In [14]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
groq_api_key = os.environ["GROQ_API_KEY"]

In [15]:
from langchain_groq import ChatGroq
llamaChatModel = ChatGroq(
    model="llama3-70b-8192"
)

In [16]:
messages = [
    ("system", "You are an helpful assistant"),
    ("human", "Tell me about lonavala.")
]

response = llamaChatModel.invoke(messages)
print(response.content)

Lonavala! A popular hill station in Maharashtra, India, known for its breathtaking scenery, pleasant climate, and proximity to Mumbai and Pune. Here's what you need to know about Lonavala:

**Location**: Lonavala is situated in the Pune district of Maharashtra, approximately 96 km (60 miles) southeast of Mumbai and 64 km (40 miles) northwest of Pune.

**Climate**: Lonavala has a mild climate throughout the year, with temperatures ranging from 12°C to 35°C (54°F to 95°F). The best time to visit is during the monsoon season (June to September) when the hills are lush green and the waterfalls are at their peak.

**Tourist Attractions**:

1. **Tiger's Leap**: A popular viewpoint with a cliff that resembles a tiger's leap, offering stunning views of the valley.
2. **Bhushi Dam**: A beautiful dam surrounded by lush greenery, perfect for a picnic or a relaxing walk.
3. **Lonavala Lake**: A serene lake with a beautiful garden and a walking path, ideal for boating and relaxation.
4. **Rajmachi 

### Data Loader

In [17]:
# TXT Data Loading....

from langchain_community.document_loaders import TextLoader
loader = TextLoader("data/be-good.txt")

loaded_data = loader.load()

In [18]:
loaded_data

[Document(metadata={'source': 'data/be-good.txt'}, page_content='Be good\n\nApril 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the\nphrase that became our motto: Make something people want.  We\'ve\nlearned a lot since then, but if I were choosing now that\'s still\nthe one I\'d pick.Another thing we tell founders is not to worry too much about the\nbusiness model, at least at first.  Not because making money is\nunimportant, but because it\'s so much easier than building something\ngreat.A couple weeks ago I realized that if you put those two ideas\ntogether, you get something surprising.  Make something people want.\nDon\'t worry too much about making money.  What you\'ve got is a\ndescription of a charity.When you get an unexpected result like this, it could either be a\nbug or a new discovery.  Either businesses aren\'t supposed to be\nlike charities, and we\'ve proven by reductio ad absurdum that one

In [19]:
# CSV Data Loader....

from langchain_community.document_loaders import CSVLoader
loader = CSVLoader("data/Street_Tree_List.csv")

loaded_data = loader.load()

In [20]:
#loaded_data

In [None]:
# HTML File Loader....

from langchain_community.document_loaders import UnstructuredHTMLLoader
loader = UnstructuredHTMLLoader("data/100-startups.html")
loaded_data = loader.load()
#As it will take lot of time to fetch all the data of website... that's why I skip it to run

In [22]:
# PDF Data Loader....

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("data/5pages.pdf")
loaded_pdf_data = loader.load_and_split()

In [23]:
loaded_pdf_data

[Document(metadata={'source': 'data/5pages.pdf', 'page': 0}, page_content='Page 1 of 4 PDF Files \nScan – Create – Reduce File Size  \n \n \nIt is recommended that you purchase an Adobe Acrobat product that \nallows you to read, create and manipulate PDF documents.  Go to http://www.adobe.com/products/acrobat/matrix.html\n to compare \nAdobe products and features –Adobe  Acrobat Standard is sufficient. \n \n \nScanning Documents \n \nYou should only have to scan docu ments that are not electronic, and \nwhen you are unable to create a PDF using PDFMaker or the Print \nCommand from the applicat ion you are using.   \n \nSignature Pages If you have a document such as a CV that requires a signature on a page only print the page that re quires the signature –printing the \nentire document and scanning it is not\n necessary or desired.  Once you \nsign and scan the signature page you can combine it with the original \ndocument using the Create PDF From Multiple Files feature. \n Scanner Set

In [24]:
loaded_pdf_data[0].page_content

'Page 1 of 4 PDF Files \nScan – Create – Reduce File Size  \n \n \nIt is recommended that you purchase an Adobe Acrobat product that \nallows you to read, create and manipulate PDF documents.  Go to http://www.adobe.com/products/acrobat/matrix.html\n to compare \nAdobe products and features –Adobe  Acrobat Standard is sufficient. \n \n \nScanning Documents \n \nYou should only have to scan docu ments that are not electronic, and \nwhen you are unable to create a PDF using PDFMaker or the Print \nCommand from the applicat ion you are using.   \n \nSignature Pages If you have a document such as a CV that requires a signature on a page only print the page that re quires the signature –printing the \nentire document and scanning it is not\n necessary or desired.  Once you \nsign and scan the signature page you can combine it with the original \ndocument using the Create PDF From Multiple Files feature. \n Scanner Settings Before scanning documents rememb er to make certain that the \nfollo

In [25]:
# Wikipedia Data Loader....

from langchain_community.document_loaders import WikipediaLoader
loader = WikipediaLoader(query="Tesla", load_max_docs=1)
loaded_wikipedia_data = loader.load()[0].page_content

In [26]:
loaded_wikipedia_data

'Tesla, Inc. ( TESS-lə or  TEZ-lə) is an American multinational automotive and clean energy company. Headquartered in Austin, Texas, it designs, manufactures and sells battery electric vehicles (BEVs), stationary battery energy storage devices from home to grid-scale, solar panels and solar shingles, and related products and services.\nTesla was founded in July 2003 by Martin Eberhard and Marc Tarpenning as Tesla Motors. Its name is a tribute to inventor and electrical engineer Nikola Tesla. In February 2004, Elon Musk joined as Tesla\'s largest shareholder; in 2008, he was named chief executive officer. In 2008, the company began production of its first car model, the Roadster sports car, followed by the Model S sedan in 2012, the Model X SUV in 2015, the Model 3 sedan in 2017, the Model Y crossover in 2020, the Tesla Semi truck in 2022 and the Cybertruck pickup truck in 2023. In June 2021, the Model 3 became the first electric car to sell 1 million units globally. In 2023, the Model 

In [27]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("human", "Answer this {question}, here is some extra {context}"),
    ]
)

messages = chat_template.format_messages(
    name= "Tesla",
    question= "Tell me about Tesla in short.",
    context = loaded_wikipedia_data
)

In [28]:
response = llamaChatModel.invoke(messages)

In [29]:
response.content

"Here's a brief overview of Tesla:\n\n**What:** Tesla is an American multinational automotive and clean energy company.\n\n**Founded:** July 2003 by Martin Eberhard and Marc Tarpenning.\n\n**Named after:** Nikola Tesla, an inventor and electrical engineer.\n\n**Headquarters:** Austin, Texas.\n\n**Products:**\n\n* Battery electric vehicles (BEVs): Roadster, Model S, Model X, Model 3, Model Y, Cybertruck, and Tesla Semi.\n* Stationary battery energy storage devices.\n* Solar panels and solar shingles.\n\n**Notable milestones:**\n\n* 2008: Elon Musk becomes CEO.\n* 2012: Model S sedan launched.\n* 2015: Model X SUV launched.\n* 2017: Model 3 sedan launched.\n* 2020: Model Y crossover launched.\n* 2022: Tesla Semi truck launched.\n* 2023: Cybertruck pickup truck launched.\n* 2023: Model Y becomes best-selling vehicle globally.\n* 2024: Model Y becomes best-selling BEV in history.\n\n**Achievements:**\n\n* World's most valuable automaker since 2020.\n* First electric car to sell 1 million u

## RAG: Retrieval-Augmented Generation

##### Load large data assets and ask LLM questions about it.


### What if the loaded data is too large? We will use RAG.

##### When you load a document, you end up with strings. Sometimes the strings will be too large to fit into the context window. In those occasions, we will use the RAG Technique:

###### 1. Split document into small chunks.

###### 2. Transform the text chunks in numeric chunks(embeddings).

###### 3. Load embeddings to a vector database (aka vector store).

###### 4. Load questions and retrive the most relevant embeddings to respond it.

###### 5. Sent the embeddings to the LLM to format the response properly.
