<a href="https://colab.research.google.com/github/arishp/srm-ap-genai-2024/blob/main/02_Intro_to_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Document Loaders

In [3]:
from langchain.document_loaders import TextLoader

loader = TextLoader('sample.txt')
documents = loader.load()
len(documents)

1

# Document Transformers

In [8]:
from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter(separator=" ", chunk_size=300, chunk_overlap=50)
texts = text_splitter.split_documents(documents)
len(texts)

129

In [9]:
texts[:2]

[Document(metadata={'source': 'sample.txt'}, page_content='Dhoni" redirects here. For other uses, see Dhoni (disambiguation).\nLieutenant colonel\nMahendra Singh Dhoni\n\nDhoni in 2023\nPersonal details\nBorn\t7 July 1981 (age 43)\nRanchi, Bihar (present-day Jharkhand), India\nHeight\t5 ft 9 in (175 cm)[1]\nSpouse\tSakshi Dhoni\nAwards\t\n Padma Bhushan (2018)\n Padma Shri'),
 Document(metadata={'source': 'sample.txt'}, page_content='Dhoni\nAwards\t\n Padma Bhushan (2018)\n Padma Shri (2009)\nMajor Dhyan Chand Khel Ratna Award (2008)\nNickname(s)\tMahi, Thala, Captain Cool[2]\nMilitary service\nAllegiance\t India\nBranch/service\t Indian Army\nYears of service\t2011–present\nRank\t Lieutenant colonel\nUnit\t Territorial Army\nPersonal')]

# Text Embedding Models

In [11]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

# Vector Stores

In [15]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(texts, embeddings)

In [14]:
# db._collection.get(include=['embeddings'])

# Retrievers

In [16]:
retriever = db.as_retriever(search_kwargs={"k":4})

In [17]:
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x176c88310>, search_kwargs={'k': 4})

# Question 1

In [42]:
question = input("enter your question?")

In [43]:
question

"what is dhoni's date of birth"

In [44]:
# docs = retriever.get_relevant_documents(question)
docs = retriever.invoke(question)

In [45]:
docs

[Document(metadata={'source': 'sample.txt'}, page_content='rank of Lieutenant colonel in the Parachute Regiment of the Indian Territorial Army which was presented to him by the Indian Army in 2011. He is one of the most popular cricketers in the world.\n\nEarly life\nDhoni was born on 7 July 1981 in Ranchi, Bihar (now in Jharkhand) in a Hindu Rajput family to'),
 Document(metadata={'source': 'sample.txt'}, page_content='Dhoni" redirects here. For other uses, see Dhoni (disambiguation).\nLieutenant colonel\nMahendra Singh Dhoni\n\nDhoni in 2023\nPersonal details\nBorn\t7 July 1981 (age 43)\nRanchi, Bihar (present-day Jharkhand), India\nHeight\t5 ft 9 in (175 cm)[1]\nSpouse\tSakshi Dhoni\nAwards\t\n Padma Bhushan (2018)\n Padma Shri'),
 Document(metadata={'source': 'sample.txt'}, page_content='2010, 2016 and was a member of the title winning squad in 2018.\n\nBorn in Ranchi, Dhoni made his first class debut for Bihar in 1999. He made his debut for the Indian cricket team on 23 December 2

In [23]:
docs[1].page_content

'of being a successful leader.[176][177] Dhoni is also known for his cool-headed demeanor on the field which has earned him the monicker "Captain cool".[178]\n\nPersonal life\nDhoni married Sakshi Singh Rawat on 4 July 2010 in Dehradun.[179][180] Dhoni and his wife have a daughter, Ziva Dhoni who was'

In [46]:
relevant_text = ''
for doc in docs:
  relevant_text += doc.page_content
print(relevant_text)

rank of Lieutenant colonel in the Parachute Regiment of the Indian Territorial Army which was presented to him by the Indian Army in 2011. He is one of the most popular cricketers in the world.

Early life
Dhoni was born on 7 July 1981 in Ranchi, Bihar (now in Jharkhand) in a Hindu Rajput family toDhoni" redirects here. For other uses, see Dhoni (disambiguation).
Lieutenant colonel
Mahendra Singh Dhoni

Dhoni in 2023
Personal details
Born	7 July 1981 (age 43)
Ranchi, Bihar (present-day Jharkhand), India
Height	5 ft 9 in (175 cm)[1]
Spouse	Sakshi Dhoni
Awards	
 Padma Bhushan (2018)
 Padma Shri2010, 2016 and was a member of the title winning squad in 2018.

Born in Ranchi, Dhoni made his first class debut for Bihar in 1999. He made his debut for the Indian cricket team on 23 December 2004 in an ODI against Bangladesh and played his first test a year later against Sri Lanka. In 2007, hecricket on 15 August 2020 as he had not played any international cricket since India's loss in the 2019 

In [47]:
from dotenv import load_dotenv
import os

load_dotenv()
# os.environ['HUGGINGFACE_HUB_API_KEY']

True

In [49]:
from langchain_huggingface import HuggingFaceEndpoint

# Define the repository ID for the Gemma 2b model
repo_id = "google/gemma-2b"

# Set up a Hugging Face Endpoint for Gemma 2b model
llm = HuggingFaceEndpoint(
    repo_id=repo_id, temperature=0.01, huggingfacehub_api_token=os.environ['HUGGINGFACE_HUB_API_KEY']
)

In [50]:
from langchain.prompts import PromptTemplate

template = """Answer the question in one line using the following information:

```{information}```.


*** Question ***

{question}

*** Answer ***"""

prompt = PromptTemplate.from_template(template=template)

In [51]:
question

"what is dhoni's date of birth"

In [52]:
relevant_text

'rank of Lieutenant colonel in the Parachute Regiment of the Indian Territorial Army which was presented to him by the Indian Army in 2011. He is one of the most popular cricketers in the world.\n\nEarly life\nDhoni was born on 7 July 1981 in Ranchi, Bihar (now in Jharkhand) in a Hindu Rajput family toDhoni" redirects here. For other uses, see Dhoni (disambiguation).\nLieutenant colonel\nMahendra Singh Dhoni\n\nDhoni in 2023\nPersonal details\nBorn\t7 July 1981 (age 43)\nRanchi, Bihar (present-day Jharkhand), India\nHeight\t5 ft 9 in (175 cm)[1]\nSpouse\tSakshi Dhoni\nAwards\t\n Padma Bhushan (2018)\n Padma Shri2010, 2016 and was a member of the title winning squad in 2018.\n\nBorn in Ranchi, Dhoni made his first class debut for Bihar in 1999. He made his debut for the Indian cricket team on 23 December 2004 in an ODI against Bangladesh and played his first test a year later against Sri Lanka. In 2007, hecricket on 15 August 2020 as he had not played any international cricket since Ind

In [53]:
prompt_formatted_str: str = prompt.format(question=question, information=relevant_text)

In [54]:
prompt_formatted_str

'Answer the question in one line using the following information:\n\n```rank of Lieutenant colonel in the Parachute Regiment of the Indian Territorial Army which was presented to him by the Indian Army in 2011. He is one of the most popular cricketers in the world.\n\nEarly life\nDhoni was born on 7 July 1981 in Ranchi, Bihar (now in Jharkhand) in a Hindu Rajput family toDhoni" redirects here. For other uses, see Dhoni (disambiguation).\nLieutenant colonel\nMahendra Singh Dhoni\n\nDhoni in 2023\nPersonal details\nBorn\t7 July 1981 (age 43)\nRanchi, Bihar (present-day Jharkhand), India\nHeight\t5 ft 9 in (175 cm)[1]\nSpouse\tSakshi Dhoni\nAwards\t\n Padma Bhushan (2018)\n Padma Shri2010, 2016 and was a member of the title winning squad in 2018.\n\nBorn in Ranchi, Dhoni made his first class debut for Bihar in 1999. He made his debut for the Indian cricket team on 23 December 2004 in an ODI against Bangladesh and played his first test a year later against Sri Lanka. In 2007, hecricket on 

In [56]:
response = llm.invoke(prompt_formatted_str)
response.strip()

'7 July 1981'