What is the RAG system?
Defination:
This is called retrieval augmented generation (RAG), as you would retrieve the relevant data and use it as augmented context for the LLM. Instead of relying solely on knowledge derived from the training data, a RAG workflow pulls relevant information and connects static LLMs with real-time data retrieval.

Architecture:
image.png

Why we create a RAG System?
Retrieval systems (RAG) give LLM systems access to factual, 
access-controlled, 
timely information.

RAG REDUCES HALLUCINATION
Example: In the financial services industry, 
providing accurate information on investment options is crucial because it directly impacts customers' purchasing decisions and financial well-being. 
RAG can help ensure that the information generated about stocks, bonds, or mutual funds

COST-EFFECTIVE ALTERNATIVE
Example: Banks often need to assess the creditworthiness of potential borrowers. Fine-tuning pre-trained language models to analyse credit histories can be resource-intensive. RAG architecture offers a cost-effective alternative by retrieving relevant financial data and credit history information from existing databases, combining this with pre-trained language models

CREDIBLE AND ACCURATE RESPONSES
Example: In customer support, providing accurate and helpful responses is essential for maintaining customer trust, as it demonstrates the company's commitment to providing reliable information and support. The RAG technique is able to do this very effectively by retrieving data from catalogues, policies, and past customer interactions to generate context-aware insights, ensuring that customers receive reliable information on product features, returns, and other inquiries.

DOMAIN-SPECIFIC INFORMATION
Example: In the legal industry, clients often require advice specific to their case or jurisdiction because different legal systems have unique rules and regulations, and understanding these nuances is crucial for effective legal representation. RAG can access domain-specific knowledge bases, such as local statutes and case law, to provide tailored information relevant to clients' legal needs.

https://www.advancinganalytics.co.uk/blog/2023/11/7/10-reasons-why-you-need-to-implement-rag-a-game-changer-in-ai

RAG Practical Usecase
Document Question Answering Systems
Conversational agents
Real-time Event Commentary
Content Generation
Personalised Recommendation
Virtual Assistants

Data Ingetion

getting the data in PDF , TXT form , the extract the data 

https://en.wikipedia.org/wiki/State_of_the_Union#:~:text=Though%20the%20language%20of%20the,as%20late%20as%20March%207

In [10]:
import requests
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS

In [11]:
with open("stateofart.txt","r", encoding="utf8") as f:
  data = f.read()

In [12]:
loder=TextLoader('stateofart.txt', encoding="utf8")

In [13]:
document = loder.load()

In [14]:
print(document[0].page_content)

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 

Groups of citizens blocking tanks with their bodies. Every

Now Chunking the Extracted Data 



In [15]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [24]:
text_splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)
text_chunks=text_splitter.split_documents(document)

In [25]:
print(text_chunks[1].page_content)

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.


In [26]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

In [28]:
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="deepseek-r1:1.5b")
print(embeddings.embed_query("Ollama + DeepSeek test"))

[-1.4444594383239746, 1.9347456693649292, -1.3072381019592285, -1.9882678985595703, 3.29730486869812, -0.9998583197593689, -3.815270185470581, 0.8051474094390869, 1.5039712190628052, 0.8765518069267273, 0.22258146107196808, -8.865545272827148, 2.845285177230835, -2.0232162475585938, -1.98785400390625, -1.475900650024414, 0.7841082215309143, 0.1571357101202011, -2.5848727226257324, -2.33575439453125, -0.9478411078453064, 1.0555870532989502, -0.9778225421905518, -0.7708240747451782, -0.8347837924957275, 1.1695724725723267, -0.6518177390098572, -2.4499003887176514, -0.6799378991127014, 0.13516569137573242, -0.9139566421508789, 0.5363237261772156, 1.4175111055374146, 1.3292540311813354, -0.6849470138549805, -2.2992334365844727, -0.9889945387840271, -1.4526584148406982, 2.4575655460357666, -2.6271166801452637, -0.5448383092880249, 0.6149179935455322, -1.6535392999649048, 0.15375763177871704, -1.7061092853546143, 0.5295663475990295, -0.4703814685344696, 1.0075713396072388, -1.245901942253112

In [29]:
vectorstore=FAISS.from_documents(text_chunks, embeddings)

In [30]:
retriever=vectorstore.as_retriever()

In [48]:
from langchain.prompts import ChatPromptTemplate

In [63]:

template="""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use ten sentences maximum and keep the answer concise.
generate the answer of length of 1000 of words
the user input is : {question}
refer this : {context}
Answer:
"""

In [64]:
prompt=ChatPromptTemplate.from_template(template)

In [65]:
from langchain_community.chat_models import ChatOllama
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

# Replace OpenAI with Ollama (DeepSeek)
llm_model = ChatOllama(model="deepseek-r1:14b", temperature=0)  # Adjust model name if different

# Rest of the chain remains the same
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm_model
    | StrOutputParser()
)

In [66]:
answer = rag_chain.invoke("How is the United States supporting Ukraine economically and militarily?")

In [None]:
print(answer)

<think>
Okay, so I need to figure out how the United States is supporting Ukraine economically and militarily based on the provided context. Let me go through each document one by one.

First, Document 8b9b1d06-a3e8-4fc3-92fd-d352818a6923 talks about detecting new variants of COVID-19 and mentions that if necessary, they might deploy new vaccines within 100 days. It also says they'll have stockpiles ready if needed. So, this document is about the pandemic response.

Next, Document b8f6f637-2477-46ac-aa0a-f04c8411032b discusses vaccination rates and hospitalizations. It says 75% of adults are fully vaccinated and hospitalizations have dropped by 77%. They achieved this with free vaccines, treatments, tests, and masks. So, this is about the vaccine program.

Document 9076417b-a03e-4261-bc30-164eb11d0235 talks about border security measures like new scanners at the U.S.-Mexico border and joint patrols with Mexico and Guatemala. They're also setting up dedicated immigration judges to handl

In [68]:
answer2 = rag_chain.invoke("What action is the U.S. taking to address rising gas prices?")
print(answer2)

<think>
Okay, I need to figure out what actions the U.S. is taking to address rising gas prices based on the provided documents. Let me go through each document one by one.

The first document talks about vaccination rates and mask removal, but there's no mention of gas prices or related policies. So probably not relevant here.

The second document discusses preparing for new COVID variants and vaccine deployment. Again, nothing about gas prices. Not helpful for this question.

The third document is about buying American products and the Bipartisan Innovation Act. It mentions investments in manufacturing and technologies but doesn't touch on gas prices or energy policies. So, not directly related either.

The fourth document talks about border security measures like scanners, joint patrols, immigration judges, and supporting partners in South/Central America. Still no connection to gas prices.

Wait a minute, none of the documents provided actually mention anything about rising gas pri

In [74]:
from langchain_community.chat_models import ChatOllama
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

# Replace OpenAI with Ollama (DeepSeek)
llm_model = ChatOllama(model="llama2", temperature=0)  # Adjust model name if different

# Rest of the chain remains the same
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm_model
    | StrOutputParser()
)

In [75]:
answer3 = rag_chain.invoke("How is the United States supporting Ukraine economically and militarily?")

In [76]:
print(answer)

<think>
Okay, so I need to figure out how the United States is supporting Ukraine economically and militarily based on the provided documents. Let me start by reading through each document carefully.

The first document talks about preparing for new COVID-19 variants. It mentions deploying vaccines quickly if needed and stockpiling tests, masks, and pills with Congress's funding. But this doesn't seem related to Ukraine at all. So I can probably ignore this one since it's about public health measures in the U.S.

The second document discusses vaccination rates in the U.S., hospitalizations decreasing, and the importance of continuing efforts with Congressional funding. Again, this is about domestic COVID-19 response, not international aid or military support for Ukraine. So I'll set this aside as well.

The third document mentions installing new technology at borders to detect drug smuggling, joint patrols with Mexico and Guatemala to catch traffickers, and supporting partners in South