###### **Creating the RAG pipeline using LangChain, OpenAI and VectoDB(FAISS)**

Faiss -> Facebook AI Similarity Search 


In [17]:
# Required libraries for building RAG
! pip install langchain openai tiktoken rapidocr-onnxruntime


Defaulting to user installation because normal site-packages is not writeable


In [18]:
! pip install langchain-community

Defaulting to user installation because normal site-packages is not writeable


In [19]:
# fetching the open ai api key and setting up the environment for it 
import openai
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
os.environ["OPENAI_API_KEY"] = api_key


Stages of creating a RAG
1. Data Ingestion
2. Data Reterival
3. Data Generation 


### **Stage 01 : Data Ingestion**
Data ingestion is the concept where we store the data in the form of chunks inside the database


In [20]:
import requests 
from langchain.document_loaders import TextLoader 
from langchain.vectorstores import FAISS


In [21]:
url = "https://frontiernerds.com/files/state_of_the_union.txt"
     

In [22]:
response = requests.get(url)

In [23]:
response


<Response [200]>

In [26]:
print(raw_data)

Madame Speaker, Vice President Biden, members of Congress, distinguished guests, and fellow Americans:

Our Constitution declares that from time to time, the president shall give to Congress information about the state of our union. For 220 years, our leaders have fulfilled this duty. They have done so during periods of prosperity and tranquility. And they have done so in the midst of war and depression; at moments of great strife and great struggle.

It's tempting to look back on these moments and assume that our progress was inevitable, that America was always destined to succeed. But when the Union was turned back at Bull Run and the Allies first landed at Omaha Beach, victory was very much in doubt. When the market crashed on Black Tuesday and civil rights marchers were beaten on Bloody Sunday, the future was anything but certain. These were times that tested the courage of our convictions and the strength of our union. And despite all our divisions and disagreements, our hesitatio

In [27]:
 raw_data = response.text

In [66]:
#Writing the url response data into a text file for further use 
with open("State_of_union.txt","w") as f:
    f.write(raw_data)

In [38]:
loader =TextLoader('State_of_union.txt')

In [39]:
document = loader.load()

In [40]:
print(document[0].page_content)

Madame Speaker, Vice President Biden, members of Congress, distinguished guests, and fellow Americans:

Our Constitution declares that from time to time, the president shall give to Congress information about the state of our union. For 220 years, our leaders have fulfilled this duty. They have done so during periods of prosperity and tranquility. And they have done so in the midst of war and depression; at moments of great strife and great struggle.

It's tempting to look back on these moments and assume that our progress was inevitable, that America was always destined to succeed. But when the Union was turned back at Bull Run and the Allies first landed at Omaha Beach, victory was very much in doubt. When the market crashed on Black Tuesday and civil rights marchers were beaten on Bloody Sunday, the future was anything but certain. These were times that tested the courage of our convictions and the strength of our union. And despite all our divisions and disagreements, our hesitatio

##### Chunking of the data 

In [41]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [42]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)

In [43]:
 text_chunks = text_splitter.split_documents(document)

In [44]:
print(text_chunks[3].page_content)

Again, we are tested. And again, we must answer history's call.

One year ago, I took office amid two wars, an economy rocked by severe recession, a financial system on the verge of collapse and a government deeply in debt. Experts from across the political spectrum warned that if we did not act, we might face a second depression. So we acted immediately and aggressively. And one year later, the worst of the storm has passed.


In [45]:
! pip install langchain-openai

Defaulting to user installation because normal site-packages is not writeable


In [46]:
# Importing modules for embedding the chunk to vector store 
from langchain_openai import OpenAIEmbeddings
#from weaviate import Client 
#from weaviate.embedded import EmbeddedOptions
from langchain.vectorstores import FAISS

In [49]:
#Object of the  embeddings class
embeddings = OpenAIEmbeddings(openai_api_key=api_key)

In [52]:
! pip install faiss-cpu

Defaulting to user installation because normal site-packages is not writeable
Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0.post1-cp312-cp312-win_amd64.whl.metadata (3.8 kB)
Downloading faiss_cpu-1.8.0.post1-cp312-cp312-win_amd64.whl (14.6 MB)
   ---------------------------------------- 0.0/14.6 MB ? eta -:--:--
   ------------ --------------------------- 4.7/14.6 MB 25.9 MB/s eta 0:00:01
   ------------------------- -------------- 9.2/14.6 MB 22.8 MB/s eta 0:00:01
   ---------------------------------------- 14.6/14.6 MB 24.8 MB/s eta 0:00:00
Installing collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0.post1


In [53]:
# Object of the FAISS class
vectorstore = FAISS.from_documents(text_chunks, embeddings)

In [54]:
retriever = vectorstore.as_retriever()

In [55]:
 from langchain.prompts import ChatPromptTemplate 

In [56]:
template = """
You are an assistance for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use ten sentences maximum and keepp the answer concise.
Question: {question}
Conntext: {context}
Answer:
"""


In [58]:
prompt= ChatPromptTemplate.from_template(template)

In [63]:
#Libraries to load the model 
from langchain_openai import ChatOpenAI # Importing the Chat Model
from langchain.schema.runnable import RunnablePassthrough # Can pass the dynamic query 
from langchain.schema.output_parser import StrOutputParser # Parse the output which is defined

In [64]:
# Loading the model
llm_model=ChatOpenAI(openai_api_key=api_key, model_name="gpt-4o-mini")

In [65]:
#Object for output parser 
output_parser=StrOutputParser()

In [66]:
rag_chain = (
    {"context":retriever, "question": RunnablePassthrough()}
    | prompt 
    | llm_model
    | output_parser
)

In [70]:
rag_chain.invoke("What is americas commitment to global issues")

"America's commitment to global issues involves active leadership and engagement to promote security and prosperity for all. The country collaborates through platforms like the G-20 to ensure a sustainable global recovery and supports education and innovation, particularly in Muslim communities. America has transitioned from being a bystander to a leader in combating climate change and is helping developing nations achieve food security while fighting diseases like HIV/AIDS. The U.S. emphasizes the importance of leading in the clean energy economy to maintain global economic leadership. Additionally, America is dedicated to humanitarian efforts, such as aiding Haiti's recovery and supporting educational and human rights initiatives worldwide. This commitment is rooted in the belief that America's fate is interconnected with global well-being and is driven by moral responsibility. Furthermore, the U.S. seeks to expand trade relationships aggressively to create jobs domestically while en

In [72]:
rag_chain.invoke("What are the measures and initiatives taken by U.S. on housing , clean energy and government spending? ")

'The U.S. has taken several measures and initiatives focused on housing, clean energy, and government spending. Efforts include providing incentives for energy efficiency and clean energy projects, which are seen as essential for future economic leadership. The government has also funded infrastructure projects, such as a high-speed railroad in Tampa, Florida, aimed at job creation and improved logistics. Additionally, there are initiatives to put more Americans to work in building clean energy facilities. Rebates are being offered to homeowners to enhance energy efficiency, thereby supporting clean energy jobs. The focus is on not only addressing domestic needs but also ensuring the U.S. remains competitive globally in clean energy. The approach emphasizes the urgency of action, as other nations are making significant investments in their economies. Overall, these initiatives aim to stimulate economic growth while addressing climate change concerns.'