#**RAG with Langchain**
- Upload PDF
- Read PDF and split text into chunks
- Apply embeddings and store into FAISS DB
- Create Retriever
- Extract the relevent Chunk
- Generate Response Using LLM

**FAISS** stands for Facebook AI Similarity Search.

It is an open-source library developed by Meta (Facebook) designed for efficient similarity search

###**Install Dependencies**

In [1]:
!pip install langchain langchain_community pypdf faiss-cpu langchain_openai

Collecting langchain_community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting pypdf
  Downloading pypdf-5.5.0-py3-none-any.whl.metadata (7.2 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting langchain_openai
  Downloading langchain_openai-0.3.18-py3-none-any.whl.metadata (2.3 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain-core<1.0.0,>=0.3.58 (from langchain)
  Downloading langchain_core-0.3.61-py3-none-any.whl.metadata (5.8 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.

###**Retrive API key from Secrets and Set as an ENV**

In [2]:
# Retrieve the API key from Colab's secrets
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [3]:
# Set OPENAI_API_KEY as an ENV
import os
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

###**Langchain Import Statements**

In [14]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

###**Upload your PDF file in current working directory**

(in google colab it is /content/ folder)

In [5]:
from google.colab import files

# Prompt user to upload a PDF
uploaded = files.upload()

Saving RAG.pdf to RAG.pdf


###**Get filename of the uploaded PDF**

In [6]:
filename = list(uploaded.keys())[0]
print(filename)

RAG.pdf


###**Read the Document**

In [7]:
# Load your document
loader = PyPDFLoader(filename)  # Replace with your file
documents = loader.load()

###**Split the Text into Chunks**

In [8]:
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

###**Convert chunks into embeddings and store in FAISS**

In [9]:
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(docs, embeddings)

###**Create a Retriever**

In [10]:
retriever = db.as_retriever()

###**Set up OpenAI Chat model**

In [11]:
llm = ChatOpenAI(model="gpt-3.5-turbo")

###**Create RetrievalQA chain**

In [12]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

###**Query and Response Generation**

In [13]:
query = "What is this document about?"
result = qa_chain.invoke(query)

print("Answer:\n", result["result"])
print("\nSources:\n", result["source_documents"])

Answer:
 This document is about Retrieval Augmented Generation (RAG), a technology that uses external knowledge sources to enhance the accuracy and coherence of generated text. It explains the importance and benefits of RAG, how it works, and provides examples of its applications in various fields such as question answering systems, content generation, legal document generation, medical diagnosis, and more.

Sources:
 [Document(id='b4342c5d-d398-4fb7-b834-2fdd474936d1', metadata={'producer': 'Skia/PDF m136 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'RAG', 'source': 'RAG.pdf', 'total_pages': 3, 'page': 2, 'page_label': '3'}, page_content='coherence\n \nof\n \ngenerated\n \ntext.\n \nWith\n \nRAG,\n \nAI\n \nsystems\n \ncan\n \nbridge\n \nthe\n \ngap\n \nto\n \nreal-world\n \nknowledge\n \nand\n \ncreate\n \nmore\n \nhuman-like\n \ntext.'), Document(id='bd949612-cf31-454f-a7df-e8b17a636fa3', metadata={'producer': 'Skia/PDF m136 Google Docs Renderer', 'creator