#**RAG with Langchain**
- Upload PDF
- Read PDF and split text into chunks
- Apply embeddings and store into FAISS DB
- Create Retriever
- Extract the relevent Chunk
- Generate Response Using LLM

**FAISS** stands for Facebook AI Similarity Search.

It is an open-source library developed by Meta (Facebook) designed for efficient similarity search

###**Install Dependencies**

In [1]:
!pip install langchain langchain_community pypdf faiss-cpu langchain_openai



###**Retrive API key from Secrets and Set as an ENV**

In [2]:
# Retrieve the API key from Colab's secrets
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [3]:
# Set OPENAI_API_KEY as an ENV
import os
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

###**Langchain Import Statements**

In [4]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

###**Upload your PDF file in current working directory**

(in google colab it is /content/ folder)

In [5]:
from google.colab import files

# Prompt user to upload a PDF
uploaded = files.upload()

Saving RAG.pdf to RAG (1).pdf


###**Get filename of the uploaded PDF**

In [6]:
filename = list(uploaded.keys())[0]
print(filename)

RAG (1).pdf


###**Read the Document**

In [7]:
# Load your document
loader = PyPDFLoader(filename)  # Replace with your file
documents = loader.load()

###**Split the Text into Chunks**

In [8]:
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

###**Convert chunks into embeddings and store in FAISS**

In [9]:
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(docs, embeddings)

  embeddings = OpenAIEmbeddings()


###**Create a Retriever**

In [10]:
retriever = db.as_retriever()

###**Set up OpenAI Chat model**

In [11]:
llm = ChatOpenAI(model="gpt-3.5-turbo")

###**Create RetrievalQA chain**

In [12]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

###**Query and Response Generation**

In [13]:
query = "What is this document about?"
result = qa_chain({"query": query})

print("Answer:\n", result["result"])
print("\nSources:\n", result["source_documents"])

  result = qa_chain({"query": query})


Answer:
 This document is about Retrieval Augmented Generation (RAG), which is a system that uses external knowledge sources to enhance the accuracy and coherence of generated text. The document explains the importance of RAG, its benefits, how it works, has a lab demo on RAG implementation, and provides various use-cases of RAG. It also discusses how Artificial Intelligence has evolved with the rise of large language models (LLMs) and the impact on natural language processing (NLP) applications.

Sources:
 [Document(id='cd0de9e0-7d7b-46d0-816c-66c8baa85a4b', metadata={'producer': 'Skia/PDF m136 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'RAG', 'source': 'RAG (1).pdf', 'total_pages': 3, 'page': 2, 'page_label': '3'}, page_content='coherence\n \nof\n \ngenerated\n \ntext.\n \nWith\n \nRAG,\n \nAI\n \nsystems\n \ncan\n \nbridge\n \nthe\n \ngap\n \nto\n \nreal-world\n \nknowledge\n \nand\n \ncreate\n \nmore\n \nhuman-like\n \ntext.'), Document(id='af3c6a24-8f2