<a href="https://colab.research.google.com/github/GenAIUnplugged/langchain_series/blob/main/02_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
!pip install langchain langchain-core langchain-community langchain_openai faiss-cpu PyMuPDF

Collecting PyMuPDF
  Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m92.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDF
Successfully installed PyMuPDF-1.25.5


# Semantic Search
search_type = similarity \
search_kwargs = k: number of top results to return

In [2]:
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [10]:
from langchain_openai import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyMuPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
url = "/content/drive/MyDrive/langchain/data/Guideline for Vector DaVinci configurator tool.pdf"

In [11]:
loader = PyMuPDFLoader(url)
docs = loader.load()

In [15]:
len(docs),type(docs)

(22, list)

In [16]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(docs)

In [20]:
len(texts),type(texts)

(58, list)

In [21]:
embeddings = OpenAIEmbeddings()
vectorstores = FAISS.from_documents(texts, embeddings)

  embeddings = OpenAIEmbeddings()


In [22]:
retriever = vectorstores.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [23]:
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [24]:
qa_chain = RetrievalQA.from_chain_type(llm, chain_type="stuff", retriever=retriever)

In [26]:
query = "Provide a brief summary of the document"
answer = qa_chain.invoke(query)
answer['result']

'The document appears to be a guide or manual related to the DaVinci tool, covering various functionalities and features. It includes sections on how to use PDF within DaVinci, launching the tool, creating new projects, familiarizing with options, creating configurations using PDF, and import/export functionalities. Additionally, it discusses modifying existing configurations, generating reports, listing generated and output files, and performing generation and compilation without the DaVinci Configurator. There are also sections on generating and compiling using batch files and differences in BUILD option arguments. The document includes figures to illustrate various elements of the DaVinci tool.'

# Similarity Score Threshold
search_type = similarity_score_threshold \
search_kwargs = score_threshold: filter results below this score

In [30]:
retriever = vectorstores.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.6,"k":3})
qa_chain = RetrievalQA.from_chain_type(llm, chain_type="stuff", retriever=retriever)
query = "Provide a brief summary of the document"
answer = qa_chain.invoke(query)
answer['result']

'The document appears to be a guide or manual related to the DaVinci tool, covering various functionalities and features. It includes sections on how to use PDF within DaVinci, launching the tool, creating new projects, familiarizing users with the options available, creating new configurations using PDF, and the import/export functionality. Additionally, it discusses modifying existing configurations, generating reports, listing generated and output files, and performing generation and compilation without using the DaVinci Configurator. There are also sections on generating and compiling using batch files and the differences in arguments for the BUILD option. The document concludes with a revision history.'

# Similarity Search With Score
similarity_search_with_score(query)
This is a method on the VectorStore object that:
* Performs a similarity search
* Returns a list of tuples: (document, score)
* You handle any filtering based on score afterward

In [40]:
docs_with_scores = vectorstores.similarity_search_with_score(query,k=3)
filtered_docs = [doc for doc, score in docs_with_scores if score > 0.4]
filtered_docs

[Document(id='37b8c224-e57c-4571-92a6-dce790f50338', metadata={'producer': 'Adobe PDF Library 15.0', 'creator': 'Acrobat PDFMaker 17 for Word', 'creationdate': '2023-06-21T10:04:34+02:00', 'source': '/content/drive/MyDrive/langchain/data/Guideline for Vector DaVinci configurator tool.pdf', 'file_path': '/content/drive/MyDrive/langchain/data/Guideline for Vector DaVinci configurator tool.pdf', 'total_pages': 22, 'format': 'PDF 1.6', 'title': 'Guideline for Vector DaVinci Configurator Tool', 'author': 'Renesas Electronics Corporation', 'subject': 'R01AN6904EA0100', 'keywords': 'MCAL DaVinci', 'moddate': '2023-06-21T10:05:57+02:00', 'trapped': '', 'encryption': 'Standard V4 R4 128-bit AES', 'modDate': "D:20230621100557+02'00'", 'creationDate': "D:20230621100434+02'00'", 'page': 0}, page_content='2.2.1. \nUsage of PDF in DaVinci ........................................................................................................ 6 \n2.2.2. \nHow to Launch or open DaVinci tool ..........

Use LLM to Generate Answer from Filtered Docs \
We are using load_qa_chain from langchain.chains.question_answering \

Both load_qa_chain and RetrievalQA are used in LangChain to build question answering systems, but they serve slightly different purposes and are used in different contexts.It's useful when you already have the relevant documents that you want the language model to consider for answering the question \\

**load_qa_chain** is specifically designed to take a language model (llm) and a list of input_documents and answer a question based only on the information contained within those provided documents. \\

**RetrievalQA** is a higher-level chain that combines retrieval and question answering. \
It takes a language model (llm) and a retriever object as input.
The retriever is responsible for fetching relevant documents based on a user's query. This often involves searching a vector store (like the FAISS vector store used in this code) for documents similar to the query.
After the documents are retrieved by the retriever, RetrievalQA then passes these retrieved documents along with the original query to the language model to generate an answer.



In [41]:
from langchain.chains.question_answering import load_qa_chain

qa_chain = load_qa_chain(llm, chain_type="stuff")

# Run the QA chain
response = qa_chain.run(input_documents=filtered_docs, question=query)
print(response)


  response = qa_chain.run(input_documents=filtered_docs, question=query)


The document appears to be a guide or manual related to the DaVinci tool, covering various functionalities and features. It includes sections on how to use PDF within DaVinci, launching the tool, creating new projects, familiarizing users with the options available, creating new configurations using PDF, and the import/export functionality. Additionally, it discusses modifying existing configurations, generating reports, listing generated and output files, and performing generation and compilation without using the DaVinci Configurator. There are also sections on generating and compiling using batch files and the differences between BUILD option arguments. The document includes figures to illustrate various elements of the DaVinci tool.


# Using PromptTemplates with RetrievalQA and load_qa_chain
We are using SystemMessagePromptTemplate and HumanMessagePromptTemplate

In [42]:
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

# Define the system message template
system_template = "You are a helpful AI assistant that answers questions based on the provided documents."
system_message_prompt = SystemMessagePromptTemplate.from_template(system_template)

# Define the human message template with placeholders for context and question
human_template = """Please answer the question based on the following context:
{context}

Question: {question}"""
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

# Combine the system and human message prompts into a chat prompt template
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

# Load the QA chain with the defined prompt
qa_chain = load_qa_chain(llm, chain_type="stuff", prompt=chat_prompt)

# Run the QA chain
response = qa_chain.run(input_documents=filtered_docs, question=query)
print(response)

The document appears to be a guide or manual related to the DaVinci tool, covering various functionalities and features. It includes sections on how to use PDF within DaVinci, launching the tool, creating new projects, and familiarizing users with the available options. Additionally, it discusses creating new configurations using PDF, as well as import and export functionalities. There are instructions for modifying existing configurations and generating reports, along with a list of generated and output files. The document also addresses performing generation and compilation without the DaVinci Configurator, including how to use batch files and the differences in BUILD option arguments. Lastly, it contains a revision history section. Various figures are referenced throughout the document to illustrate key concepts and features.


In the provided human_template, the placeholders {context} and {question} are fixed and are filled by LangChain when the prompt is used within a chain, like load_qa_chain or RetrievalQA. \

**Here's how it works:** \

**{context}:** When you use a chain like load_qa_chain and pass in input_documents, LangChain takes the content of these documents and inserts them into the {context} placeholder in the template. \
**{question}:** The user's query or question is inserted into the {question} placeholder. \
LangChain is designed to handle this variable substitution, creating a complete prompt that includes both the relevant context from the retrieved documents and the user's question, which is then sent to the language model.