<a href="https://colab.research.google.com/github/GenAIUnplugged/langchain_series/blob/main/01_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain langchain-core langchain-community langchain_openai



In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
from google.colab import userdata
import os
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

In [4]:
pip install langchain openai faiss-cpu tiktoken pymupdf



In [5]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

# Step 1: Load PDF
url = "/content/drive/MyDrive/langchain/data/Guideline for Vector DaVinci configurator tool.pdf"
loader = PyMuPDFLoader(url)
documents = loader.load()

# Step 2: Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

# Step 3: Create vector store (embedding + FAISS)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(docs, embeddings)

# Step 4: Set up retriever and QA chain
retriever = db.as_retriever()
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(temperature=0),
    chain_type="stuff",
    retriever=retriever
)

# Step 5: Ask questions
query = "What is the document about?"
response = qa_chain.run(query)
print(response)


  embeddings = OpenAIEmbeddings()
  llm=OpenAI(temperature=0),
  response = qa_chain.run(query)


 The document is a guideline for using the Vector DaVinci configurator tool, including instructions on how to launch the tool, create new projects and configurations, and perform generation and compilation. It also explains the difference between a Parameter Definition File (PDF) and a Configuration Description File (CDF).


In [6]:
!pip install langchain openai chromadb tiktoken pymupdf




In [9]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

# Step 1: Load PDF
loader = PyMuPDFLoader(url)
documents = loader.load()

# Step 2: Split the text
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

# Step 3: Create Chroma vector store
embedding_model = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embedding_model, persist_directory="./chroma_db")

# Step 4: Create retriever and QA chain
retriever = vectorstore.as_retriever()
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(temperature=0),
    chain_type="stuff",
    retriever=retriever
)

# Step 5: Ask a question
query = "Summarize the main points in the PDF"
answer = qa_chain.run(query)
print(answer)


 The PDF is used in DaVinci for creating new configurations and familiarizing oneself with the options present in the tool. It also allows for the import and export of configurations and the modification of existing ones. The PDF is different from the CDF, which is used for mandatory and non-mandatory parameters for specific users.
