<a href="https://colab.research.google.com/github/Chahethsen12/Chat_with_PDF/blob/main/Chat_With_PDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Install Libraries

In [None]:
!pip install -q langchain langchain-groq langchain-community chromadb pypdf sentence-transformers
print("RAG System libraries installed!")

Setup API Keys

In [None]:
import os
from google.colab import userdata
from getpass import getpass

# We only need Groq for the "Thinking" part.
# We will use HuggingFace (free/local) for the "Embedding" part to save money/limit.
if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass("Enter your Groq API Key: ")

**The RAG Logic (The "Brain")**
This code does three complex things in seconds:

Ingest: Reads a PDF.
Split: Cuts it into small chunks (so the AI doesn't get overwhelmed).
Vectorize: Converts those chunks into math and stores them in a Vector Database.

In [None]:
import os
from langchain_groq import ChatGroq
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

def process_pdf_and_ask(pdf_path, user_question):
    # 1. Load the PDF
    print("üìÇ Loading PDF...")
    loader = PyPDFLoader(pdf_path)
    docs = loader.load()

    # 2. Split into chunks (AI can't read a whole book at once)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(docs)
    print(f"‚úÇÔ∏è Split document into {len(splits)} chunks.")

    # 3. Create Embeddings (Turn text into numbers)
    # We use a free, powerful model from HuggingFace
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

    # 4. Store in Vector Database (Chroma)
    print("üíæ Indexing data into Vector Database...")
    vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

    # 5. Setup the Retriever (The "Librarian")
    retriever = vectorstore.as_retriever()

    # 6. Setup the LLM (The "Speaker")
    llm = ChatGroq(model_name="llama3-8b-8192")

    # 7. Create the Chat Chain
    qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type="stuff")

    # 8. Ask the Question
    print("ü§î Thinking...")
    response = qa_chain.run(user_question)

    # Cleanup (to save RAM in Colab)
    vectorstore.delete_collection()

    return response

print("RAG Function Ready!")

Run It
For this to work, you need a PDF.

Find a PDF on your computer (e.g., a resume, a small research paper, or a homework assignment).

In Colab, click the Folder Icon üìÅ on the left sidebar.

Drag and drop your PDF there.

Update the pdf_filename below to match your file's name.

In [None]:
# UPDATE THIS NAME to the file you uploaded
pdf_filename = "sample.pdf"

# What do you want to ask your PDF?
question = "Summarize the main points of this document."

# Run!
try:
    answer = process_pdf_and_ask(pdf_filename, question)
    print("\n" + "="*50)
    print(f"üìù AI ANSWER:\n{answer}")
    print("="*50)
except FileNotFoundError:
    print("‚ùå Error: Please upload a PDF to the Colab files section first!")