# RAG System for Analyzing PDF Documents with Groq

This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using:
- **Groq** for fast LLM inference (Llama models)
- **FAISS** for vector storage and similarity search
- **LangChain** for orchestrating the RAG pipeline

## Why Groq?
- **Speed**: Groq's LPU (Language Processing Unit) provides extremely fast inference
- **Free tier**: Generous free API access for development
- **Open models**: Access to Llama 3.3, Mixtral, and other open-source models
- **Simple API**: No need for local GPU resources

## Setup

### 1. Get a Groq API Key
1. Go to https://console.groq.com
2. Sign up for a free account
3. Create an API key
4. Set it in Cell 3 below

In [None]:
# Install required packages
!pip install langchain langchain-groq langchain-community
!pip install faiss-cpu
!pip install pypdf
!pip install sentence-transformers
!pip install python-dotenv

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file if it exists
load_dotenv()

# Set your Groq API key here (or use environment variable)
os.environ["GROQ_API_KEY"] = "your-groq-api-key-here"  # Replace with your actual key

# Verify the key is set
if os.environ.get("GROQ_API_KEY") == "your-groq-api-key-here":
    print("WARNING: Please set your actual Groq API key!")
else:
    print("Groq API key is set.")

In [None]:
# Import required libraries
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import AIMessage, HumanMessage

## Step 1: Load and Process the PDF Document

We'll load a PDF and split it into chunks for processing.

In [None]:
# Load your PDF document
# Replace with your PDF file path
pdf_file_path = "your_document.pdf"

# For demo purposes, let's create a simple text-based example if no PDF is available
try:
    loader = PyPDFLoader(pdf_file_path)
    pages = loader.load_and_split()
    print(f"Loaded {len(pages)} pages from PDF")
except FileNotFoundError:
    print(f"PDF not found at {pdf_file_path}")
    print("Creating sample documents for demonstration...")
    
    # Create sample documents for demo
    from langchain.schema import Document
    pages = [
        Document(
            page_content="""Task Decomposition is a technique where complex tasks are broken down 
            into smaller, more manageable sub-tasks. This approach is commonly used in AI systems 
            to improve problem-solving capabilities. Methods include Chain of Thought (CoT), 
            Tree of Thoughts (ToT), and Plan-and-Execute strategies.""",
            metadata={"source": "sample", "page": 0}
        ),
        Document(
            page_content="""Multi-agent systems consist of multiple AI agents working together 
            to accomplish tasks. Each agent can have specialized roles and capabilities. 
            These systems excel at tasks requiring collaboration, debate, and diverse perspectives. 
            Applications include software development, research, and complex problem-solving.""",
            metadata={"source": "sample", "page": 1}
        ),
        Document(
            page_content="""Retrieval-Augmented Generation (RAG) combines the power of large 
            language models with external knowledge retrieval. The system first retrieves 
            relevant documents from a knowledge base, then uses this context to generate 
            accurate and informed responses. This reduces hallucinations and improves factual accuracy.""",
            metadata={"source": "sample", "page": 2}
        )
    ]
    print(f"Created {len(pages)} sample documents")

In [None]:
# View the first document
print("First document content:")
print(pages[0].page_content[:500])
print("\nMetadata:", pages[0].metadata)

## Step 2: Create the Vector Store

We'll use HuggingFace embeddings (free, runs locally) and FAISS for the vector store.

### Why these choices?
- **HuggingFace Embeddings**: Free, no API key needed, runs locally
- **FAISS**: Fast, efficient vector similarity search by Meta/Facebook

In [None]:
# Initialize the embedding model (free, runs locally)
# This model is good for general-purpose text embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",  # Small, fast, and effective
    model_kwargs={'device': 'cpu'}   # Use 'cuda' if you have a GPU
)

print("Embedding model loaded!")

In [None]:
# Create the FAISS vector store from our documents
vectorstore = FAISS.from_documents(pages, embedding=embeddings)

print(f"Vector store created with {len(pages)} documents")

In [None]:
# Test the vector store with a similarity search
query = "What is task decomposition?"
docs = vectorstore.similarity_search(query, k=2)

print(f"Query: {query}\n")
print("Top 2 most similar documents:")
for i, doc in enumerate(docs):
    print(f"\n--- Document {i+1} ---")
    print(doc.page_content[:300])

## Step 3: Initialize the Groq LLM

Groq provides access to several powerful open-source models:

| Model | Description | Best For |
|-------|-------------|----------|
| `llama-3.3-70b-versatile` | Latest Llama, very capable | Complex reasoning |
| `llama-3.1-8b-instant` | Smaller, faster Llama | Quick responses |
| `mixtral-8x7b-32768` | Mixture of experts | Long context tasks |
| `gemma2-9b-it` | Google's Gemma 2 | Balanced performance |

In [None]:
# Initialize the Groq LLM
llm = ChatGroq(
    model="llama-3.3-70b-versatile",  # You can change this to other models
    temperature=0.2,  # Lower = more focused/deterministic
    max_tokens=1000
)

print("Groq LLM initialized!")

In [None]:
# Quick test of the LLM
response = llm.invoke("Say hello in one sentence.")
print("LLM Test Response:", response.content)

## Step 4: Build the Basic RAG Chain

Now we'll combine the vector store and LLM into a RAG pipeline.

In [None]:
# Define the system prompt for the RAG chain
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

# Create the RAG chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(vectorstore.as_retriever(), question_answer_chain)

print("RAG chain created!")

In [None]:
# Test the RAG chain
response = rag_chain.invoke({"input": "What is task decomposition?"})

print("Question:", response['input'])
print("\nAnswer:", response['answer'])

In [None]:
# View the retrieved context
print("Retrieved Context Documents:")
for i, doc in enumerate(response['context']):
    print(f"\n--- Document {i+1} ---")
    print(doc.page_content[:200])

## Step 5: Add Chat History (Conversational RAG)

To make the system conversational, we need to:
1. Track chat history
2. Reformulate questions based on context
3. Use history-aware retrieval

In [None]:
# Create a history-aware retriever
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    llm, vectorstore.as_retriever(), contextualize_q_prompt
)

print("History-aware retriever created!")

In [None]:
# Create the conversational RAG chain
qa_system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
conversational_rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

print("Conversational RAG chain created!")

In [None]:
# Have a conversation!
chat_history = []

# First question
question1 = "What is Task Decomposition?"
response1 = conversational_rag_chain.invoke({
    "input": question1, 
    "chat_history": chat_history
})

print(f"Q: {question1}")
print(f"A: {response1['answer']}\n")

# Update chat history
chat_history.extend([
    HumanMessage(content=question1),
    AIMessage(content=response1["answer"]),
])

In [None]:
# Follow-up question (uses "it" to refer to previous topic)
question2 = "What are common ways of doing it?"
response2 = conversational_rag_chain.invoke({
    "input": question2, 
    "chat_history": chat_history
})

print(f"Q: {question2}")
print(f"A: {response2['answer']}")

# The system understands "it" refers to "Task Decomposition" from the previous question!

In [None]:
# View the full chat history
print("Chat History:")
for msg in chat_history:
    role = "Human" if isinstance(msg, HumanMessage) else "AI"
    print(f"\n{role}: {msg.content}")

## Step 6: Interactive Chat Loop

Let's create an interactive chat interface!

In [None]:
def chat_with_documents(rag_chain, initial_history=None):
    """
    Interactive chat function for RAG system.
    Type 'quit' or 'exit' to stop.
    Type 'clear' to reset chat history.
    """
    chat_history = initial_history if initial_history else []
    
    print("="*50)
    print("RAG Chat with Groq")
    print("Commands: 'quit'/'exit' to stop, 'clear' to reset")
    print("="*50)
    
    while True:
        user_input = input("\nYou: ").strip()
        
        if user_input.lower() in ['quit', 'exit']:
            print("Goodbye!")
            break
        elif user_input.lower() == 'clear':
            chat_history = []
            print("Chat history cleared.")
            continue
        elif not user_input:
            continue
            
        # Get response from RAG chain
        response = rag_chain.invoke({
            "input": user_input,
            "chat_history": chat_history
        })
        
        print(f"\nAssistant: {response['answer']}")
        
        # Update history
        chat_history.extend([
            HumanMessage(content=user_input),
            AIMessage(content=response["answer"]),
        ])
    
    return chat_history

In [None]:
# Uncomment to run interactive chat
# history = chat_with_documents(conversational_rag_chain)

## Summary

In this notebook, we built a complete RAG system using:

1. **Document Loading**: PyPDFLoader for PDF processing
2. **Embeddings**: HuggingFace's `all-MiniLM-L6-v2` (free, local)
3. **Vector Store**: FAISS for efficient similarity search
4. **LLM**: Groq's `llama-3.3-70b-versatile` (fast, free tier available)
5. **Orchestration**: LangChain for RAG pipeline

### Key Advantages of This Setup:
- **Cost-effective**: Free embeddings + Groq's generous free tier
- **Fast**: Groq's LPU provides extremely fast inference
- **Simple**: No local GPU needed for the LLM
- **Powerful**: Access to state-of-the-art Llama models

### Next Steps:
- Try different Groq models for different use cases
- Experiment with chunk sizes and overlap
- Add more documents to your knowledge base
- Implement persistent vector storage

In [None]:
# Bonus: Save and load the vector store for later use

# Save
# vectorstore.save_local("my_vectorstore")

# Load
# loaded_vectorstore = FAISS.load_local(
#     "my_vectorstore", 
#     embeddings,
#     allow_dangerous_deserialization=True
# )