# RAG Application with Gemini API and LangChain

This notebook implements a Retrieval-Augmented Generation (RAG) system using:
- **LLM**: Google Gemini 2.5 Flash (via Google AI Studio - Free Tier)
- **Embeddings**: Gemini Embedding model (gemini-embedding-001)
- **Vector Store**: FAISS (in-memory)
- **Framework**: LangChain with modern LCEL patterns (1.0 compatible)

## Features
- Upload and process PDF, TXT, and DOCX files
- Create embeddings and store in vector database
- Ask questions and get answers with source citations
- All in-memory (no persistence between sessions)
- Free tier usage only

## 1. Installation and Setup

First, let's install all required packages.

In [1]:
# Install required packages
!pip install -q -U \
    langchain \
    langchain-classic \
    langchain-google-genai \
    langchain-community \
    langchain-text-splitters \
    faiss-cpu \
    pypdf \
    python-docx \
    python-dotenv

## 2. Import Libraries

In [3]:
import os
import warnings
from typing import List, Dict, Any

# LangChain core imports
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document

# Document loaders
from langchain_community.document_loaders import (
    PyPDFLoader,
    TextLoader,
    Docx2txtLoader
)

# Vector store
from langchain_community.vectorstores import FAISS

# Google Gemini imports
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

# Modern LangChain chain constructors (LCEL)
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

print("✓ All libraries imported successfully!")

✓ All libraries imported successfully!


## 3. Configuration

### Get Your Free Gemini API Key:
1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Click "Create API Key"
3. Copy the key and paste it below

**Note**: The free tier provides:
- 1,500 requests per day
- 15 requests per minute
- 1 million tokens per minute

In [None]:
# Set your Gemini API key
# Option 1: Direct input
GOOGLE_API_KEY = "GOOGLE_API_KEY"  

# Option 2: Use environment variable
# GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

# Set the environment variable
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

print("✓ API key configured!")

✓ API key configured!


### Initialize Models

In [7]:
# Initialize Gemini LLM
llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",  # Free tier model
    temperature=0,  # For more deterministic outputs
    max_tokens=1024
)

# Initialize Gemini Embeddings
embeddings = GoogleGenerativeAIEmbeddings(
    model="gemini-embedding-001"  # Gemini embedding model
)

print("✓ LLM and Embeddings models initialized!")

✓ LLM and Embeddings models initialized!


### Test the Models

In [8]:
# Test LLM
try:
    response = llm.invoke("Say 'Hello World' if you're working!")
    print("LLM Response:", response.content)
    print("✓ LLM is working!\n")
except Exception as e:
    print(f"❌ LLM Error: {e}\n")

# Test Embeddings
try:
    test_embedding = embeddings.embed_query("test")
    print(f"Embedding dimension: {len(test_embedding)}")
    print(f"First 5 values: {test_embedding[:5]}")
    print("✓ Embeddings are working!")
except Exception as e:
    print(f"❌ Embeddings Error: {e}")

LLM Response: Hello World!
✓ LLM is working!

Embedding dimension: 3072
First 5 values: [-0.020297376438975334, 0.0038267294876277447, 0.016992559656500816, -0.09309638291597366, -0.0009401048882864416]
✓ Embeddings are working!


## 4. Document Loading Functions

Functions to load different document types.

In [9]:
def load_document(file_path: str) -> List[Document]:
    """
    Load a document based on its file extension.

    Supported formats: .pdf, .txt, .docx

    Args:
        file_path: Path to the document file

    Returns:
        List of Document objects
    """
    file_extension = os.path.splitext(file_path)[1].lower()

    try:
        if file_extension == ".pdf":
            loader = PyPDFLoader(file_path)
        elif file_extension == ".txt":
            loader = TextLoader(file_path, encoding="utf-8")
        elif file_extension == ".docx":
            loader = Docx2txtLoader(file_path)
        else:
            raise ValueError(f"Unsupported file format: {file_extension}")

        documents = loader.load()
        print(f"✓ Loaded {len(documents)} document(s) from {file_path}")
        return documents

    except Exception as e:
        print(f"❌ Error loading document: {e}")
        return []


def load_document_from_text(text: str, metadata: Dict[str, Any] = None) -> List[Document]:
    """
    Create a document from raw text.

    Args:
        text: Raw text content
        metadata: Optional metadata dictionary

    Returns:
        List containing a single Document object
    """
    if metadata is None:
        metadata = {"source": "user_input"}

    doc = Document(page_content=text, metadata=metadata)
    print(f"✓ Created document from text ({len(text)} characters)")
    return [doc]


print("✓ Document loading functions defined!")

✓ Document loading functions defined!


## 5. Text Splitting

Split documents into smaller chunks for better retrieval.

In [10]:
def split_documents(documents: List[Document],
                   chunk_size: int = 1000,
                   chunk_overlap: int = 200) -> List[Document]:
    """
    Split documents into smaller chunks.

    Args:
        documents: List of Document objects to split
        chunk_size: Maximum size of each chunk in characters
        chunk_overlap: Number of characters to overlap between chunks

    Returns:
        List of split Document objects
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        separators=["\n\n", "\n", ". ", " ", ""]
    )

    splits = text_splitter.split_documents(documents)
    print(f"✓ Split into {len(splits)} chunks")

    # Show first chunk as example
    if splits:
        print(f"\nExample chunk (first 200 chars):")
        print(f"{splits[0].page_content[:200]}...\n")

    return splits


print("✓ Text splitting function defined!")

✓ Text splitting function defined!


## 6. Vector Store Creation

Create a FAISS vector store from document chunks.

In [11]:
def create_vector_store(chunks: List[Document],
                       embeddings_model: GoogleGenerativeAIEmbeddings) -> FAISS:
    """
    Create a FAISS vector store from document chunks.

    Args:
        chunks: List of document chunks
        embeddings_model: Embeddings model to use

    Returns:
        FAISS vector store
    """
    print(f"Creating embeddings for {len(chunks)} chunks...")
    print("(This may take a moment with free tier rate limits)\n")

    try:
        vectorstore = FAISS.from_documents(
            documents=chunks,
            embedding=embeddings_model
        )
        print("✓ Vector store created successfully!")
        return vectorstore

    except Exception as e:
        print(f"❌ Error creating vector store: {e}")
        raise


print("✓ Vector store function defined!")

✓ Vector store function defined!


## 7. RAG Chain Creation (Modern LCEL Pattern)

This uses the modern `create_retrieval_chain` pattern that's compatible with LangChain 1.0.

In [12]:
def create_rag_chain(vectorstore: FAISS,
                     llm_model: ChatGoogleGenerativeAI,
                     k: int = 4):
    """
    Create a RAG chain using modern LCEL patterns.

    Args:
        vectorstore: FAISS vector store
        llm_model: Language model to use
        k: Number of documents to retrieve

    Returns:
        RAG chain (Runnable)
    """
    # Create retriever from vector store
    retriever = vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": k}
    )

    # Define the prompt template
    system_prompt = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer the question. "
        "If you don't know the answer, just say that you don't know. "
        "Use three sentences maximum and keep the answer concise.\n\n"
        "Context: {context}"
    )

    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", "{input}"),
        ]
    )

    # Create the document chain (combines documents and generates answer)
    question_answer_chain = create_stuff_documents_chain(llm_model, prompt)

    # Create the full RAG chain (retrieval + generation)
    rag_chain = create_retrieval_chain(retriever, question_answer_chain)

    print("✓ RAG chain created successfully!")
    return rag_chain


print("✓ RAG chain function defined!")

✓ RAG chain function defined!


## 8. Query Function with Source Display

In [13]:
def query_rag(chain, question: str, show_sources: bool = True) -> Dict[str, Any]:
    """
    Query the RAG chain and display results.

    Args:
        chain: The RAG chain
        question: Question to ask
        show_sources: Whether to display source documents

    Returns:
        Dictionary with answer and context
    """
    print(f"\n{'='*70}")
    print(f"QUESTION: {question}")
    print(f"{'='*70}\n")

    try:
        # Invoke the chain
        result = chain.invoke({"input": question})

        # Display answer
        print("ANSWER:")
        print(result["answer"])
        print()

        # Display sources if requested
        if show_sources and "context" in result:
            print(f"\n{'─'*70}")
            print(f"SOURCES ({len(result['context'])} documents retrieved):")
            print(f"{'─'*70}\n")

            for i, doc in enumerate(result["context"], 1):
                print(f"Source {i}:")
                print(f"Content: {doc.page_content[:300]}...")
                if doc.metadata:
                    print(f"Metadata: {doc.metadata}")
                print()

        return result

    except Exception as e:
        print(f"❌ Error querying RAG chain: {e}")
        return {}


print("✓ Query function defined!")

✓ Query function defined!


## 9. Complete RAG Pipeline Class

A wrapper class to manage the entire RAG workflow.

In [14]:
class RAGSystem:
    """
    Complete RAG system that handles document loading, processing, and querying.
    """

    def __init__(self, llm, embeddings, chunk_size=1000, chunk_overlap=200, k=4):
        """
        Initialize the RAG system.

        Args:
            llm: Language model
            embeddings: Embeddings model
            chunk_size: Size of text chunks
            chunk_overlap: Overlap between chunks
            k: Number of documents to retrieve
        """
        self.llm = llm
        self.embeddings = embeddings
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.k = k

        self.documents = []
        self.chunks = []
        self.vectorstore = None
        self.rag_chain = None

    def load_from_file(self, file_path: str):
        """Load document from file."""
        self.documents = load_document(file_path)
        return self

    def load_from_text(self, text: str, metadata: Dict[str, Any] = None):
        """Load document from text."""
        self.documents = load_document_from_text(text, metadata)
        return self

    def process_documents(self):
        """Split documents and create vector store."""
        if not self.documents:
            print("❌ No documents loaded!")
            return self

        # Split documents
        self.chunks = split_documents(
            self.documents,
            self.chunk_size,
            self.chunk_overlap
        )

        # Create vector store
        self.vectorstore = create_vector_store(self.chunks, self.embeddings)

        # Create RAG chain
        self.rag_chain = create_rag_chain(self.vectorstore, self.llm, self.k)

        print("\n✓ RAG system ready for queries!\n")
        return self

    def query(self, question: str, show_sources: bool = True):
        """Query the RAG system."""
        if self.rag_chain is None:
            print("❌ RAG chain not initialized. Call process_documents() first!")
            return {}

        return query_rag(self.rag_chain, question, show_sources)

    def get_stats(self):
        """Get statistics about the loaded documents."""
        stats = {
            "num_documents": len(self.documents),
            "num_chunks": len(self.chunks),
            "chunk_size": self.chunk_size,
            "chunk_overlap": self.chunk_overlap,
            "retrieval_k": self.k
        }

        print("\nRAG System Statistics:")
        print("─" * 40)
        for key, value in stats.items():
            print(f"{key.replace('_', ' ').title()}: {value}")
        print("─" * 40 + "\n")

        return stats


print("✓ RAGSystem class defined!")

✓ RAGSystem class defined!


## 10. Example Usage

Let's test the system with sample text.

### Example 1: Using Sample Text

In [15]:
# Sample document about AI and LangChain
sample_text = """
LangChain is a framework for developing applications powered by language models.
It enables applications that are context-aware and can reason about how to answer
based on provided context. The framework consists of several key components:

Models: LangChain provides abstractions for working with different language models
from various providers like OpenAI, Anthropic, and Google. These models can be
easily swapped without changing the application code.

Prompts: The framework includes tools for managing and optimizing prompts, which
are the inputs given to language models. Prompt templates allow for dynamic
generation of prompts based on user input.

Memory: LangChain provides different types of memory components that allow
applications to maintain context across multiple interactions. This is essential
for building chatbots and conversational AI systems.

Chains: These are sequences of calls to LLMs or other tools. Chains can be simple
(calling a single LLM) or complex (calling multiple LLMs or tools in sequence).
The newest approach uses LangChain Expression Language (LCEL) which provides
better composability and streaming support.

Retrieval-Augmented Generation (RAG) is one of the most powerful applications of
LangChain. RAG combines the power of language models with external knowledge bases.
It works by first retrieving relevant documents from a knowledge base, then using
those documents as context for the language model to generate an answer.

Vector Stores are a key component in RAG systems. They store document embeddings,
which are numerical representations of text that capture semantic meaning. When a
user asks a question, the question is also embedded, and the vector store finds
the most similar document embeddings, effectively finding the most relevant documents.

The Gemini API from Google provides both powerful language models and embedding
models. Gemini 2.0 Flash is optimized for speed and efficiency, while maintaining
high quality outputs. The Gemini Embedding model produces state-of-the-art embeddings
that work across multiple languages and understand nuanced context.
"""

# Initialize RAG system
rag = RAGSystem(
    llm=llm,
    embeddings=embeddings,
    chunk_size=500,  # Smaller chunks for this example
    chunk_overlap=100,
    k=3  # Retrieve top 3 chunks
)

# Load and process the document
rag.load_from_text(sample_text, metadata={"source": "langchain_intro"})
rag.process_documents()

# Get statistics
rag.get_stats()

✓ Created document from text (2123 characters)
✓ Split into 6 chunks

Example chunk (first 200 chars):
LangChain is a framework for developing applications powered by language models.
It enables applications that are context-aware and can reason about how to answer
based on provided context. The framew...

Creating embeddings for 6 chunks...
(This may take a moment with free tier rate limits)

✓ Vector store created successfully!
✓ RAG chain created successfully!

✓ RAG system ready for queries!


RAG System Statistics:
────────────────────────────────────────
Num Documents: 1
Num Chunks: 6
Chunk Size: 500
Chunk Overlap: 100
Retrieval K: 3
────────────────────────────────────────



{'num_documents': 1,
 'num_chunks': 6,
 'chunk_size': 500,
 'chunk_overlap': 100,
 'retrieval_k': 3}

### Ask Questions

In [16]:
# Question 1
rag.query("What is LangChain?", show_sources=True)


QUESTION: What is LangChain?

ANSWER:
LangChain is a framework designed for developing applications powered by language models. It enables applications to be context-aware and reason about how to answer based on provided context. The framework includes components like models, prompts, memory, and chains to facilitate this development.


──────────────────────────────────────────────────────────────────────
SOURCES (3 documents retrieved):
──────────────────────────────────────────────────────────────────────

Source 1:
Content: LangChain is a framework for developing applications powered by language models.
It enables applications that are context-aware and can reason about how to answer
based on provided context. The framework consists of several key components:

Models: LangChain provides abstractions for working with di...
Metadata: {'source': 'langchain_intro'}

Source 2:
Content: Prompts: The framework includes tools for managing and optimizing prompts, which
are the inputs given

{'input': 'What is LangChain?',
 'context': [Document(id='8bed6b5f-0590-4f08-8d01-0b6d885237bc', metadata={'source': 'langchain_intro'}, page_content='LangChain is a framework for developing applications powered by language models.\nIt enables applications that are context-aware and can reason about how to answer\nbased on provided context. The framework consists of several key components:\n\nModels: LangChain provides abstractions for working with different language models\nfrom various providers like OpenAI, Anthropic, and Google. These models can be\neasily swapped without changing the application code.'),
  Document(id='b2191d88-148e-44fb-b7f7-8af0d77281b2', metadata={'source': 'langchain_intro'}, page_content='Prompts: The framework includes tools for managing and optimizing prompts, which\nare the inputs given to language models. Prompt templates allow for dynamic\ngeneration of prompts based on user input.\n\nMemory: LangChain provides different types of memory components that a

In [17]:
# Question 2
rag.query("How does RAG work?", show_sources=True)


QUESTION: How does RAG work?

ANSWER:
RAG works by first retrieving relevant documents from an external knowledge base. These retrieved documents are then used as context for a language model. The language model then generates an answer based on this provided context.


──────────────────────────────────────────────────────────────────────
SOURCES (3 documents retrieved):
──────────────────────────────────────────────────────────────────────

Source 1:
Content: Retrieval-Augmented Generation (RAG) is one of the most powerful applications of
LangChain. RAG combines the power of language models with external knowledge bases.
It works by first retrieving relevant documents from a knowledge base, then using
those documents as context for the language model to ...
Metadata: {'source': 'langchain_intro'}

Source 2:
Content: Vector Stores are a key component in RAG systems. They store document embeddings,
which are numerical representations of text that capture semantic meaning. When a
user 

{'input': 'How does RAG work?',
 'context': [Document(id='a3dad085-2457-4022-86de-4a249fc41965', metadata={'source': 'langchain_intro'}, page_content='Retrieval-Augmented Generation (RAG) is one of the most powerful applications of\nLangChain. RAG combines the power of language models with external knowledge bases.\nIt works by first retrieving relevant documents from a knowledge base, then using\nthose documents as context for the language model to generate an answer.'),
  Document(id='b20aeffd-e951-4175-aee0-1d30537119bb', metadata={'source': 'langchain_intro'}, page_content='Vector Stores are a key component in RAG systems. They store document embeddings,\nwhich are numerical representations of text that capture semantic meaning. When a\nuser asks a question, the question is also embedded, and the vector store finds\nthe most similar document embeddings, effectively finding the most relevant documents.'),
  Document(id='8bed6b5f-0590-4f08-8d01-0b6d885237bc', metadata={'source': 'lan

In [18]:
# Question 3
rag.query("What are vector stores used for?", show_sources=True)


QUESTION: What are vector stores used for?

ANSWER:
Vector stores are used to store document embeddings, which are numerical representations of text capturing semantic meaning. In RAG systems, they help find the most relevant documents by comparing the embedding of a user's question to stored document embeddings. This process effectively retrieves documents most similar to the query.


──────────────────────────────────────────────────────────────────────
SOURCES (3 documents retrieved):
──────────────────────────────────────────────────────────────────────

Source 1:
Content: Vector Stores are a key component in RAG systems. They store document embeddings,
which are numerical representations of text that capture semantic meaning. When a
user asks a question, the question is also embedded, and the vector store finds
the most similar document embeddings, effectively findin...
Metadata: {'source': 'langchain_intro'}

Source 2:
Content: Retrieval-Augmented Generation (RAG) is one of the 

{'input': 'What are vector stores used for?',
 'context': [Document(id='b20aeffd-e951-4175-aee0-1d30537119bb', metadata={'source': 'langchain_intro'}, page_content='Vector Stores are a key component in RAG systems. They store document embeddings,\nwhich are numerical representations of text that capture semantic meaning. When a\nuser asks a question, the question is also embedded, and the vector store finds\nthe most similar document embeddings, effectively finding the most relevant documents.'),
  Document(id='a3dad085-2457-4022-86de-4a249fc41965', metadata={'source': 'langchain_intro'}, page_content='Retrieval-Augmented Generation (RAG) is one of the most powerful applications of\nLangChain. RAG combines the power of language models with external knowledge bases.\nIt works by first retrieving relevant documents from a knowledge base, then using\nthose documents as context for the language model to generate an answer.'),
  Document(id='8bed6b5f-0590-4f08-8d01-0b6d885237bc', metadata={

In [19]:
# Question 4 - Testing out of context
rag.query("What is the capital of France?", show_sources=True)


QUESTION: What is the capital of France?

ANSWER:
I don't know the answer to that question based on the provided context. The context describes LangChain and its components, not geographical facts.


──────────────────────────────────────────────────────────────────────
SOURCES (3 documents retrieved):
──────────────────────────────────────────────────────────────────────

Source 1:
Content: LangChain is a framework for developing applications powered by language models.
It enables applications that are context-aware and can reason about how to answer
based on provided context. The framework consists of several key components:

Models: LangChain provides abstractions for working with di...
Metadata: {'source': 'langchain_intro'}

Source 2:
Content: Prompts: The framework includes tools for managing and optimizing prompts, which
are the inputs given to language models. Prompt templates allow for dynamic
generation of prompts based on user input.

Memory: LangChain provides different ty

{'input': 'What is the capital of France?',
 'context': [Document(id='8bed6b5f-0590-4f08-8d01-0b6d885237bc', metadata={'source': 'langchain_intro'}, page_content='LangChain is a framework for developing applications powered by language models.\nIt enables applications that are context-aware and can reason about how to answer\nbased on provided context. The framework consists of several key components:\n\nModels: LangChain provides abstractions for working with different language models\nfrom various providers like OpenAI, Anthropic, and Google. These models can be\neasily swapped without changing the application code.'),
  Document(id='b2191d88-148e-44fb-b7f7-8af0d77281b2', metadata={'source': 'langchain_intro'}, page_content='Prompts: The framework includes tools for managing and optimizing prompts, which\nare the inputs given to language models. Prompt templates allow for dynamic\ngeneration of prompts based on user input.\n\nMemory: LangChain provides different types of memory compo

### Example 2: Loading from a File

Uncomment and modify the path to load from an actual file.

In [None]:
# # Create a new RAG system for file-based document
# rag_file = RAGSystem(
#     llm=llm,
#     embeddings=embeddings,
#     chunk_size=1000,
#     chunk_overlap=200,
#     k=4
# )

# # Load from file (replace with your file path)
# file_path = "path/to/your/document.pdf"  # or .txt, .docx
# rag_file.load_from_file(file_path)
# rag_file.process_documents()

# # Query the document
# rag_file.query("Your question here")

## 11. Advanced Features

### Direct Vector Store Search (without LLM)

In [None]:
def search_similar_documents(vectorstore: FAISS, query: str, k: int = 3):
    """
    Search for similar documents without generating an answer.
    Useful for understanding what context is being retrieved.
    """
    print(f"\nSearching for documents similar to: '{query}'\n")
    print("─" * 70)

    # Perform similarity search
    docs = vectorstore.similarity_search(query, k=k)

    for i, doc in enumerate(docs, 1):
        print(f"\nDocument {i}:")
        print(f"Content: {doc.page_content[:400]}...")
        print(f"Metadata: {doc.metadata}")
        print("─" * 70)

# Try it out
search_similar_documents(rag.vectorstore, "embeddings and vector stores", k=2)

### Similarity Search with Scores

In [None]:
def search_with_scores(vectorstore: FAISS, query: str, k: int = 3):
    """
    Search with similarity scores to see how relevant the documents are.
    Lower scores mean more similar.
    """
    print(f"\nSearching with scores for: '{query}'\n")
    print("─" * 70)

    docs_with_scores = vectorstore.similarity_search_with_score(query, k=k)

    for i, (doc, score) in enumerate(docs_with_scores, 1):
        print(f"\nDocument {i} (Score: {score:.4f}):")
        print(f"Content: {doc.page_content[:300]}...")
        print("─" * 70)

# Try it out
search_with_scores(rag.vectorstore, "What is a chain in LangChain?", k=3)

## 12. Helper Function: Create Sample PDF

This creates a sample PDF file for testing.

In [None]:
# def create_sample_pdf(filename="sample_document.pdf"):
#     """Create a sample PDF for testing."""
#     from reportlab.lib.pagesizes import letter
#     from reportlab.pdfgen import canvas

#     c = canvas.Canvas(filename, pagesize=letter)
#     c.drawString(100, 750, "Sample Document for RAG Testing")
#     c.drawString(100, 730, "")
#     c.drawString(100, 710, "This is a sample document created for testing the RAG system.")
#     c.drawString(100, 690, "It contains information about artificial intelligence and machine learning.")
#     c.save()
#     print(f"✓ Created {filename}")

# Uncomment to create a sample PDF
# create_sample_pdf()

## 13. Save and Export Functions

Functions to save results for later use.

In [None]:
import json
from datetime import datetime

def save_qa_session(questions_and_answers: List[Dict], filename: str = None):
    """
    Save Q&A session to a JSON file.

    Args:
        questions_and_answers: List of Q&A dictionaries
        filename: Output filename (auto-generated if None)
    """
    if filename is None:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"qa_session_{timestamp}.json"

    with open(filename, 'w') as f:
        json.dump(questions_and_answers, f, indent=2)

    print(f"✓ Session saved to {filename}")

# Example usage:
# qa_history = []
# result = rag.query("What is LangChain?")
# qa_history.append({
#     "question": "What is LangChain?",
#     "answer": result["answer"],
#     "timestamp": datetime.now().isoformat()
# })
# save_qa_session(qa_history)

## 15. Troubleshooting

### Common Issues and Solutions:

1. **Rate Limit Errors**:
   - Free tier has 15 RPM (requests per minute)
   - Add delays between requests if needed
   - Reduce chunk count or batch size

2. **API Key Issues**:
   - Verify key is correct
   - Check key has not expired
   - Ensure key is properly set in environment

3. **Document Loading Errors**:
   - Check file path is correct
   - Verify file format is supported
   - Ensure file is not corrupted

4. **Memory Issues**:
   - Reduce chunk size
   - Process documents in smaller batches
   - Use smaller documents for testing