# **RAG for Product Managers**
This notebook demonstrates how to build and evaluate a RAG pipeline using specific PDF documents as the knowledge source.
You can get a free GEMINI API key [here](https://aistudio.google.com/app/u/3/apikey) with some rate limitations



# **1. Setup: Install libraries and configure API keys.**


In [None]:
# Install and upgrade required packages to their latest versions
!pip install --upgrade langchain langchain_community langchain_core langchain_google_genai google-generativeai

# Install other dependencies
!pip install pypdf
!pip install chromadb
!pip install pandas
!pip install rank_bm25
!pip install streamlit

Next, Add your Google Gemini API key in the Secrets settings 🔑 in Google Colab on the left panel


In [None]:
# Load your Keys into the environment variable. This will be automatically used by the LLM functions
import os
import google.generativeai as genai
from google.colab import userdata

if not os.getenv("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')  # Store API key in environment variable
#GEMINI_API_KEY = userdata.get('GOOGLE_API_KEY')
#genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

# **📄 Part 2: Loading PDF Documents**


##**2.1 Upload External Files**

We have created 5 sample documents for our RAG. They are available in the github links provided below

In [29]:
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# --- 1. Download PDF Files from GitHub ---

# !!! IMPORTANT: Replace the placeholder URLs below with the RAW file links from your GitHub repo.
github_files = {
   "CRM_Pro_Technical_documentation.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Technical_documentation.pdf",
        "CRM_Pro_User_guide.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_User_guide.pdf",
        "CRM_Pro_Billing_Information.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Billing_Information.pdf",
        "CRM_Pro_Customer_Support_Guide.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Customer_Support_Guide.pdf",
        "CRM_Pro_Customer_Support_Upsell_guide.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Customer_Support_Upsell_guide.pdf"
}

# Download each file using wget
for filename, url in github_files.items():
    print(f"Downloading {filename} from GitHub...")
    # The '-O' flag ensures the file is saved with its original name
    !wget -q -O "{filename}" "{url}"


# --- 2. Load the Downloaded PDF Documents ---
pdf_files = list(github_files.keys())
all_docs = []

for pdf_path in pdf_files:
    if os.path.exists(pdf_path):
        loader = PyPDFLoader(pdf_path)
        documents = loader.load()
        # Add the source filename to each document's metadata for later reference
        for doc in documents:
            doc.metadata['source'] = os.path.basename(pdf_path)
        all_docs.extend(documents)
        print(f"Loaded {len(documents)} pages from {pdf_path}")
    else:
        print(f"Error: Could not find downloaded file {pdf_path}.")


Downloading CRM_Pro_Technical_documentation.pdf from GitHub...
Downloading CRM_Pro_User_guide.pdf from GitHub...
Downloading CRM_Pro_Billing_Information.pdf from GitHub...
Downloading CRM_Pro_Customer_Support_Guide.pdf from GitHub...
Downloading CRM_Pro_Customer_Support_Upsell_guide.pdf from GitHub...
Loaded 14 pages from CRM_Pro_Technical_documentation.pdf
Loaded 3 pages from CRM_Pro_User_guide.pdf
Loaded 4 pages from CRM_Pro_Billing_Information.pdf
Loaded 5 pages from CRM_Pro_Customer_Support_Guide.pdf
Loaded 5 pages from CRM_Pro_Customer_Support_Upsell_guide.pdf


##**2.2 Split / chunk the loaded Documents into smaller files**

In [None]:
# --- 3. Chunk the Documents ---
# Split the loaded documents into smaller chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunked_docs = text_splitter.split_documents(all_docs)

print(f"\nTotal documents loaded: {len(all_docs)}")
print(f"Total chunks created: {len(chunked_docs)}")


Total documents loaded: 31
Total chunks created: 155


# **3. Initialize the Language Models**
First, we'll define the two different models we intend to use for their specific tasks. We will pick a smaller and faster model for  our chat inference, and use a larger (and more expensive) model for evaluating the response

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

# 1. Initialize a smaller, faster model for generating answers in the RAG pipeline.
# This is optimized for speed and lower cost, suitable for high-volume requests.
generation_llm = ChatGoogleGenerativeAI(
    model="models/gemini-2.0-flash-lite",
    temperature=0.3
)

# 2. Initialize a more powerful model to act as the "judge" for evaluation.
# This is optimized for reasoning and accuracy, suitable for quality control.
evaluation_llm = ChatGoogleGenerativeAI(
    model="models/gemini-2.5-flash",
    temperature=0.3
)

print("Initialized two LLMs:")
print(f"Generation Model: {generation_llm.model}")
print(f"Evaluation Model: {evaluation_llm.model}")

Initialized two LLMs:
Generation Model: models/gemini-2.0-flash-lite
Evaluation Model: models/gemini-2.5-flash


# **⚖️ 4. Establishing the Evaluation Framework**
This is the most critical new part. We will evaluate our RAG pipeline's accuracy using a predefined set of questions based on the PDF content.

##**4.1 Create an Evaluation Set**

We'll manually create questions, ground-truth answers, and identify the source contexts.

In [None]:
# Create a list of dictionaries, where each dictionary is a test case

eval_dataset = [
    {
        "question": "What is the contact limit for the Starter plan and what happens when you exceed it?",
        "ground_truth_answer": "The Starter plan includes up to 1,000 contacts. When exceeded, you are charged $0.10 per additional contact per month as an overage fee.",
    },

    {
        "question": "What is the response time standard for Enterprise customers?",
        "ground_truth_answer": "Enterprise customers receive premium support with a 4-hour response time via phone and email.",
    },
    {
        "question": "How much can customers save by choosing annual billing instead of monthly?",
        "ground_truth_answer": "Customers save 15% on subscription fees by choosing annual billing instead of monthly billing.",
    },

    {
        "question": "What happens to account data after cancellation?",
        "ground_truth_answer": "Account data is preserved for 30 days during suspension period, customers have 90 days to export data after cancellation before permanent deletion.",
    },

    {
        "question": "What is the grace period for failed payments?",
        "ground_truth_answer": "There is a 5-day grace period for failed payments with email reminders, after which account access is restricted until payment is resolved.",
    },
    {
        "question": "What are the key indicators that suggest a customer might benefit from upgrading to Professional plan?",
        "ground_truth_answer": "Contact count at 80%+ of plan maximum, email sends consistently above 800 per month, asking about automation capabilities, requesting custom reports, or needing priority support.",
    },
    {
        "question": "How should support representatives handle customer requests for plan downgrades?",
        "ground_truth_answer": "Understand the reason for downgrade, explain what features will be lost, offer alternatives like annual discounts or usage optimization, process downgrade for next billing cycle if customer insists, and document feedback for the product team.",
    },
    {
        "question": "What is included in the Professional plan that makes it different from Starter?",
        "ground_truth_answer": "Professional plan includes up to 10,000 contacts (vs 1,000), unlimited deals (vs 100/month), advanced reporting with unlimited custom reports, automation workflows (up to 10 active), calendar integration, and priority support with 24-hour response time.",
    },

      {
        "question": "What is the current status and impact of issue #CRM-2401?",
        "ground_truth_answer": "Issue #CRM-2401 is a critical performance degradation issue affecting accounts with over 100,000 contacts, impacting approximately 15% of Enterprise customers. It's currently in progress with database optimization patches scheduled for Q3 2025.",
    },

    {
        "question": "What minimum browser versions are required for optimal CRM Pro performance?",
        "ground_truth_answer": "Google Chrome 90+, Mozilla Firefox 88+, Microsoft Edge 90+, and Safari 14+ are required, with Chrome 90+ being recommended for optimal performance.",
    },

    {
        "question": "What encryption standards are used for data transmission and storage?",
        "ground_truth_answer": "256-bit SSL/TLS encryption for all data transmission and AES-256 encryption at rest for all sensitive data.",
    },
    {
        "question": "What is the maximum file size limit for individual attachments and total storage per user?",
        "ground_truth_answer": "Maximum file size of 100MB per upload, with 5GB total storage per user, and individual file attachments have a maximum size limit of 25MB.",
    }


]


print(f"{len(eval_dataset)} evaluation questions created.")

12 evaluation questions created.


## **4.2 Create the Reusable Evaluation Function**

This function takes in a rag pipeline and evaluation questions we created in the previous step. Don't worry about the code. You will see later how this works
Since we are using the free Gemini Tier, our models are rate limited, hence we will use some delay to ensure our responses are getting executed


In [None]:
import pandas as pd
import time
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field

def evaluate_rag_pipeline(rag_pipeline, eval_dataset, evaluator_llm):
    """
    Evaluates a RAG pipeline with added metrics for latency, factual correctness,
    and respects API rate limits.

    Args:
        rag_pipeline: The RAG pipeline to evaluate.
        eval_dataset: A list of dictionaries with questions and ground truth answers.
        evaluator_llm: The powerful LLM used for judging the responses.

    Returns:
        A tuple containing the results DataFrame and a dictionary of average scores.
    """
    # 1. Define the updated JSON structure, including the new metric
    class Evaluation(BaseModel):
        relevance_score: int = Field(description="A score from 1-5 on how relevant the answer is to the question.")
        faithfulness_score: int = Field(description="A score from 1-5 on if the answer is factually supported by the retrieved context.")
        factual_correctness_score: int = Field(description="A score from 1-5 on how factually correct the answer is when compared to the ground truth answer.")
        reasoning: str = Field(description="A brief explanation for all scores.")

    parser = JsonOutputParser(pydantic_object=Evaluation)

    # 2. Update the evaluator prompt to include the new task
    evaluator_prompt_template = """
    You are an impartial, highly intelligent judge. Your task is to evaluate a generated answer based on three metrics:
    1. Relevance: Is the answer directly relevant to the question?
    2. Faithfulness: Is the answer fully supported by the provided context? The context is the only source of truth for this metric.
    3. Factual Correctness: Is the answer factually correct when compared to the ground truth answer?

    {format_instructions}

    --- START OF EVALUATION ---
    Question: {question}
    Ground Truth Answer: {ground_truth_answer}
    Retrieved Context:
    {context}
    ---
    Generated Answer: {answer}
    """
    evaluator_prompt = PromptTemplate(
        template=evaluator_prompt_template,
        input_variables=["question", "ground_truth_answer", "context", "answer"],
        partial_variables={"format_instructions": parser.get_format_instructions()},
    )

    # 3. Set up the evaluator chain with the powerful "evaluator_llm"
    evaluator_chain = evaluator_prompt | evaluator_llm | parser

    # 4. Define rate limit sleep durations
    # Generation LLM: 30 RPM -> 60 / 30 = 2.0 seconds per request
    # Evaluation LLM: 14 RPM -> 60 / 14 = ~4.29 seconds per request
    gen_sleep_time = 2.0
    eval_sleep_time = 4.3

    # 5. Run the evaluation loop
    results = []
    for item in eval_dataset:
        question = item["question"]
        ground_truth_answer = item["ground_truth_answer"]

        # --- Measure latency for the RAG response generation ---
        start_time = time.monotonic()
        response = rag_pipeline.invoke({"query": question})
        end_time = time.monotonic()

        # The actual latency for this request
        latency = end_time - start_time

        # --- Pause to respect the generation LLM's rate limit ---
        time.sleep(gen_sleep_time)

        generated_answer = response['result']
        retrieved_context = "\n---\n".join([doc.page_content for doc in response['source_documents']])

        # Use the LLM-as-a-judge to score the answer
        try:
            scores = evaluator_chain.invoke({
                "question": question,
                "ground_truth_answer": ground_truth_answer,
                "context": retrieved_context,
                "answer": generated_answer
            })
            # --- Pause to respect the evaluation LLM's rate limit ---
            time.sleep(eval_sleep_time)

        except Exception as e:
            scores = {
                "relevance_score": "Error",
                "faithfulness_score": "Error",
                "factual_correctness_score": "Error",
                "reasoning": f"LLM call failed: {e}"
            }

        # Store all results
        new_item = item.copy()
        new_item['generated_answer'] = generated_answer
        new_item['latency'] = latency
        new_item.update(scores)
        results.append(new_item)

    # Convert results to DataFrame and calculate averages
    eval_df = pd.DataFrame(results)
    eval_df['relevance_score'] = pd.to_numeric(eval_df['relevance_score'], errors='coerce')
    eval_df['faithfulness_score'] = pd.to_numeric(eval_df['faithfulness_score'], errors='coerce')
    eval_df['factual_correctness_score'] = pd.to_numeric(eval_df['factual_correctness_score'], errors='coerce')

    avg_scores = {
        "avg_relevance": eval_df['relevance_score'].mean(),
        "avg_faithfulness": eval_df['faithfulness_score'].mean(),
        "avg_factual_correctness": eval_df['factual_correctness_score'].mean(),
        "avg_latency": eval_df['latency'].mean()
    }

    return eval_df, avg_scores

# **⛓️ 5. Building and Evaluating Naive / Basic RAG Pipeline**


## **5.1 Generate the embeddings and Build the pipeline**

In [None]:
from langchain_community.vectorstores import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA

# --- 1. Embedding and Vector Storage ---
# Use the Gemini embedding model
embedding_model = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004") # Using models for Google

# Create the Chroma vector store from our PDF chunks
vectorstore = Chroma.from_documents(
    documents=chunked_docs,
    embedding=embedding_model
)


# Build the basic RAG pipeline using the cheaper generation_llm
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
basic_rag_pipeline = RetrievalQA.from_chain_type(
    llm=generation_llm, # Using the faster model here
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

--- Evaluating Basic RAG Pipeline ---

Average Scores:
Latency: 0.92s | Relevance: 4.83 | Faithfulness: 4.83 | Factual Correctness: 3.50

Detailed Results:


Unnamed: 0,question,generated_answer,ground_truth_answer,latency,relevance_score,faithfulness_score,factual_correctness_score,reasoning
0,What is the contact limit for the Starter plan...,"The Starter plan includes up to 1,000 contacts...","The Starter plan includes up to 1,000 contacts...",0.900113,4,4,2,The answer correctly states the contact limit ...
1,What is the response time standard for Enterpr...,Enterprise Plan: Respond within 4 hours with d...,Enterprise customers receive premium support w...,0.857613,5,5,4,The generated answer directly addresses the qu...
2,How much can customers save by choosing annual...,Customers can save 15% on subscription fees by...,Customers save 15% on subscription fees by cho...,0.959742,5,5,5,The generated answer directly answers the ques...
3,What happens to account data after cancellation?,Data Export: 90 days to export data after canc...,Account data is preserved for 30 days during s...,0.835998,5,5,3,The generated answer is highly relevant to the...
4,What is the grace period for failed payments?,There is a 5-day grace period for failed payme...,There is a 5-day grace period for failed payme...,0.579629,5,4,3,The answer is directly relevant to the questio...
5,What are the key indicators that suggest a cus...,The key indicators that suggest a customer mig...,"Contact count at 80%+ of plan maximum, email s...",1.172885,5,5,4,The generated answer is highly relevant as it ...
6,How should support representatives handle cust...,Support representatives should:\n\n* Underst...,"Understand the reason for downgrade, explain w...",0.837577,5,5,5,The generated answer directly and completely a...
7,What is included in the Professional plan that...,"The Professional plan includes:\n* Up to 10,...","Professional plan includes up to 10,000 contac...",1.090109,5,5,3,The generated answer is highly relevant as it ...
8,What is the current status and impact of issue...,"I am sorry, but I do not have information abou...",Issue #CRM-2401 is a critical performance degr...,0.715864,4,5,1,The generated answer is relevant as it directl...
9,What minimum browser versions are required for...,* Google Chrome 90+\n* Mozilla Firefox 88+...,"Google Chrome 90+, Mozilla Firefox 88+, Micros...",0.884013,5,5,4,The generated answer directly addresses the qu...


## **5.2. Let's Ask a Question**



In [None]:
# Ask a question relevant to our documents
# question = "How do I contact the Product and Engineering team to provide feedback?"
question = "Hello. What is CRM Pro?"

# Run the pipeline
response = basic_rag_pipeline.invoke({"query": question})

# Print the result
print(f"Question: {response['query']}\n")
print(f"Answer: {response['result']}")

Question: Hello. What is CRM Pro?

Answer: CRM Pro is positioned to become the leading customer relationship management platform for small and medium enterprises, with a focus on intelligent automation, predictive analytics, and seamless integrations.


## **5.3 Evalute the pipeline**

In [None]:



# Evaluate the pipeline using the more powerful evaluation_llm as the judge
# Evaluate the pipeline using the new comprehensive function
print("--- Evaluating Basic RAG Pipeline ---")
basic_results_df, basic_avg_scores = evaluate_rag_pipeline(
    rag_pipeline=basic_rag_pipeline,
    eval_dataset=eval_dataset,
    evaluator_llm=evaluation_llm
)

# Display the results with the new metrics
print("\nAverage Scores:")
print(f"Latency: {basic_avg_scores['avg_latency']:.2f}s | "
      f"Relevance: {basic_avg_scores['avg_relevance']:.2f} | "
      f"Faithfulness: {basic_avg_scores['avg_faithfulness']:.2f} | "
      f"Factual Correctness: {basic_avg_scores['avg_factual_correctness']:.2f}\n")

print("Detailed Results:")
display(basic_results_df[[
    'question',
    'generated_answer',
    'ground_truth_answer',
    'latency',
    'relevance_score',
    'faithfulness_score',
    'factual_correctness_score',
    'reasoning'
]])

--- Evaluating Basic RAG Pipeline ---

Average Scores:
Latency: 0.87s | Relevance: 4.67 | Faithfulness: 4.83 | Factual Correctness: 3.67

Detailed Results:


Unnamed: 0,question,generated_answer,ground_truth_answer,latency,relevance_score,faithfulness_score,factual_correctness_score,reasoning
0,What is the contact limit for the Starter plan...,"The Starter plan includes up to 1,000 contacts...","The Starter plan includes up to 1,000 contacts...",1.276916,5,5,5,The generated answer directly addresses both p...
1,What is the response time standard for Enterpr...,Enterprise Plan: Respond within 4 hours with d...,Enterprise customers receive premium support w...,0.717696,5,5,4,The generated answer is highly relevant as it ...
2,How much can customers save by choosing annual...,Customers can save 15% on subscription fees by...,Customers save 15% on subscription fees by cho...,0.768988,5,5,5,The generated answer directly addresses the qu...
3,What happens to account data after cancellation?,After cancellation:\n\n* **Monthly Plans:** ...,Account data is preserved for 30 days during s...,0.699602,2,3,2,The question asks specifically about 'account ...
4,What is the grace period for failed payments?,The grace period for failed payments is 5 days.,There is a 5-day grace period for failed payme...,0.689076,5,5,4,The generated answer directly answers the ques...
5,What are the key indicators that suggest a cus...,The key indicators that suggest a customer mig...,"Contact count at 80%+ of plan maximum, email s...",1.11617,5,5,3,"The answer is highly relevant to the question,..."
6,How should support representatives handle cust...,Support representatives should:\n\n* Underst...,"Understand the reason for downgrade, explain w...",1.026656,5,5,5,The generated answer directly and completely a...
7,What is included in the Professional plan that...,"The Professional plan includes:\n* Up to 10,...","Professional plan includes up to 10,000 contac...",1.023512,5,5,3,The generated answer is highly relevant as it ...
8,What is the current status and impact of issue...,"I am sorry, but I do not have information abou...",Issue #CRM-2401 is a critical performance degr...,0.958989,4,5,1,The generated answer is relevant as it directl...
9,What minimum browser versions are required for...,* Google Chrome 90+\n* Mozilla Firefox 88+...,"Google Chrome 90+, Mozilla Firefox 88+, Micros...",0.689748,5,5,4,The answer is highly relevant as it directly a...


# **6. Building more Advanced Pipelines**

This section corresponds to "Scaling Up: Advanced RAG Techniques" in the blog post.

The naive approach works, but we can improve retrieval quality by combining semantic search with traditional keyword search (Hybrid Search) and adding a reranking step.

## **6.1 Hybrid Search (Semantic + Keyword)**

We'll combine our existing semantic retriever (Chroma) with a keyword-based retriever (BM25). The EnsembleRetriever will run both and combine the results.

### **6.1.1 Building a pipeline for Hybrid Search**

In [None]:
!pip install rank_bm25

Collecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


In [None]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever

# 1. Set up the BM25 (keyword) retriever
# It needs the raw text from the documents
doc_texts = [doc.page_content for doc in chunked_docs]

bm25_retriever = BM25Retriever.from_texts(
    texts=doc_texts,
    metadatas=[doc.metadata for doc in chunked_docs] # Include metadata
)

bm25_retriever.k = 3 # Retrieve top 3 results

# 2. Set up the Chroma (semantic) retriever (we already have this)
chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# 3. Create the Ensemble Retriever
# This combines the results and gives equal weight to both methods
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, chroma_retriever],
    weights=[0.5, 0.5]
)

# 4. Build the Hybrid Search RAG Pipeline
hybrid_rag_pipeline = RetrievalQA.from_chain_type(
    llm=generation_llm,
    chain_type="stuff",
    retriever=ensemble_retriever, # Use the new ensemble retriever
    return_source_documents=True
)

--- Evaluating Hybrid Search RAG Pipeline ---

Average Scores:
Latency: 0.76s | Relevance: 4.83 | Faithfulness: 4.67 | Factual Correctness: 3.75

Detailed Results:


Unnamed: 0,question,generated_answer,ground_truth_answer,latency,relevance_score,faithfulness_score,factual_correctness_score,reasoning
0,What is the contact limit for the Starter plan...,"The Starter plan includes up to 1,000 contacts...","The Starter plan includes up to 1,000 contacts...",0.824548,5,3,2,The answer is highly relevant as it directly a...
1,What is the response time standard for Enterpr...,Enterprise Plan: Respond within 4 hours with d...,Enterprise customers receive premium support w...,0.530842,5,5,4,The answer directly addresses the question abo...
2,How much can customers save by choosing annual...,Customers can save 15% on subscription fees by...,Customers save 15% on subscription fees by cho...,0.609017,5,5,5,The generated answer directly addresses the qu...
3,What happens to account data after cancellation?,Data Export: 90 days to export data after canc...,Account data is preserved for 30 days during s...,0.653747,4,5,3,The generated answer is highly relevant to the...
4,What is the grace period for failed payments?,There is a 5-day grace period for failed payme...,There is a 5-day grace period for failed payme...,0.595111,5,5,3,The generated answer is directly relevant to t...
5,What are the key indicators that suggest a cus...,The key indicators that suggest a customer mig...,"Contact count at 80%+ of plan maximum, email s...",1.16941,5,3,3,"The answer is highly relevant to the question,..."
6,How should support representatives handle cust...,Support representatives should:\n\n* Underst...,"Understand the reason for downgrade, explain w...",1.091709,5,5,5,The generated answer directly and completely a...
7,What is included in the Professional plan that...,"The Professional plan includes:\n* Up to 10,...","Professional plan includes up to 10,000 contac...",0.847192,5,5,4,The answer is highly relevant as it directly l...
8,What is the current status and impact of issue...,"I am sorry, but I do not have the information ...",Issue #CRM-2401 is a critical performance degr...,0.705098,4,5,1,The generated answer is relevant as it directl...
9,What minimum browser versions are required for...,* Google Chrome 90+\n* Mozilla Firefox 88+...,"Google Chrome 90+, Mozilla Firefox 88+, Micros...",0.735791,5,5,5,The generated answer directly addresses the qu...


### **6.1.2 Evaluating the Hybrid Search pipeline**

In [None]:
# 5. Evaluate the hybrid pipeline
print("--- Evaluating Hybrid Search RAG Pipeline ---")
hybrid_results_df, hybrid_avg_scores = evaluate_rag_pipeline(
    rag_pipeline=hybrid_rag_pipeline,
    eval_dataset=eval_dataset,
    evaluator_llm=evaluation_llm # Passing in the judge
)
# Display the results with the new metrics
print("\nAverage Scores:")
print(f"Latency: {hybrid_avg_scores['avg_latency']:.2f}s | "
      f"Relevance: {hybrid_avg_scores['avg_relevance']:.2f} | "
      f"Faithfulness: {hybrid_avg_scores['avg_faithfulness']:.2f} | "
      f"Factual Correctness: {hybrid_avg_scores['avg_factual_correctness']:.2f}\n")

print("Detailed Results:")
display(hybrid_results_df[[
    'question',
    'generated_answer',
    'ground_truth_answer',
    'latency',
    'relevance_score',
    'faithfulness_score',
    'factual_correctness_score',
    'reasoning'
]])

##**6.2. Add a Reranking Step**

While Hybrid Search helps retrieve a broader set of potentially relevant documents, not all of them are equally useful. A reranking model can examine the retrieved documents more closely and re-order them based on their direct relevance to the user's query. This ensures that only the most pertinent information reaches the LLM.

###**6.2.1. Build the Reranking Pipeline**

We'll use a CrossEncoder model, which is specifically designed for this reranking task, and wrap it around our ensemble_retriever from Part 5.

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

# 1. Initialize a cross-encoder model for reranking
# These models are very effective at scoring the relevance of a document to a query.
reranker_model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")

#2. Increase the k thresholds for hybrid search to return more context which can be later rereanked
new_bm25_retriever = BM25Retriever.from_texts(
    texts=doc_texts,
    metadatas=[doc.metadata for doc in chunked_docs], # Include metadata
    kwargs = {"k": 5}
)
new_chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 7})

# 3. Create the new Ensemble Retriever
# This combines the results and gives equal weight to both methods
new_ensemble_retriever = EnsembleRetriever(
    retrievers=[new_bm25_retriever, new_chroma_retriever],
    weights=[0.5, 0.5]
)

# 4. Create a document compressor that uses the reranker
# It will filter and reorder the documents from the retriever, keeping the top 4.
compressor = CrossEncoderReranker(model=reranker_model, top_n=4)


# 5. Wrap our hybrid search retriever and applies the reranking step.
compression_reranker_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=new_ensemble_retriever # Using the hybrid retriever from Part 5
)

# 6. Build the final RAG pipeline with reranking
rerank_rag_pipeline = RetrievalQA.from_chain_type(
    llm=generation_llm,
    chain_type="stuff",
    retriever=compression_reranker_retriever,
    return_source_documents=True
)

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/34.1k [00:00<?, ?B/s]

### **6.2 Evaluating the Hybrid + Reranking RAG pipeline**

In [None]:
# Evaluate the reranking pipeline
print("--- Evaluating Hybrid Search + Reranking RAG Pipeline ---")
rerank_results_df, rerank_avg_scores = evaluate_rag_pipeline(
    rag_pipeline=rerank_rag_pipeline,
    eval_dataset=eval_dataset,
    evaluator_llm=evaluation_llm # Passing in the judge
)

# Display the results with the new metrics
print("\nAverage Scores:")
print(f"Latency: {rerank_avg_scores['avg_latency']:.2f}s | "
      f"Relevance: {rerank_avg_scores['avg_relevance']:.2f} | "
      f"Faithfulness: {rerank_avg_scores['avg_faithfulness']:.2f} | "
      f"Factual Correctness: {rerank_avg_scores['avg_factual_correctness']:.2f}\n")

print("Detailed Results:")
display(rerank_results_df[[
    'question',
    'generated_answer',
    'ground_truth_answer',
    'latency',
    'relevance_score',
    'faithfulness_score',
    'factual_correctness_score',
    'reasoning'
]])

--- Evaluating Hybrid Search + Reranking RAG Pipeline ---

Average Scores:
Latency: 5.52s | Relevance: 5.00 | Faithfulness: 4.92 | Factual Correctness: 3.67

Detailed Results:


Unnamed: 0,question,generated_answer,ground_truth_answer,latency,relevance_score,faithfulness_score,factual_correctness_score,reasoning
0,What is the contact limit for the Starter plan...,"The Starter plan includes up to 1,000 contacts...","The Starter plan includes up to 1,000 contacts...",6.669402,5,5,3,The answer is highly relevant as it directly a...
1,What is the response time standard for Enterpr...,Enterprise Plan: Respond within 4 hours with d...,Enterprise customers receive premium support w...,5.615813,5,5,4,The generated answer directly addresses the qu...
2,How much can customers save by choosing annual...,Customers can save 15% on subscription fees by...,Customers save 15% on subscription fees by cho...,5.308163,5,5,5,The generated answer directly and completely a...
3,What happens to account data after cancellation?,Data Export: 90 days to export data after canc...,Account data is preserved for 30 days during s...,4.518921,5,5,3,The generated answer is highly relevant to the...
4,What is the grace period for failed payments?,There is a 5-day grace period for failed payme...,There is a 5-day grace period for failed payme...,4.552532,5,5,4,The generated answer directly addresses the qu...
5,What are the key indicators that suggest a cus...,The key indicators that suggest a customer mig...,"Contact count at 80%+ of plan maximum, email s...",7.575286,5,5,3,"The answer is highly relevant to the question,..."
6,How should support representatives handle cust...,Support representatives should:\n\n* Underst...,"Understand the reason for downgrade, explain w...",5.914501,5,5,5,The generated answer directly addresses the qu...
7,What is included in the Professional plan that...,"The Professional plan includes:\n* Up to 10,...","Professional plan includes up to 10,000 contac...",6.0156,5,5,2,Relevance (5/5): The answer directly addresses...
8,What is the current status and impact of issue...,"The current status of issue #CRM-2401 is ""In P...",Issue #CRM-2401 is a critical performance degr...,4.645068,5,4,2,The answer is highly relevant as it directly a...
9,What minimum browser versions are required for...,* Google Chrome 90+\n* Mozilla Firefox 88+...,"Google Chrome 90+, Mozilla Firefox 88+, Micros...",4.268152,5,5,5,The generated answer directly addresses the qu...


# **7. Comparing performance across all 3 Approaches**

In [None]:
# Display the results for all 3 approaches
print("\nBasic Rag: Average Scores:")
print(f"Latency: {basic_avg_scores['avg_latency']:.2f}s | "
      f"Relevance: {basic_avg_scores['avg_relevance']:.2f} | "
      f"Faithfulness: {basic_avg_scores['avg_faithfulness']:.2f} | "
      f"Factual Correctness: {basic_avg_scores['avg_factual_correctness']:.2f}\n")

print("\nHybrid Rag: Average Scores:")
print(f"Latency: {hybrid_avg_scores['avg_latency']:.2f}s | "
      f"Relevance: {hybrid_avg_scores['avg_relevance']:.2f} | "
      f"Faithfulness: {hybrid_avg_scores['avg_faithfulness']:.2f} | "
      f"Factual Correctness: {hybrid_avg_scores['avg_factual_correctness']:.2f}\n")

print("\nHybrid Re-Ranked Rag: Average Scores:")
print(f"Latency: {rerank_avg_scores['avg_latency']:.2f}s | "
      f"Relevance: {rerank_avg_scores['avg_relevance']:.2f} | "
      f"Faithfulness: {rerank_avg_scores['avg_faithfulness']:.2f} | "
      f"Factual Correctness: {rerank_avg_scores['avg_factual_correctness']:.2f}\n")


Basic Rag: Average Scores:
Latency: 0.92s | Relevance: 4.83 | Faithfulness: 4.83 | Factual Correctness: 3.50


Hybrid Rag: Average Scores:
Latency: 0.76s | Relevance: 4.83 | Faithfulness: 4.67 | Factual Correctness: 3.75


Hybrid Re-Ranked Rag: Average Scores:
Latency: 5.52s | Relevance: 5.00 | Faithfulness: 4.92 | Factual Correctness: 3.67



#**8. Security-Enhanced RAG Pipeline**
This section corresponds to "Enterprise Considerations: Security, Privacy, and Compliance" in the blog post.

In an enterprise setting, not all users should have access to all information. We'll modify our RAG system to respect document-level permissions.

##**8.1. Add Access-Level Metadata to Documents**
First, we need to update our documents with metadata indicating who is allowed to see them. For this example, we'll use role-based access levels: all, support, and leadership.

In [None]:
# We will use the 'all_docs' variable from Part 1, which contains the
# page-level documents before they were chunked.

# 1. Define document-level permissions
# Create a mapping from the source filename to a specific access level.
doc_level_permissions = {
    "CRM_Pro_Technical_documentation.pdf" : "Analyst_only",
    "CRM_Pro_User_ guide.pdf" : "all",
    "CRM_Pro_Billing_Information.pdf" : "CSR_only",
    "CRM_Pro_Customer_Support_ Guide.pdf" : "CSR_only",
    "CRM_Pro_Customer_Support_Upsell_guide.pdf" : "CSR_only"
    }

print(f"Defined permissions: {doc_level_permissions}")

# 2. Apply the document-level metadata to each page BEFORE chunking
docs_with_permissions = []
for doc in all_docs:
    source_file = doc.metadata.get('source')
    if source_file in doc_level_permissions:
        # We work on a copy to not modify the original 'all_docs' list
        doc_copy = doc.model_copy()
        doc_copy.metadata['access_level'] = doc_level_permissions[source_file]
        docs_with_permissions.append(doc_copy)

# 3. Chunk the documents that now have the correct metadata
# The 'access_level' will be automatically copied to all resulting chunks.
secure_chunked_docs = text_splitter.split_documents(docs_with_permissions)
print(f"\nChunked {len(docs_with_permissions)} pages into {len(secure_chunked_docs)} chunks with permissions.")

# 4. Create the new, secure vector store
print("\nCreating a new vector store with document-level security metadata...")
secure_vectorstore = Chroma.from_documents(
    documents=secure_chunked_docs,
    embedding=embedding_model # Use the same embedding model
)
print("Secure vector store created.")

Defined permissions: {'CRM_Pro_Technical_documentation.pdf': 'Analyst_only', 'CRM_Pro_User_ guide.pdf': 'all', 'CRM_Pro_Billing_Information.pdf': 'CSR_only', 'CRM_Pro_Customer_Support_ Guide.pdf': 'CSR_only', 'CRM_Pro_Customer_Support_Upsell_guide.pdf': 'CSR_only'}

Chunked 17 pages into 68 chunks with permissions.

Creating a new vector store with document-level security metadata...
Secure vector store created.


## **8.2. Create a Permission-Aware Retriever**
This function creates a retriever that can only "see" documents that match the user's assigned role.

In [None]:
def create_secure_retriever(user_role: str):
    """
    Creates a retriever that filters documents based on the user's role.
    """
    # Define the access permissions for each role
    if user_role == "CSR_only":
        allowed_access = ["all", "CSR_only"]
    elif user_role == "Analyst_only":
        allowed_access = ["all", "Analyst_only"]
    else: # General user
        allowed_access = ["all"]

    print(f"\nUser role '{user_role}' has access to: {allowed_access}")

    # Use the vector store's built-in metadata filtering
    retriever = secure_vectorstore.as_retriever(
        search_kwargs={
            "k": 5,
            "filter": {
                "access_level": {"$in": allowed_access}
            }
        }
    )
    return retriever

## **8.3. Build and Test the Secure Pipeline**
Let's simulate two users asking the same question and observe how their access levels affect the answer.

In [None]:
# A question whose answer might be in a restricted document
secure_question = "How should support representatives handle customer requests for plan downgrades?"

# --- Test Case 1: A user with limited access ---
analyst_retriever = create_secure_retriever(user_role="Analyst_only")
secure_analyst_pipeline = RetrievalQA.from_chain_type(
    llm=generation_llm, chain_type="stuff", retriever=analyst_retriever, return_source_documents=True
)

print("\n--- Asking question as 'Analyst_only' user ---")
analyst_response = secure_analyst_pipeline.invoke({"query": secure_question})
print(f"\nGenerated Answer: {analyst_response['result']}")

# --- Test Case 2: A user with full access ---
sme_retriever = create_secure_retriever(user_role="CSR_only")
secure_sme_pipeline = RetrievalQA.from_chain_type(
    llm=generation_llm, chain_type="stuff", retriever=sme_retriever, return_source_documents=True
)

print("\n--- Asking question as 'CSR_only' user ---")
sme_response = secure_sme_pipeline.invoke({"query": secure_question})
print(f"\nGenerated Answer: {sme_response['result']}")


User role 'Analyst_only' has access to: ['all', 'Analyst_only']

--- Asking question as 'Analyst_only' user ---

Generated Answer: I am sorry, but the provided documents do not contain information on how support representatives should handle customer requests for plan downgrades.

User role 'CSR_only' has access to: ['all', 'CSR_only']

--- Asking question as 'CSR_only' user ---

Generated Answer: Support representatives should:

*   Understand the reason for the downgrade (budget, usage, features).
*   Explain what features will be lost.
*   Offer alternatives like annual discounts or usage optimization.
*   Process the downgrade for the next billing cycle if the customer insists.
*   Document feedback for the product team.


#**Building a Streamlit chat App**

In [7]:
! pip install streamlit -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m59.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m76.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25h

### **1.Save API Key in .streamlt/secrets.toml file**

In [20]:
!mkdir /content/.streamlit

mkdir: cannot create directory ‘/content/.streamlit’: File exists


In [22]:
from google.colab import userdata

with open('/content/.streamlit/secrets.toml', 'w') as file:
  if not os.getenv("GOOGLE_API_KEY"):
    file.write("GOOGLE_API_KEY = " + userdata.get('GOOGLE_API_KEY') + "\n") # Adding a newline for better readability
  else:
    file.write("GOOGLE_API_KEY = " + os.getenv("GOOGLE_API_KEY") + "\n") # Adding a newline for better readability

### **2. Create a requirements.txt document in the same root folder**

In [23]:
%%writefile requirements.txt
streamlit
langchain
langchain-community
langchain-google-genai
google-generativeai
chromadb
pypdf
pysqlite3-binary
rank-bm25

Overwriting requirements.txt


###**3. Create the Streamlit App with the chat screen and rag components**

In [26]:
%%writefile app.py
import streamlit as st
from langchain_community.vectorstores import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.retrievers import EnsembleRetriever
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA
import os
import google.generativeai as genai
from google.colab import userdata
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.retrievers import BM25Retriever
import subprocess # Add this import

# --- Page Setup ---

st.set_page_config(page_title="CSR Assistant", page_icon="💬")
st.title("Customer Service Representative Assistant")


# --- RAG Pipeline Setup ---

@st.cache_resource
def load_rag_pipeline():
    # --- Fix 1: Handle API Key with Streamlit Secrets ---
    # In your local project, create a file .streamlit/secrets.toml
    # and add your key like this: GOOGLE_API_KEY = "your_api_key_here"
    if "GOOGLE_API_KEY" not in os.environ:
        os.environ["GOOGLE_API_KEY"] = st.secrets["GOOGLE_API_KEY"]

    # --- Fix 2: Initialize the Language Model ---
    generation_llm = ChatGoogleGenerativeAI(model="models/gemini-2.0-flash-lite")

    # --- Suggestion 1: Removed leading spaces from filenames ---
    github_files = {
        "CRM_Pro_Technical_documentation.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Technical_documentation.pdf",
        "CRM_Pro_User_guide.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_User_guide.pdf",
        "CRM_Pro_Billing_Information.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Billing_Information.pdf",
        "CRM_Pro_Customer_Support_Guide.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Customer_Support_Guide.pdf",
        "CRM_Pro_Customer_Support_Upsell_guide.pdf": "https://raw.githubusercontent.com/ars-codelab/Rag_PM/main/Files/CRM_Pro_Customer_Support_Upsell_guide.pdf"
    }

    # Download each file using wget
    for filename, url in github_files.items():
        if not os.path.exists(filename): # Only download if it doesn't exist
            print(f"Downloading {filename} from GitHub...")
            command = ['wget', '-q', '-O', filename, url]
            subprocess.run(command)

    # Load the Downloaded PDF Documents
    pdf_files = list(github_files.keys())
    all_docs = []
    for pdf_path in pdf_files:
        if os.path.exists(pdf_path):
            loader = PyPDFLoader(pdf_path)
            documents = loader.load()
            for doc in documents:
                doc.metadata['source'] = os.path.basename(pdf_path)
            all_docs.extend(documents)

    # Chunk the Documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    chunked_docs = text_splitter.split_documents(all_docs)

    # Embedding and Vector Storage
    embedding_model = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")
    vectorstore = Chroma.from_documents(documents=chunked_docs, embedding=embedding_model)

    # Setup the Hybrid RAG pipeline
    doc_texts = [doc.page_content for doc in chunked_docs]

    # --- Fix 3: Correctly configure BM25Retriever ---
    bm25_retriever = BM25Retriever.from_texts(
        texts=doc_texts,
        metadatas=[doc.metadata for doc in chunked_docs]
    )
    bm25_retriever.k = 5 # Set k value here

    chroma_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

    ensemble_retriever = EnsembleRetriever(
        retrievers=[bm25_retriever, chroma_retriever],
        weights=[0.5, 0.5]
    )

    # Build the Hybrid Search RAG Pipeline
    hybrid_rag_pipeline = RetrievalQA.from_chain_type(
        llm=generation_llm, # Now uses the defined llm
        chain_type="stuff",
        retriever=ensemble_retriever,
        return_source_documents=True
    )

    return hybrid_rag_pipeline

# We load the pipeline once, and Streamlit caches it
try:
    rag_pipeline = load_rag_pipeline()
except Exception as e:
    st.error(f"Failed to load RAG pipeline. Please ensure all placeholders are filled correctly.")
    st.error(f"Error: {e}")
    st.stop()



# --- Chat History Management ---

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = [
        {"role": "assistant", "content": "Hello! How can I help you with customer inquiries today?"}
    ]

# Display the existing chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
        # If the message is from the assistant and has sources, display them
        if message["role"] == "assistant" and "sources" in message:
            with st.expander("Show Sources"):
                for source_doc in message["sources"]:
                    source_name = source_doc.metadata.get('source', 'Unknown Source')
                    st.write(f"**Source:** {source_name}")
                    st.markdown(source_doc.page_content)
                    st.divider()


# --- Main Chat Input and Response Logic ---

if prompt := st.chat_input("What is the customer's question?"):
    # Add user message to chat history and display it
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    # Get a response from the RAG pipeline
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            # Invoke the pipeline
            rag_response = rag_pipeline.invoke({"query": prompt})
            response = rag_response['result']
            sources = rag_response['source_documents']

            # Display the main response
            st.markdown(response)

            # Display the sources in a collapsible expander
            with st.expander("Show Sources"):
                for source_doc in sources:
                    # Extract the source filename from metadata
                    source_name = source_doc.metadata.get('source', 'Unknown Source')
                    st.write(f"**Source:** {source_name}")
                    st.markdown(source_doc.page_content)
                    st.divider()

    # Add the complete assistant response (with sources) to chat history
    st.session_state.messages.append({
        "role": "assistant",
        "content": response,
        "sources": sources
    })


Overwriting app.py


### **4. Install localtunnel**

In [None]:
!npm install localtunnel

### **5. Run streamlit in background**

Go to the displayed URL and enter the displayed password

In [27]:
!streamlit run /content/app.py &>/content/logs.txt & npx localtunnel --port 8501 & echo "tunnel password" & curl ipv4.icanhazip.com

tunnel password
34.75.106.75
[1G[0K⠙[1G[0Kyour url is: https://great-webs-admire.loca.lt
