<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 4: Introduction to RAG

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

---

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Poorit-Technologies/cvraman-coe/blob/main/courses-contents/ai-systems-engineering-1/unit-4/01-ai-systems-engineering-1-unit4-introduction-to-rag.ipynb)

</div>

---

### What You'll Learn

In this notebook, you will:

1. **Understand why RAG exists** - see LLMs fail without context
2. **Learn what RAG is** - Retrieval Augmented Generation explained
3. **Understand LLM parameters** - temperature, max_tokens, and more
4. **Build a simple knowledge base** from documents
5. **Implement basic retrieval** using keyword matching
6. **Create a Q&A chatbot** with context injection
7. **See the RAG pipeline in action** - step-by-step visibility

**Duration:** ~2 hours

---

## 1. Environment Setup

In [None]:
# Install required packages
!pip install -q litellm gradio

In [None]:
import os
from getpass import getpass
from litellm import completion
import gradio as gr

In [None]:
# Configure API
api_key = getpass("Enter your OpenAI API Key: ")
os.environ['OPENAI_API_KEY'] = api_key

MODEL = "gpt-4o-mini"

---

## 2. The Problem — LLMs Don't Know Everything

Before we build a RAG system, let's understand **why** we need one.

LLMs are trained on public internet data up to a cutoff date. They **don't know** about:

| What LLMs Don't Know | Example |
|---|---|
| **Your company's internal data** | Employee details, policies, pricing |
| **Recent events** | News after the training cutoff |
| **Private documents** | Internal reports, meeting notes |
| **Domain-specific knowledge** | Niche industry data, local context |

Let's see this in action. We'll ask GPT about our fictional company **TechSolutions India** — which doesn't exist in its training data.

In [None]:
# Ask about something the LLM doesn't know — no context provided
response = completion(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Who is the CEO of TechSolutions India and what is the company's annual revenue?"}
    ],
    temperature=0
)

print("Question: Who is the CEO of TechSolutions India and what is the company's annual revenue?")
print(f"\nLLM Response (no context):\n{response.choices[0].message.content}")

**Observation:** The LLM either made something up (hallucination) or admitted it doesn't know. Either way, it **can't answer accurately** because TechSolutions India doesn't exist in its training data.

Now let's try the same question, but this time we'll **manually inject the relevant information** into the prompt:

In [None]:
# Same question, but now we provide context
context = """
TechSolutions India is a software development company founded in 2018.
Headquarters: Bhubaneswar, Odisha
Employees: 250+
Annual Revenue: ₹50 crores (2024)
CEO: Priya Sharma
"""

response = completion(
    model=MODEL,
    messages=[
        {"role": "system", "content": f"Use the following context to answer the question:\n\n{context}"},
        {"role": "user", "content": "Who is the CEO of TechSolutions India and what is the company's annual revenue?"}
    ],
    temperature=0
)

print("Question: Who is the CEO of TechSolutions India and what is the company's annual revenue?")
print(f"\nLLM Response (with context):\n{response.choices[0].message.content}")

### The Key Insight

When we **gave the LLM the right information**, it answered perfectly. That's the core idea behind RAG:

> **Don't expect the LLM to know everything. Find the right information and give it to the LLM. That's RAG.**

The question is: how do we **automatically** find the right information for any question? That's what we'll build in this notebook.

---

## 3. What is RAG?

**Retrieval Augmented Generation (RAG)** is a technique that gives LLMs access to external knowledge by finding relevant information and injecting it into the prompt.

### The Open-Book Exam Analogy

Think of it like the difference between a closed-book and open-book exam:

| | Closed-Book (LLM Alone) | Open-Book (LLM + RAG) |
|---|---|---|
| **Knowledge source** | Only what's memorized (training data) | Can look up reference materials |
| **Accuracy** | May guess or hallucinate | Answers grounded in real documents |
| **Current info** | Frozen at training cutoff | Can access up-to-date information |
| **Domain knowledge** | General knowledge only | Can use specialized documents |

### The New Employee Analogy

Imagine a smart new employee on their first day:

1. Someone asks them a question about company policy
2. They **search** the company wiki for relevant pages
3. They **read** the relevant section
4. They **answer** using what they found

That's exactly what RAG does — but with an LLM instead of an employee!

### The RAG Pipeline

```
┌──────────────┐     ┌──────────────┐     ┌──────────────────┐     ┌──────────────┐
│   QUESTION   │────>│  RETRIEVAL   │────>│   AUGMENTATION   │────>│  GENERATION  │
│              │     │              │     │                  │     │              │
│ "Who is the  │     │ Search the   │     │ Add retrieved    │     │ LLM generates│
│  CEO?"       │     │ knowledge    │     │ docs to the      │     │ answer using │
│              │     │ base         │     │ prompt as context│     │ the context  │
└──────────────┘     └──────────────┘     └──────────────────┘     └──────────────┘
```

### Why RAG?

| Problem | RAG Solution |
|---------|-------------|
| LLM knowledge cutoff | Provide up-to-date information |
| Hallucinations | Ground answers in real documents |
| Domain expertise | Add company-specific knowledge |
| Cost | Cheaper than fine-tuning |
| Transparency | Can cite sources for answers |

---

## 4. Understanding LLM Parameters

Throughout this course, we've been passing parameters like `temperature=0` to our LLM calls. Let's understand what these mean.

### Temperature — Controlling Randomness

Temperature controls how "creative" vs "deterministic" the LLM's responses are:

```
Temperature Scale:

 0.0          0.7          1.0          1.5          2.0
  |------------|------------|------------|------------|
  Deterministic  Balanced     Creative     Wild      Chaotic
  (factual)    (default)   (storytelling)           (nonsensical)
```

| Temperature | Behavior | Best For |
|---|---|---|
| **0.0** | Always picks the most likely word. Same input = same output. | Factual Q&A, RAG, data extraction |
| **0.3 - 0.5** | Mostly deterministic with slight variation | Summarization, translation |
| **0.7** | Balanced creativity (OpenAI default) | General conversation, writing |
| **1.0** | More diverse and creative outputs | Brainstorming, creative writing |
| **> 1.0** | Increasingly random and unpredictable | Rarely useful in practice |

Let's see the difference:

In [None]:
# Compare temperature effects — ask the same question multiple times
question = "Suggest a one-sentence tagline for a tech company in Bhubaneswar."

for temp in [0, 0.7, 1.0]:
    print(f"{'='*60}")
    print(f"Temperature = {temp}")
    print(f"{'='*60}")
    
    for attempt in range(1, 3):
        response = completion(
            model=MODEL,
            messages=[
                {"role": "system", "content": "You are a marketing assistant. Reply with only the tagline, nothing else."},
                {"role": "user", "content": question}
            ],
            temperature=temp
        )
        print(f"  Attempt {attempt}: {response.choices[0].message.content}")
    
    print()

**Notice:**
- At **temperature=0**, both attempts produce the **same** tagline
- At **temperature=0.7**, you get **slight variations**
- At **temperature=1.0**, the taglines are **noticeably different** each time

### Why RAG Uses Temperature = 0

For RAG systems, we want **consistent, factual answers** — not creative ones. When we have the right context, we want the LLM to faithfully report what's in the documents, not improvise. That's why we set `temperature=0`.

### Other Useful Parameters

| Parameter | What It Does | Common Values |
|---|---|---|
| **`max_tokens`** | Maximum length of the response | 100 - 4000 |
| **`top_p`** | Alternative to temperature (nucleus sampling) | 0.0 - 1.0 |
| **`stream`** | Return response word-by-word | `True` / `False` |

> **Tip:** Don't set both `temperature` and `top_p` at the same time — use one or the other.

---

## 5. Create a Sample Knowledge Base

Let's create a knowledge base for a fictional company.

In [None]:
# Sample knowledge base for TechSolutions India
knowledge_base = {
    "company": """
TechSolutions India is a software development company founded in 2018.
Headquarters: Bhubaneswar, Odisha
Employees: 250+
Specialization: AI/ML solutions, Cloud services, Mobile apps
Annual Revenue: ₹50 crores (2024)
CEO: Priya Sharma
""",
    
    "priya": """
Priya Sharma - CEO and Co-founder
Education: IIT Delhi (B.Tech), Stanford (MBA)
Experience: 15 years in tech industry
Previous: Senior Director at Infosys
Awards: Forbes 30 Under 30 (2015), Women in Tech Leader (2022)
Email: priya@techsolutions.in
""",
    
    "rahul": """
Rahul Verma - CTO and Co-founder
Education: BITS Pilani (B.Tech), IIM Bangalore (MBA)
Experience: 12 years in software development
Previous: Tech Lead at Google India
Expertise: Machine Learning, Cloud Architecture
Email: rahul@techsolutions.in
""",
    
    "products": """
TechSolutions Products:

1. CloudAssist Pro - Enterprise cloud management platform
   - Price: ₹50,000/month
   - Features: Auto-scaling, monitoring, cost optimization

2. SmartHR - AI-powered HR management system
   - Price: ₹25,000/month
   - Features: Recruitment, payroll, performance tracking

3. DataViz Analytics - Business intelligence dashboard
   - Price: ₹15,000/month
   - Features: Real-time analytics, custom reports
""",
    
    "policies": """
Company Policies:

Work Hours: 9 AM - 6 PM, Monday to Friday
Leave Policy: 24 paid leaves + 10 sick leaves per year
Remote Work: Hybrid model - 3 days office, 2 days remote
Probation: 6 months for all new employees
Notice Period: 2 months for permanent employees
""",
    
    "benefits": """
Employee Benefits:

- Health Insurance: ₹5 lakh coverage for employee + family
- Performance Bonus: Up to 20% of annual salary
- Learning Budget: ₹50,000/year for courses and certifications
- Gym Membership: Fully covered
- Team Outings: Quarterly team events
"""
}

print(f"Knowledge base has {len(knowledge_base)} documents")
print(f"Topics: {list(knowledge_base.keys())}")

---

## 6. Simple Keyword-Based Retrieval

In [None]:
def get_relevant_context(message):
    """Find relevant documents based on keyword matching."""
    # Extract words from message
    text = ''.join(ch for ch in message if ch.isalpha() or ch.isspace())
    words = text.lower().split()
    
    # Find matching documents
    relevant = []
    for word in words:
        if word in knowledge_base:
            relevant.append(knowledge_base[word])
    
    return relevant

In [None]:
# Test retrieval
question = "Who is Priya?"
context = get_relevant_context(question)

print(f"Question: {question}")
print(f"Found {len(context)} relevant documents")
if context:
    print(f"\nContext:\n{context[0]}")

In [None]:
# Test with multiple keywords
question = "Tell me about the company and its products"
context = get_relevant_context(question)

print(f"Question: {question}")
print(f"Found {len(context)} relevant documents")

---

## 7. Build the RAG System

Now we combine retrieval with LLM generation. Note the use of `temperature=0` — as we learned in Section 4, this ensures factual, consistent answers.

In [None]:
SYSTEM_PROMPT = """
You are a helpful assistant for TechSolutions India.
You answer questions about the company, its employees, products, and policies.
Use the provided context to answer questions accurately.
If you don't know the answer or it's not in the context, say so.
Keep answers concise and professional.

Context:
{context}
"""

def format_context(relevant_docs):
    """Format retrieved documents as context."""
    if not relevant_docs:
        return "No specific context available for this question."
    return "\n\n---\n\n".join(relevant_docs)

In [None]:
def answer_question(question, history=[]):
    """Answer a question using RAG."""
    # Step 1: Retrieve relevant context
    relevant_docs = get_relevant_context(question)
    context = format_context(relevant_docs)
    
    # Step 2: Create prompt with context
    system_message = SYSTEM_PROMPT.format(context=context)
    
    # Step 3: Generate answer
    messages = [
        {"role": "system", "content": system_message}
    ] + history + [
        {"role": "user", "content": question}
    ]
    
    response = completion(
        model=MODEL,
        messages=messages,
        temperature=0
    )
    
    return response.choices[0].message.content

In [None]:
# Test the RAG system
questions = [
    "Who is the CEO of the company?",
    "What products does TechSolutions offer?",
    "What is the leave policy?",
    "Tell me about Rahul"
]

for q in questions:
    print(f"Q: {q}")
    answer = answer_question(q)
    print(f"A: {answer}\n")

---

## 8. Build a Chat Interface

In [None]:
def chat(message, history):
    """Chat function for Gradio."""
    return answer_question(message, history)

# Launch chat interface
demo = gr.ChatInterface(
    chat,
    title="TechSolutions Assistant",
    description="Ask questions about TechSolutions India - company, employees, products, and policies.",
    examples=[
        "Who founded the company?",
        "What are the employee benefits?",
        "How much does CloudAssist Pro cost?"
    ],
    type="messages"
)

demo.launch()

---

## 9. See the RAG Pipeline in Action

Our `answer_question()` function works, but it hides what's happening inside. Let's create a **verbose version** that shows each step of the RAG pipeline as it runs.

In [None]:
def answer_question_verbose(question):
    """Answer a question using RAG, showing each pipeline stage."""
    
    print(f"{'='*60}")
    print("RAG PIPELINE — Step by Step")
    print(f"{'='*60}")
    
    # Step 1: Question
    print(f"\n[STEP 1] USER QUESTION")
    print(f'   "{question}"')
    
    # Step 2: Retrieval
    print(f"\n[STEP 2] RETRIEVAL")
    relevant_docs = get_relevant_context(question)
    print(f"   Found {len(relevant_docs)} relevant document(s)")
    if relevant_docs:
        for i, doc in enumerate(relevant_docs, 1):
            preview = doc.strip()[:100].replace('\n', ' ')
            print(f"   Doc {i}: {preview}...")
    else:
        print("   No matching documents found.")
    
    # Step 3: Augmentation
    print(f"\n[STEP 3] AUGMENTATION")
    context = format_context(relevant_docs)
    system_message = SYSTEM_PROMPT.format(context=context)
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": question}
    ]
    print(f"   Context length: {len(context)} characters")
    print(f"   Total prompt length: {sum(len(m['content']) for m in messages)} characters")
    
    # Step 4: Generation
    print(f"\n[STEP 4] GENERATION")
    response = completion(
        model=MODEL,
        messages=messages,
        temperature=0
    )
    answer = response.choices[0].message.content
    print(f"   {answer}")
    
    print(f"\n{'='*60}")
    return answer


# Test with example questions
test_questions = [
    "What are the employee benefits?",
    "Who is Rahul?",
    "What is the dress code?"  # Not in knowledge base
]

for q in test_questions:
    answer_question_verbose(q)
    print()

---

## 10. Limitations of Simple RAG

Our simple keyword-based retrieval has limitations:

| Limitation | Example |
|-----------|--------|
| **Exact match only** | "CEO" won't match "priya" document |
| **No semantic understanding** | "founder" won't find CEO info |
| **No ranking** | Can't prioritize more relevant docs |
| **Limited scalability** | Doesn't work for large knowledge bases |

In [None]:
# Demonstrate limitation
question = "Who founded the company?"  # Uses 'founded' not 'priya'
context = get_relevant_context(question)

print(f"Question: {question}")
print(f"Found {len(context)} documents")  # Likely 0 or just 'company'

# But the answer exists in the 'company' and 'priya' documents

---

## 11. Exercise: Expand the Knowledge Base

In [None]:
# Exercise: Add more documents to the knowledge base
# 1. Add a document about "careers" or "jobs"
# 2. Add a document about "clients" or "customers"
# 3. Test with new questions

# Your implementation here
# knowledge_base["careers"] = """..."""
pass

---

## 12. Key Takeaways

1. **LLMs don't know everything** — they can hallucinate or refuse when asked about data outside their training

2. **RAG = Retrieval + Augmentation + Generation** — find relevant documents, inject them as context, then generate an answer

3. **Temperature controls randomness** — RAG uses `temperature=0` for consistent, factual responses

4. **Context is key** — the quality of retrieved documents directly affects answer quality

5. **Simple retrieval works** — keyword matching is a good starting point, but has limitations

6. **System prompts matter** — they instruct the LLM how to use the provided context

### RAG Stage to Code Mapping

| RAG Stage | Our Function | What It Does |
|---|---|---|
| **Retrieval** | `get_relevant_context()` | Searches knowledge base by keyword |
| **Augmentation** | `format_context()` + `SYSTEM_PROMPT` | Formats docs and injects into prompt |
| **Generation** | `completion()` | LLM generates answer from context |

### What's Next?

In the next notebook, we'll improve our retrieval system using:
- **Vector embeddings** — represent text as numbers to capture meaning
- **Semantic search** — find documents by meaning, not just keywords
- **ChromaDB** — a vector database purpose-built for RAG

---

## Additional Resources

- [RAG Paper](https://arxiv.org/abs/2005.11401) — the original research paper
- [LangChain RAG Tutorial](https://python.langchain.com/docs/tutorials/rag/)
- [LiteLLM Documentation](https://docs.litellm.ai/) — unified API for 100+ LLM providers
- [OpenAI API Parameters](https://platform.openai.com/docs/api-reference/chat/create) — full list of parameters
- [IBM: What is RAG? (video)](https://www.ibm.com/think/videos/rag) — short visual explainer
- [RAG Playground (interactive)](https://ragplay.vercel.app/) — see chunking, embeddings, and retrieval live
- [RAG Pipeline Diagrams](https://www.designveloper.com/blog/rag-pipeline-diagram/) — step-by-step visual guide
- [DeepLearning.AI RAG Course](https://learn.deeplearning.ai/courses/retrieval-augmented-generation/) — free short course

---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) - *Transform Graduates into Industry-Ready Professionals*

---