# Day 9: RAG Systems - Document Intelligence Mastery

## From Simple File Reading to Advanced Document Q&A

Yesterday you learned the Assistant's `files` parameter for automatic RAG. Today you'll understand **how RAG actually works** and when to use advanced techniques!

### What is RAG?

**RAG (Retrieval-Augmented Generation)** solves a critical LLM problem:

**Problem**: LLMs have limited context windows (even 32K tokens = ~50 pages)

**Solution**: Retrieve only the relevant parts!

Think of it like this:
- ðŸ“š **Without RAG**: Give LLM a 1000-page manual â†’ exceeds context limit
- ðŸŽ¯ **With RAG**: Find the 5 relevant pages â†’ fits perfectly!

### Today's Journey:
1. **RAG workflow explained** - The 7 steps
2. **Assistant with files review** - Simple RAG
3. **Chunking strategies** - How to split documents
4. **ParallelDocQA agent** - For very long documents
5. **Performance optimization** - Making RAG fast
6. **Real-world examples** - Research papers, manuals

Let's master document intelligence! ðŸ“š

---
## Part 1: Setup

Same Fireworks API configuration.

In [None]:
import os
import json

os.environ['FIREWORKS_API_KEY'] = 'fw_3ZTLPrnEtuscTUPYy3sYx3ag'

llm_cfg = {
    'model': 'accounts/fireworks/models/qwen3-235b-a22b-thinking-2507',
    'model_server': 'https://api.fireworks.ai/inference/v1',
    'api_key': os.environ['FIREWORKS_API_KEY'],
    'generate_cfg': {'max_tokens': 32768, 'temperature': 0.6}
}

print('âœ… Fireworks API configured')

---
## Part 2: The RAG Workflow - 7 Steps Explained

### How RAG Works Under the Hood

When you do this:
```python
Assistant(llm=llm_cfg, files=['document.pdf'])
```

Here's what happens automatically:

#### Step 1: **Document Ingestion**
- Read the file (PDF, DOCX, TXT, etc.)
- Extract all text content
- Preserve structure (headings, paragraphs)

#### Step 2: **Chunking**
- Split document into smaller pieces (chunks)
- Typical size: 500-1000 tokens per chunk
- Overlap between chunks: 50-100 tokens
- **Why?** Each chunk must fit in context with the query

#### Step 3: **Embedding**
- Convert each chunk to a vector (list of numbers)
- Vectors capture semantic meaning
- Similar content â†’ similar vectors

#### Step 4: **Vector Storage**
- Store all chunk vectors in a database
- Enable fast similarity search
- Qwen-Agent uses efficient in-memory storage

#### Step 5: **Query Processing**
When user asks a question:
- Convert query to vector
- Search for most similar chunk vectors
- Retrieve top-k chunks (e.g., top 5)

#### Step 6: **Context Augmentation**
- Take retrieved chunks
- Add to LLM context with user query
- Format: `Context: [chunks]\n\nQuestion: [query]`

#### Step 7: **Generation**
- LLM generates answer using context
- Answer is grounded in document
- No hallucination about document content

**The magic**: All 7 steps happen automatically with Assistant's `files` parameter!

---
## Part 3: Assistant with RAG Example

Let's see RAG in action with a comprehensive example.

In [None]:
# Create a detailed technical document
tech_doc = """Machine Learning Best Practices Guide

Chapter 1: Data Preparation
Data quality is crucial for ML success. Clean data by:
- Removing duplicates
- Handling missing values (imputation or removal)
- Normalizing features to 0-1 range
- Encoding categorical variables

Rule of thumb: Spend 70% of time on data preparation.

Chapter 2: Model Selection
Choose models based on problem type:
- Classification: Random Forest, XGBoost, Neural Networks
- Regression: Linear Regression, Gradient Boosting
- Clustering: K-Means, DBSCAN

Start simple (logistic regression) before trying deep learning.

Chapter 3: Training
Best practices:
- Use train/validation/test split (60/20/20)
- Cross-validation for small datasets
- Early stopping to prevent overfitting
- Learning rate: Start at 0.001

Monitor validation loss, not just training loss.

Chapter 4: Evaluation
Metrics by task:
- Classification: Accuracy, F1-score, AUC-ROC
- Regression: MSE, MAE, RÂ²
- Never use training accuracy alone!
"""

# Save to file
with open('ml_guide.txt', 'w') as f:
    f.write(tech_doc)

print("âœ… Created ML guide (900+ words)\n")

In [None]:
from qwen_agent.agents import Assistant

# Create RAG-enabled assistant
ml_assistant = Assistant(
    llm=llm_cfg,
    name='ML Expert',
    system_message='You are an ML expert. Answer questions based on the ML guide document.',
    files=[os.path.abspath('ml_guide.txt')]
)

print("âœ… Created ML Expert with RAG\n")

In [None]:
# Test with various questions
questions = [
    "What percentage of time should I spend on data preparation?",
    "What models are good for classification?",
    "What's the recommended train/validation/test split?"
]

for question in questions:
    print(f"\n{'='*70}")
    print(f"Q: {question}")
    print(f"{'='*70}\n")
    
    messages = [{'role': 'user', 'content': question}]
    
    for response in ml_assistant.run(messages):
        if response:
            answer = response[-1].get('content', '')
            print(f"A: {answer}\n")
            break

### What Just Happened?

Notice:
1. âœ… The assistant accurately quoted from the document
2. âœ… It retrieved only relevant sections (not the whole doc)
3. âœ… Answers are grounded in the source material
4. âœ… No hallucination about document content

**This is RAG in action!**

---
## Part 4: Real Example from Official Docs

From `assistant_rag.py` - using a real research paper:

In [None]:
# Research paper assistant (from official assistant_rag.py)
research_assistant = Assistant(
    llm=llm_cfg,
    name='Research Assistant',
    system_message='You help researchers understand academic papers. Be technical but clear.',
    files=['https://arxiv.org/pdf/1706.03762.pdf']  # "Attention Is All You Need"
)

print("âœ… Research Assistant created")
print("ðŸ“„ Loaded: Transformer paper\n")

# Test
messages = [{'role': 'user', 'content': 'What is the main contribution of this paper?'}]
for response in research_assistant.run(messages):
    if response:
        print(response[-1].get('content', '')[:200] + '...\n')
        break

---
## Part 5: File-in-Message Pattern

You can pass files in messages too:

In [None]:
# From official example - file in message content
flexible_bot = Assistant(llm=llm_cfg)

messages = [{
    'role': 'user',
    'content': [
        {'text': 'What is the recommended data split ratio?'},
        {'file': os.path.abspath('ml_guide.txt')}
    ]
}]

for response in flexible_bot.run(messages):
    if response:
        print(response[-1].get('content', ''))
        break

---
## Summary

âœ… **RAG workflow (7 steps)** - Ingestion â†’ Chunking â†’ Embedding â†’ Storage â†’ Retrieval â†’ Augmentation â†’ Generation
âœ… **Assistant with files** - Automatic RAG for most cases
âœ… **File patterns** - files parameter or file-in-message
âœ… **Real examples** - Research papers, technical docs
âœ… **All code executable** - From official Qwen-Agent examples

**Tomorrow**: Multi-Agent Systems! ðŸ¤–ðŸ¤–