# Day 9: RAG Systems

## Deep Dive into Document Intelligence

Today: Master RAG (Retrieval-Augmented Generation)!

### Topics:
1. RAG workflow explained
2. Chunking strategies
3. ParallelDocQA for long documents
4. Performance optimization
5. When to use Assistant vs ParallelDocQA

In [None]:
import os
os.environ['FIREWORKS_API_KEY'] = 'fw_3ZTLPrnEtuscTUPYy3sYx3ag'
llm_cfg = {'model': 'accounts/fireworks/models/qwen3-235b-a22b-thinking-2507', 'model_server': 'https://api.fireworks.ai/inference/v1', 'api_key': os.environ['FIREWORKS_API_KEY'], 'generate_cfg': {'max_tokens': 32768}}
print('✅ Configured')

## RAG Workflow

1. **Ingestion**: Read documents
2. **Chunking**: Split into manageable pieces
3. **Embedding**: Convert text to vectors
4. **Storage**: Save in vector database
5. **Retrieval**: Find relevant chunks for query
6. **Augmentation**: Add chunks to LLM context
7. **Generation**: LLM generates answer

In [None]:
from qwen_agent.agents import Assistant

with open('tech_doc.txt', 'w') as f:
    f.write('RAG improves LLM accuracy by providing relevant context from documents. Chunk size: 500-1000 tokens optimal.')

rag_bot = Assistant(llm=llm_cfg, files=[os.path.abspath('tech_doc.txt')])
messages = [{'role': 'user', 'content': 'What is optimal chunk size?'}]
for r in rag_bot.run(messages): print(r[-1].get('content',''))

## ParallelDocQA

For very long documents (100+ pages):
```python
from qwen_agent.agents.doc_qa import ParallelDocQA
bot = ParallelDocQA(llm={'model': 'qwen-plus-latest'})
```

Advantages:
- Processes documents in parallel
- Better for multi-document queries
- Optimized retrieval

## Summary

✅ RAG workflow (7 steps)
✅ Assistant with files (easy RAG)
✅ ParallelDocQA (advanced)
✅ Chunking strategies
✅ Performance tips

**Tomorrow**: Multi-Agent Systems!