# RAG (Retrieval-Augmented Generation) with Trae AI

This notebook demonstrates how to build a complete RAG system using Trae AI for enhanced retrieval and generation capabilities.

## Learning Objectives
- Understand RAG architecture and components
- Set up vector databases for document retrieval
- Integrate Trae AI with RAG pipelines
- Build an end-to-end RAG application


In [None]:
# Import required libraries
import os
import sys
import json
import numpy as np
import pandas as pd
from pathlib import Path
from typing import List, Dict, Any

# Vector database and embeddings
try:
    import chromadb
    from chromadb.config import Settings
    chroma_available = True
except ImportError:
    chroma_available = False
    print("⚠️  ChromaDB not available. Install with: pip install chromadb")

# Sentence transformers for embeddings
try:
    from sentence_transformers import SentenceTransformer
    sentence_transformers_available = True
except ImportError:
    sentence_transformers_available = False
    print("⚠️  SentenceTransformers not available. Install with: pip install sentence-transformers")

# LangChain for RAG pipeline
try:
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain.schema import Document
    langchain_available = True
except ImportError:
    langchain_available = False
    print("⚠️  LangChain not available. Install with: pip install langchain")

# Standard ML libraries
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Add project root to path
project_root = Path().absolute().parent
sys.path.append(str(project_root))

print("Environment setup complete!")


## 1. Understanding RAG Architecture

RAG combines retrieval and generation in a two-step process:

1. **Retrieval**: Find relevant documents from a knowledge base
2. **Generation**: Use retrieved context to generate informed responses

### Components:
- **Document Store**: Vector database with embedded documents
- **Retriever**: Finds relevant documents based on query similarity
- **Generator**: LLM that uses retrieved context for response generation
- **Embeddings**: Vector representations of text for similarity search


## 2. Sample Knowledge Base

Let's create a sample knowledge base about AI and machine learning.

In [None]:
# Sample documents for our knowledge base
sample_documents = [
    {
        "title": "Introduction to Machine Learning",
        "content": """Machine learning is a subset of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves. The process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide."""
    },
    {
        "title": "Deep Learning Fundamentals",
        "content": """Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. Deep learning architectures such as deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics and drug design."""
    },
    {
        "title": "Natural Language Processing",
        "content": """Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation."""
    },
    {
        "title": "Computer Vision Applications",
        "content": """Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do. Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information."""
    },
    {
        "title": "Reinforcement Learning",
        "content": """Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Unlike supervised learning, reinforcement learning does not require labelled input/output pairs to be presented, and need not explicitly correct sub-optimal actions."""
    },
    {
        "title": "Transformer Architecture",
        "content": """The Transformer is a deep learning model introduced in 2017, used primarily in the field of natural language processing (NLP). Like recurrent neural networks (RNNs), Transformers are designed to handle sequential input data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, Transformers do not require that the sequential data be processed in order. This feature allows for much more parallelization than RNNs and therefore reduces training times."""
    },
    {
        "title": "Large Language Models",
        "content": """Large language models (LLMs) are a type of artificial intelligence model designed to understand and generate human-like text. These models are trained on vast amounts of text data and use deep learning techniques, particularly transformer architectures, to learn patterns in language. LLMs can perform various tasks such as text completion, question answering, summarization, translation, and creative writing. Examples include GPT-3, GPT-4, BERT, and T5."""
    },
    {
        "title": "AI Ethics and Bias",
        "content": """AI ethics is a branch of ethics that examines the moral implications of artificial intelligence systems. Key concerns include algorithmic bias, fairness, transparency, accountability, and the potential societal impacts of AI deployment. Bias in AI systems can arise from training data, algorithmic design, or deployment contexts, leading to unfair treatment of certain groups. Addressing these issues requires interdisciplinary collaboration between technologists, ethicists, policymakers, and affected communities."""
    }
]

print(f"Created knowledge base with {len(sample_documents)} documents")
for doc in sample_documents:
    print(f"- {doc['title']}")


## 3. Document Processing and Chunking

Split documents into smaller chunks for better retrieval.

In [None]:
# Simple text splitter (fallback if LangChain not available)
class SimpleTextSplitter:
    def __init__(self, chunk_size=500, chunk_overlap=50):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
    
    def split_text(self, text):
        chunks = []
        start = 0
        while start < len(text):
            end = start + self.chunk_size
            chunk = text[start:end]
            chunks.append(chunk)
            start = end - self.chunk_overlap
        return chunks

# Use LangChain splitter if available, otherwise use simple splitter
if langchain_available:
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=50,
        length_function=len,
    )
    print("✅ Using LangChain RecursiveCharacterTextSplitter")
else:
    text_splitter = SimpleTextSplitter(chunk_size=500, chunk_overlap=50)
    print("✅ Using simple text splitter")

# Process documents into chunks
chunks = []
for doc in sample_documents:
    if langchain_available:
        doc_chunks = text_splitter.split_text(doc['content'])
    else:
        doc_chunks = text_splitter.split_text(doc['content'])
    
    for i, chunk in enumerate(doc_chunks):
        chunks.append({
            'id': f"{doc['title'].replace(' ', '_').lower()}_{i}",
            'title': doc['title'],
            'content': chunk,
            'chunk_index': i
        })

print(f"
✅ Created {len(chunks)} text chunks from {len(sample_documents)} documents")

# Show example chunk
print("
Example chunk:")
print(f"Title: {chunks[0]['title']}")
print(f"Content: {chunks[0]['content'][:200]}...")


## 4. Embedding Generation

Generate embeddings for document chunks using sentence transformers.

In [None]:
# Simple embedding model (fallback)
class SimpleEmbedding:
    def __init__(self):
        # Simple TF-IDF like embedding for demo
        self.vocab = {}
        self.embedding_dim = 384
    
    def encode(self, texts):
        if isinstance(texts, str):
            texts = [texts]
        
        embeddings = []
        for text in texts:
            # Simple hash-based embedding for demo
            words = text.lower().split()
            embedding = np.random.rand(self.embedding_dim)
            # Add some consistency based on text content
            for word in words:
                word_hash = hash(word) % self.embedding_dim
                embedding[word_hash] += 0.1
            embeddings.append(embedding / np.linalg.norm(embedding))
        
        return np.array(embeddings)

# Load embedding model
if sentence_transformers_available:
    print("Loading SentenceTransformer model...")
    embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
    print("✅ SentenceTransformer model loaded")
else:
    print("Using simple embedding model for demo")
    embedding_model = SimpleEmbedding()
    print("✅ Simple embedding model initialized")

# Generate embeddings for all chunks
print("
Generating embeddings for document chunks...")
chunk_texts = [chunk['content'] for chunk in chunks]
embeddings = embedding_model.encode(chunk_texts)

print(f"✅ Generated embeddings: {embeddings.shape}")

# Add embeddings to chunks
for i, chunk in enumerate(chunks):
    chunk['embedding'] = embeddings[i]


## 5. Vector Database Setup

Set up ChromaDB for storing and retrieving document embeddings.

In [None]:
# Simple vector store (fallback)
class SimpleVectorStore:
    def __init__(self):
        self.documents = []
        self.embeddings = []
        self.metadata = []
    
    def add(self, documents, embeddings, metadata):
        self.documents.extend(documents)
        self.embeddings.extend(embeddings)
        self.metadata.extend(metadata)
    
    def similarity_search(self, query_embedding, k=5):
        if not self.embeddings:
            return []
        
        # Calculate cosine similarity
        similarities = []
        for emb in self.embeddings:
            sim = np.dot(query_embedding, emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(emb))
            similarities.append(sim)
        
        # Get top k results
        top_indices = np.argsort(similarities)[-k:][::-1]
        
        results = []
        for idx in top_indices:
            results.append({
                'document': self.documents[idx],
                'metadata': self.metadata[idx],
                'similarity': similarities[idx]
            })
        
        return results

# Set up vector store
if chroma_available:
    print("Setting up ChromaDB...")
    
    # Initialize ChromaDB client
    chroma_client = chromadb.Client()
    
    # Create or get collection
    collection_name = "ai_knowledge_base"
    try:
        collection = chroma_client.create_collection(name=collection_name)
    except:
        collection = chroma_client.get_collection(name=collection_name)
    
    # Add documents to collection
    collection.add(
        documents=[chunk['content'] for chunk in chunks],
        embeddings=embeddings.tolist(),
        metadatas=[{
            'title': chunk['title'],
            'chunk_index': chunk['chunk_index'],
            'id': chunk['id']
        } for chunk in chunks],
        ids=[chunk['id'] for chunk in chunks]
    )
    
    vector_store = collection
    print("✅ ChromaDB collection created and populated")
    
else:
    print("Setting up simple vector store...")
    vector_store = SimpleVectorStore()
    
    # Add documents
    vector_store.add(
        documents=[chunk['content'] for chunk in chunks],
        embeddings=[chunk['embedding'] for chunk in chunks],
        metadata=[{
            'title': chunk['title'],
            'chunk_index': chunk['chunk_index'],
            'id': chunk['id']
        } for chunk in chunks]
    )
    
    print("✅ Simple vector store created and populated")

print(f"Vector store contains {len(chunks)} document chunks")


## 6. Retrieval System

Implement the retrieval component of our RAG system.

In [None]:
class RAGRetriever:
    """Retrieval component for RAG system"""
    
    def __init__(self, vector_store, embedding_model, top_k=3):
        self.vector_store = vector_store
        self.embedding_model = embedding_model
        self.top_k = top_k
    
    def retrieve(self, query: str) -> List[Dict[str, Any]]:
        """Retrieve relevant documents for a query"""
        # Generate query embedding
        query_embedding = self.embedding_model.encode([query])[0]
        
        # Search vector store
        if chroma_available and hasattr(self.vector_store, 'query'):
            # ChromaDB query
            results = self.vector_store.query(
                query_embeddings=[query_embedding.tolist()],
                n_results=self.top_k
            )
            
            retrieved_docs = []
            for i in range(len(results['documents'][0])):
                retrieved_docs.append({
                    'content': results['documents'][0][i],
                    'metadata': results['metadatas'][0][i],
                    'distance': results['distances'][0][i] if 'distances' in results else 0
                })
        else:
            # Simple vector store query
            results = self.vector_store.similarity_search(query_embedding, k=self.top_k)
            retrieved_docs = [{
                'content': result['document'],
                'metadata': result['metadata'],
                'similarity': result['similarity']
            } for result in results]
        
        return retrieved_docs
    
    def format_context(self, retrieved_docs: List[Dict[str, Any]]) -> str:
        """Format retrieved documents into context string"""
        context_parts = []
        
        for i, doc in enumerate(retrieved_docs, 1):
            title = doc['metadata'].get('title', 'Unknown')
            content = doc['content']
            context_parts.append(f"Document {i} ({title}):\n{content}")
        
        return "\n\n".join(context_parts)

# Initialize retriever
retriever = RAGRetriever(vector_store, embedding_model, top_k=3)
print("✅ RAG retriever initialized")


In [None]:
# Test retrieval
test_queries = [
    "What is machine learning?",
    "How do transformers work?",
    "What are the applications of computer vision?",
    "Explain reinforcement learning"
]

print("Testing retrieval system:")
print("=" * 50)

for query in test_queries:
    print(f"Query: {query}")
    retrieved_docs = retriever.retrieve(query)
    
    print("Retrieved documents:")
    for i, doc in enumerate(retrieved_docs, 1):
        title = doc['metadata'].get('title', 'Unknown')
        score_key = 'similarity' if 'similarity' in doc else 'distance'
        score = doc.get(score_key, 0)
        print(f"  {i}. {title} (score: {score:.3f})")
        print(f"     {doc['content'][:100]}...")
    
    print("-" * 30)


## 7. Generation Component

Set up the language model for generating responses based on retrieved context.

In [None]:
# Load language model for generation
MODEL_NAME = "microsoft/DialoGPT-medium"

print(f"Loading generation model: {MODEL_NAME}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

# Set padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print(f"✅ Generation model loaded: {model.num_parameters():,} parameters")


In [None]:
class RAGGenerator:
    """Generation component for RAG system"""
    
    def __init__(self, model, tokenizer, max_length=512):
        self.model = model
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def generate_response(self, query: str, context: str) -> str:
        """Generate response using query and retrieved context"""
        # Create prompt with context and query
        prompt = self._create_prompt(query, context)
        
        # Tokenize prompt
        inputs = self.tokenizer.encode(prompt, return_tensors='pt', max_length=self.max_length, truncation=True)
        
        # Generate response
        with torch.no_grad():
            outputs = self.model.generate(
                inputs,
                max_length=inputs.shape[1] + 150,
                do_sample=True,
                temperature=0.7,
                top_p=0.9,
                pad_token_id=self.tokenizer.eos_token_id,
                eos_token_id=self.tokenizer.eos_token_id
            )
        
        # Decode response
        generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = generated_text[len(prompt):].strip()
        
        return response
    
    def _create_prompt(self, query: str, context: str) -> str:
        """Create a well-formatted prompt for generation"""
        prompt = f"""Context Information:
{context}

Question: {query}

Based on the context information above, please provide a comprehensive answer to the question:

Answer:"""
        
        return prompt

# Initialize generator
generator = RAGGenerator(model, tokenizer)
print("✅ RAG generator initialized")


## 8. Complete RAG Pipeline

Combine retrieval and generation into a complete RAG system.

In [None]:
class TraeRAGPipeline:
    """Complete RAG pipeline with Trae AI optimizations"""
    
    def __init__(self, retriever, generator):
        self.retriever = retriever
        self.generator = generator
        self.conversation_history = []
    
    def query(self, question: str, include_sources: bool = True) -> Dict[str, Any]:
        """Process a query through the complete RAG pipeline"""
        # Step 1: Retrieve relevant documents
        retrieved_docs = self.retriever.retrieve(question)
        
        # Step 2: Format context
        context = self.retriever.format_context(retrieved_docs)
        
        # Step 3: Generate response
        response = self.generator.generate_response(question, context)
        
        # Step 4: Prepare result
        result = {
            'question': question,
            'answer': response,
            'retrieved_docs': retrieved_docs,
            'context_used': context
        }
        
        # Add to conversation history
        self.conversation_history.append(result)
        
        return result
    
    def format_response(self, result: Dict[str, Any], include_sources: bool = True) -> str:
        """Format the response for display"""
        formatted = f"**Question:** {result['question']}\n\n"
        formatted += f"**Answer:** {result['answer']}\n\n"
        
        if include_sources:
            formatted += "**Sources:**\n"
            for i, doc in enumerate(result['retrieved_docs'], 1):
                title = doc['metadata'].get('title', 'Unknown')
                formatted += f"{i}. {title}\n"
        
        return formatted
    
    def get_conversation_history(self) -> List[Dict[str, Any]]:
        """Get the conversation history"""
        return self.conversation_history
    
    def clear_history(self):
        """Clear conversation history"""
        self.conversation_history = []

# Initialize complete RAG pipeline
rag_pipeline = TraeRAGPipeline(retriever, generator)
print("✅ Complete RAG pipeline initialized")


## 9. Testing the RAG System

Test our complete RAG system with various questions.

In [None]:
# Test questions
test_questions = [
    "What is machine learning and how does it work?",
    "Can you explain the transformer architecture?",
    "What are the main applications of computer vision?",
    "How does reinforcement learning differ from supervised learning?",
    "What are the ethical concerns with AI systems?"
]

print("Testing complete RAG pipeline:")
print("=" * 60)

for i, question in enumerate(test_questions, 1):
    print(f"
🤔 Question {i}: {question}")
    print("-" * 40)
    
    # Get response from RAG pipeline
    result = rag_pipeline.query(question)
    
    # Display formatted response
    print(f"🤖 Answer: {result['answer']}")
    
    print("📚 Sources used:")
    for j, doc in enumerate(result['retrieved_docs'], 1):
        title = doc['metadata'].get('title', 'Unknown')
        score_key = 'similarity' if 'similarity' in doc else 'distance'
        score = doc.get(score_key, 0)
        print(f"   {j}. {title} (relevance: {score:.3f})")
    
    print("=" * 60)


## 10. RAG System Evaluation

Evaluate the performance of our RAG system.

In [None]:
class RAGEvaluator:
    """Evaluate RAG system performance"""
    
    def __init__(self, rag_pipeline):
        self.rag_pipeline = rag_pipeline
    
    def evaluate_retrieval(self, questions_and_expected_docs):
        """Evaluate retrieval accuracy"""
        total_questions = len(questions_and_expected_docs)
        correct_retrievals = 0
        
        for question, expected_titles in questions_and_expected_docs:
            retrieved_docs = self.rag_pipeline.retriever.retrieve(question)
            retrieved_titles = [doc['metadata']['title'] for doc in retrieved_docs]
            
            # Check if any expected document was retrieved
            if any(title in retrieved_titles for title in expected_titles):
                correct_retrievals += 1
        
        accuracy = correct_retrievals / total_questions
        return accuracy
    
    def evaluate_response_quality(self, questions):
        """Evaluate response quality metrics"""
        metrics = {
            'avg_response_length': 0,
            'avg_retrieval_time': 0,
            'avg_generation_time': 0
        }
        
        total_response_length = 0
        
        for question in questions:
            import time
            
            # Measure retrieval time
            start_time = time.time()
            retrieved_docs = self.rag_pipeline.retriever.retrieve(question)
            retrieval_time = time.time() - start_time
            
            # Measure generation time
            context = self.rag_pipeline.retriever.format_context(retrieved_docs)
            start_time = time.time()
            response = self.rag_pipeline.generator.generate_response(question, context)
            generation_time = time.time() - start_time
            
            total_response_length += len(response.split())
            metrics['avg_retrieval_time'] += retrieval_time
            metrics['avg_generation_time'] += generation_time
        
        # Calculate averages
        num_questions = len(questions)
        metrics['avg_response_length'] = total_response_length / num_questions
        metrics['avg_retrieval_time'] /= num_questions
        metrics['avg_generation_time'] /= num_questions
        
        return metrics

# Evaluation data
eval_questions_and_docs = [
    ("What is machine learning?", ["Introduction to Machine Learning"]),
    ("How do transformers work?", ["Transformer Architecture"]),
    ("What is computer vision?", ["Computer Vision Applications"]),
    ("Explain reinforcement learning", ["Reinforcement Learning"]),
    ("What are large language models?", ["Large Language Models"])
]

# Initialize evaluator
evaluator = RAGEvaluator(rag_pipeline)

# Evaluate retrieval accuracy
retrieval_accuracy = evaluator.evaluate_retrieval(eval_questions_and_docs)
print(f"📊 Retrieval Accuracy: {retrieval_accuracy:.2%}")

# Evaluate response quality
eval_questions = [q for q, _ in eval_questions_and_docs]
quality_metrics = evaluator.evaluate_response_quality(eval_questions)

print("
📈 Performance Metrics:")
print(f"  Average response length: {quality_metrics['avg_response_length']:.1f} words")
print(f"  Average retrieval time: {quality_metrics['avg_retrieval_time']:.3f} seconds")
print(f"  Average generation time: {quality_metrics['avg_generation_time']:.3f} seconds")


## 11. Trae AI Integration and Optimizations

Demonstrate how Trae AI can enhance the RAG pipeline.

In [None]:
# Trae AI enhanced RAG pipeline
class TraeEnhancedRAG(TraeRAGPipeline):
    """RAG pipeline with Trae AI optimizations"""
    
    def __init__(self, retriever, generator):
        super().__init__(retriever, generator)
        self.trae_optimizations = {
            'adaptive_retrieval': True,
            'context_compression': True,
            'response_caching': True,
            'quality_filtering': True
        }
        self.response_cache = {}
    
    def query(self, question: str, include_sources: bool = True) -> Dict[str, Any]:
        """Enhanced query with Trae AI optimizations"""
        # Check cache first
        if self.trae_optimizations['response_caching']:
            cache_key = hash(question)
            if cache_key in self.response_cache:
                print("🚀 Using cached response")
                return self.response_cache[cache_key]
        
        # Adaptive retrieval - adjust number of documents based on query complexity
        if self.trae_optimizations['adaptive_retrieval']:
            query_complexity = len(question.split())
            if query_complexity > 10:
                self.retriever.top_k = 5  # More documents for complex queries
            else:
                self.retriever.top_k = 3  # Fewer for simple queries
        
        # Standard retrieval
        retrieved_docs = self.retriever.retrieve(question)
        
        # Quality filtering - remove low-relevance documents
        if self.trae_optimizations['quality_filtering']:
            threshold = 0.3  # Minimum similarity threshold
            retrieved_docs = [
                doc for doc in retrieved_docs 
                if doc.get('similarity', doc.get('distance', 1)) > threshold
            ]
        
        # Context compression - optimize context length
        if self.trae_optimizations['context_compression']:
            context = self._compress_context(retrieved_docs, question)
        else:
            context = self.retriever.format_context(retrieved_docs)
        
        # Generate response
        response = self.generator.generate_response(question, context)
        
        # Prepare result
        result = {
            'question': question,
            'answer': response,
            'retrieved_docs': retrieved_docs,
            'context_used': context,
            'trae_optimizations_used': self.trae_optimizations
        }
        
        # Cache result
        if self.trae_optimizations['response_caching']:
            self.response_cache[cache_key] = result
        
        # Add to conversation history
        self.conversation_history.append(result)
        
        return result
    
    def _compress_context(self, retrieved_docs, question):
        """Compress context by extracting most relevant sentences"""
        # Simple compression: take first 2 sentences from each document
        compressed_parts = []
        
        for i, doc in enumerate(retrieved_docs, 1):
            title = doc['metadata'].get('title', 'Unknown')
            content = doc['content']
            
            # Extract first 2 sentences
            sentences = content.split('. ')
            compressed_content = '. '.join(sentences[:2])
            if not compressed_content.endswith('.'):
                compressed_content += '.'
            
            compressed_parts.append(f"Source {i} ({title}): {compressed_content}")
        
        return '\n\n'.join(compressed_parts)

# Initialize Trae-enhanced RAG
trae_rag = TraeEnhancedRAG(retriever, generator)
print("✅ Trae AI enhanced RAG pipeline initialized")

# Test enhanced pipeline
print("
🚀 Testing Trae AI enhanced RAG:")
test_question = "What is the difference between machine learning and deep learning?"

result = trae_rag.query(test_question)
print(f"
Question: {result['question']}")
print(f"Answer: {result['answer']}")
print(f"
Optimizations used: {result['trae_optimizations_used']}")


## 12. Saving and Deployment

Save the RAG system components for production deployment.

In [None]:
# Save RAG system configuration
rag_config = {
    'embedding_model': 'all-MiniLM-L6-v2' if sentence_transformers_available else 'simple',
    'generation_model': MODEL_NAME,
    'vector_store': 'chromadb' if chroma_available else 'simple',
    'retrieval_top_k': 3,
    'max_generation_length': 150,
    'trae_optimizations': trae_rag.trae_optimizations
}

# Save configuration
config_path = Path('../trae_llm/rag_config.json')
with open(config_path, 'w') as f:
    json.dump(rag_config, f, indent=2)

print(f"✅ RAG configuration saved to {config_path}")

# Save sample knowledge base
kb_path = Path('../data/knowledge_base.json')
with open(kb_path, 'w') as f:
    json.dump(sample_documents, f, indent=2)

print(f"✅ Knowledge base saved to {kb_path}")

# Performance summary
print("
📊 RAG System Summary:")
print(f"  Documents in knowledge base: {len(sample_documents)}")
print(f"  Text chunks created: {len(chunks)}")
print(f"  Embedding model: {rag_config['embedding_model']}")
print(f"  Generation model: {rag_config['generation_model']}")
print(f"  Vector store: {rag_config['vector_store']}")
print(f"  Retrieval accuracy: {retrieval_accuracy:.2%}")


## Next Steps

Congratulations! You've built a complete RAG system with Trae AI enhancements. Continue with:

- **05_evaluation_and_visualization.ipynb**: Comprehensive evaluation and visualization
- **scripts/launch_rag_server.py**: Deploy your RAG system as an API

## Key Takeaways

1. **RAG Architecture**: Combines retrieval and generation for informed responses
2. **Vector Databases**: Enable efficient similarity search for document retrieval
3. **Trae AI Optimizations**: Enhance performance with adaptive retrieval and context compression
4. **Evaluation**: Important to measure both retrieval accuracy and response quality

## Advanced Exercises

Try these advanced features:

1. **Multi-modal RAG**: Add support for images and documents
2. **Hybrid Search**: Combine semantic and keyword search
3. **Dynamic Knowledge Base**: Add real-time document updates
4. **Conversation Memory**: Maintain context across multiple queries
5. **Custom Embeddings**: Fine-tune embeddings for your domain
6. **Advanced Filtering**: Implement metadata-based filtering
