# Module 10: Prompt Engineering for RAG

## 🎯 Learning Objectives
By the end of this module, you will:
- Design effective prompts that maximize utilization of retrieved context
- Implement robust prompt templates for different RAG scenarios
- Handle context length limitations and optimization strategies
- Create model-specific prompt adaptations for different LLMs
- Master advanced techniques like Chain-of-Thought and few-shot prompting in RAG
- Build attribution and citation systems for reliable source tracking

## 📚 Key Concepts

### Why Prompt Engineering is Critical for RAG

**The RAG Quality Chain:**
```
Great Retrieval + Poor Prompt = Mediocre Results
Great Retrieval + Great Prompt = Excellent Results
```

Even perfect retrieval can fail without effective prompts. Research shows that **prompt quality accounts for 30-40% of RAG system performance**.

### 🏗️ RAG Prompt Architecture (2025)

**Modern RAG prompts follow a structured format:**

1. **System Context**: Role definition and behavior instructions
2. **Retrieved Context**: Relevant documents with metadata
3. **Task Instructions**: Specific guidance for using context
4. **User Query**: The actual question or request
5. **Output Format**: Structure and citation requirements
6. **Quality Controls**: Fallback instructions and limitations

### 🔄 Context Integration Patterns

#### Context-First Pattern (Most Common):
```
CONTEXT: [Retrieved information]
QUESTION: [User query]
ANSWER: [LLM response]
```

#### Question-Context Pattern:
```
QUESTION: [User query]
RELEVANT_CONTEXT: [Retrieved information]
ANSWER: [LLM response]
```

#### Interleaved Pattern (Advanced):
```
QUESTION: [User query]
CONTEXT_1: [Most relevant document]
ANALYSIS_1: [How this relates to question]
CONTEXT_2: [Secondary document]
FINAL_ANSWER: [Synthesized response]
```

### 📊 2025 Prompt Engineering Best Practices

| Technique | Impact | Use Case |
|-----------|--------|----------|
| **Structured Templates** | +25% consistency | Production systems |
| **Chain-of-Thought** | +30% reasoning quality | Complex queries |
| **Attribution Requirements** | +40% source accuracy | Fact-sensitive domains |
| **Few-Shot Examples** | +20% format compliance | Specific output formats |
| **Context Summarization** | +15% with long contexts | Large document sets |

### ⚡ Context Length Optimization

**Context window utilization strategies:**
- **GPT-4**: 128K tokens (~200 pages)
- **Claude-3**: 200K tokens (~300 pages)  
- **Gemini Pro**: 2M tokens (~3000 pages)

**Optimization techniques:**
1. **Context truncation**: Keep most relevant portions
2. **Context summarization**: LLM-powered compression
3. **Hierarchical context**: Important info first
4. **Dynamic context**: Adjust based on query complexity

## 🛠️ Setup
Let's install the required packages and set up our prompt engineering lab.

In [None]:
# Install required packages
!pip install -q langchain langchain-community langchain-openai
!pip install -q openai anthropic python-dotenv
!pip install -q jinja2 tiktoken
!pip install -q numpy pandas matplotlib seaborn plotly
!pip install -q textstat nltk
# Note: You'll need API keys for OpenAI/Anthropic for full functionality

In [None]:
import os
import re
import time
import json
from datetime import datetime
from typing import List, Dict, Any, Tuple, Optional, Union
import warnings
warnings.filterwarnings('ignore')

# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter

# Template engine
from jinja2 import Template, Environment, BaseLoader
import tiktoken

# Text analysis
import textstat
import nltk
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt', quiet=True)

# LangChain components
from langchain.schema import SystemMessage, HumanMessage, AIMessage
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Set up visualization
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Initialize tokenizer for token counting
try:
    tokenizer = tiktoken.get_encoding("cl100k_base")  # GPT-4 tokenizer
except:
    tokenizer = None
    print("⚠️ Tiktoken not available, token counting will be estimated")

print("✅ Setup complete!")
print(f"📅 Today's date: {datetime.now().strftime('%Y-%m-%d')}")
print(f"🔑 OpenAI API Key: {'✅ Available' if os.getenv('OPENAI_API_KEY') else '❌ Not found'}")

## 🧪 Exercise 1: RAG Prompt Templates and Patterns

Let's build a comprehensive library of RAG prompt templates for different use cases.

In [None]:
class RAGPromptLibrary:
    """Comprehensive library of RAG prompt templates and patterns"""
    
    def __init__(self):
        self.templates = {}
        self.template_usage_stats = defaultdict(int)
        self.jinja_env = Environment(loader=BaseLoader())
        
        # Initialize standard templates
        self._initialize_standard_templates()
    
    def _initialize_standard_templates(self):
        """Initialize standard RAG prompt templates"""
        
        # 1. Basic RAG Template
        self.templates['basic_rag'] = {
            'name': 'Basic RAG',
            'description': 'Simple context-question-answer format',
            'template': '''You are a helpful assistant that answers questions based on provided context.

CONTEXT:
{{ context }}

QUESTION: {{ question }}

Please answer the question based on the provided context. If the context doesn't contain enough information to answer the question, please say so.

ANSWER:''',
            'variables': ['context', 'question'],
            'use_cases': ['General Q&A', 'Simple fact-finding', 'Basic information retrieval']
        }
        
        # 2. Structured RAG with Citations
        self.templates['structured_citations'] = {
            'name': 'Structured RAG with Citations',
            'description': 'Structured format with mandatory source attribution',
            'template': '''You are an expert assistant that provides accurate answers with proper citations.

CONTEXT SOURCES:
{% for doc in context_sources %}
[{{ loop.index }}] {{ doc.title or "Document " + loop.index|string }}
{{ doc.content }}
{% endfor %}

QUESTION: {{ question }}

INSTRUCTIONS:
1. Answer the question using ONLY information from the provided sources
2. Cite sources using [1], [2], etc. format
3. If insufficient information, state what's missing
4. Be precise and factual

ANSWER:''',
            'variables': ['context_sources', 'question'],
            'use_cases': ['Research assistance', 'Academic writing', 'Fact-checking', 'Legal/Medical domains']
        }
        
        # 3. Chain-of-Thought RAG
        self.templates['chain_of_thought'] = {
            'name': 'Chain-of-Thought RAG',
            'description': 'Step-by-step reasoning with retrieved context',
            'template': '''You are an analytical assistant that thinks step-by-step.

CONTEXT:
{{ context }}

QUESTION: {{ question }}

Please think through this step by step:

STEP 1 - IDENTIFY RELEVANT INFORMATION:
What information from the context is most relevant to the question?

STEP 2 - ANALYZE THE INFORMATION:
How does this information relate to and help answer the question?

STEP 3 - SYNTHESIZE AND CONCLUDE:
Based on the analysis, what is the complete answer?

Let me work through this:''',
            'variables': ['context', 'question'],
            'use_cases': ['Complex reasoning', 'Multi-step problems', 'Analysis tasks', 'Educational content']
        }
        
        # 4. Conversational RAG
        self.templates['conversational'] = {
            'name': 'Conversational RAG',
            'description': 'Multi-turn conversation with context and history',
            'template': '''You are a helpful assistant in an ongoing conversation.

CONTEXT:
{{ context }}

CONVERSATION HISTORY:
{% for turn in conversation_history %}
{{ turn.role|upper }}: {{ turn.content }}
{% endfor %}

CURRENT QUESTION: {{ question }}

Please respond naturally, considering both the provided context and the conversation history. Reference previous parts of our conversation when relevant.

RESPONSE:''',
            'variables': ['context', 'conversation_history', 'question'],
            'use_cases': ['Chatbots', 'Customer support', 'Interactive learning', 'Consultation systems']
        }
        
        # 5. Few-Shot RAG
        self.templates['few_shot'] = {
            'name': 'Few-Shot RAG',
            'description': 'Examples-based learning with retrieved context',
            'template': '''You are an assistant that learns from examples and applies knowledge to new situations.

Here are some examples of how to answer questions using provided context:

{% for example in examples %}
EXAMPLE {{ loop.index }}:
Context: {{ example.context }}
Question: {{ example.question }}
Answer: {{ example.answer }}

{% endfor %}
Now answer this new question following the same pattern:

CONTEXT:
{{ context }}

QUESTION: {{ question }}

ANSWER:''',
            'variables': ['examples', 'context', 'question'],
            'use_cases': ['Specific formats', 'Domain adaptation', 'Style consistency', 'Complex outputs']
        }
        
        # 6. Summarization RAG
        self.templates['summarization'] = {
            'name': 'Summarization RAG',
            'description': 'For handling large contexts with summarization',
            'template': '''You are an expert at analyzing and summarizing information.

LARGE CONTEXT ({{ context_length }} tokens):
{{ context }}

QUESTION: {{ question }}

TASK:
1. First, identify the most relevant sections of the context for the question
2. Summarize the key information from those sections
3. Answer the question based on your summary
4. Note if the full context contains additional relevant details

RELEVANT SECTIONS SUMMARY:
[Summarize key relevant information here]

ANSWER BASED ON SUMMARY:
[Provide answer here]

ADDITIONAL CONTEXT NOTES:
[Note any other relevant details from the full context]''',
            'variables': ['context', 'context_length', 'question'],
            'use_cases': ['Long documents', 'Research papers', 'Legal documents', 'Technical manuals']
        }
        
        # 7. Comparative Analysis RAG
        self.templates['comparative'] = {
            'name': 'Comparative Analysis RAG',
            'description': 'For comparing information across multiple sources',
            'template': '''You are an analyst comparing information from multiple sources.

{% for source in sources %}
SOURCE {{ loop.index }} ({{ source.title or "Document " + loop.index|string }}):
{{ source.content }}

{% endfor %}
ANALYSIS QUESTION: {{ question }}

Please provide a comparative analysis:

SIMILARITIES:
- [What do the sources agree on?]

DIFFERENCES:
- [Where do sources differ or contradict?]

KEY INSIGHTS:
- [What unique insights emerge from comparing these sources?]

CONCLUSION:
[Overall assessment based on the comparison]''',
            'variables': ['sources', 'question'],
            'use_cases': ['Research synthesis', 'Market analysis', 'Literature review', 'Policy comparison']
        }
    
    def get_template(self, template_name: str) -> Dict[str, Any]:
        """Get a template by name"""
        if template_name not in self.templates:
            raise ValueError(f"Template '{template_name}' not found. Available: {list(self.templates.keys())}")
        
        self.template_usage_stats[template_name] += 1
        return self.templates[template_name]
    
    def render_prompt(self, template_name: str, **kwargs) -> str:
        """Render a prompt template with provided variables"""
        template_info = self.get_template(template_name)
        
        # Check required variables
        missing_vars = set(template_info['variables']) - set(kwargs.keys())
        if missing_vars:
            raise ValueError(f"Missing required variables: {missing_vars}")
        
        # Render template
        template = self.jinja_env.from_string(template_info['template'])
        rendered = template.render(**kwargs)
        
        return rendered.strip()
    
    def add_custom_template(self, name: str, template: str, variables: List[str], 
                          description: str = "", use_cases: List[str] = None) -> None:
        """Add a custom template to the library"""
        self.templates[name] = {
            'name': name,
            'description': description,
            'template': template,
            'variables': variables,
            'use_cases': use_cases or [],
            'custom': True
        }
        print(f"✅ Added custom template: {name}")
    
    def list_templates(self) -> pd.DataFrame:
        """List all available templates"""
        template_data = []
        for name, info in self.templates.items():
            template_data.append({
                'Name': name,
                'Description': info['description'],
                'Variables': ', '.join(info['variables']),
                'Use Cases': '; '.join(info['use_cases']),
                'Usage Count': self.template_usage_stats[name],
                'Type': 'Custom' if info.get('custom', False) else 'Built-in'
            })
        
        return pd.DataFrame(template_data)
    
    def count_tokens(self, text: str) -> int:
        """Count tokens in text (approximate if tiktoken not available)"""
        if tokenizer:
            return len(tokenizer.encode(text))
        else:
            # Rough approximation: 1 token ≈ 4 characters
            return len(text) // 4
    
    def analyze_prompt(self, prompt: str) -> Dict[str, Any]:
        """Analyze prompt characteristics"""
        return {
            'token_count': self.count_tokens(prompt),
            'character_count': len(prompt),
            'word_count': len(prompt.split()),
            'line_count': len(prompt.split('\n')),
            'readability_score': textstat.flesch_reading_ease(prompt),
            'has_instructions': bool(re.search(r'instructions?:|steps?:|task:', prompt.lower())),
            'has_examples': 'example' in prompt.lower(),
            'has_formatting': bool(re.search(r'answer:|response:|step \d+', prompt.lower()))
        }

# Initialize the prompt library
prompt_library = RAGPromptLibrary()
print("📚 RAG Prompt Library initialized!")
print(f"   Available templates: {len(prompt_library.templates)}")

In [None]:
# Explore available templates
templates_df = prompt_library.list_templates()
print("📋 AVAILABLE RAG PROMPT TEMPLATES")
print("=" * 60)
print(templates_df.to_string(index=False))

# Display template examples
print("\n\n🧪 TEMPLATE EXAMPLES")
print("=" * 40)

# Sample data for demonstrations
sample_context = """TensorFlow 2.14.0 introduces significant performance improvements for GPU training, 
with up to 35% faster training times for transformer models. The new version includes optimized 
kernels and improved memory management. PyTorch 2.1 also offers similar optimizations with 
torch.compile() providing up to 60% speedup for inference."""

sample_question = "What are the performance improvements in the latest deep learning frameworks?"

# Example 1: Basic RAG
print("\n1️⃣ Basic RAG Template:")
print("-" * 30)
basic_prompt = prompt_library.render_prompt(
    'basic_rag',
    context=sample_context,
    question=sample_question
)
print(basic_prompt[:200] + "...")
basic_analysis = prompt_library.analyze_prompt(basic_prompt)
print(f"   Tokens: {basic_analysis['token_count']}, Readability: {basic_analysis['readability_score']:.1f}")

# Example 2: Chain-of-Thought RAG
print("\n2️⃣ Chain-of-Thought RAG Template:")
print("-" * 35)
cot_prompt = prompt_library.render_prompt(
    'chain_of_thought',
    context=sample_context,
    question=sample_question
)
print(cot_prompt[:200] + "...")
cot_analysis = prompt_library.analyze_prompt(cot_prompt)
print(f"   Tokens: {cot_analysis['token_count']}, Has Structure: {cot_analysis['has_formatting']}")

# Example 3: Few-Shot RAG
print("\n3️⃣ Few-Shot RAG Template:")
print("-" * 25)
few_shot_examples = [
    {
        'context': 'Python 3.11 is 25% faster than Python 3.10.',
        'question': 'How much faster is Python 3.11?',
        'answer': 'Python 3.11 is 25% faster than Python 3.10 according to the provided information.'
    }
]

few_shot_prompt = prompt_library.render_prompt(
    'few_shot',
    examples=few_shot_examples,
    context=sample_context,
    question=sample_question
)
print(few_shot_prompt[:200] + "...")
few_shot_analysis = prompt_library.analyze_prompt(few_shot_prompt)
print(f"   Tokens: {few_shot_analysis['token_count']}, Has Examples: {few_shot_analysis['has_examples']}")

## 📏 Exercise 2: Context Management and Optimization

Let's implement advanced context management techniques for handling long documents and optimizing context utilization.

In [None]:
class ContextManager:
    """Advanced context management for RAG systems"""
    
    def __init__(self, max_tokens: int = 4000):
        self.max_tokens = max_tokens
        self.prompt_library = prompt_library
        self.truncation_strategies = {
            'head': self._truncate_head,
            'tail': self._truncate_tail,
            'middle': self._truncate_middle,
            'smart': self._truncate_smart,
            'summarize': self._truncate_with_summary
        }
    
    def estimate_context_usage(self, template_name: str, context: str, 
                             question: str, **kwargs) -> Dict[str, Any]:
        """Estimate token usage for a complete prompt"""
        
        # Render full prompt
        full_prompt = self.prompt_library.render_prompt(
            template_name,
            context=context,
            question=question,
            **kwargs
        )
        
        # Analyze token distribution
        template_info = self.prompt_library.get_template(template_name)
        template_without_vars = template_info['template']
        
        # Remove variable placeholders to estimate fixed template size
        for var in template_info['variables']:
            template_without_vars = re.sub(rf'\{{\{{\s*{var}.*?\}}\}}', '', template_without_vars)
        
        template_tokens = self.prompt_library.count_tokens(template_without_vars)
        context_tokens = self.prompt_library.count_tokens(context)
        question_tokens = self.prompt_library.count_tokens(question)
        other_tokens = self.prompt_library.count_tokens(full_prompt) - template_tokens - context_tokens - question_tokens
        
        return {
            'total_tokens': self.prompt_library.count_tokens(full_prompt),
            'template_tokens': template_tokens,
            'context_tokens': context_tokens,
            'question_tokens': question_tokens,
            'other_tokens': max(0, other_tokens),  # Account for any rendering overhead
            'context_percentage': (context_tokens / self.prompt_library.count_tokens(full_prompt)) * 100,
            'fits_in_limit': self.prompt_library.count_tokens(full_prompt) <= self.max_tokens,
            'tokens_over_limit': max(0, self.prompt_library.count_tokens(full_prompt) - self.max_tokens)
        }
    
    def _truncate_head(self, context: str, target_tokens: int) -> str:
        """Keep the beginning of context"""
        words = context.split()
        # Rough approximation: 1.3 words per token
        target_words = int(target_tokens * 1.3)
        
        if len(words) <= target_words:
            return context
        
        return ' '.join(words[:target_words]) + "\n\n[Context truncated...]"
    
    def _truncate_tail(self, context: str, target_tokens: int) -> str:
        """Keep the end of context"""
        words = context.split()
        target_words = int(target_tokens * 1.3)
        
        if len(words) <= target_words:
            return context
        
        return "[Context truncated...]\n\n" + ' '.join(words[-target_words:])
    
    def _truncate_middle(self, context: str, target_tokens: int) -> str:
        """Keep beginning and end, remove middle"""
        words = context.split()
        target_words = int(target_tokens * 1.3)
        
        if len(words) <= target_words:
            return context
        
        # Keep first and last portions
        keep_each_side = target_words // 2
        
        beginning = ' '.join(words[:keep_each_side])
        end = ' '.join(words[-keep_each_side:])
        
        return f"{beginning}\n\n[Middle section truncated...]\n\n{end}"
    
    def _truncate_smart(self, context: str, target_tokens: int) -> str:
        """Smart truncation preserving complete sentences and paragraphs"""
        # Split into sentences
        import nltk
        sentences = nltk.sent_tokenize(context)
        
        # Build truncated context sentence by sentence
        truncated = ""
        current_tokens = 0
        
        for sentence in sentences:
            sentence_tokens = self.prompt_library.count_tokens(sentence)
            
            if current_tokens + sentence_tokens <= target_tokens:
                truncated += sentence + " "
                current_tokens += sentence_tokens
            else:
                truncated += "\n\n[Remaining context truncated for length...]"
                break
        
        return truncated.strip()
    
    def _truncate_with_summary(self, context: str, target_tokens: int) -> str:
        """Truncate with LLM-generated summary (simulated)"""
        # In production, this would use an LLM to summarize
        # For demo, we'll create a simple extractive summary
        
        sentences = context.split('. ')
        if len(sentences) <= 3:
            return context
        
        # Keep first sentence, middle sentence, and last sentence
        key_sentences = [
            sentences[0],
            sentences[len(sentences)//2] if len(sentences) > 2 else "",
            sentences[-1]
        ]
        
        summary = '. '.join([s for s in key_sentences if s]).strip()
        
        # Add summary indicator
        return f"[SUMMARY] {summary}\n\n[Full context was summarized to fit length constraints]"
    
    def optimize_context(self, template_name: str, context: str, question: str,
                        strategy: str = 'smart', **kwargs) -> Dict[str, Any]:
        """Optimize context to fit within token limits"""
        
        # First, check if optimization is needed
        usage = self.estimate_context_usage(template_name, context, question, **kwargs)
        
        if usage['fits_in_limit']:
            return {
                'optimized_context': context,
                'optimization_applied': False,
                'original_tokens': usage['context_tokens'],
                'final_tokens': usage['context_tokens'],
                'tokens_saved': 0,
                'strategy_used': 'none',
                'full_prompt_tokens': usage['total_tokens']
            }
        
        # Calculate how much we need to reduce context
        non_context_tokens = usage['total_tokens'] - usage['context_tokens']
        target_context_tokens = self.max_tokens - non_context_tokens - 100  # Leave some buffer
        
        if strategy not in self.truncation_strategies:
            strategy = 'smart'
        
        # Apply truncation strategy
        optimized_context = self.truncation_strategies[strategy](context, target_context_tokens)
        
        # Verify the optimization worked
        new_usage = self.estimate_context_usage(template_name, optimized_context, question, **kwargs)
        
        return {
            'optimized_context': optimized_context,
            'optimization_applied': True,
            'original_tokens': usage['context_tokens'],
            'final_tokens': new_usage['context_tokens'],
            'tokens_saved': usage['context_tokens'] - new_usage['context_tokens'],
            'strategy_used': strategy,
            'full_prompt_tokens': new_usage['total_tokens'],
            'fits_after_optimization': new_usage['fits_in_limit']
        }
    
    def create_hierarchical_context(self, documents: List[Dict[str, Any]], 
                                  question: str, max_docs: int = 5) -> str:
        """Create hierarchical context with most relevant information first"""
        
        # Sort documents by relevance (assuming they have relevance scores)
        sorted_docs = sorted(documents, key=lambda x: x.get('score', 0), reverse=True)[:max_docs]
        
        hierarchical_context = ""
        current_tokens = 0
        
        for i, doc in enumerate(sorted_docs):
            doc_header = f"\n--- DOCUMENT {i+1} (Relevance: {doc.get('score', 0):.3f}) ---\n"
            doc_content = doc.get('content', doc.get('text', ''))
            
            # Add metadata if available
            if 'metadata' in doc:
                metadata = doc['metadata']
                source_info = f"Source: {metadata.get('source', 'Unknown')} | "
                source_info += f"Date: {metadata.get('date', 'Unknown')} | "
                source_info += f"Category: {metadata.get('category', 'Unknown')}\n"
                doc_header += source_info
            
            doc_section = doc_header + doc_content + "\n"
            section_tokens = self.prompt_library.count_tokens(doc_section)
            
            # Check if adding this document would exceed limits
            if current_tokens + section_tokens <= (self.max_tokens * 0.7):  # Reserve 30% for template
                hierarchical_context += doc_section
                current_tokens += section_tokens
            else:
                hierarchical_context += f"\n[{len(sorted_docs) - i} additional documents truncated due to length limits]\n"
                break
        
        return hierarchical_context
    
    def analyze_context_quality(self, context: str, question: str) -> Dict[str, Any]:
        """Analyze context quality and relevance"""
        question_words = set(question.lower().split())
        context_words = set(context.lower().split())
        
        # Calculate word overlap
        overlap = len(question_words.intersection(context_words))
        overlap_ratio = overlap / len(question_words) if question_words else 0
        
        # Analyze context structure
        sentences = context.count('.') + context.count('!') + context.count('?')
        paragraphs = len([p for p in context.split('\n\n') if p.strip()])
        
        # Calculate information density
        unique_words = len(set(context.lower().split()))
        total_words = len(context.split())
        lexical_diversity = unique_words / total_words if total_words > 0 else 0
        
        return {
            'word_overlap_count': overlap,
            'word_overlap_ratio': overlap_ratio,
            'sentence_count': sentences,
            'paragraph_count': paragraphs,
            'lexical_diversity': lexical_diversity,
            'context_length': len(context),
            'readability': textstat.flesch_reading_ease(context),
            'estimated_tokens': self.prompt_library.count_tokens(context),
            'quality_score': (overlap_ratio * 0.4 + lexical_diversity * 0.3 + 
                             min(1.0, sentences / 10) * 0.3)  # Normalized quality score
        }

# Initialize context manager
context_manager = ContextManager(max_tokens=4000)
print("📏 Context Manager initialized!")
print(f"   Max tokens: {context_manager.max_tokens}")

In [None]:
# Test context management with different scenarios

# Create long context to test truncation
long_context = """TensorFlow 2.14.0 represents a major milestone in deep learning frameworks, introducing 
significant performance optimizations that benefit both training and inference workflows. The new version 
includes GPU kernel optimizations that can reduce training time by up to 35% for transformer-based models, 
which have become increasingly important in natural language processing applications.

The performance improvements come from several key areas. First, the team has optimized memory management 
in the CUDA kernels, reducing memory fragmentation and improving cache utilization. This is particularly 
beneficial for large models that previously struggled with memory constraints. Second, the new version 
includes improved graph optimization that can eliminate redundant operations and fuse operations together 
for better hardware utilization.

PyTorch 2.1 has also made significant strides in performance optimization. The introduction of torch.compile() 
represents a fundamental shift in how PyTorch handles model compilation. This new feature can provide up to 
60% speedup for inference workloads by leveraging aggressive compiler optimizations and hardware-specific 
code generation.

The compile function works by tracing the execution of your model and then optimizing the resulting graph 
using a variety of techniques including operator fusion, constant folding, and dead code elimination. 
It also includes support for dynamic shapes, which has been a long-standing challenge in graph-based 
optimization systems.

Both frameworks have also improved their distributed training capabilities. TensorFlow now includes better 
support for parameter sharding and gradient accumulation, while PyTorch has enhanced its DistributedDataParallel 
implementation with more efficient communication patterns and better fault tolerance.

Looking forward, both frameworks are investing heavily in support for emerging hardware platforms including 
specialized AI accelerators and edge devices. This includes optimization for quantization, pruning, and 
other model compression techniques that are essential for deployment in resource-constrained environments.

The competition between these frameworks has ultimately benefited the entire machine learning community, 
driving innovation and performance improvements that make it easier to train and deploy sophisticated 
models across a wide range of applications and use cases."""

test_question = "What are the key performance improvements in recent deep learning frameworks?"

print("🧪 CONTEXT MANAGEMENT TESTING")
print("=" * 50)

# 1. Analyze original context
original_analysis = context_manager.estimate_context_usage('basic_rag', long_context, test_question)
print(f"\n📊 Original Context Analysis:")
print(f"   Total tokens: {original_analysis['total_tokens']}")
print(f"   Context tokens: {original_analysis['context_tokens']} ({original_analysis['context_percentage']:.1f}%)")
print(f"   Fits in limit: {original_analysis['fits_in_limit']}")
print(f"   Tokens over limit: {original_analysis['tokens_over_limit']}")

# 2. Test different truncation strategies
truncation_strategies = ['head', 'tail', 'middle', 'smart', 'summarize']

print(f"\n🔧 Testing Truncation Strategies:")
print("-" * 40)

strategy_results = []

for strategy in truncation_strategies:
    optimization = context_manager.optimize_context(
        'basic_rag', long_context, test_question, strategy=strategy
    )
    
    print(f"\n{strategy.upper()} Strategy:")
    print(f"   Original tokens: {optimization['original_tokens']}")
    print(f"   Final tokens: {optimization['final_tokens']}")
    print(f"   Tokens saved: {optimization['tokens_saved']}")
    print(f"   Fits after optimization: {optimization['fits_after_optimization']}")
    print(f"   Preview: {optimization['optimized_context'][:100]}...")
    
    strategy_results.append({
        'Strategy': strategy.title(),
        'Original Tokens': optimization['original_tokens'],
        'Final Tokens': optimization['final_tokens'],
        'Tokens Saved': optimization['tokens_saved'],
        'Reduction %': (optimization['tokens_saved'] / optimization['original_tokens']) * 100,
        'Fits Limit': optimization['fits_after_optimization']
    })

# 3. Compare strategies
strategy_df = pd.DataFrame(strategy_results)
print(f"\n📋 Strategy Comparison:")
print(strategy_df.to_string(index=False, float_format='%.1f'))

# 4. Context quality analysis
quality_analysis = context_manager.analyze_context_quality(long_context, test_question)
print(f"\n🎯 Context Quality Analysis:")
print(f"   Word overlap ratio: {quality_analysis['word_overlap_ratio']:.3f}")
print(f"   Lexical diversity: {quality_analysis['lexical_diversity']:.3f}")
print(f"   Quality score: {quality_analysis['quality_score']:.3f}")
print(f"   Readability: {quality_analysis['readability']:.1f}")
print(f"   Sentence count: {quality_analysis['sentence_count']}")

## 🎛️ Exercise 3: Model-Specific Prompt Optimization

Let's create model-specific prompt adaptations for different LLM providers and architectures.

In [None]:
class ModelSpecificOptimizer:
    """Model-specific prompt optimization for different LLM providers"""
    
    def __init__(self):
        self.model_configurations = {
            'gpt-4': {
                'provider': 'OpenAI',
                'context_window': 128000,
                'strengths': ['reasoning', 'code', 'analysis'],
                'prompt_style': 'detailed_instructions',
                'system_message_support': True,
                'preferred_format': 'structured',
                'citation_style': 'bracketed_numbers'
            },
            'gpt-3.5-turbo': {
                'provider': 'OpenAI',
                'context_window': 16385,
                'strengths': ['conversation', 'general_knowledge'],
                'prompt_style': 'concise_instructions',
                'system_message_support': True,
                'preferred_format': 'conversational',
                'citation_style': 'inline_references'
            },
            'claude-3-opus': {
                'provider': 'Anthropic',
                'context_window': 200000,
                'strengths': ['analysis', 'writing', 'safety'],
                'prompt_style': 'thinking_aloud',
                'system_message_support': True,
                'preferred_format': 'structured_thinking',
                'citation_style': 'detailed_sources'
            },
            'claude-3-sonnet': {
                'provider': 'Anthropic',
                'context_window': 200000,
                'strengths': ['balanced', 'efficient'],
                'prompt_style': 'efficient_instructions',
                'system_message_support': True,
                'preferred_format': 'structured',
                'citation_style': 'bracketed_numbers'
            },
            'gemini-pro': {
                'provider': 'Google',
                'context_window': 2000000,
                'strengths': ['large_context', 'multimodal'],
                'prompt_style': 'comprehensive_context',
                'system_message_support': False,
                'preferred_format': 'detailed_analysis',
                'citation_style': 'source_blocks'
            },
            'llama-2-70b': {
                'provider': 'Meta',
                'context_window': 4096,
                'strengths': ['open_source', 'customizable'],
                'prompt_style': 'explicit_instructions',
                'system_message_support': True,
                'preferred_format': 'step_by_step',
                'citation_style': 'simple_references'
            }
        }
        
        self.model_specific_templates = self._create_model_templates()
    
    def _create_model_templates(self) -> Dict[str, Dict[str, str]]:
        """Create model-specific prompt templates"""
        templates = {}
        
        # GPT-4 template (detailed, structured)
        templates['gpt-4'] = {
            'system': '''You are an expert research assistant with access to relevant documents. Your task is to provide accurate, well-reasoned answers based on the provided context.

Key requirements:
1. Base your answer primarily on the provided context
2. Use logical reasoning to synthesize information
3. Cite sources using [1], [2], etc. format
4. Acknowledge limitations when context is insufficient
5. Provide detailed, comprehensive responses''',
            
            'user': '''Context Sources:
{% for doc in context_sources %}
[{{ loop.index }}] {{ doc.title or "Document " + loop.index|string }}
{{ doc.content }}

{% endfor %}
Question: {{ question }}

Please provide a comprehensive answer based on the context above.'''
        }
        
        # GPT-3.5 template (concise, conversational)
        templates['gpt-3.5-turbo'] = {
            'system': '''You are a helpful assistant that answers questions using provided context. Be concise but informative.''',
            
            'user': '''Context:
{{ context }}

Question: {{ question }}

Answer based on the context above, and mention your sources when relevant.'''
        }
        
        # Claude-3-Opus template (thinking aloud)
        templates['claude-3-opus'] = {
            'system': '''You are Claude, an AI assistant created by Anthropic. You think carefully and provide well-reasoned responses based on given context.''',
            
            'user': '''I have some relevant documents that may help answer a question. Let me share them first:

<documents>
{% for doc in context_sources %}
<document id="{{ loop.index }}">
{{ doc.content }}
</document>
{% endfor %}
</documents>

Now, here's my question: {{ question }}

Please think through this step by step, referencing the relevant documents as needed.'''
        }
        
        # Gemini Pro template (comprehensive context)
        templates['gemini-pro'] = {
            'user': '''Given the following comprehensive context, please answer the question with detailed analysis:

=== CONTEXT START ===
{{ context }}
=== CONTEXT END ===

QUESTION: {{ question }}

ANALYSIS FRAMEWORK:
1. Key information from context
2. Logical connections and reasoning
3. Comprehensive answer with evidence
4. Source attribution

Please provide your detailed analysis:'''
        }
        
        # Llama-2 template (explicit instructions)
        templates['llama-2-70b'] = {
            'system': '''<s>[INST] <<SYS>>
You are a helpful assistant. Answer questions based on the provided context. Follow these steps:
1. Read the context carefully
2. Find information relevant to the question
3. Provide a clear answer
4. Mention your sources
<</SYS>>''',
            
            'user': '''Context:
{{ context }}

Question: {{ question }}

Step-by-step answer: [/INST]'''
        }
        
        return templates
    
    def get_model_info(self, model_name: str) -> Dict[str, Any]:
        """Get information about a specific model"""
        model_key = self._normalize_model_name(model_name)
        return self.model_configurations.get(model_key, {})
    
    def _normalize_model_name(self, model_name: str) -> str:
        """Normalize model name to match our configurations"""
        model_name = model_name.lower()
        
        # Handle common variations
        if 'gpt-4' in model_name:
            return 'gpt-4'
        elif 'gpt-3.5' in model_name:
            return 'gpt-3.5-turbo'
        elif 'claude-3-opus' in model_name:
            return 'claude-3-opus'
        elif 'claude-3-sonnet' in model_name:
            return 'claude-3-sonnet'
        elif 'gemini' in model_name:
            return 'gemini-pro'
        elif 'llama' in model_name:
            return 'llama-2-70b'
        
        return model_name
    
    def optimize_for_model(self, model_name: str, context: str, question: str, 
                          context_sources: List[Dict] = None) -> Dict[str, Any]:
        """Create model-optimized prompt"""
        model_key = self._normalize_model_name(model_name)
        model_config = self.model_configurations.get(model_key, {})
        
        if not model_config:
            return {
                'error': f'Model {model_name} not supported',
                'supported_models': list(self.model_configurations.keys())
            }
        
        # Get model-specific template
        templates = self.model_specific_templates.get(model_key, {})
        
        # Prepare context based on model preferences
        if context_sources:
            prepared_context = context_sources
        else:
            # Convert single context to sources format
            prepared_context = [{'content': context, 'title': 'Source Document'}]
        
        # Check context length against model limits
        estimated_tokens = len(context.split()) * 1.3  # Rough estimate
        context_limit = model_config.get('context_window', 4096)
        
        if estimated_tokens > context_limit * 0.8:  # Use 80% of limit
            # Context truncation needed
            target_tokens = int(context_limit * 0.6)  # Leave room for response
            truncated_context = self._truncate_for_model(context, target_tokens, model_config)
        else:
            truncated_context = context
        
        # Render template
        rendered_prompt = {}
        
        if 'system' in templates and model_config.get('system_message_support', True):
            # Render system message
            system_template = Template(templates['system'])
            rendered_prompt['system'] = system_template.render(
                context=truncated_context,
                question=question,
                context_sources=prepared_context
            )
        
        if 'user' in templates:
            # Render user message
            user_template = Template(templates['user'])
            rendered_prompt['user'] = user_template.render(
                context=truncated_context,
                question=question,
                context_sources=prepared_context
            )
        
        # Calculate final token usage
        total_prompt = '\n\n'.join(rendered_prompt.values())
        estimated_prompt_tokens = len(total_prompt.split()) * 1.3
        
        return {
            'model': model_name,
            'model_config': model_config,
            'rendered_prompt': rendered_prompt,
            'prompt_optimization': {
                'original_context_length': len(context),
                'truncated_context_length': len(truncated_context),
                'estimated_prompt_tokens': int(estimated_prompt_tokens),
                'context_window_utilization': estimated_prompt_tokens / context_limit,
                'truncation_applied': len(truncated_context) < len(context)
            },
            'recommendations': self._get_model_recommendations(model_config)
        }
    
    def _truncate_for_model(self, context: str, target_tokens: int, model_config: Dict) -> str:
        """Model-specific context truncation"""
        prompt_style = model_config.get('prompt_style', 'detailed_instructions')
        
        if prompt_style == 'comprehensive_context':
            # For models like Gemini with large context windows, preserve more
            return context  # Don't truncate for large context models
        elif prompt_style == 'thinking_aloud':
            # For Claude, preserve complete thoughts
            sentences = context.split('. ')
            truncated = ""
            for sentence in sentences:
                if len(truncated.split()) * 1.3 < target_tokens:
                    truncated += sentence + ". "
                else:
                    break
            return truncated.strip()
        else:
            # Standard truncation
            words = context.split()
            target_words = int(target_tokens / 1.3)
            return ' '.join(words[:target_words])
    
    def _get_model_recommendations(self, model_config: Dict) -> List[str]:
        """Get model-specific recommendations"""
        recommendations = []
        
        if model_config.get('context_window', 0) < 16000:
            recommendations.append("Consider context truncation for long documents")
        
        if 'reasoning' in model_config.get('strengths', []):
            recommendations.append("This model excels at complex reasoning tasks")
        
        if not model_config.get('system_message_support', True):
            recommendations.append("Include instructions in the user message (no system message support)")
        
        return recommendations
    
    def compare_models(self, context: str, question: str) -> pd.DataFrame:
        """Compare prompt optimization across different models"""
        comparison_data = []
        
        for model_name in self.model_configurations.keys():
            optimization = self.optimize_for_model(model_name, context, question)
            
            if 'error' not in optimization:
                prompt_opt = optimization['prompt_optimization']
                model_config = optimization['model_config']
                
                comparison_data.append({
                    'Model': model_name,
                    'Provider': model_config['provider'],
                    'Context Window': f"{model_config['context_window']:,}",
                    'Estimated Tokens': int(prompt_opt['estimated_prompt_tokens']),
                    'Window Utilization': f"{prompt_opt['context_window_utilization']:.1%}",
                    'Truncation Applied': prompt_opt['truncation_applied'],
                    'Strengths': ', '.join(model_config.get('strengths', [])),
                    'System Message': model_config.get('system_message_support', False)
                })
        
        return pd.DataFrame(comparison_data)

# Initialize model-specific optimizer
model_optimizer = ModelSpecificOptimizer()
print("🎛️ Model-Specific Optimizer initialized!")
print(f"   Supported models: {len(model_optimizer.model_configurations)}")

In [None]:
# Test model-specific optimization
test_context = """Machine learning model optimization has become increasingly important as models grow larger 
and more complex. Recent advances in deep learning frameworks have focused on improving training efficiency 
through better memory management, optimized computation graphs, and hardware-specific optimizations.

TensorFlow 2.14.0 introduces GPU kernel optimizations that can reduce training time by up to 35% for 
transformer models. The improvements come from better memory management and graph optimization techniques.

PyTorch 2.1 offers torch.compile() which provides up to 60% speedup for inference through compiler 
optimizations and hardware-specific code generation. This feature represents a significant advancement 
in making PyTorch models more efficient in production environments."""

test_question = "What are the recent performance improvements in machine learning frameworks?"

print("🎛️ MODEL-SPECIFIC OPTIMIZATION TESTING")
print("=" * 60)

# Test optimization for different models
test_models = ['gpt-4', 'claude-3-opus', 'gpt-3.5-turbo', 'gemini-pro']

for model in test_models:
    print(f"\n🤖 {model.upper()} Optimization:")
    print("-" * 30)
    
    optimization = model_optimizer.optimize_for_model(model, test_context, test_question)
    
    if 'error' in optimization:
        print(f"❌ Error: {optimization['error']}")
        continue
    
    model_config = optimization['model_config']
    prompt_opt = optimization['prompt_optimization']
    rendered = optimization['rendered_prompt']
    
    print(f"Provider: {model_config['provider']}")
    print(f"Context Window: {model_config['context_window']:,} tokens")
    print(f"Strengths: {', '.join(model_config['strengths'])}")
    print(f"Estimated Prompt Tokens: {prompt_opt['estimated_prompt_tokens']}")
    print(f"Window Utilization: {prompt_opt['context_window_utilization']:.1%}")
    
    if 'system' in rendered:
        print(f"\nSystem Message Preview:")
        print(f"   {rendered['system'][:100]}...")
    
    if 'user' in rendered:
        print(f"\nUser Message Preview:")
        print(f"   {rendered['user'][:100]}...")
    
    if optimization['recommendations']:
        print(f"\n💡 Recommendations:")
        for rec in optimization['recommendations']:
            print(f"   - {rec}")

# Compare all models
print(f"\n\n📊 MODEL COMPARISON TABLE")
print("=" * 40)

comparison_df = model_optimizer.compare_models(test_context, test_question)
print(comparison_df.to_string(index=False))

# Visualize model characteristics
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))

# 1. Context window sizes
context_windows = [int(w.replace(',', '')) for w in comparison_df['Context Window']]
models = comparison_df['Model']

bars1 = ax1.barh(range(len(models)), context_windows)
ax1.set_title('Context Window Sizes', fontweight='bold', fontsize=14)
ax1.set_xlabel('Context Window (tokens)')
ax1.set_yticks(range(len(models)))
ax1.set_yticklabels(models)
ax1.set_xscale('log')
ax1.grid(True, alpha=0.3)

# 2. Token utilization
utilization = [float(u.rstrip('%')) for u in comparison_df['Window Utilization']]
bars2 = ax2.bar(range(len(models)), utilization)
ax2.set_title('Context Window Utilization', fontweight='bold', fontsize=14)
ax2.set_ylabel('Utilization (%)')
ax2.set_xticks(range(len(models)))
ax2.set_xticklabels(models, rotation=45, ha='right')
ax2.grid(True, alpha=0.3)

# 3. Provider distribution
provider_counts = comparison_df['Provider'].value_counts()
ax3.pie(provider_counts.values, labels=provider_counts.index, autopct='%1.0f%%')
ax3.set_title('Models by Provider', fontweight='bold', fontsize=14)

# 4. System message support
system_support = comparison_df['System Message'].value_counts()
ax4.pie(system_support.values, labels=['Supported', 'Not Supported'], autopct='%1.0f%%')
ax4.set_title('System Message Support', fontweight='bold', fontsize=14)

plt.tight_layout()
plt.show()

print("\n💡 Model Selection Guidelines:")
print("- GPT-4: Best for complex reasoning and analysis")
print("- Claude-3-Opus: Excellent for detailed thinking and safety")
print("- GPT-3.5-Turbo: Cost-effective for general conversations")
print("- Gemini Pro: Best for very long contexts (2M+ tokens)")
print("- Choose based on your specific use case and context length needs")

## 🧪 Exercise 4: Advanced RAG Prompting Techniques

Let's implement advanced prompting techniques including few-shot learning, chain-of-thought, and attribution systems.

In [None]:
class AdvancedRAGPrompting:
    """Advanced RAG prompting techniques and patterns"""
    
    def __init__(self):
        self.prompt_library = prompt_library
        self.few_shot_examples = self._create_few_shot_examples()
        
    def _create_few_shot_examples(self) -> Dict[str, List[Dict]]:
        """Create curated few-shot examples for different domains"""
        return {
            'technical_qa': [
                {
                    'context': 'TensorFlow 2.10 introduced tf.data.Dataset.batch() optimizations that improve throughput by 15% for large datasets.',
                    'question': 'How much did TensorFlow 2.10 improve dataset performance?',
                    'answer': 'According to the provided context, TensorFlow 2.10 introduced optimizations to tf.data.Dataset.batch() that improved throughput by 15% specifically for large datasets.'
                },
                {
                    'context': 'The new API endpoint /v2/predict requires authentication headers and returns JSON responses with prediction scores and confidence intervals.',
                    'question': 'What does the v2 predict API return?',
                    'answer': 'The /v2/predict API endpoint returns JSON responses containing prediction scores and confidence intervals. Note that it requires authentication headers for access.'
                }
            ],
            'business_analysis': [
                {
                    'context': 'Q3 revenue increased 22% year-over-year, driven primarily by enterprise sales which grew 35% while consumer sales remained flat.',
                    'question': 'What drove the revenue growth in Q3?',
                    'answer': 'Q3 revenue growth of 22% year-over-year was primarily driven by strong enterprise sales, which grew 35%. Consumer sales remained flat during this period, highlighting the enterprise segment as the key growth driver.'
                }
            ],
            'research_synthesis': [
                {
                    'context': 'Study A found that method X improved accuracy by 12% on dataset Y. Study B reported 8% improvement using the same method on dataset Z.',
                    'question': 'How effective is method X across different datasets?',
                    'answer': 'Based on the provided studies, method X shows consistent effectiveness across different datasets: Study A reported 12% accuracy improvement on dataset Y, while Study B found 8% improvement on dataset Z. This suggests method X is generally effective, though performance may vary by dataset characteristics.'
                }
            ]
        }
    
    def create_chain_of_thought_prompt(self, context: str, question: str, 
                                     domain: str = 'general') -> str:
        """Create chain-of-thought prompt for complex reasoning"""
        
        domain_instructions = {
            'technical': 'Focus on technical accuracy, implementation details, and potential trade-offs.',
            'business': 'Consider business impact, stakeholder perspectives, and strategic implications.',
            'research': 'Evaluate evidence quality, methodology, and limitations of conclusions.',
            'general': 'Think through the problem systematically and consider multiple perspectives.'
        }
        
        instruction = domain_instructions.get(domain, domain_instructions['general'])
        
        cot_template = f"""You are an expert analyst. {instruction}

CONTEXT:
{context}

QUESTION: {question}

Please think through this step-by-step:

STEP 1 - INFORMATION EXTRACTION:
What are the key facts and data points from the context that relate to this question?

STEP 2 - ANALYSIS:
How do these facts connect to answer the question? What patterns or relationships do you see?

STEP 3 - REASONING:
What logical conclusions can you draw? What assumptions are you making?

STEP 4 - SYNTHESIS:
Based on your analysis, what is the complete answer to the question?

STEP 5 - CONFIDENCE AND LIMITATIONS:
How confident are you in this answer? What information would strengthen your conclusion?

Let me work through this systematically:"""
        
        return cot_template
    
    def create_few_shot_prompt(self, context: str, question: str, 
                              domain: str = 'technical_qa', num_examples: int = 2) -> str:
        """Create few-shot prompt with domain-specific examples"""
        
        if domain not in self.few_shot_examples:
            domain = 'technical_qa'
        
        examples = self.few_shot_examples[domain][:num_examples]
        
        few_shot_prompt = """You are an expert assistant that provides accurate answers based on context. Here are examples of how to respond:

"""
        
        # Add examples
        for i, example in enumerate(examples, 1):
            few_shot_prompt += f"""EXAMPLE {i}:
Context: {example['context']}
Question: {example['question']}
Answer: {example['answer']}

"""
        
        # Add current query
        few_shot_prompt += f"""Now answer this question following the same pattern:

Context: {context}
Question: {question}
Answer:"""
        
        return few_shot_prompt
    
    def create_attribution_prompt(self, context_sources: List[Dict[str, Any]], 
                                question: str, citation_style: str = 'academic') -> str:
        """Create prompt with strong attribution requirements"""
        
        citation_styles = {
            'academic': 'Use [Source 1], [Source 2] format and include page numbers if available',
            'journalistic': 'Use "according to [Source Name]" format',
            'legal': 'Use precise citations with document names and section numbers',
            'medical': 'Include study details and confidence levels with citations'
        }
        
        citation_instruction = citation_styles.get(citation_style, citation_styles['academic'])
        
        # Build sources section
        sources_text = ""
        for i, source in enumerate(context_sources, 1):
            source_header = f"SOURCE {i}"
            if 'title' in source:
                source_header += f" - {source['title']}"
            if 'date' in source:
                source_header += f" ({source['date']})"
            
            sources_text += f"{source_header}:\n{source['content']}\n\n"
        
        attribution_template = f"""You are a research assistant that provides meticulously cited answers. Accuracy and proper attribution are critical.

SOURCES:
{sources_text}
QUESTION: {question}

ATTRIBUTION REQUIREMENTS:
1. {citation_instruction}
2. Every factual claim must be attributed to a source
3. If sources conflict, acknowledge the disagreement
4. If information is not in sources, explicitly state this
5. Distinguish between direct quotes and paraphrases

FORMAT:
- Main Answer: [Your response with inline citations]
- Source Summary: [Brief note on which sources were most relevant]
- Confidence Level: [High/Medium/Low based on source quality and agreement]

ANSWER:"""
        
        return attribution_template
    
    def create_conversational_rag_prompt(self, context: str, question: str, 
                                       conversation_history: List[Dict[str, str]], 
                                       max_history: int = 5) -> str:
        """Create conversational RAG prompt with history"""
        
        # Limit conversation history
        recent_history = conversation_history[-max_history:] if len(conversation_history) > max_history else conversation_history
        
        # Build conversation history section
        history_text = ""
        for turn in recent_history:
            role = turn.get('role', 'user').upper()
            content = turn.get('content', '')
            history_text += f"{role}: {content}\n"
        
        conversational_template = f"""You are a helpful assistant in an ongoing conversation. Use the provided context to answer questions, but also consider the conversation history to maintain continuity and reference previous topics when relevant.

CONTEXT:
{context}

RECENT CONVERSATION:
{history_text}
CURRENT QUESTION: {question}

CONVERSATION GUIDELINES:
1. Reference previous parts of our conversation when relevant
2. Build on previously established information
3. Maintain a natural conversational flow
4. Use the context to provide accurate information
5. Ask follow-up questions if clarification would be helpful

RESPONSE:"""
        
        return conversational_template
    
    def create_multi_document_synthesis_prompt(self, document_sources: List[Dict[str, Any]], 
                                             question: str, synthesis_type: str = 'comparative') -> str:
        """Create prompt for synthesizing information across multiple documents"""
        
        synthesis_instructions = {
            'comparative': {
                'instruction': 'Compare and contrast the information across sources',
                'structure': ['SIMILARITIES', 'DIFFERENCES', 'UNIQUE INSIGHTS', 'SYNTHESIS']
            },
            'chronological': {
                'instruction': 'Organize information chronologically to show development over time',
                'structure': ['TIMELINE', 'KEY DEVELOPMENTS', 'CURRENT STATE', 'IMPLICATIONS']
            },
            'thematic': {
                'instruction': 'Organize information by themes and topics',
                'structure': ['MAIN THEMES', 'SUPPORTING EVIDENCE', 'CONNECTIONS', 'CONCLUSIONS']
            }
        }
        
        synthesis_config = synthesis_instructions.get(synthesis_type, synthesis_instructions['comparative'])
        
        # Build document sources section
        documents_text = ""
        for i, doc in enumerate(document_sources, 1):
            doc_header = f"DOCUMENT {i}"
            if 'title' in doc:
                doc_header += f": {doc['title']}"
            if 'metadata' in doc:
                metadata = doc['metadata']
                if 'date' in metadata:
                    doc_header += f" ({metadata['date']})"
                if 'source' in metadata:
                    doc_header += f" - {metadata['source']}"
            
            documents_text += f"{doc_header}:\n{doc['content']}\n\n"
        
        # Build structure section
        structure_text = ""
        for section in synthesis_config['structure']:
            structure_text += f"{section}:\n[Address this aspect based on the documents]\n\n"
        
        synthesis_template = f"""You are an expert analyst specializing in multi-document synthesis. {synthesis_config['instruction']}.

DOCUMENTS TO ANALYZE:
{documents_text}
RESEARCH QUESTION: {question}

ANALYSIS FRAMEWORK:
{structure_text}
SYNTHESIS REQUIREMENTS:
1. Draw connections between documents
2. Identify patterns and relationships
3. Note any conflicts or contradictions
4. Provide evidence-based conclusions
5. Cite specific documents for each claim

COMPREHENSIVE ANALYSIS:"""
        
        return synthesis_template
    
    def evaluate_prompt_effectiveness(self, prompt: str, expected_elements: List[str]) -> Dict[str, Any]:
        """Evaluate prompt quality and completeness"""
        
        prompt_lower = prompt.lower()
        
        # Check for essential RAG elements
        essential_elements = {
            'context_section': any(word in prompt_lower for word in ['context', 'sources', 'documents']),
            'question_section': 'question' in prompt_lower,
            'instructions': any(word in prompt_lower for word in ['instructions', 'requirements', 'guidelines']),
            'attribution_guidance': any(word in prompt_lower for word in ['cite', 'source', 'reference', 'attribution']),
            'format_specification': any(word in prompt_lower for word in ['format', 'structure', 'answer:']),
            'quality_controls': any(word in prompt_lower for word in ['if', 'when', 'ensure', 'avoid'])
        }
        
        # Check for expected elements
        expected_found = {element: element.lower() in prompt_lower for element in expected_elements}
        
        # Calculate scores
        essential_score = sum(essential_elements.values()) / len(essential_elements)
        expected_score = sum(expected_found.values()) / len(expected_found) if expected_found else 1.0
        
        # Analyze prompt characteristics
        characteristics = {
            'length': len(prompt),
            'word_count': len(prompt.split()),
            'sentence_count': len([s for s in prompt.split('.') if s.strip()]),
            'has_examples': 'example' in prompt_lower,
            'has_steps': bool(re.search(r'step \d+|\d+\.|first|second|third', prompt_lower)),
            'readability': textstat.flesch_reading_ease(prompt)
        }
        
        return {
            'essential_elements': essential_elements,
            'essential_score': essential_score,
            'expected_elements': expected_found,
            'expected_score': expected_score,
            'overall_score': (essential_score + expected_score) / 2,
            'characteristics': characteristics,
            'recommendations': self._get_prompt_recommendations(essential_elements, characteristics)
        }
    
    def _get_prompt_recommendations(self, essential_elements: Dict[str, bool], 
                                   characteristics: Dict[str, Any]) -> List[str]:
        """Generate recommendations for improving prompt quality"""
        recommendations = []
        
        if not essential_elements['attribution_guidance']:
            recommendations.append("Add citation requirements to ensure proper source attribution")
        
        if not essential_elements['quality_controls']:
            recommendations.append("Include fallback instructions for insufficient context")
        
        if characteristics['readability'] < 30:
            recommendations.append("Simplify language for better readability")
        
        if characteristics['word_count'] > 500:
            recommendations.append("Consider shortening prompt to reduce token usage")
        
        if not characteristics['has_steps'] and characteristics['word_count'] > 200:
            recommendations.append("Consider adding step-by-step structure for complex prompts")
        
        return recommendations

# Initialize advanced RAG prompting
advanced_prompting = AdvancedRAGPrompting()
print("🧪 Advanced RAG Prompting initialized!")
print(f"   Few-shot domains: {list(advanced_prompting.few_shot_examples.keys())}")

In [None]:
# Test advanced prompting techniques
test_context = """Recent research has shown significant improvements in large language model performance. 
Study A (Johnson et al., 2024) found that fine-tuning with human feedback improved accuracy by 23% 
on reasoning tasks. Study B (Chen et al., 2024) reported that constitutional AI methods reduced 
harmful outputs by 67% while maintaining helpfulness. Study C (Williams et al., 2024) demonstrated 
that multi-step reasoning approaches increased problem-solving success rates from 45% to 78%."""

test_question = "What recent improvements have been made to large language models?"

print("🧪 ADVANCED RAG PROMPTING TECHNIQUES")
print("=" * 60)

# 1. Chain-of-Thought Prompting
print("\n1️⃣ Chain-of-Thought RAG:")
print("-" * 30)
cot_prompt = advanced_prompting.create_chain_of_thought_prompt(test_context, test_question, 'research')
print(cot_prompt[:300] + "...")
cot_eval = advanced_prompting.evaluate_prompt_effectiveness(cot_prompt, ['step', 'analysis', 'reasoning'])
print(f"Quality Score: {cot_eval['overall_score']:.2f}")
print(f"Has Steps: {cot_eval['characteristics']['has_steps']}")

# 2. Few-Shot Prompting
print("\n2️⃣ Few-Shot RAG:")
print("-" * 20)
few_shot_prompt = advanced_prompting.create_few_shot_prompt(test_context, test_question, 'research_synthesis')
print(few_shot_prompt[:300] + "...")
fs_eval = advanced_prompting.evaluate_prompt_effectiveness(few_shot_prompt, ['example', 'pattern'])
print(f"Quality Score: {fs_eval['overall_score']:.2f}")
print(f"Has Examples: {fs_eval['characteristics']['has_examples']}")

# 3. Attribution-Focused Prompting
print("\n3️⃣ Attribution RAG:")
print("-" * 25)
context_sources = [
    {'title': 'Johnson et al. (2024)', 'content': 'Study A found that fine-tuning with human feedback improved accuracy by 23% on reasoning tasks.'},
    {'title': 'Chen et al. (2024)', 'content': 'Study B reported that constitutional AI methods reduced harmful outputs by 67% while maintaining helpfulness.'},
    {'title': 'Williams et al. (2024)', 'content': 'Study C demonstrated that multi-step reasoning approaches increased problem-solving success rates from 45% to 78%.'}
]

attribution_prompt = advanced_prompting.create_attribution_prompt(context_sources, test_question, 'academic')
print(attribution_prompt[:300] + "...")
attr_eval = advanced_prompting.evaluate_prompt_effectiveness(attribution_prompt, ['citation', 'source', 'attribution'])
print(f"Quality Score: {attr_eval['overall_score']:.2f}")
print(f"Attribution Guidance: {attr_eval['essential_elements']['attribution_guidance']}")

# 4. Multi-Document Synthesis
print("\n4️⃣ Multi-Document Synthesis:")
print("-" * 35)
document_sources = [
    {
        'title': 'Study A - Human Feedback',
        'content': 'Fine-tuning with human feedback improved accuracy by 23% on reasoning tasks.',
        'metadata': {'date': '2024-01', 'source': 'Johnson et al.'}
    },
    {
        'title': 'Study B - Constitutional AI',
        'content': 'Constitutional AI methods reduced harmful outputs by 67% while maintaining helpfulness.',
        'metadata': {'date': '2024-02', 'source': 'Chen et al.'}
    }
]

synthesis_prompt = advanced_prompting.create_multi_document_synthesis_prompt(
    document_sources, test_question, 'comparative'
)
print(synthesis_prompt[:300] + "...")
synth_eval = advanced_prompting.evaluate_prompt_effectiveness(synthesis_prompt, ['compare', 'synthesis', 'documents'])
print(f"Quality Score: {synth_eval['overall_score']:.2f}")

# 5. Conversational RAG
print("\n5️⃣ Conversational RAG:")
print("-" * 25)
conversation_history = [
    {'role': 'user', 'content': 'What are some recent AI developments?'},
    {'role': 'assistant', 'content': 'Recent developments include improvements in reasoning and safety.'},
    {'role': 'user', 'content': 'Can you be more specific about the reasoning improvements?'}
]

conversational_prompt = advanced_prompting.create_conversational_rag_prompt(
    test_context, test_question, conversation_history
)
print(conversational_prompt[:300] + "...")
conv_eval = advanced_prompting.evaluate_prompt_effectiveness(conversational_prompt, ['conversation', 'history', 'continuity'])
print(f"Quality Score: {conv_eval['overall_score']:.2f}")

# Compare all techniques
print("\n\n📊 TECHNIQUE COMPARISON")
print("=" * 40)

techniques_comparison = pd.DataFrame([
    {'Technique': 'Chain-of-Thought', 'Quality Score': cot_eval['overall_score'], 'Word Count': cot_eval['characteristics']['word_count'], 'Readability': cot_eval['characteristics']['readability']},
    {'Technique': 'Few-Shot', 'Quality Score': fs_eval['overall_score'], 'Word Count': fs_eval['characteristics']['word_count'], 'Readability': fs_eval['characteristics']['readability']},
    {'Technique': 'Attribution', 'Quality Score': attr_eval['overall_score'], 'Word Count': attr_eval['characteristics']['word_count'], 'Readability': attr_eval['characteristics']['readability']},
    {'Technique': 'Synthesis', 'Quality Score': synth_eval['overall_score'], 'Word Count': synth_eval['characteristics']['word_count'], 'Readability': synth_eval['characteristics']['readability']},
    {'Technique': 'Conversational', 'Quality Score': conv_eval['overall_score'], 'Word Count': conv_eval['characteristics']['word_count'], 'Readability': conv_eval['characteristics']['readability']}
])

print(techniques_comparison.to_string(index=False, float_format='%.2f'))

# Show recommendations for improvement
print("\n💡 IMPROVEMENT RECOMMENDATIONS:")
print("-" * 35)
for i, (name, eval_result) in enumerate([
    ('Chain-of-Thought', cot_eval),
    ('Few-Shot', fs_eval),
    ('Attribution', attr_eval)
], 1):
    if eval_result['recommendations']:
        print(f"\n{i}. {name}:")
        for rec in eval_result['recommendations']:
            print(f"   - {rec}")
    else:
        print(f"\n{i}. {name}: No specific recommendations - good quality prompt")

## 🎯 Key Takeaways

From this module, you should now understand:

### 🏗️ RAG Prompt Architecture:
1. **System Context**: Role definition and behavior instructions
2. **Retrieved Context**: Relevant documents with proper formatting
3. **Task Instructions**: Specific guidance for using context
4. **User Query**: The actual question or request
5. **Output Format**: Structure and citation requirements
6. **Quality Controls**: Fallback instructions and limitations

### 📚 Template Library Benefits:
- **Consistency**: Standardized prompts across use cases
- **Reusability**: Templated approach with variables
- **Optimization**: Domain-specific adaptations
- **Maintenance**: Centralized prompt management

### 📏 Context Management Strategies:

#### Truncation Methods:
- **Head Truncation**: Keep beginning (good for intros)
- **Tail Truncation**: Keep ending (good for conclusions)
- **Middle Truncation**: Keep beginning + end
- **Smart Truncation**: Preserve complete sentences/paragraphs
- **Summarization**: LLM-powered context compression

#### Context Optimization:
- **Hierarchical**: Most relevant information first
- **Quality Analysis**: Measure relevance and diversity
- **Token Management**: Stay within model limits
- **Metadata Integration**: Rich context with source info

### 🎛️ Model-Specific Adaptations:

| Model | Context Window | Best For | Prompt Style |
|-------|---------------|----------|-------------|
| **GPT-4** | 128K | Complex reasoning | Detailed instructions |
| **Claude-3-Opus** | 200K | Analysis & safety | Thinking aloud |
| **GPT-3.5-Turbo** | 16K | Conversations | Concise instructions |
| **Gemini Pro** | 2M+ | Large contexts | Comprehensive analysis |
| **Llama-2** | 4K | Open source | Explicit instructions |

### 🧪 Advanced Techniques Impact:

#### Chain-of-Thought RAG:
- **Impact**: +30% reasoning quality
- **Best for**: Complex analysis, multi-step problems
- **Structure**: Step-by-step thinking process

#### Few-Shot RAG:
- **Impact**: +20% format compliance
- **Best for**: Specific output formats, style consistency
- **Method**: Domain-specific examples

#### Attribution Systems:
- **Impact**: +40% source accuracy
- **Best for**: Fact-sensitive domains, research
- **Styles**: Academic, journalistic, legal, medical

#### Multi-Document Synthesis:
- **Types**: Comparative, chronological, thematic
- **Best for**: Research synthesis, analysis across sources
- **Structure**: Organized comparison and conclusions

### 📊 Prompt Quality Framework:

#### Essential Elements Checklist:
- ✅ Context section clearly defined
- ✅ Question/task specification
- ✅ Clear instructions and requirements
- ✅ Attribution guidance
- ✅ Format specification
- ✅ Quality controls and fallbacks

#### Optimization Metrics:
- **Token efficiency**: Context utilization vs limits
- **Readability**: Flesch reading ease score
- **Structure**: Step-by-step organization
- **Completeness**: All required elements present

### 🔄 Prompt Engineering Workflow:
1. **Define Use Case** → 2. **Select Template** → 3. **Adapt for Model** → 4. **Optimize Context** → 5. **Test & Iterate** → 6. **Deploy & Monitor**

### 💡 Production Guidelines:
- **Template Library**: Maintain centralized prompt templates
- **Version Control**: Track prompt changes and performance
- **A/B Testing**: Compare prompt variations systematically
- **Model Adaptation**: Customize prompts for specific LLMs
- **Context Management**: Implement smart truncation strategies
- **Quality Monitoring**: Track output quality and citation accuracy

## 🎯 Next Steps

In the next modules, we'll explore:
- **Module 11**: LLM integration and model selection strategies
- **Module 12**: Complete RAG system integration and deployment

Mastering prompt engineering is crucial for maximizing the effectiveness of your RAG system and ensuring high-quality, reliable outputs!

## 🤔 Discussion Questions

1. How would you adapt prompts for domain-specific applications (legal, medical, technical)?
2. What strategies would you use to handle conflicting information in retrieved context?
3. How would you balance prompt complexity with token efficiency?
4. What role should user feedback play in prompt optimization?
5. How would you ensure consistent citation quality across different models?

## 📝 Optional Exercises

1. **Domain Adaptation**: Create specialized prompt templates for your industry
2. **A/B Testing**: Design experiments to compare prompt variations
3. **Real LLM Integration**: Connect templates to actual LLM APIs
4. **Advanced Attribution**: Implement sophisticated citation systems
5. **Conversational Memory**: Build multi-turn RAG conversations with persistent context