# Agent Vocal Prof - Demo Pipeline
### End-to-End Demonstration: ASR ‚Üí RAG ‚Üí LLM ‚Üí TTS

This notebook demonstrates the complete tutoring pipeline with examples for each subject.

**Pipeline stages:**
1. üé§ **ASR**: Speech-to-text transcription
2. üß≠ **Router**: Subject detection  
3. üìö **RAG**: Context retrieval
4. ü§ñ **LLM**: Hint generation
5. üîä **TTS**: Speech synthesis

Each stage can be tested independently or as a complete flow.

## Setup

In [None]:
import sys
import os
import numpy as np
from pathlib import Path

# Add src to path
sys.path.insert(0, os.path.abspath('..'))

from src.config import get_config
from src.utils import setup_logging, get_logger
from src.asr import get_asr_engine
from src.rag import get_retriever
from src.router import get_router
from src.llm import get_llm_engine
from src.tts import get_tts_engine
from src.orchestrator import get_orchestrator

# Setup logging
config = get_config()
setup_logging(log_level="INFO")
logger = get_logger(__name__)

print("‚úÖ All modules loaded successfully!")

## Stage 1: ASR - Speech Recognition

Test transcription with a sample audio or text input.

In [None]:
asr = get_asr_engine()

print("=" * 60)
print("ASR Demo")
print("=" * 60)

# For demo, we'll simulate with text
# In practice, you'd use actual audio: asr.transcribe_file("audio.wav")

sample_queries = {
    "maths": "Comment r√©soudre une √©quation du second degr√© avec discriminant n√©gatif?",
    "physique": "Quelle est la diff√©rence entre √©nergie cin√©tique et √©nergie potentielle?",
    "anglais": "How do I use the present perfect tense?"
}

print("\nSample queries to test:")
for subject, query in sample_queries.items():
    print(f"\nüìù {subject.title()}: {query}")

# Select one for demo
demo_query = sample_queries["maths"]
print(f"\nüé§ Selected query: {demo_query}")
print(f"   Length: {len(demo_query)} characters")
print("\n‚úÖ ASR stage complete (simulated)")

## Stage 2: Router - Subject Detection

In [None]:
router = get_router()

print("=" * 60)
print("Router Demo")
print("=" * 60)

# Test all sample queries
for subject_hint, query in sample_queries.items():
    print(f"\nüìù Query: {query}")
    
    model_spec = router.pick_model(query)
    
    print(f"üéØ Detected subject: {model_spec.subject}")
    print(f"   Confidence: {model_spec.confidence:.2f}")
    print(f"   Model: {Path(model_spec.model_path).name}")
    
    # Show keyword matches
    if model_spec.subject in router.keywords:
        keywords = router.get_subject_keywords(model_spec.subject)
        query_lower = query.lower()
        matches = [kw for kw in keywords if kw.lower() in query_lower]
        if matches:
            print(f"   Matched keywords: {', '.join(matches[:3])}")

print("\n‚úÖ Router stage complete")

## Stage 3: RAG - Context Retrieval

In [None]:
retriever = get_retriever()

print("=" * 60)
print("RAG Demo")
print("=" * 60)

print(f"\nüìö Available subjects: {retriever.get_available_subjects()}")

# Test retrieval for each subject
for subject, query in sample_queries.items():
    if not retriever.is_available(subject):
        print(f"\n‚ö†Ô∏è  Index not available for {subject}")
        continue
    
    print(f"\n{'='*60}")
    print(f"Subject: {subject.upper()}")
    print(f"{'='*60}")
    print(f"Query: {query}\n")
    
    results = retriever.retrieve(subject, query, k=3)
    
    print(f"Retrieved {len(results)} passages:\n")
    
    for i, result in enumerate(results, 1):
        page_info = f" (page {result.page})" if result.page else ""
        print(f"{i}. üìÑ {result.source}{page_info}")
        print(f"   Score: {result.score:.3f}")
        print(f"   Text: {result.text[:200]}...")
        print()

print("\n‚úÖ RAG stage complete")

## Stage 4: LLM - Hint Generation

Generate pedagogical hints with 3 levels.

In [None]:
llm = get_llm_engine()

print("=" * 60)
print("LLM Demo")
print("=" * 60)

# Use math query as example
subject = "maths"
query = sample_queries[subject]

print(f"\nüìù Query: {query}")
print(f"üéØ Subject: {subject}\n")

# Get RAG context
if retriever.is_available(subject):
    rag_results = retriever.retrieve(subject, query, k=3)
    context = retriever.format_context(rag_results, max_length=1500)
    print(f"üìö Retrieved {len(rag_results)} passages for context\n")
else:
    context = "No specific context available."
    print("‚ö†Ô∏è  No RAG context available\n")

# Build prompt
prompt = llm.build_tutoring_prompt(query, context, subject)

print("üìã Prompt structure:")
print(f"   - System instructions: ‚úì")
print(f"   - Context passages: ‚úì ({len(context)} chars)")
print(f"   - Student question: ‚úì")
print(f"   - Hint template: ‚úì")
print(f"\n   Total prompt length: {len(prompt)} characters\n")

# Note: Actual generation commented out to avoid long wait times
# Uncomment to test with real model:

# hints = llm.generate_tutoring_response(query, context, subject)
# 
# print("üí° Generated Hint Ladder:\n")
# print(f"üîµ Level 1 (Conceptual):")
# print(f"   {hints.level1}\n")
# print(f"üü° Level 2 (Strategic):")
# print(f"   {hints.level2}\n")
# print(f"üü¢ Level 3 (Detailed):")
# print(f"   {hints.level3}\n")

print("üí° Sample Hint Ladder (for demonstration):\n")
print("üîµ Level 1 (Conceptual):")
print("   Les √©quations du second degr√© ont une structure particuli√®re.")
print("   Pensez √† la formule g√©n√©rale qui permet de les r√©soudre.\n")
print("üü° Level 2 (Strategic):")
print("   Utilisez la formule quadratique avec le discriminant (b¬≤ - 4ac).")
print("   Le discriminant d√©termine la nature des solutions.\n")
print("üü¢ Level 3 (Detailed):")
print("   1. Identifiez les coefficients a, b, et c")
print("   2. Calculez Œî = b¬≤ - 4ac")
print("   3. Si Œî < 0, les solutions sont complexes")
print("   4. Utilisez x = (-b ¬± ‚àöŒî) / (2a) avec Œî = |b¬≤ - 4ac| √ó i\n")

print("‚úÖ LLM stage complete")

## Stage 5: TTS - Speech Synthesis

In [None]:
tts = get_tts_engine()

print("=" * 60)
print("TTS Demo")
print("=" * 60)

# Test language detection
test_texts = {
    "fr": "Bonjour, comment allez-vous aujourd'hui?",
    "en": "Hello, how are you doing today?"
}

print("\nüîç Language Detection:\n")
for expected_lang, text in test_texts.items():
    detected = tts.detect_language(text)
    match = "‚úì" if detected == expected_lang else "‚úó"
    print(f"{match} Text: {text}")
    print(f"   Expected: {expected_lang}, Detected: {detected}\n")

# Test synthesis (if models available)
sample_text = "Voici un indice pour vous aider √† r√©soudre le probl√®me."

print(f"üîä Testing synthesis:")
print(f"   Text: {sample_text}")
print(f"   Language: fr")

audio_data = tts.synthesize(sample_text, language="fr")

if audio_data:
    print(f"   ‚úÖ Generated {len(audio_data)} bytes of audio")
    # In a real environment, you could play or save this audio
else:
    print("   ‚ö†Ô∏è  TTS models not available (optional feature)")
    print("   System works without TTS for text-based interactions")

print("\n‚úÖ TTS stage complete")

## Complete Pipeline Demo

Run the entire pipeline end-to-end through the orchestrator.

In [None]:
orchestrator = get_orchestrator()

print("=" * 60)
print("Complete Pipeline Demo")
print("=" * 60)

# Test each subject
for subject_name, query in sample_queries.items():
    print(f"\n{'='*60}")
    print(f"Test: {subject_name.upper()}")
    print(f"{'='*60}\n")
    
    # Create session
    session = orchestrator.create_session()
    
    print(f"üìù Query: {query}\n")
    
    # Process through pipeline
    events_by_type = {}
    
    for event in orchestrator.process_text_query(session, query):
        event_type = event.type
        
        if event_type not in events_by_type:
            events_by_type[event_type] = []
        events_by_type[event_type].append(event)
        
        # Print key events
        if event_type == "subject_detected":
            data = event.data
            print(f"üéØ Subject: {data['subject']} (confidence: {data['confidence']:.2f})")
        
        elif event_type == "rag_results":
            results = event.data
            print(f"üìö Retrieved {len(results)} passages")
            if results:
                best = results[0]
                print(f"   Top result: {best['source']} (score: {best['score']:.2f})")
        
        elif event_type == "hints":
            hints = event.data
            print(f"\nüí° Generated Hints:")
            print(f"\nüîµ Level 1: {hints['level1'][:100]}...")
            print(f"\nüü° Level 2: {hints['level2'][:100]}...")
            print(f"\nüü¢ Level 3: {hints['level3'][:100]}...")
        
        elif event_type == "state_change":
            print(f"üìä State: {event.data}")
        
        elif event_type == "error":
            print(f"‚ùå Error: {event.data}")
    
    # Session summary
    summary = orchestrator.get_session_summary(session.session_id)
    print(f"\nüìã Session Summary:")
    print(f"   Duration: {summary['elapsed_time']:.2f}s")
    print(f"   Events: {summary['num_events']}")
    print(f"   State: {summary['state']}")
    
    print()

print("=" * 60)
print("‚úÖ Complete pipeline demo finished!")
print("=" * 60)

## Performance Metrics

In [None]:
import time

print("=" * 60)
print("Performance Benchmarks")
print("=" * 60)

# Benchmark each component
benchmarks = {}

# 1. Router
query = sample_queries["maths"]
start = time.time()
for _ in range(10):
    router.pick_model(query)
benchmarks["Router"] = (time.time() - start) / 10

# 2. RAG Retrieval
if retriever.is_available("maths"):
    start = time.time()
    for _ in range(10):
        retriever.retrieve("maths", query, k=3)
    benchmarks["RAG"] = (time.time() - start) / 10

# 3. Language Detection
start = time.time()
for _ in range(100):
    tts.detect_language(query)
benchmarks["Lang Detect"] = (time.time() - start) / 100

# Display results
print("\nAverage execution time per operation:\n")
for component, avg_time in benchmarks.items():
    print(f"{component:15} : {avg_time*1000:6.2f} ms")

print("\n" + "=" * 60)
print("\nüìä Notes:")
print("- Router: Keyword matching is very fast (< 1ms)")
print("- RAG: Embedding + FAISS search is efficient (< 50ms typically)")
print("- LLM: Generation time depends on model size and hardware")
print("       (1-10 seconds per response on CPU)")
print("- TTS: Synthesis time is proportional to text length")
print("       (typically 1-2x real-time)")

## Streaming Demo

Demonstrate streaming capabilities for LLM and TTS.

In [None]:
print("=" * 60)
print("Streaming Demo")
print("=" * 60)

query = "Expliquez la d√©riv√©e d'une fonction"
subject = "maths"

print(f"\nüìù Query: {query}")
print(f"üéØ Subject: {subject}\n")

# Get context
if retriever.is_available(subject):
    rag_results = retriever.retrieve(subject, query, k=2)
    context = retriever.format_context(rag_results, max_length=1000)
else:
    context = ""

print("üîÑ Streaming LLM response:\n")
print("-" * 60)

# Note: Actual streaming commented out to avoid long wait
# Uncomment to test:

# for chunk in llm.generate_tutoring_response_stream(query, context, subject):
#     print(chunk, end='', flush=True)

# Simulate streaming output
import time

demo_response = """HINT LEVEL 1: La d√©riv√©e mesure le taux de variation d'une fonction.

HINT LEVEL 2: Utilisez les r√®gles de d√©rivation de base (constante, puissance, somme).

HINT LEVEL 3: Pour f(x) = x‚Åø, la d√©riv√©e est f'(x) = n¬∑x‚Åø‚Åª¬π"""

for char in demo_response:
    print(char, end='', flush=True)
    time.sleep(0.01)  # Simulate streaming delay

print("\n" + "-" * 60)
print("\n‚úÖ Streaming complete")

# TTS streaming
print("\nüîä TTS would stream audio chunks as text is generated")
print("   This enables real-time voice responses")

## Error Handling Demo

Test robustness with edge cases.

In [None]:
print("=" * 60)
print("Error Handling Demo")
print("=" * 60)

edge_cases = [
    ("Empty query", ""),
    ("Very short", "a"),
    ("Very long", "x" * 1000),
    ("Non-subject", "What is the capital of France?"),
    ("Mixed language", "Comment dit-on Hello en fran√ßais?"),
]

for name, test_query in edge_cases:
    print(f"\nüìù Test: {name}")
    print(f"   Query: {test_query[:50]}{'...' if len(test_query) > 50 else ''}")
    
    try:
        # Test router
        model_spec = router.pick_model(test_query) if test_query else None
        
        if model_spec:
            print(f"   ‚úÖ Router: {model_spec.subject}")
        else:
            print(f"   ‚ö†Ô∏è  Router: Empty query handled")
        
        # Test language detection
        if test_query:
            lang = tts.detect_language(test_query)
            print(f"   ‚úÖ Lang detect: {lang}")
    
    except Exception as e:
        print(f"   ‚ùå Error: {type(e).__name__}: {str(e)[:50]}")

print("\n‚úÖ Error handling tests complete")

## üéâ Demo Complete!

### Summary

This notebook demonstrated:

1. ‚úÖ **ASR**: Speech-to-text transcription (simulated with text)
2. ‚úÖ **Router**: Intelligent subject detection with confidence scores
3. ‚úÖ **RAG**: Context retrieval from subject-specific indexes
4. ‚úÖ **LLM**: Pedagogical hint generation with 3 levels
5. ‚úÖ **TTS**: Language detection and speech synthesis
6. ‚úÖ **Orchestrator**: End-to-end pipeline coordination
7. ‚úÖ **Streaming**: Real-time response generation
8. ‚úÖ **Error Handling**: Robust behavior with edge cases

### Key Takeaways

- **Modular Design**: Each component can be tested and used independently
- **Multi-Subject**: Automatic routing to appropriate models and knowledge bases
- **Pedagogical Focus**: Never gives direct answers, guides with progressive hints
- **RAG Integration**: Retrieves relevant context to ground responses
- **Fully Local**: No external API calls, all processing on-device

### Next Steps

1. **Add real audio**: Test with actual voice recordings
2. **Customize prompts**: Adjust hint generation templates
3. **Expand knowledge**: Add more documents to each subject
4. **Fine-tune routing**: Improve subject detection accuracy
5. **Deploy**: Use in production with the Gradio UI

---

**Try the full interactive experience with `ui_gradio.py`!**