# üéôÔ∏è Voice Testing Session - LLM+RAG vs LLM Standalone
## Jetson Orin Nano 8GB - Guided Testing Protocol

**Researcher:** aRJey  
**Platform:** NVIDIA Jetson Orin Nano 8GB  
**Date:** 2025

---

## üìã Testing Protocol

### Participant Information:
- **Number of participants:** 1 person
- **Questions:** 20 questions (4 categories)
- **Repetitions:** 5 times per question
- **Modes:** Non-RAG and RAG (tested separately)

### Total Tests:
- 20 questions √ó 5 repetitions √ó 2 modes = **200 total tests**
- Estimated time: **20-30 minutes**

### Important Notes:
1. **Speak clearly** into the microphone
2. **Wait for beep** before speaking
3. **Speak naturally** - don't read robotically
4. **Take breaks** - 2-3 minutes after every 10 questions
5. **Stay consistent** - try to speak at similar pace/volume

---
## 1. Setup and Initialization

In [None]:
import sys
import os
import json
import time
from pathlib import Path
from datetime import datetime
from IPython.display import display, HTML, clear_output
import warnings
warnings.filterwarnings('ignore')

# Add project directory to path
PROJECT_DIR = Path.cwd()
sys.path.insert(0, str(PROJECT_DIR))

print("‚úì Initialization complete")
print(f"‚úì Project directory: {PROJECT_DIR}")
print(f"‚úì Session started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

In [None]:
# Import the enhanced assistant
import enhanced_assistant_v2 as assistant

print("‚úì Enhanced Assistant v2 loaded")
print(f"‚úì Knowledge base chunks: {len(assistant.kb.documents)}")

# Check knowledge base stats
kb_stats = assistant.kb.get_stats()
print("\nKnowledge Base Statistics:")
print(f"  Total chunks: {kb_stats['total_chunks']}")
print(f"  Total characters: {kb_stats['total_chars']:,}")
print(f"  Number of sources: {kb_stats['num_sources']}")
if kb_stats['sources']:
    print("  Sources:")
    for source, count in kb_stats['sources'].items():
        print(f"    - {source}: {count} chunks")
else:
    print("  ‚ö†Ô∏è WARNING: No knowledge base loaded!")
    print(f"  Please add documents to: {assistant.Config.KB_USER_DIR}")

In [None]:
# Verify Ollama server
print("Checking system components...\n")

if assistant.check_ollama_server():
    print("\n‚úÖ Ollama server is ready!")
else:
    print("\n‚ùå ERROR: Ollama server not running!")
    print("Please start it in another terminal with: ollama serve")
    print("Then re-run this cell.")

---
## 2. Load Test Questions

In [None]:
# Load questions from reference_answers.json
ref_answers_file = PROJECT_DIR / 'reference_answers.json'

with open(ref_answers_file, 'r') as f:
    ref_data = json.load(f)

# Extract questions
all_questions = []
question_categories = ref_data['reference_answers']

for category, qa_pairs in question_categories.items():
    for question, answer in qa_pairs.items():
        all_questions.append({
            'category': category,
            'question': question,
            'reference_answer': answer
        })

print(f"‚úì Loaded {len(all_questions)} questions")
print("\nQuestions by category:")
for category in question_categories.keys():
    count = sum(1 for q in all_questions if q['category'] == category)
    print(f"  {category}: {count} questions")

In [None]:
# Display first 3 questions as preview
print("\nüìù Preview of questions:\n")
for i, q in enumerate(all_questions[:3], 1):
    print(f"{i}. [{q['category']}]")
    print(f"   Q: {q['question']}")
    print(f"   Reference: {q['reference_answer']}")
    print()

---
## 3. Pre-Test System Check

In [None]:
# Quick test to verify everything works
print("Running quick system test...\n")

test_question = "What is a resistor?"

# Test Non-RAG
print("Testing Non-RAG mode...")
result_nonrag = assistant.process_single_query(test_question, use_rag=False)
print(f"Response: {result_nonrag['response'][:100]}...")
print(f"Time: {result_nonrag['total_time']:.2f}s")

print("\nTesting RAG mode...")
result_rag = assistant.process_single_query(test_question, use_rag=True)
print(f"Response: {result_rag['response'][:100]}...")
print(f"Time: {result_rag['total_time']:.2f}s")
if 'rag_info' in result_rag:
    print(f"Docs retrieved: {result_rag['rag_info']['docs_retrieved']}")

print("\n‚úÖ System test complete!")

---
## 4. Main Testing Session

### ‚ö†Ô∏è IMPORTANT INSTRUCTIONS FOR PARTICIPANT:

1. **Position yourself** comfortably in front of the microphone
2. **Read the question** displayed on screen
3. **Wait for the beep** (system recording signal)
4. **Speak the question clearly** into the microphone
5. **Listen to the response** from the speaker
6. **Repeat 5 times** for the same question
7. **Move to next question**

### Testing Flow:
```
Questions 1-10: Non-RAG mode (LLM Standalone)
  ‚Üí BREAK (2-3 minutes)
Questions 11-20: Non-RAG mode continued
  ‚Üí BREAK (2-3 minutes)
Questions 1-10: RAG mode (LLM + RAG)
  ‚Üí BREAK (2-3 minutes)
Questions 11-20: RAG mode continued
```

In [None]:
# Configuration for testing session
NUM_REPETITIONS = 5
QUESTIONS_BEFORE_BREAK = 10

print("="*60)
print("TESTING SESSION CONFIGURATION")
print("="*60)
print(f"Total questions: {len(all_questions)}")
print(f"Repetitions per question: {NUM_REPETITIONS}")
print(f"Modes to test: 2 (Non-RAG, RAG)")
print(f"Total tests: {len(all_questions) * NUM_REPETITIONS * 2}")
print(f"Break after every: {QUESTIONS_BEFORE_BREAK} questions")
print("="*60)
print("\nReady to start?")
print("Make sure:")
print("  ‚úì Microphone is connected and working")
print("  ‚úì Speakers are connected and volume is good")
print("  ‚úì You are in a quiet environment")
print("  ‚úì Ollama server is running")
print("\nRun the next cell to start the testing session.")

In [None]:
# MAIN TESTING FUNCTION
# This will run the complete guided testing session

def run_guided_testing_session():
    """
    Run complete guided voice testing session
    """
    all_results = []
    session_start = time.time()
    
    # Prepare questions list
    questions_list = [q['question'] for q in all_questions]
    
    print("\n" + "="*60)
    print(" " * 15 + "STARTING TESTING SESSION")
    print("="*60)
    
    # Test both modes
    for mode_idx, (mode_name, use_rag) in enumerate([('Non-RAG', False), ('RAG', True)], 1):
        print("\n" + "#"*60)
        print(f"#  MODE {mode_idx}/2: {mode_name}")
        print("#"*60)
        
        # Test each question
        for q_idx, question_data in enumerate(all_questions, 1):
            question = question_data['question']
            category = question_data['category']
            
            print("\n" + "="*60)
            print(f"Question {q_idx}/{len(all_questions)} [{category}]")
            print("="*60)
            print(f"\nüìù QUESTION TO SPEAK:")
            print(f"\n   {question}")
            print("\n" + "-"*60)
            
            # Wait for participant to be ready
            input(f"\nPress ENTER when ready to record (Repetition will be done {NUM_REPETITIONS} times)...")
            
            # Do repetitions
            for rep in range(NUM_REPETITIONS):
                print(f"\nüé§ Repetition {rep+1}/{NUM_REPETITIONS} - Get ready...")
                time.sleep(1)
                
                try:
                    # Record and process
                    import tempfile
                    
                    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmpfile:
                        # Record
                        print("   Recording... (speak now)")
                        rec_time = assistant.record_audio(tmpfile.name)
                        
                        # Transcribe
                        transcription, trans_time = assistant.transcribe_audio(tmpfile.name)
                        
                        if not transcription or len(transcription.strip()) < 2:
                            print("   ‚ö†Ô∏è No speech detected. Please retry...")
                            rep -= 1
                            continue
                        
                        print(f"   ‚úì Heard: '{transcription}'")
                        
                        # Process query
                        result = assistant.process_single_query(transcription, use_rag=use_rag)
                        
                        # Add metadata
                        result.update({
                            'question_id': f"q{q_idx}",
                            'question_original': question,
                            'category': category,
                            'repetition': rep + 1,
                            'mode': mode_name,
                            'transcription': transcription,
                            'recording_time': rec_time,
                            'transcription_time': trans_time,
                            'reference_answer': question_data['reference_answer']
                        })
                        
                        print(f"   üí¨ Response: {result['response'][:80]}...")
                        
                        # Speak response
                        if result['success']:
                            tts_time = assistant.text_to_speech(result['response'])
                            result['tts_time'] = tts_time
                        
                        all_results.append(result)
                        
                        # Small delay
                        time.sleep(1)
                        
                except Exception as e:
                    print(f"   ‚ùå Error: {e}")
                    continue
            
            # Break check
            if q_idx % QUESTIONS_BEFORE_BREAK == 0 and q_idx < len(all_questions):
                print("\n" + "="*60)
                print("‚è∏Ô∏è  BREAK TIME - Rest for 2-3 minutes")
                print("="*60)
                input("Press ENTER when ready to continue...")
        
        # Break between modes
        if mode_idx == 1:
            print("\n" + "="*60)
            print("‚úÖ Non-RAG mode completed!")
            print("‚è∏Ô∏è  BREAK TIME - Rest for 2-3 minutes before RAG mode")
            print("="*60)
            input("Press ENTER when ready to continue with RAG mode...")
    
    # Session complete
    session_time = time.time() - session_start
    
    print("\n" + "="*60)
    print(" " * 15 + "üéâ SESSION COMPLETE! üéâ")
    print("="*60)
    print(f"Total time: {session_time/60:.1f} minutes")
    print(f"Total tests: {len(all_results)}")
    print(f"Success rate: {sum(1 for r in all_results if r['success'])/len(all_results)*100:.1f}%")
    
    # Save results
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    results_file = assistant.Config.RESULTS_DIR / f'voice_test_complete_{timestamp}.json'
    
    with open(results_file, 'w') as f:
        json.dump({
            'session_info': {
                'start_time': datetime.now().isoformat(),
                'duration_minutes': session_time / 60,
                'total_tests': len(all_results),
                'num_questions': len(all_questions),
                'num_repetitions': NUM_REPETITIONS
            },
            'results': all_results
        }, f, indent=2)
    
    print(f"\n‚úì Results saved to: {results_file}")
    print("\nYou can now proceed to the analysis notebook!")
    
    return all_results

print("‚úì Testing function ready")
print("\nTo start the testing session, run the next cell.")

In [None]:
# START THE TESTING SESSION
# This will take approximately 20-30 minutes

print("üöÄ Starting guided testing session...\n")
print("‚ö†Ô∏è  REMINDER:")
print("  - Speak clearly into the microphone")
print("  - Wait for beep before speaking")
print("  - Take breaks when prompted")
print("  - Stay consistent in your speaking\n")

input("Press ENTER to begin...")

# Run the session
results = run_guided_testing_session()

---
## 5. Quick Results Preview

In [None]:
# Quick statistics from the session
if 'results' in locals():
    import pandas as pd
    
    df = pd.DataFrame(results)
    
    print("\n" + "="*60)
    print("QUICK SESSION STATISTICS")
    print("="*60)
    
    print("\n1. Overall:")
    print(f"   Total tests: {len(df)}")
    print(f"   Success rate: {df['success'].mean()*100:.1f}%")
    print(f"   Avg response time: {df['total_time'].mean():.2f}s")
    
    print("\n2. By Mode:")
    mode_stats = df.groupby('mode').agg({
        'total_time': ['mean', 'std'],
        'success': 'mean'
    }).round(3)
    print(mode_stats)
    
    print("\n3. By Category:")
    cat_stats = df.groupby('category')['total_time'].agg(['mean', 'count']).round(3)
    print(cat_stats)
    
    print("\n‚úì For detailed analysis, open the analysis notebook")
else:
    print("No results available. Please run the testing session first.")

---
## ‚úÖ Session Complete!

### Next Steps:

1. **Results are saved** in `experiment_results/voice_test_complete_[timestamp].json`

2. **Open the analysis notebook** for detailed analysis:
   - Statistical comparisons
   - ROUGE/BLEU score calculations
   - Visualization generation
   - Report creation

3. **Backup your results** (recommended):
   ```bash
   tar -czf results_backup_$(date +%Y%m%d).tar.gz experiment_results/
   ```

### Thank you for participating! üôè