# üîß Preparation and System Setup
## LLM+RAG vs LLM Standalone - Pre-Testing Verification

**Researcher:** aRJey  
**Platform:** NVIDIA Jetson Orin Nano 8GB  
**Date:** 2025

---

## üìã Purpose of This Notebook

Notebook ini digunakan untuk:
1. ‚úÖ Verify semua system components berfungsi
2. ‚úÖ Load dan verify knowledge base
3. ‚úÖ Test Ollama LLM server
4. ‚úÖ Test ASR (Whisper) dan TTS (Piper)
5. ‚úÖ Verify reference answers
6. ‚úÖ Quick system test

**Jalankan notebook ini SEBELUM testing session untuk memastikan semua siap!**

---
## 1. Import Libraries dan Setup

In [None]:
import sys
import os
import json
from pathlib import Path
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("="*60)
print("PREPARATION AND SETUP")
print("="*60)
print(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Python version: {sys.version.split()[0]}")
print(f"Working directory: {os.getcwd()}")
print("="*60)

---
## 2. Load Enhanced Assistant v2

In [None]:
# Import the main program
import enhanced_assistant_v2 as assistant

print("‚úì Enhanced Assistant v2.0 loaded successfully")
print(f"‚úì Configuration loaded")
print(f"\nConfiguration Summary:")
print(f"  Device: {assistant.Config.DEVICE}")
print(f"  Whisper Model: {assistant.Config.WHISPER_MODEL}")
print(f"  Embedding Model: {assistant.Config.EMBEDDING_MODEL}")
print(f"  LLM Model: {assistant.Config.OLLAMA_MODEL}")
print(f"  RAG Top-K: {assistant.Config.RAG_TOP_K}")
print(f"  RAG Chunk Size: {assistant.Config.RAG_CHUNK_SIZE}")

---
## 3. Verify Directory Structure

In [None]:
print("\nVerifying directory structure...\n")

# Check important directories
directories_to_check = [
    ('Project Directory', assistant.Config.PROJECT_DIR),
    ('Knowledge Base Root', assistant.Config.KB_DIR),
    ('Default Knowledge', assistant.Config.KB_DEFAULT_DIR),
    ('User Documents', assistant.Config.KB_USER_DIR),
    ('Results Directory', assistant.Config.RESULTS_DIR),
    ('Logs Directory', assistant.Config.LOG_DIR),
]

all_ok = True
for name, path in directories_to_check:
    exists = path.exists()
    status = "‚úì" if exists else "‚úó"
    print(f"{status} {name}: {path}")
    if not exists:
        all_ok = False
        print(f"  WARNING: Directory does not exist!")

if all_ok:
    print("\n‚úÖ All directories OK!")
else:
    print("\n‚ö†Ô∏è Some directories are missing. They will be created automatically.")

---
## 4. Check Knowledge Base Documents

In [None]:
print("\nChecking knowledge base documents...\n")

# Check default knowledge
print("üìÅ Default Knowledge Directory:")
default_files = list(assistant.Config.KB_DEFAULT_DIR.glob('*'))
if default_files:
    for f in default_files:
        size = f.stat().st_size / 1024  # KB
        print(f"  ‚úì {f.name} ({size:.1f} KB)")
else:
    print("  ‚ö†Ô∏è No default knowledge files (OK - will be created later)")

print("\nüìÅ User Documents Directory:")
user_files = list(assistant.Config.KB_USER_DIR.glob('*'))
if user_files:
    total_size = 0
    for f in user_files:
        size = f.stat().st_size / 1024  # KB
        total_size += size
        print(f"  ‚úì {f.name} ({size:.1f} KB)")
    print(f"\n  Total: {len(user_files)} files ({total_size:.1f} KB)")
else:
    print("  ‚ö†Ô∏è WARNING: No user documents found!")
    print(f"  Please add PDF files to: {assistant.Config.KB_USER_DIR}")
    print(f"  Example: cp ~/Downloads/*.pdf {assistant.Config.KB_USER_DIR}/")

---
## 5. Load and Verify Knowledge Base

In [None]:
print("\n" + "="*60)
print("LOADING KNOWLEDGE BASE")
print("="*60)

# Get knowledge base statistics
kb_stats = assistant.kb.get_stats()

print(f"\nKnowledge Base Statistics:")
print(f"  Total chunks: {kb_stats['total_chunks']}")
print(f"  Total characters: {kb_stats['total_chars']:,}")
print(f"  Number of sources: {kb_stats['num_sources']}")

if kb_stats['sources']:
    print(f"\n  Sources breakdown:")
    for source, count in kb_stats['sources'].items():
        print(f"    - {source}: {count} chunks")
else:
    print(f"\n  ‚ö†Ô∏è WARNING: No knowledge base loaded!")
    print(f"  System will work but RAG will have no documents to retrieve.")

# Evaluation
if kb_stats['total_chunks'] == 0:
    print("\n‚ùå Knowledge base is EMPTY!")
    print("   Action required: Add documents to user_documents folder")
elif kb_stats['total_chunks'] < 50:
    print("\n‚ö†Ô∏è Knowledge base is SMALL (< 50 chunks)")
    print("   Recommendation: Add more documents for better RAG performance")
elif kb_stats['total_chunks'] < 100:
    print("\n‚úì Knowledge base is ADEQUATE (50-100 chunks)")
    print("   Status: OK for testing, but more documents would be better")
else:
    print(f"\n‚úÖ Knowledge base is GOOD ({kb_stats['total_chunks']} chunks)")
    print("   Status: Ready for testing!")

---
## 6. Test Knowledge Base Search

In [None]:
if kb_stats['total_chunks'] > 0:
    print("\nTesting knowledge base search...\n")
    
    # Test queries
    test_queries = [
        "Arduino",
        "resistor",
        "sensor"
    ]
    
    for query in test_queries:
        results = assistant.kb.search(query, top_k=3)
        print(f"Query: '{query}'")
        print(f"  Found: {len(results)} documents")
        
        if results:
            for i, result in enumerate(results, 1):
                similarity = result['similarity']
                text_preview = result['text'][:60] + "..."
                print(f"    {i}. Similarity: {similarity:.3f} - {text_preview}")
        else:
            print(f"    No results found")
        print()
    
    print("‚úÖ Knowledge base search is working!")
else:
    print("‚ö†Ô∏è Skipping search test (no knowledge base loaded)")

---
## 7. Verify Ollama LLM Server

In [None]:
print("\n" + "="*60)
print("CHECKING OLLAMA LLM SERVER")
print("="*60)

if assistant.check_ollama_server():
    print("\n‚úÖ Ollama server is READY!")
    print(f"   Using model: {assistant.Config.OLLAMA_MODEL}")
else:
    print("\n‚ùå Ollama server is NOT running!")
    print("\nAction required:")
    print("  1. Open a new terminal")
    print("  2. Run: ollama serve")
    print("  3. Keep that terminal running")
    print("  4. Re-run this cell")
    print("\nOr check if model is available:")
    print(f"  ollama list | grep {assistant.Config.OLLAMA_MODEL}")
    print(f"  ollama pull {assistant.Config.OLLAMA_MODEL}  # if not found")

---
## 8. Test LLM (Quick Query)

In [None]:
print("\nTesting LLM with quick query...\n")

test_query = "What is a resistor? Answer in one sentence."

try:
    print(f"Query: {test_query}")
    response, llm_time = assistant.query_llm(test_query)
    
    print(f"\nResponse: {response}")
    print(f"Time: {llm_time:.2f}s")
    
    if response and not response.startswith("Error"):
        print("\n‚úÖ LLM is working correctly!")
    else:
        print("\n‚ö†Ô∏è LLM returned an error")
        
except Exception as e:
    print(f"\n‚ùå LLM test failed: {e}")
    print("Check if Ollama server is running.")

---
## 9. Test RAG System

In [None]:
if kb_stats['total_chunks'] > 0:
    print("\nTesting RAG system...\n")
    
    test_query = "What is PWM?"
    
    try:
        print(f"Query: {test_query}")
        response, llm_time, rag_info = assistant.query_with_rag(test_query)
        
        print(f"\nRAG Info:")
        print(f"  Documents retrieved: {rag_info['docs_retrieved']}")
        print(f"  Average similarity: {rag_info['avg_similarity']:.3f}")
        
        print(f"\nResponse: {response}")
        print(f"Time: {llm_time:.2f}s")
        
        if rag_info['docs_retrieved'] > 0:
            print("\n‚úÖ RAG system is working correctly!")
        else:
            print("\n‚ö†Ô∏è RAG retrieved no documents (similarity too low)")
            
    except Exception as e:
        print(f"\n‚ùå RAG test failed: {e}")
else:
    print("‚ö†Ô∏è Skipping RAG test (no knowledge base loaded)")

---
## 10. Verify Whisper ASR

In [None]:
print("\n" + "="*60)
print("CHECKING WHISPER ASR")
print("="*60)

try:
    # Check if Whisper model is loaded
    if hasattr(assistant, 'whisper_model'):
        print("\n‚úÖ Whisper ASR model loaded")
        print(f"   Model: {assistant.Config.WHISPER_MODEL}")
        print(f"   Device: {assistant.Config.DEVICE}")
    else:
        print("\n‚ùå Whisper model not loaded")
        
except Exception as e:
    print(f"\n‚ùå Error checking Whisper: {e}")

print("\nNote: Actual audio recording will be tested in voice testing session.")

---
## 11. Verify Piper TTS

In [None]:
print("\n" + "="*60)
print("CHECKING PIPER TTS")
print("="*60)

# Check Piper binary
piper_bin = Path(assistant.Config.PIPER_BIN)
piper_model = Path(assistant.Config.PIPER_MODEL)

print(f"\nPiper binary: {piper_bin}")
if piper_bin.exists():
    print("  ‚úì Found")
else:
    print("  ‚úó NOT FOUND")

print(f"\nPiper model: {piper_model}")
if piper_model.exists():
    size = piper_model.stat().st_size / (1024*1024)  # MB
    print(f"  ‚úì Found ({size:.1f} MB)")
else:
    print("  ‚úó NOT FOUND")

if piper_bin.exists() and piper_model.exists():
    print("\n‚úÖ Piper TTS is available!")
else:
    print("\n‚ùå Piper TTS not properly installed")
    print("   Refer to the tutorial for installation instructions")

---
## 12. Check Audio Files

In [None]:
print("\nChecking audio notification files...\n")

audio_files = [
    ('Beep Start', assistant.Config.BIP_SOUND),
    ('Beep End', assistant.Config.BIP2_SOUND)
]

all_found = True
for name, path in audio_files:
    if path.exists():
        size = path.stat().st_size / 1024  # KB
        print(f"‚úì {name}: {path.name} ({size:.1f} KB)")
    else:
        print(f"‚úó {name}: NOT FOUND at {path}")
        all_found = False

if all_found:
    print("\n‚úÖ All audio files present")
else:
    print("\n‚ö†Ô∏è Some audio files missing (not critical, but nice to have)")

---
## 13. Load and Verify Reference Answers

In [None]:
print("\n" + "="*60)
print("CHECKING REFERENCE ANSWERS")
print("="*60)

ref_file = Path('reference_answers.json')

if ref_file.exists():
    with open(ref_file, 'r') as f:
        ref_data = json.load(f)
    
    ref_answers = ref_data['reference_answers']
    
    print(f"\n‚úì Reference answers file found")
    print(f"\nCategories and question counts:")
    
    total_questions = 0
    for category, qa_pairs in ref_answers.items():
        count = len(qa_pairs)
        total_questions += count
        print(f"  {category}: {count} questions")
    
    print(f"\nTotal questions: {total_questions}")
    
    # Show sample
    print(f"\nSample question and answer:")
    first_category = list(ref_answers.keys())[0]
    first_qa = list(ref_answers[first_category].items())[0]
    print(f"  Q: {first_qa[0]}")
    print(f"  A: {first_qa[1]}")
    
    print("\n‚úÖ Reference answers loaded successfully!")
    
else:
    print(f"\n‚ùå Reference answers file NOT FOUND: {ref_file}")
    print("   This file is required for evaluation!")

---
## 14. Complete System Test

In [None]:
print("\n" + "="*60)
print("COMPLETE SYSTEM TEST")
print("="*60)

print("\nTesting both Non-RAG and RAG modes...\n")

test_question = "What is the function of a capacitor?"

# Test Non-RAG
print("--- Non-RAG Mode ---")
result_nonrag = assistant.process_single_query(test_question, use_rag=False)
print(f"Query: {test_question}")
print(f"Response: {result_nonrag['response'][:150]}...")
print(f"Time: {result_nonrag['total_time']:.2f}s")
print(f"Success: {result_nonrag['success']}")

print("\n--- RAG Mode ---")
if kb_stats['total_chunks'] > 0:
    result_rag = assistant.process_single_query(test_question, use_rag=True)
    print(f"Query: {test_question}")
    print(f"Response: {result_rag['response'][:150]}...")
    print(f"Time: {result_rag['total_time']:.2f}s")
    print(f"Success: {result_rag['success']}")
    if 'rag_info' in result_rag:
        print(f"Docs retrieved: {result_rag['rag_info']['docs_retrieved']}")
else:
    print("Skipped (no knowledge base)")

print("\n" + "="*60)
print("‚úÖ SYSTEM TEST COMPLETE")
print("="*60)

---
## 15. Final System Report

In [None]:
print("\n" + "#"*60)
print("#" + " "*20 + "SYSTEM READINESS REPORT" + " "*17 + "#")
print("#"*60)

# Collect status
checks = {
    'Directories': all_ok,
    'Knowledge Base': kb_stats['total_chunks'] >= 50,
    'Ollama Server': assistant.check_ollama_server(),
    'Whisper ASR': hasattr(assistant, 'whisper_model'),
    'Piper TTS': piper_bin.exists() and piper_model.exists(),
    'Reference Answers': ref_file.exists(),
}

print("\nComponent Status:")
for component, status in checks.items():
    symbol = "‚úÖ" if status else "‚ùå"
    status_text = "READY" if status else "NOT READY"
    print(f"  {symbol} {component}: {status_text}")

# Overall readiness
critical_checks = ['Ollama Server', 'Whisper ASR', 'Reference Answers']
critical_ok = all(checks[c] for c in critical_checks)

print("\n" + "-"*60)

if all(checks.values()):
    print("\nüéâ SYSTEM IS FULLY READY FOR TESTING! üéâ")
    print("\nYou can proceed to: voice_testing_session.ipynb")
elif critical_ok:
    print("\n‚úÖ SYSTEM IS READY (with minor issues)")
    print("\nCritical components are OK. Non-critical issues:")
    for comp, status in checks.items():
        if not status and comp not in critical_checks:
            print(f"  - {comp}")
    print("\nYou can proceed, but consider fixing the issues above.")
else:
    print("\n‚ùå SYSTEM IS NOT READY")
    print("\nCritical issues that must be fixed:")
    for comp in critical_checks:
        if not checks[comp]:
            print(f"  - {comp}")
    print("\nPlease fix these issues before proceeding to testing.")

print("\n" + "#"*60)
print(f"\nReport generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

---
## üìù Notes and Recommendations

### If Knowledge Base is Empty:
```bash
# Add documents to user_documents folder
cd ~/voice_assistant_research
cp ~/Downloads/*.pdf knowledge_base/user_documents/
```

### If Ollama is Not Running:
```bash
# Terminal 1 (keep running)
ollama serve
```

### Knowledge Base Recommendations:
- **Minimum:** 3-5 PDF files (50-100 chunks)
- **Ideal:** 5-10 PDF files (100-300 chunks)
- **Topics:** Arduino, sensors, components, protocols, microcontrollers

---

## ‚úÖ Next Steps

After this notebook shows "SYSTEM IS READY":

1. **Open:** `voice_testing_session.ipynb`
2. **Run:** Complete guided testing session
3. **Duration:** 20-30 minutes for full testing
4. **Participant:** Be ready to speak clearly into microphone

---

**Good luck with your testing! üöÄ**