# MCP-Powered Voice Agent with DeepSeek

## 🎯 Project Overview

This notebook implements a comprehensive voice-enabled AI agent that integrates:
- **MCP (Model Context Protocol)**: Structured AI model interactions with context management
- **DeepSeek Language Model**: Advanced natural language understanding and generation
- **Speech Recognition**: Real-time voice input processing
- **Text-to-Speech**: Voice output for AI responses
- **Interactive Interface**: User-friendly Jupyter widget controls

## 🏗️ System Architecture

```
Voice Input → Speech Recognition → Text Processing → DeepSeek Model → MCP Protocol → Response Generation → Text-to-Speech → Voice Output
```

### Key Components:
1. **Audio Processing Pipeline**: Handles microphone input and speaker output
2. **MCP Protocol Layer**: Manages conversation context and structured communication
3. **DeepSeek Integration**: Provides intelligent language processing
4. **Interactive Controls**: Jupyter widgets for user interaction
5. **Health Monitoring**: System diagnostics and performance tracking

## 📋 Prerequisites

- Python 3.8+ environment
- Microphone access for voice input
- Audio output capabilities (speakers/headphones)
- Internet connection for model downloads and API access
- DeepSeek API key (will be configured in setup)


# 1. Environment Setup and Dependencies

This section installs and configures all necessary dependencies for the voice agent.
Each dependency serves a specific purpose in our voice processing pipeline.

## 1.1 Interactive Widget Setup

Jupyter widgets provide the interactive user interface for our voice agent.
These widgets enable buttons, progress bars, and real-time status displays.

In [None]:
# Install and configure Jupyter interactive widgets
# 
# ipywidgets: Core library for interactive HTML widgets in Jupyter
# - Provides buttons, sliders, text inputs, and output displays
# - Essential for creating user-friendly voice agent controls
# - Enables real-time status updates and interaction feedback

!pip install --upgrade ipywidgets

# Enable widget extensions in Jupyter environment
# Note: This command may show 'not found' in some environments, but widgets will still function
# The extension enables proper widget rendering and interaction handling
!jupyter nbextension enable --py widgetsnbextension

print("✅ Jupyter widgets setup completed")
print("📝 Note: 'jupyter-nbextension not found' messages are normal in some environments")
print("🎛️ Interactive controls will be available for voice agent operation")

## 1.2 Machine Learning and AI Dependencies

These libraries form the core of our AI processing capabilities:

- **Transformers**: Hugging Face library providing access to pre-trained language models
- **PyTorch**: Deep learning framework that powers most modern NLP models
- **Model Integration**: These libraries enable seamless integration with DeepSeek models

In [None]:
# Install core AI and machine learning dependencies

# transformers: Hugging Face Transformers library
# - Provides access to thousands of pre-trained language models
# - Includes automatic tokenizers, model architectures, and utilities
# - Supports DeepSeek and other state-of-the-art language models
# - Handles model loading, tokenization, and inference pipelines

# torch: PyTorch deep learning framework
# - Provides tensor operations and neural network building blocks
# - Required backend for transformer model computations
# - Enables GPU acceleration when available
# - Handles automatic differentiation and optimization

!pip install transformers torch

print("✅ Machine Learning dependencies installed successfully")
print("🤖 Transformer models and PyTorch backend ready")
print("🧠 DeepSeek integration capabilities enabled")
print("⚡ GPU acceleration available if CUDA is detected")

## 1.3 Audio Processing Dependencies

These libraries handle the complete audio pipeline for voice interaction:

- **SpeechRecognition**: Converts speech to text using multiple engines
- **PyAudio**: Low-level audio I/O for recording and playback
- **pyttsx3**: Text-to-speech conversion with multiple voice engines

In [None]:
# Install comprehensive audio processing libraries

# SpeechRecognition: Advanced speech-to-text library
# - Supports multiple recognition engines: Google, Sphinx, Wit.ai, etc.
# - Handles microphone input and audio file processing
# - Provides noise reduction and audio preprocessing
# - Supports multiple languages and locales

# pyaudio: Cross-platform audio I/O library
# - Provides low-level access to audio hardware
# - Enables real-time audio recording from microphone
# - Supports audio playback through system speakers
# - Required for live audio streaming and processing

# pyttsx3: Text-to-speech synthesis library
# - Works offline with system TTS engines
# - Cross-platform: Windows SAPI, macOS NSSpeechSynthesizer, Linux espeak
# - Configurable voice properties: rate, volume, voice selection
# - No internet connection required for speech synthesis

!pip install SpeechRecognition pyaudio pyttsx3

print("✅ Audio processing libraries installed successfully")
print("🎤 Speech recognition engine ready")
print("🔊 Text-to-speech synthesis ready")
print("🎚️ Audio I/O capabilities enabled")
print("⚠️  Note: If PyAudio fails, system audio development libraries may be needed")

## 1.4 API and Communication Dependencies

These libraries enable communication with external services and APIs:

- **Requests**: HTTP client for API communications
- **OpenAI**: Client library compatible with DeepSeek's API format
- **aiohttp**: Asynchronous HTTP client for non-blocking operations

In [None]:
# Install API communication and networking libraries

# requests: HTTP requests library for API communication
# - Simple, elegant HTTP client for Python
# - Handles authentication, headers, and request formatting
# - Used for synchronous API calls to DeepSeek services
# - Includes built-in JSON handling and error management

# openai: OpenAI-compatible API client
# - Provides standardized interface for AI model APIs
# - DeepSeek API is compatible with OpenAI format
# - Handles authentication, rate limiting, and error handling
# - Supports streaming responses and async operations

# aiohttp: Asynchronous HTTP client/server framework
# - Enables non-blocking HTTP operations
# - Useful for concurrent API calls and streaming
# - Improves responsiveness during voice processing
# - Supports WebSocket connections for real-time communication

!pip install requests openai aiohttp

print("✅ API communication libraries installed successfully")
print("🌐 HTTP client capabilities ready")
print("🔗 DeepSeek API integration enabled")
print("⚡ Asynchronous operations supported")
print("🔐 Authentication and security features available")

# 2. Library Imports and System Initialization

This section imports all required libraries and initializes core system components.
Each import group serves a specific functional area of the voice agent.

In [None]:
# ============================================================================
# CORE PYTHON LIBRARIES
# ============================================================================

import os                    # Operating system interface for environment variables
import json                  # JSON data handling for API requests/responses
import time                  # Time operations for delays and timestamps
import threading             # Multi-threading for concurrent operations
import asyncio               # Asynchronous programming support
import logging               # Logging system for debugging and monitoring
from typing import Dict, List, Optional, Any, Union  # Type hints for better code clarity
from datetime import datetime, timedelta             # Date/time operations

# ============================================================================
# AUDIO PROCESSING LIBRARIES
# ============================================================================

import speech_recognition as sr  # Speech-to-text conversion engine
import pyttsx3                   # Text-to-speech synthesis engine
import pyaudio                   # Low-level audio input/output operations

# ============================================================================
# MACHINE LEARNING AND NLP LIBRARIES
# ============================================================================

import torch                     # PyTorch deep learning framework
from transformers import (
    AutoTokenizer,               # Automatic tokenizer selection for models
    AutoModelForCausalLM,        # Causal language model architecture (GPT-style)
    pipeline,                    # High-level model interface for common tasks
    BitsAndBytesConfig          # Quantization configuration for memory efficiency
)

# ============================================================================
# API AND NETWORKING LIBRARIES
# ============================================================================

import requests                  # HTTP requests for API communication
from openai import OpenAI        # OpenAI-compatible client for DeepSeek API
import aiohttp                   # Asynchronous HTTP client

# ============================================================================
# JUPYTER NOTEBOOK LIBRARIES
# ============================================================================

import ipywidgets as widgets     # Interactive HTML widgets for Jupyter
from IPython.display import (
    display,                     # Display objects in notebook cells
    clear_output,                # Clear cell output programmatically
    HTML,                        # Render HTML content
    Audio,                       # Audio playback widget
    Javascript                   # Execute JavaScript in browser
)

# ============================================================================
# LOGGING CONFIGURATION
# ============================================================================

# Configure comprehensive logging system
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),  # Console output
        logging.FileHandler('voice_agent.log', mode='a')  # File logging
    ]
)

# Create logger instance for this module
logger = logging.getLogger(__name__)

# ============================================================================
# SYSTEM COMPATIBILITY CHECKS
# ============================================================================

print("✅ All libraries imported successfully")
print(f"🐍 Python version: {os.sys.version}")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"🤗 Transformers available: {hasattr(torch, 'cuda') and torch.cuda.is_available()}")
print(f"🎮 CUDA available: {torch.cuda.is_available() if hasattr(torch, 'cuda') else False}")
print("📦 Core components ready for initialization")

logger.info("Voice Agent initialization started")

# 3. Configuration Management System

This section defines a comprehensive configuration system that centralizes all settings
for the voice agent. This approach makes the system easily configurable and maintainable.

In [None]:
class VoiceAgentConfig:
    """
    Comprehensive configuration class for the Voice Agent system.
    
    This class centralizes all configuration parameters, making it easy to:
    - Adjust system behavior without modifying code throughout the notebook
    - Maintain consistent settings across all components
    - Enable easy deployment with different configurations
    - Provide clear documentation for each parameter
    """
    
    # ========================================================================
    # API AND MODEL CONFIGURATION
    # ========================================================================
    
    # DeepSeek API Configuration
    DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY", "your-deepseek-api-key-here")
    DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"  # Official DeepSeek API endpoint
    
    # Model Selection and Parameters
    MODEL_NAME = "deepseek-chat"         # Primary model for conversation
    FALLBACK_MODEL = "deepseek-coder"    # Fallback model if primary fails
    MAX_TOKENS = 1000                    # Maximum tokens in model response
    TEMPERATURE = 0.7                    # Response creativity (0.0 = deterministic, 1.0 = creative)
    TOP_P = 0.9                         # Nucleus sampling parameter
    FREQUENCY_PENALTY = 0.0             # Penalty for repeated tokens
    PRESENCE_PENALTY = 0.0              # Penalty for new topics
    
    # ========================================================================
    # AUDIO PROCESSING CONFIGURATION
    # ========================================================================
    
    # Audio Input/Output Settings
    SAMPLE_RATE = 16000                 # Audio sample rate in Hz (16kHz standard for speech)
    CHUNK_SIZE = 1024                   # Audio chunk size for processing (larger = more latency, better quality)
    AUDIO_FORMAT = pyaudio.paInt16      # 16-bit integer audio format
    CHANNELS = 1                        # Mono audio (1 channel)
    RECORD_SECONDS = 5                  # Maximum recording duration per session
    
    # Speech Recognition Settings
    RECOGNITION_TIMEOUT = 5             # Timeout for speech recognition (seconds)
    PHRASE_TIMEOUT = 1                  # Pause detection timeout between phrases
    ENERGY_THRESHOLD = 300              # Minimum audio energy for speech detection
    DYNAMIC_ENERGY_THRESHOLD = True     # Automatically adjust energy threshold
    RECOGNITION_LANGUAGE = "en-US"      # Primary language for speech recognition
    FALLBACK_LANGUAGES = ["en-GB", "en-AU"]  # Alternative languages to try
    
    # Text-to-Speech Configuration
    TTS_RATE = 200                      # Speech rate in words per minute
    TTS_VOLUME = 0.9                    # Speech volume (0.0 to 1.0)
    TTS_VOICE_INDEX = 0                 # Voice selection index (system dependent)
    TTS_ENGINE = "default"              # TTS engine preference
    
    # ========================================================================
    # MCP PROTOCOL CONFIGURATION
    # ========================================================================
    
    # Model Context Protocol Settings
    MCP_VERSION = "1.0"                 # MCP protocol version
    MCP_TIMEOUT = 30                    # MCP operation timeout (seconds)
    CONTEXT_WINDOW_SIZE = 10            # Number of previous interactions to remember
    MAX_CONTEXT_TOKENS = 2048           # Maximum tokens for context history
    CONTEXT_COMPRESSION_RATIO = 0.7     # Compress context when approaching limit
    
    # ========================================================================
    # SYSTEM PERFORMANCE CONFIGURATION
    # ========================================================================
    
    # Threading and Concurrency
    MAX_WORKER_THREADS = 4              # Maximum concurrent processing threads
    ASYNC_TIMEOUT = 10                  # Timeout for async operations
    
    # Memory Management
    MAX_MEMORY_MB = 1024                # Maximum memory usage in MB
    GARBAGE_COLLECTION_INTERVAL = 100   # GC interval in operations
    
    # Caching Configuration
    ENABLE_MODEL_CACHE = True           # Cache loaded models
    CACHE_EXPIRY_MINUTES = 60           # Cache expiration time
    
    # ========================================================================
    # USER INTERFACE CONFIGURATION
    # ========================================================================
    
    # Widget Display Settings
    WIDGET_WIDTH = "300px"              # Default widget width
    WIDGET_HEIGHT = "40px"              # Default widget height
    UPDATE_INTERVAL_MS = 100            # UI update interval in milliseconds
    
    # Status and Feedback
    SHOW_DEBUG_INFO = True              # Display debug information
    ENABLE_PROGRESS_BARS = True         # Show progress indicators
    VOICE_FEEDBACK = True               # Provide audio feedback for actions
    
    # ========================================================================
    # LOGGING AND MONITORING
    # ========================================================================
    
    # Logging Configuration
    LOG_LEVEL = "INFO"                  # Logging level (DEBUG, INFO, WARNING, ERROR)
    LOG_FILE = "voice_agent.log"        # Log file name
    MAX_LOG_SIZE_MB = 10                # Maximum log file size
    LOG_BACKUP_COUNT = 3                # Number of backup log files
    
    # Performance Monitoring
    ENABLE_METRICS = True               # Collect performance metrics
    METRICS_INTERVAL = 60               # Metrics collection interval (seconds)
    
    @classmethod
    def validate_config(cls) -> List[str]:
        """
        Validate configuration settings and return list of issues.
        
        Returns:
            List of configuration validation warnings/errors
        """
        issues = []
        
        # API Key Validation
        if cls.DEEPSEEK_API_KEY == "your-deepseek-api-key-here":
            issues.append("⚠️  DeepSeek API key not configured")
        
        # Audio Settings Validation
        if cls.SAMPLE_RATE not in [8000, 16000, 44100, 48000]:
            issues.append(f"⚠️  Unusual sample rate: {cls.SAMPLE_RATE}Hz")
        
        if cls.TEMPERATURE < 0.0 or cls.TEMPERATURE > 1.0:
            issues.append(f"⚠️  Temperature should be 0.0-1.0, got {cls.TEMPERATURE}")
        
        # Memory Validation
        if cls.MAX_MEMORY_MB < 512:
            issues.append(f"⚠️  Low memory limit: {cls.MAX_MEMORY_MB}MB")
        
        return issues
    
    @classmethod
    def get_summary(cls) -> str:
        """
        Get a formatted summary of current configuration.
        
        Returns:
            Formatted configuration summary string
        """
        return f"""
📋 Voice Agent Configuration Summary
{'='*50}
🤖 Model: {cls.MODEL_NAME} (temp: {cls.TEMPERATURE})
🎤 Audio: {cls.SAMPLE_RATE}Hz, {cls.CHANNELS} channel(s)
🗣️  TTS: {cls.TTS_RATE}WPM, volume {cls.TTS_VOLUME}
🧠 Context: {cls.CONTEXT_WINDOW_SIZE} interactions, {cls.MAX_CONTEXT_TOKENS} tokens
⚡ Performance: {cls.MAX_WORKER_THREADS} threads, {cls.MAX_MEMORY_MB}MB limit
📊 Monitoring: {'Enabled' if cls.ENABLE_METRICS else 'Disabled'}
        """.strip()

# Initialize configuration and validate settings
config = VoiceAgentConfig()
validation_issues = config.validate_config()

print("✅ Configuration system initialized")
print(config.get_summary())

if validation_issues:
    print("\n⚠️  Configuration Issues:")
    for issue in validation_issues:
        print(f"   {issue}")
else:
    print("\n✅ All configuration settings validated successfully")

logger.info(f"Configuration loaded: {len(validation_issues)} issues found")

# 4. MCP (Model Context Protocol) Implementation

The Model Context Protocol provides a structured framework for managing AI model interactions.
This implementation handles conversation context, session management, and structured communication
between the user interface and the AI model.

In [None]:
class MCPProtocol:
    """
    Model Context Protocol implementation for structured AI interactions.
    
    The MCP system provides:
    - Conversation context management with memory optimization
    - Structured request/response handling with validation
    - Session state management and persistence
    - Error handling and recovery mechanisms
    - Performance monitoring and metrics collection
    
    This protocol ensures consistent communication format between all system
    components and maintains conversation coherence across interactions.
    """
    
    def __init__(self, config: VoiceAgentConfig):
        """
        Initialize MCP protocol handler with comprehensive state management.
        
        Args:
            config: Voice agent configuration object
        """
        self.config = config
        
        # Session Management
        self.session_id = f"session_{int(time.time())}_{os.getpid()}"
        self.session_start_time = time.time()
        
        # Context and Memory Management
        self.context_history: List[Dict[str, Any]] = []  # Complete conversation history
        self.compressed_context: List[Dict[str, Any]] = []  # Compressed older context
        self.active_context_tokens = 0  # Current context token count
        
        # Session Metadata
        self.metadata = {
            "mcp_version": config.MCP_VERSION,
            "session_id": self.session_id,
            "created_at": time.time(),
            "model_name": config.MODEL_NAME,
            "language": config.RECOGNITION_LANGUAGE,
            "client_info": {
                "platform": os.name,
                "python_version": os.sys.version,
                "torch_version": torch.__version__
            }
        }
        
        # Performance Metrics
        self.metrics = {
            "total_requests": 0,
            "successful_requests": 0,
            "failed_requests": 0,
            "total_tokens_processed": 0,
            "average_response_time": 0.0,
            "context_compressions": 0
        }
        
        # State Management
        self.is_active = True
        self.last_activity = time.time()
        
        logger.info(f"MCP Protocol initialized - Session: {self.session_id}")
    
    def create_request(self, user_input: str, context: Optional[Dict] = None, 
                      request_type: str = "chat") -> Dict[str, Any]:
        """
        Create a structured MCP request with comprehensive metadata.
        
        This method packages user input into a standardized format that includes:
        - Request identification and routing information
        - Complete conversation context with token management
        - Session state and metadata
        - Performance tracking information
        
        Args:
            user_input: User's text input to process
            context: Additional context information (optional)
            request_type: Type of request (chat, command, query, etc.)
            
        Returns:
            Structured MCP request dictionary ready for model processing
        """
        request_id = f"req_{self.metrics['total_requests']}_{int(time.time())}"
        request_timestamp = time.time()
        
        # Prepare conversation context with token management
        context_for_request = self._prepare_context_for_request()
        
        # Build comprehensive request structure
        request = {
            # Core MCP Headers
            "mcp_version": self.config.MCP_VERSION,
            "session_id": self.session_id,
            "request_id": request_id,
            "timestamp": request_timestamp,
            "request_type": request_type,
            
            # User Input and Context
            "user_input": user_input,
            "input_length": len(user_input),
            "additional_context": context or {},
            
            # Conversation History
            "conversation_context": context_for_request,
            "context_token_count": self.active_context_tokens,
            "context_compression_level": len(self.compressed_context),
            
            # Model Configuration
            "model_config": {
                "model_name": self.config.MODEL_NAME,
                "max_tokens": self.config.MAX_TOKENS,
                "temperature": self.config.TEMPERATURE,
                "top_p": self.config.TOP_P
            },
            
            # Session Information
            "session_info": {
                "session_duration": request_timestamp - self.session_start_time,
                "total_interactions": len(self.context_history),
                "last_activity": self.last_activity
            },
            
            # System Metadata
            "system_metadata": self.metadata.copy()
        }
        
        # Update metrics and state
        self.metrics["total_requests"] += 1
        self.last_activity = request_timestamp
        
        logger.info(f"MCP request created: {request_id} (type: {request_type})")
        logger.debug(f"Request context tokens: {self.active_context_tokens}")
        
        return request
    
    def process_response(self, response: str, request_id: str, 
                        processing_time: float = 0.0) -> Dict[str, Any]:
        """
        Process and structure an MCP response with comprehensive metadata.
        
        This method handles the AI model's response by:
        - Structuring the response in MCP format
        - Adding performance and quality metrics
        - Updating conversation context and history
        - Managing memory and token limits
        
        Args:
            response: Model's response text
            request_id: ID of the original request
            processing_time: Time taken to generate response (seconds)
            
        Returns:
            Structured MCP response dictionary with metadata
        """
        response_timestamp = time.time()
        response_tokens = len(response.split())  # Approximate token count
        
        # Build comprehensive response structure
        response_data = {
            # Core MCP Headers
            "mcp_version": self.config.MCP_VERSION,
            "session_id": self.session_id,
            "request_id": request_id,
            "response_id": f"resp_{request_id}_{int(response_timestamp)}",
            "timestamp": response_timestamp,
            
            # Response Content
            "response": response,
            "response_length": len(response),
            "response_tokens": response_tokens,
            "status": "success",
            
            # Performance Metrics
            "performance": {
                "processing_time": processing_time,
                "tokens_per_second": response_tokens / max(processing_time, 0.001),
                "response_quality_score": self._calculate_response_quality(response)
            },
            
            # Model Information
            "model_info": {
                "model_name": self.config.MODEL_NAME,
                "tokens_used": response_tokens,
                "context_tokens": self.active_context_tokens
            },
            
            # Context State
            "context_state": {
                "total_interactions": len(self.context_history) + 1,
                "context_window_full": len(self.context_history) >= self.config.CONTEXT_WINDOW_SIZE,
                "compression_needed": self.active_context_tokens > self.config.MAX_CONTEXT_TOKENS
            }
        }
        
        # Add to conversation context
        interaction_record = {
            "request_id": request_id,
            "timestamp": response_timestamp,
            "user_input": "",  # Will be filled by caller if needed
            "response": response,
            "processing_time": processing_time,
            "tokens": response_tokens
        }
        
        self.context_history.append(interaction_record)
        self.active_context_tokens += response_tokens
        
        # Manage context size and compression
        self._manage_context_size()
        
        # Update performance metrics
        self._update_performance_metrics(processing_time, response_tokens, True)
        
        logger.info(f"MCP response processed: {request_id}")
        logger.debug(f"Response tokens: {response_tokens}, Total context: {self.active_context_tokens}")
        
        return response_data
    
    def _prepare_context_for_request(self) -> List[Dict[str, Any]]:
        """
        Prepare conversation context for model request with intelligent truncation.
        
        Returns:
            Optimized context history for model consumption
        """
        # Return recent context within token limits
        context_window = self.context_history[-self.config.CONTEXT_WINDOW_SIZE:]
        
        # Add compressed context summary if available
        if self.compressed_context:
            context_summary = {
                "type": "context_summary",
                "summary": f"Previous {len(self.compressed_context)} interactions compressed",
                "key_topics": self._extract_key_topics(self.compressed_context)
            }
            return [context_summary] + context_window
        
        return context_window
    
    def _manage_context_size(self):
        """
        Manage context size through intelligent compression and pruning.
        """
        if self.active_context_tokens > self.config.MAX_CONTEXT_TOKENS:
            # Move older interactions to compressed context
            compress_count = max(1, len(self.context_history) // 4)
            to_compress = self.context_history[:compress_count]
            
            # Add to compressed context
            self.compressed_context.extend(to_compress)
            
            # Remove from active context
            self.context_history = self.context_history[compress_count:]
            
            # Recalculate token count
            self.active_context_tokens = sum(
                interaction.get("tokens", 0) for interaction in self.context_history
            )
            
            self.metrics["context_compressions"] += 1
            logger.info(f"Context compressed: {compress_count} interactions moved to compressed storage")
    
    def _calculate_response_quality(self, response: str) -> float:
        """
        Calculate a simple quality score for the response.
        
        Args:
            response: Response text to evaluate
            
        Returns:
            Quality score between 0.0 and 1.0
        """
        if not response:
            return 0.0
        
        # Simple heuristics for response quality
        score = 0.5  # Base score
        
        # Length appropriateness (not too short, not too long)
        length = len(response)
        if 20 <= length <= 500:
            score += 0.2
        
        # Sentence structure (contains periods or question marks)
        if any(punct in response for punct in '.?!'):
            score += 0.1
        
        # Coherence (no excessive repetition)
        words = response.lower().split()
        if len(set(words)) / max(len(words), 1) > 0.5:
            score += 0.1
        
        # Engagement (contains personal pronouns or questions)
        if any(word in response.lower() for word in ['you', 'your', 'i', '?']):
            score += 0.1
        
        return min(score, 1.0)
    
    def _extract_key_topics(self, interactions: List[Dict]) -> List[str]:
        """
        Extract key topics from compressed interactions.
        
        Args:
            interactions: List of interaction records
            
        Returns:
            List of key topics/themes
        """
        # Simple keyword extraction (can be enhanced with NLP)
        all_text = " ".join(
            interaction.get("response", "") for interaction in interactions
        )
        
        # Extract meaningful words (excluding common words)
        words = all_text.lower().split()
        common_words = {"the", "is", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by", "a", "an"}
        meaningful_words = [word for word in words if len(word) > 3 and word not in common_words]
        
        # Return most frequent meaningful words as topics
        from collections import Counter
        word_counts = Counter(meaningful_words)
        return [word for word, count in word_counts.most_common(5)]
    
    def _update_performance_metrics(self, processing_time: float, 
                                  tokens: int, success: bool):
        """
        Update performance metrics with latest operation data.
        """
        if success:
            self.metrics["successful_requests"] += 1
        else:
            self.metrics["failed_requests"] += 1
        
        self.metrics["total_tokens_processed"] += tokens
        
        # Update average response time
        total_successful = self.metrics["successful_requests"]
        if total_successful > 0:
            current_avg = self.metrics["average_response_time"]
            self.metrics["average_response_time"] = (
                (current_avg * (total_successful - 1) + processing_time) / total_successful
            )
    
    def get_context_summary(self) -> str:
        """
        Get a comprehensive summary of the current conversation context.
        
        Returns:
            Formatted string summary of conversation context and system state
        """
        session_duration = time.time() - self.session_start_time
        
        return f"""
🗣️  MCP Session Summary
{'='*40}
Session ID: {self.session_id}
Duration: {session_duration:.1f} seconds
Total Interactions: {len(self.context_history)}
Active Context Tokens: {self.active_context_tokens}
Compressed Interactions: {len(self.compressed_context)}

📊 Performance Metrics:
  • Total Requests: {self.metrics['total_requests']}
  • Success Rate: {self.metrics['successful_requests']}/{self.metrics['total_requests']}
  • Average Response Time: {self.metrics['average_response_time']:.2f}s
  • Tokens Processed: {self.metrics['total_tokens_processed']}
  • Context Compressions: {self.metrics['context_compressions']}
        """.strip()
    
    def reset_session(self):
        """
        Reset the session state while preserving configuration.
        """
        old_session_id = self.session_id
        
        # Create new session
        self.session_id = f"session_{int(time.time())}_{os.getpid()}"
        self.session_start_time = time.time()
        
        # Clear context
        self.context_history.clear()
        self.compressed_context.clear()
        self.active_context_tokens = 0
        
        # Reset metrics
        self.metrics = {
            "total_requests": 0,
            "successful_requests": 0,
            "failed_requests": 0,
            "total_tokens_processed": 0,
            "average_response_time": 0.0,
            "context_compressions": 0
        }
        
        logger.info(f"MCP session reset: {old_session_id} -> {self.session_id}")

# Initialize MCP Protocol
mcp = MCPProtocol(config)

print("✅ MCP Protocol system initialized")
print(f"🆔 Session ID: {mcp.session_id}")
print(f"📊 Context window: {config.CONTEXT_WINDOW_SIZE} interactions")
print(f"🧠 Token limit: {config.MAX_CONTEXT_TOKENS} tokens")
print("🔄 Context compression and management enabled")

logger.info("MCP Protocol initialization completed")

# 5. DeepSeek Model Integration

This section implements comprehensive integration with DeepSeek language models,
providing both API-based and local model support with advanced features like
memory management, context optimization, and error handling.

In [None]:
class DeepSeekIntegration:
    """
    Comprehensive DeepSeek model integration with advanced features.
    
    This class provides:
    - API-based and local model support
    - Intelligent context management and memory optimization
    - Multi-language support and conversation continuity
    - Error handling with fallback mechanisms
    - Performance monitoring and optimization
    - Token usage tracking and cost management
    """
    
    def __init__(self, config: VoiceAgentConfig, mcp_protocol: MCPProtocol):
        """
        Initialize DeepSeek integration with comprehensive configuration.
        
        Args:
            config: Voice agent configuration
            mcp_protocol: MCP protocol handler for structured communication
        """
        self.config = config
        self.mcp = mcp_protocol
        
        # API Client Configuration
        self.api_client = None
        self.api_available = False
        
        # Local Model Configuration
        self.local_model = None
        self.local_tokenizer = None
        self.local_model_available = False
        
        # Performance Tracking
        self.performance_stats = {
            "total_api_calls": 0,
            "total_local_calls": 0,
            "api_failures": 0,
            "local_failures": 0,
            "average_api_latency": 0.0,
            "average_local_latency": 0.0,
            "total_tokens_consumed": 0,
            "estimated_cost_usd": 0.0
        }
        
        # Conversation State
        self.conversation_memory = []  # Structured conversation history
        self.user_preferences = {}     # Learned user preferences
        self.context_keywords = set()  # Important context keywords
        
        # Initialize available backends
        self._initialize_api_client()
        self._initialize_local_model()
        
        logger.info("DeepSeek integration initialized")
    
    def _initialize_api_client(self):
        """
        Initialize DeepSeek API client with authentication and error handling.
        """
        try:
            if self.config.DEEPSEEK_API_KEY != "your-deepseek-api-key-here":
                self.api_client = OpenAI(
                    api_key=self.config.DEEPSEEK_API_KEY,
                    base_url=self.config.DEEPSEEK_BASE_URL
                )
                
                # Test API connectivity
                test_response = self.api_client.chat.completions.create(
                    model=self.config.MODEL_NAME,
                    messages=[{"role": "user", "content": "Hello"}],
                    max_tokens=5
                )
                
                self.api_available = True
                logger.info("DeepSeek API client initialized and tested successfully")
                
            else:
                logger.warning("DeepSeek API key not configured")
                
        except Exception as e:
            logger.error(f"Failed to initialize DeepSeek API: {e}")
            self.api_available = False
    
    def _initialize_local_model(self):
        """
        Initialize local DeepSeek model for offline operation.
        
        Note: This requires significant computational resources and model downloads.
        """
        try:
            # Check if we have enough memory for local model
            import psutil
            available_memory_gb = psutil.virtual_memory().available / (1024**3)
            
            if available_memory_gb < 8:  # Require at least 8GB for local model
                logger.warning(f"Insufficient memory for local model: {available_memory_gb:.1f}GB available")
                return
            
            # Try to load a smaller, compatible model for local use
            model_name = "microsoft/DialoGPT-small"  # Fallback conversational model
            
            logger.info(f"Loading local model: {model_name}")
            
            # Load tokenizer
            self.local_tokenizer = AutoTokenizer.from_pretrained(model_name)
            self.local_tokenizer.pad_token = self.local_tokenizer.eos_token
            
            # Load model with optimization
            self.local_model = AutoModelForCausalLM.from_pretrained(
                model_name,
                torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
                device_map="auto" if torch.cuda.is_available() else None,
                low_cpu_mem_usage=True
            )
            
            self.local_model_available = True
            logger.info("Local model initialized successfully")
            
        except Exception as e:
            logger.error(f"Failed to initialize local model: {e}")
            self.local_model_available = False
    
    def generate_response(self, user_input: str, context: Optional[Dict] = None) -> str:
        """
        Generate response using available DeepSeek models with intelligent fallback.
        
        This method:
        1. Creates structured MCP request
        2. Attempts API-based generation first
        3. Falls back to local model if API unavailable
        4. Manages conversation context and memory
        5. Tracks performance and usage metrics
        
        Args:
            user_input: User's input text
            context: Additional context information
            
        Returns:
            Generated response text
        """
        if not user_input.strip():
            return "I didn't hear anything. Could you please repeat that?"
        
        # Create MCP request
        mcp_request = self.mcp.create_request(user_input, context)
        start_time = time.time()
        
        try:
            # Try API-based generation first
            if self.api_available:
                response = self._generate_api_response(mcp_request)
                method_used = "API"
            elif self.local_model_available:
                response = self._generate_local_response(mcp_request)
                method_used = "Local"
            else:
                response = self._generate_fallback_response(user_input)
                method_used = "Fallback"
            
            processing_time = time.time() - start_time
            
            # Process response through MCP
            mcp_response = self.mcp.process_response(
                response, mcp_request["request_id"], processing_time
            )
            
            # Update conversation memory
            self._update_conversation_memory(user_input, response, method_used)
            
            # Learn from interaction
            self._learn_from_interaction(user_input, response)
            
            logger.info(f"Response generated using {method_used} in {processing_time:.2f}s")
            return response
            
        except Exception as e:
            logger.error(f"Error generating response: {e}")
            self.performance_stats["api_failures" if self.api_available else "local_failures"] += 1
            return "I'm having trouble processing that right now. Could you try rephrasing your question?"
    
    def _generate_api_response(self, mcp_request: Dict) -> str:
        """
        Generate response using DeepSeek API with advanced context management.
        
        Args:
            mcp_request: Structured MCP request
            
        Returns:
            API-generated response text
        """
        # Prepare conversation messages with context
        messages = self._prepare_conversation_messages(mcp_request)
        
        # Make API call with comprehensive parameters
        response = self.api_client.chat.completions.create(
            model=self.config.MODEL_NAME,
            messages=messages,
            max_tokens=self.config.MAX_TOKENS,
            temperature=self.config.TEMPERATURE,
            top_p=self.config.TOP_P,
            frequency_penalty=self.config.FREQUENCY_PENALTY,
            presence_penalty=self.config.PRESENCE_PENALTY,
            stream=False
        )
        
        # Extract and process response
        response_text = response.choices[0].message.content.strip()
        
        # Update performance statistics
        self.performance_stats["total_api_calls"] += 1
        self.performance_stats["total_tokens_consumed"] += response.usage.total_tokens
        self.performance_stats["estimated_cost_usd"] += self._estimate_cost(response.usage.total_tokens)
        
        return response_text
    
    def _generate_local_response(self, mcp_request: Dict) -> str:
        """
        Generate response using local model with memory optimization.
        
        Args:
            mcp_request: Structured MCP request
            
        Returns:
            Locally-generated response text
        """
        # Prepare input with conversation context
        conversation_text = self._prepare_local_context(mcp_request)
        
        # Tokenize input
        inputs = self.local_tokenizer.encode(
            conversation_text, 
            return_tensors="pt",
            max_length=1024,
            truncation=True
        )
        
        # Generate response with controlled parameters
        with torch.no_grad():
            outputs = self.local_model.generate(
                inputs,
                max_new_tokens=min(self.config.MAX_TOKENS, 200),
                temperature=self.config.TEMPERATURE,
                top_p=self.config.TOP_P,
                do_sample=True,
                pad_token_id=self.local_tokenizer.eos_token_id,
                eos_token_id=self.local_tokenizer.eos_token_id
            )
        
        # Decode and clean response
        full_response = self.local_tokenizer.decode(outputs[0], skip_special_tokens=True)
        response_text = full_response[len(conversation_text):].strip()
        
        # Clean up response
        if not response_text:
            response_text = "I understand. Could you tell me more about that?"
        
        # Update performance statistics
        self.performance_stats["total_local_calls"] += 1
        
        return response_text
    
    def _generate_fallback_response(self, user_input: str) -> str:
        """
        Generate fallback response when no models are available.
        
        Args:
            user_input: User's input text
            
        Returns:
            Rule-based fallback response
        """
        user_lower = user_input.lower()
        
        # Simple rule-based responses
        if any(greeting in user_lower for greeting in ["hello", "hi", "hey", "good morning", "good afternoon"]):
            return "Hello! I'm here to help. What would you like to talk about?"
        
        elif any(question in user_lower for question in ["how are you", "how do you feel", "what's up"]):
            return "I'm doing well, thank you for asking! How can I assist you today?"
        
        elif any(goodbye in user_lower for goodbye in ["bye", "goodbye", "see you", "farewell"]):
            return "Goodbye! It was nice talking with you. Have a great day!"
        
        elif "?" in user_input:
            return "That's an interesting question. I'm currently running in limited mode, but I'd be happy to discuss this with you when my full capabilities are available."
        
        else:
            return "I hear you. Could you help me understand what you'd like to know more about?"
    
    def _prepare_conversation_messages(self, mcp_request: Dict) -> List[Dict[str, str]]:
        """
        Prepare conversation messages for API with intelligent context management.
        
        Args:
            mcp_request: MCP request with context
            
        Returns:
            List of conversation messages formatted for API
        """
        messages = [{
            "role": "system",
            "content": "You are a helpful, knowledgeable, and friendly AI assistant. Provide clear, accurate, and engaging responses. Keep responses conversational and appropriate for voice interaction."
        }]
        
        # Add conversation context
        context = mcp_request.get("conversation_context", [])
        for interaction in context[-5:]:  # Last 5 interactions
            if "user_input" in interaction and "response" in interaction:
                messages.append({"role": "user", "content": interaction["user_input"]})
                messages.append({"role": "assistant", "content": interaction["response"]})
        
        # Add current user input
        messages.append({"role": "user", "content": mcp_request["user_input"]})
        
        return messages
    
    def _prepare_local_context(self, mcp_request: Dict) -> str:
        """
        Prepare context string for local model generation.
        
        Args:
            mcp_request: MCP request with context
            
        Returns:
            Context string formatted for local model
        """
        context_parts = []
        
        # Add recent conversation
        context = mcp_request.get("conversation_context", [])
        for interaction in context[-3:]:  # Last 3 interactions for local model
            if "user_input" in interaction and "response" in interaction:
                context_parts.append(f"Human: {interaction['user_input']}")
                context_parts.append(f"Assistant: {interaction['response']}")
        
        # Add current input
        context_parts.append(f"Human: {mcp_request['user_input']}")
        context_parts.append("Assistant:")
        
        return "\n".join(context_parts)
    
    def _update_conversation_memory(self, user_input: str, response: str, method: str):
        """
        Update conversation memory with interaction details.
        
        Args:
            user_input: User's input
            response: Generated response
            method: Generation method used
        """
        interaction = {
            "timestamp": time.time(),
            "user_input": user_input,
            "response": response,
            "method": method,
            "user_input_length": len(user_input),
            "response_length": len(response)
        }
        
        self.conversation_memory.append(interaction)
        
        # Keep memory manageable
        if len(self.conversation_memory) > 100:
            self.conversation_memory = self.conversation_memory[-50:]  # Keep last 50
    
    def _learn_from_interaction(self, user_input: str, response: str):
        """
        Learn user preferences and context from interactions.
        
        Args:
            user_input: User's input
            response: Generated response
        """
        # Extract keywords from user input
        words = user_input.lower().split()
        meaningful_words = [word for word in words if len(word) > 3]
        self.context_keywords.update(meaningful_words)
        
        # Keep keyword set manageable
        if len(self.context_keywords) > 200:
            # Keep most recent words (simple approach)
            self.context_keywords = set(list(self.context_keywords)[-150:])
    
    def _estimate_cost(self, tokens: int) -> float:
        """
        Estimate API cost based on token usage.
        
        Args:
            tokens: Number of tokens used
            
        Returns:
            Estimated cost in USD
        """
        # Approximate DeepSeek pricing (adjust as needed)
        cost_per_1k_tokens = 0.002  # $0.002 per 1K tokens (estimate)
        return (tokens / 1000) * cost_per_1k_tokens
    
    def get_performance_summary(self) -> str:
        """
        Get comprehensive performance summary.
        
        Returns:
            Formatted performance statistics
        """
        stats = self.performance_stats
        total_calls = stats["total_api_calls"] + stats["total_local_calls"]
        
        return f"""
🚀 DeepSeek Performance Summary
{'='*40}
API Status: {'✅ Available' if self.api_available else '❌ Unavailable'}
Local Model: {'✅ Available' if self.local_model_available else '❌ Unavailable'}

📊 Usage Statistics:
  • Total Calls: {total_calls}
  • API Calls: {stats['total_api_calls']}
  • Local Calls: {stats['total_local_calls']}
  • API Failures: {stats['api_failures']}
  • Local Failures: {stats['local_failures']}

⚡ Performance:
  • Avg API Latency: {stats['average_api_latency']:.2f}s
  • Avg Local Latency: {stats['average_local_latency']:.2f}s

💰 Resource Usage:
  • Tokens Consumed: {stats['total_tokens_consumed']}
  • Estimated Cost: ${stats['estimated_cost_usd']:.4f}
  • Conversations: {len(self.conversation_memory)}
  • Context Keywords: {len(self.context_keywords)}
        """.strip()

# Initialize DeepSeek Integration
deepseek = DeepSeekIntegration(config, mcp)

print("✅ DeepSeek integration initialized")
print(f"🌐 API Available: {deepseek.api_available}")
print(f"💻 Local Model Available: {deepseek.local_model_available}")
print("🧠 Intelligent context management enabled")
print("📊 Performance monitoring active")

logger.info("DeepSeek integration initialization completed")