# SPEECH-TO-TEXT PRACTICE GUIDE

## EXERCISE GOAL
In this practice session, you'll build a real-time speech recognition system using Vosk and PyAudio. Your program will continuously listen for speech, convert it to text, and detect simple voice commands. This forms the foundation for a voice assistant.

## STEP-BY-STEP INSTRUCTIONS

### Step 1: Set Up Your Project
Let's start by importing the necessary modules:

In [None]:
# Import necessary modules
from vosk import Model, KaldiRecognizer, SetLogLevel
import pyaudio
import numpy as np
import json
import time
import os
import queue
import threading
from scipy.signal import butter, lfilter
import matplotlib.pyplot as plt

# Suppress excessive Vosk logging
SetLogLevel(-1)

# Check that all imports were successful
print("All modules imported successfully!")

### Step 2: Define Helper Functions for Audio Processing

First, let's create some helper functions for audio processing and visualization:

In [None]:
def bandpass_filter(audio, sr, lowcut=300, highcut=3000, order=5):
    """Apply a bandpass filter to focus on speech frequencies"""
    nyquist = 0.5 * sr
    low = lowcut / nyquist
    high = highcut / nyquist
    b, a = butter(order, [low, high], btype='band')
    return lfilter(b, a, audio)

def normalize_audio(audio, target_dB=-3):
    """Normalize audio to a target decibel level"""
    # Find the maximum absolute amplitude
    max_amplitude = np.max(np.abs(audio))
    
    # Calculate current peak in dB
    current_dB = 20 * np.log10(max_amplitude) if max_amplitude > 0 else -80
    
    # Calculate the gain needed
    gain_dB = target_dB - current_dB
    gain_linear = 10 ** (gain_dB / 20)
    
    # Apply gain
    normalized = audio * gain_linear
    
    # Ensure no clipping
    if np.max(np.abs(normalized)) > 1.0:
        normalized = normalized / np.max(np.abs(normalized))
    
    return normalized

def plot_waveform(audio, sr, title="Audio Waveform"):
    """Plot audio waveform"""
    time_axis = np.linspace(0, len(audio)/sr, len(audio))
    plt.figure(figsize=(10, 4))
    plt.plot(time_axis, audio)
    plt.title(title)
    plt.xlabel("Time (seconds)")
    plt.ylabel("Amplitude")
    plt.ylim(-1.1, 1.1)
    plt.grid(True)
    plt.show()

# Test the functions with a simple sine wave
sample_rate = 16000
duration = 1  # 1 second
t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
# Create a test signal with frequencies at 200Hz, 1000Hz and 4000Hz
test_signal = 0.3 * np.sin(2 * np.pi * 200 * t) + 0.4 * np.sin(2 * np.pi * 1000 * t) + 0.3 * np.sin(2 * np.pi * 4000 * t)

# Apply filtering and normalization
filtered_signal = bandpass_filter(test_signal, sample_rate)
normalized_signal = normalize_audio(filtered_signal)

# Plot the original and processed signals
plot_waveform(test_signal, sample_rate, "Original Test Signal")
plot_waveform(filtered_signal, sample_rate, "Bandpass Filtered Signal (300-3000Hz)")
plot_waveform(normalized_signal, sample_rate, "Normalized Signal")

### Step 3: Implement Basic Real-time Speech Recognition

Now let's implement a basic real-time speech recognition system:

In [None]:
class BasicRecognizer:
    """Basic real-time speech recognition with Vosk"""
    
    def __init__(self, model_path, sample_rate=16000):
        """
        Initialize the recognizer
        
        Parameters:
        - model_path: Path to Vosk model directory
        - sample_rate: Audio sample rate (must match what the model expects)
        """
        self.sample_rate = sample_rate
        self.chunk_size = 1024
        
        # Check if model exists
        if not os.path.exists(model_path):
            raise FileNotFoundError(f"Model not found at {model_path}")
        
        # Load Vosk model
        self.model = Model(model_path)
        self.recognizer = KaldiRecognizer(self.model, self.sample_rate)
        
        # PyAudio setup
        self.p = None
        self.stream = None
    
    def start_listening(self, max_seconds=10):
        """
        Start listening and recognizing speech
        
        Parameters:
        - max_seconds: Maximum seconds to listen before stopping
        """
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=self.sample_rate,
            input=True,
            frames_per_buffer=self.chunk_size
        )
        
        print(f"Listening for up to {max_seconds} seconds...")
        self.stream.start_stream()
        
        start_time = time.time()
        while time.time() - start_time < max_seconds:
            try:
                # Read audio chunk
                data = self.stream.read(self.chunk_size, exception_on_overflow=False)
                
                # Process with Vosk
                if self.recognizer.AcceptWaveform(data):
                    result = json.loads(self.recognizer.Result())
                    text = result.get("text", "")
                    if text:
                        print(f"\nRecognized: {text}")
                else:
                    # Show partial results
                    partial = json.loads(self.recognizer.PartialResult())
                    partial_text = partial.get("partial", "")
                    if partial_text:
                        print(f"\rPartial: {partial_text}", end="", flush=True)
            
            except KeyboardInterrupt:
                break
        
        # Get final result
        final_result = json.loads(self.recognizer.FinalResult())
        final_text = final_result.get("text", "")
        if final_text:
            print(f"\nFinal: {final_text}")
        
        # Clean up
        self.stop_listening()
        return final_text
    
    def stop_listening(self):
        """Stop listening and clean up resources"""
        if self.stream:
            self.stream.stop_stream()
            self.stream.close()
            self.stream = None
        
        if self.p:
            self.p.terminate()
            self.p = None
            
        print("Stopped listening")

# Define path to your Vosk model
model_path = "models/vosk-model-small-en-us-0.15"

# Check if the model exists, provide guidance if it doesn't
if not os.path.exists(model_path):
    print(f"Model not found at {model_path}")
    print("Please download a model from https://alphacephei.com/vosk/models")
    print("For example, download vosk-model-small-en-us-0.15.zip and extract it to the models directory")
else:
    print(f"Found model at {model_path}")
    
    # Uncomment to test the basic recognizer
    # recognizer = BasicRecognizer(model_path)
    # recognizer.start_listening(max_seconds=10)

### Step 4: Enhance the Recognizer with Preprocessing

Now let's add preprocessing to improve recognition accuracy:

In [None]:
class EnhancedRecognizer(BasicRecognizer):
    """Enhanced speech recognizer with preprocessing"""
    
    def __init__(self, model_path, sample_rate=16000):
        super().__init__(model_path, sample_rate)
        
        # Preprocessing settings
        self.enable_bandpass = True
        self.enable_normalization = True
        self.bandpass_lowcut = 300
        self.bandpass_highcut = 3000
        
        # For silence detection
        self.silence_threshold = 700  # Adjust based on your microphone
        self.speech_detected = False
        self.last_speech_time = 0
        self.silence_timeout = 2.0  # Seconds of silence to end recognition
    
    def preprocess_audio(self, audio_bytes):
        """
        Preprocess audio data
        
        Parameters:
        - audio_bytes: Raw audio bytes from PyAudio
        
        Returns:
        - processed_bytes: Processed audio bytes
        - is_speech: Whether speech was detected
        """
        # Convert bytes to numpy array
        audio = np.frombuffer(audio_bytes, dtype=np.int16).astype(np.float32) / 32767.0
        
        # Check audio level for silence detection
        audio_level = np.max(np.abs(audio)) * 32767
        is_speech = audio_level > self.silence_threshold
        
        # Apply preprocessing if enabled
        if self.enable_normalization:
            audio = normalize_audio(audio)
        
        if self.enable_bandpass:
            audio = bandpass_filter(audio, self.sample_rate, 
                                    self.bandpass_lowcut, self.bandpass_highcut)
        
        # Convert back to bytes
        processed_bytes = (audio * 32767).astype(np.int16).tobytes()
        
        return processed_bytes, is_speech
    
    def start_listening(self, max_seconds=30):
        """
        Start listening with enhanced processing
        
        Parameters:
        - max_seconds: Maximum seconds to listen before stopping
        """
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=self.sample_rate,
            input=True,
            frames_per_buffer=self.chunk_size
        )
        
        print(f"Listening for up to {max_seconds} seconds...")
        print("Speak clearly, pausing for 2 seconds will stop recognition")
        self.stream.start_stream()
        
        start_time = time.time()
        self.speech_detected = False
        self.last_speech_time = start_time
        
        try:
            while time.time() - start_time < max_seconds:
                # Read audio chunk
                data = self.stream.read(self.chunk_size, exception_on_overflow=False)
                
                # Preprocess audio
                processed_data, is_speech = self.preprocess_audio(data)
                
                # Update speech detection state
                if is_speech:
                    if not self.speech_detected:
                        print("\nSpeech detected...")
                    self.speech_detected = True
                    self.last_speech_time = time.time()
                elif self.speech_detected and time.time() - self.last_speech_time > self.silence_timeout:
                    print("\nSilence detected, finalizing...")
                    break
                
                # Process with Vosk
                if self.recognizer.AcceptWaveform(processed_data):
                    result = json.loads(self.recognizer.Result())
                    text = result.get("text", "")
                    if text:
                        print(f"\nRecognized: {text}")
                else:
                    # Show partial results
                    partial = json.loads(self.recognizer.PartialResult())
                    partial_text = partial.get("partial", "")
                    if partial_text:
                        print(f"\rPartial: {partial_text}", end="", flush=True)
        
        except KeyboardInterrupt:
            print("\nStopped by user")
        finally:
            # Get final result
            final_result = json.loads(self.recognizer.FinalResult())
            final_text = final_result.get("text", "")
            if final_text:
                print(f"\nFinal result: {final_text}")
            
            # Clean up
            self.stop_listening()
            return final_text

# Uncomment to test the enhanced recognizer
# if os.path.exists(model_path):
#     enhanced_recognizer = EnhancedRecognizer(model_path)
#     enhanced_recognizer.start_listening(max_seconds=30)

### Step 5: Implement Non-Blocking Recognition

Now, let's implement a non-blocking version of our recognizer using threading:

In [None]:
class NonBlockingRecognizer:
    """Non-blocking speech recognizer using threading"""
    
    def __init__(self, model_path, sample_rate=16000):
        """Initialize the recognizer"""
        self.sample_rate = sample_rate
        self.chunk_size = 1024
        self.model_path = model_path
        
        # Check if model exists
        if not os.path.exists(model_path):
            raise FileNotFoundError(f"Model not found at {model_path}")
        
        # For audio processing
        self.p = None
        self.stream = None
        self.recognizer = None
        
        # Threading components
        self.running = False
        self.audio_queue = queue.Queue()
        self.result_queue = queue.Queue()
        
        # Preprocessing settings
        self.enable_bandpass = True
        self.enable_normalization = True
        
        # Silence detection
        self.silence_threshold = 700
        self.speech_detected = False
        self.last_speech_time = 0
        self.silence_timeout = 2.0
    
    def _audio_callback(self, in_data, frame_count, time_info, status):
        """PyAudio callback for streaming audio data"""
        self.audio_queue.put(in_data)
        return (in_data, pyaudio.paContinue)
    
    def _preprocess_audio(self, audio_bytes):
        """Preprocess audio data"""
        # Convert bytes to numpy array
        audio = np.frombuffer(audio_bytes, dtype=np.int16).astype(np.float32) / 32767.0
        
        # Check audio level for silence detection
        audio_level = np.max(np.abs(audio)) * 32767
        is_speech = audio_level > self.silence_threshold
        
        # Update speech detection state
        if is_speech:
            if not self.speech_detected:
                self.result_queue.put(("status", "Speech detected"))
            self.speech_detected = True
            self.last_speech_time = time.time()
        elif self.speech_detected and time.time() - self.last_speech_time > self.silence_timeout:
            self.speech_detected = False
            self.result_queue.put(("status", "Speech ended"))
        
        # Apply preprocessing
        if self.enable_normalization:
            audio = normalize_audio(audio)
        
        if self.enable_bandpass:
            audio = bandpass_filter(audio, self.sample_rate, 300, 3000)
        
        # Convert back to bytes
        return (audio * 32767).astype(np.int16).tobytes()
    
    def _process_audio(self):
        """Process audio data from queue in a separate thread"""
        while self.running:
            # Get audio data from queue
            if not self.audio_queue.empty():
                audio_data = self.audio_queue.get()
                
                # Preprocess the audio
                processed_data = self._preprocess_audio(audio_data)
                
                # Process with Vosk
                if self.recognizer.AcceptWaveform(processed_data):
                    result = json.loads(self.recognizer.Result())
                    text = result.get("text", "")
                    if text:
                        self.result_queue.put(("final", text))
                else:
                    partial = json.loads(self.recognizer.PartialResult())
                    partial_text = partial.get("partial", "")
                    if partial_text:
                        self.result_queue.put(("partial", partial_text))
            
            # Sleep briefly to prevent CPU hogging
            time.sleep(0.01)
    
    def start(self):
        """Start the non-blocking recognizer"""
        if self.running:
            return
        
        # Set up Vosk
        model = Model(self.model_path)
        self.recognizer = KaldiRecognizer(model, self.sample_rate)
        
        # Set up PyAudio
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=self.sample_rate,
            input=True,
            frames_per_buffer=self.chunk_size,
            stream_callback=self._audio_callback
        )
        
        # Reset states
        self.running = True
        self.speech_detected = False
        self.last_speech_time = time.time()
        
        # Start the processing thread
        self.process_thread = threading.Thread(target=self._process_audio)
        self.process_thread.daemon = True  # Thread will exit when main program exits
        self.process_thread.start()
        
        # Start the audio stream
        self.stream.start_stream()
        
        print("Non-blocking recognizer started")
        return True
    
    def stop(self):
        """Stop the recognizer"""
        if not self.running:
            return
        
        # Set flag to stop processing thread
        self.running = False
        
        # Clean up PyAudio
        if self.stream:
            self.stream.stop_stream()
            self.stream.close()
            self.stream = None
        
        if self.p:
            self.p.terminate()
            self.p = None
        
        # Get final result if any
        if self.recognizer:
            final_result = json.loads(self.recognizer.FinalResult())
            final_text = final_result.get("text", "")
            if final_text:
                self.result_queue.put(("final", final_text))
        
        print("Recognizer stopped")
    
    def get_result(self, block=False, timeout=None):
        """
        Get recognition result if available
        
        Returns tuple of (type, text) where type is 'final', 'partial', or 'status'
        Returns None if no result available
        """
        try:
            return self.result_queue.get(block=block, timeout=timeout)
        except queue.Empty:
            return None

# Example of using the non-blocking recognizer
def demo_non_blocking(run_time=15):
    """Demo of non-blocking recognition"""
    if not os.path.exists(model_path):
        print(f"Model not found at {model_path}")
        return
    
    recognizer = NonBlockingRecognizer(model_path)
    recognizer.start()
    
    print(f"Running for {run_time} seconds. Speak now!")
    
    try:
        start_time = time.time()
        while time.time() - start_time < run_time:
            # Check for recognition results
            result = recognizer.get_result(block=False)
            if result:
                result_type, text = result
                if result_type == "partial":
                    print(f"\rPartial: {text}", end="", flush=True)
                elif result_type == "final":
                    print(f"\nFinal: {text}")
                elif result_type == "status":
                    print(f"\n{text}")
            
            # The main thread could do other things here
            time.sleep(0.1)
    
    except KeyboardInterrupt:
        print("\nStopped by user")
    finally:
        recognizer.stop()

# Uncomment to run the non-blocking demo
# demo_non_blocking(run_time=30)

### Step 6: Implement Command Recognition

Now, let's implement a simple command recognition system using our speech recognizer:

In [None]:
class CommandRecognizer:
    """Recognizer for specific voice commands"""
    
    def __init__(self, model_path, commands=None):
        """
        Initialize the command recognizer
        
        Parameters:
        - model_path: Path to Vosk model
        - commands: Dictionary of commands and their handlers
        """
        self.model_path = model_path
        self.commands = commands or {}
        self.recognizer = NonBlockingRecognizer(model_path)
        self.running = False
    
    def add_command(self, phrase, handler):
        """
        Add a command to recognize
        
        Parameters:
        - phrase: Command phrase to recognize
        - handler: Function to call when command is recognized
        """
        self.commands[phrase.lower()] = handler
        print(f"Added command: '{phrase}'")
    
    def _command_matches(self, text):
        """Check if text matches any command"""
        text = text.lower()
        
        for command, handler in self.commands.items():
            # Check for exact match
            if command == text:
                return handler
            
            # Check for command contained in text
            if command in text:
                return handler
        
        return None
    
    def start_listening(self, max_seconds=60):
        """
        Start listening for commands
        
        Parameters:
        - max_seconds: Maximum seconds to listen
        """
        if not self.commands:
            print("Warning: No commands added. Use add_command() first.")
        
        self.running = True
        self.recognizer.start()
        
        print(f"Listening for commands for up to {max_seconds} seconds...")
        print("Available commands:")
        for command in self.commands.keys():
            print(f"  - {command}")
        
        try:
            start_time = time.time()
            while self.running and time.time() - start_time < max_seconds:
                # Get recognition results
                result = self.recognizer.get_result(block=False)
                if result:
                    result_type, text = result
                    
                    if result_type == "partial":
                        print(f"\rListening: {text}", end="", flush=True)
                    
                    elif result_type == "final":
                        print(f"\nYou said: {text}")
                        
                        # Check for command match
                        handler = self._command_matches(text)
                        if handler:
                            print(f"Command recognized!")
                            handler(text)
                        else:
                            print("No command matched")
                
                time.sleep(0.1)
        
        except KeyboardInterrupt:
            print("\nStopped by user")
        finally:
            self.stop_listening()
    
    def stop_listening(self):
        """Stop listening for commands"""
        self.running = False
        self.recognizer.stop()
        print("Command recognition stopped")

# Example command handlers
def handle_hello(text):
    print("Hello to you too!")

def handle_time(text):
    current_time = time.strftime("%H:%M:%S")
    print(f"The current time is {current_time}")

def handle_weather(text):
    print("I don't have a weather API connected yet, but it's probably nice outside!")

def handle_lights(text):
    if "on" in text.lower():
        print("Turning lights on...")
    elif "off" in text.lower():
        print("Turning lights off...")
    else:
        print("Do you want the lights on or off?")

# Demonstrate command recognition
def demo_command_recognition(run_time=60):
    """Demo of command recognition"""
    if not os.path.exists(model_path):
        print(f"Model not found at {model_path}")
        return
    
    # Create command recognizer
    cmd_recognizer = CommandRecognizer(model_path)
    
    # Add commands
    cmd_recognizer.add_command("hello", handle_hello)
    cmd_recognizer.add_command("what time is it", handle_time)
    cmd_recognizer.add_command("what's the weather", handle_weather)
    cmd_recognizer.add_command("turn on the lights", handle_lights)
    cmd_recognizer.add_command("turn off the lights", handle_lights)
    
    # Start listening
    cmd_recognizer.start_listening(max_seconds=run_time)

# Uncomment to run the command recognition demo
# demo_command_recognition(run_time=30)

### Step 7: Creating a Grammar-Based Recognizer

For better accuracy with a limited set of commands, let's implement a grammar-based recognizer:

In [None]:
class GrammarRecognizer:
    """Recognizer constrained by grammar for better accuracy"""
    
    def __init__(self, model_path, grammar_phrases=None):
        """
        Initialize grammar-based recognizer
        
        Parameters:
        - model_path: Path to Vosk model
        - grammar_phrases: List of phrases to recognize
        """
        self.model_path = model_path
        self.grammar_phrases = grammar_phrases or []
        
        # Initialize components
        self.p = None
        self.stream = None
        self.recognizer = None
        
        # Recognition settings
        self.sample_rate = 16000
        self.chunk_size = 1024
    
    def add_phrase(self, phrase):
        """Add phrase to grammar"""
        self.grammar_phrases.append(phrase.lower())
    
    def start_listening(self, max_seconds=30):
        """Start listening with grammar constraint"""
        if not self.grammar_phrases:
            print("Warning: No phrases added to grammar.")
            return ""
        
        # Create model
        model = Model(self.model_path)
        
        # Create grammar JSON
        grammar = {"grammar": self.grammar_phrases}
        
        # Create recognizer with grammar
        self.recognizer = KaldiRecognizer(
            model, 
            self.sample_rate, 
            json.dumps(grammar)
        )
        
        # Initialize PyAudio
        self.p = pyaudio.PyAudio()
        self.stream = self.p.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=self.sample_rate,
            input=True,
            frames_per_buffer=self.chunk_size
        )
        
        print("Listening with grammar constraint...")
        print("Recognized phrases will be limited to:")
        for phrase in self.grammar_phrases:
            print(f"  - {phrase}")
        
        # Start listening
        recognized_text = ""
        try:
            start_time = time.time()
            while time.time() - start_time < max_seconds:
                data = self.stream.read(self.chunk_size, exception_on_overflow=False)
                
                if self.recognizer.AcceptWaveform(data):
                    result = json.loads(self.recognizer.Result())
                    text = result.get("text", "")
                    if text:
                        print(f"\nRecognized: {text}")
                        recognized_text = text
                        break  # Stop after first recognition
                else:
                    partial = json.loads(self.recognizer.PartialResult())
                    partial_text = partial.get("partial", "")
                    if partial_text:
                        print(f"\rPartial: {partial_text}", end="", flush=True)
        
        except KeyboardInterrupt:
            print("\nStopped by user")
        finally:
            # Clean up
            if self.stream:
                self.stream.stop_stream()
                self.stream.close()
            
            if self.p:
                self.p.terminate()
            
            print("\nGrammar-based recognition completed")
        
        return recognized_text

# Demo of grammar-based recognition
def demo_grammar_recognition():
    """Demo grammar-based recognition"""
    if not os.path.exists(model_path):
        print(f"Model not found at {model_path}")
        return
    
    # Create grammar recognizer
    grammar_rec = GrammarRecognizer(model_path)
    
    # Add phrases
    grammar_rec.add_phrase("turn on the lights")
    grammar_rec.add_phrase("turn off the lights")
    grammar_rec.add_phrase("what time is it")
    grammar_rec.add_phrase("what's the weather")
    grammar_rec.add_phrase("play music")
    grammar_rec.add_phrase("stop music")
    grammar_rec.add_phrase("open the door")
    grammar_rec.add_phrase("close the door")
    
    # Start listening
    recognized = grammar_rec.start_listening(max_seconds=15)
    
    if recognized:
        print(f"\nFinal recognized command: '{recognized}'")
        print("Taking action based on command...")
        
        # Simple command handling
        if "light" in recognized:
            if "on" in recognized:
                print("Turning lights on!")
            else:
                print("Turning lights off!")
        elif "time" in recognized:
            current_time = time.strftime("%H:%M:%S")
            print(f"The current time is {current_time}")
        elif "music" in recognized:
            if "play" in recognized:
                print("Playing your favorite music...")
            else:
                print("Stopping music playback.")
        elif "door" in recognized:
            if "open" in recognized:
                print("Opening the door...")
            else:
                print("Closing the door...")
        elif "weather" in recognized:
            print("The weather is sunny with a chance of Python!")

# Uncomment to run the grammar-based recognition demo
# demo_grammar_recognition()

### Step 8: Put Everything Together - A Simple Voice Assistant

Now, let's bring everything together to create a simple voice assistant:

In [None]:
class SimpleVoiceAssistant:
    """A simple voice assistant using our speech recognition components"""
    
    def __init__(self, model_path, wake_word="hey assistant"):
        """
        Initialize voice assistant
        
        Parameters:
        - model_path: Path to Vosk model
        - wake_word: Wake word to activate assistant
        """
        self.model_path = model_path
        self.wake_word = wake_word.lower()
        
        # Recognition components
        self.listener = NonBlockingRecognizer(model_path)
        self.command_recognizer = GrammarRecognizer(model_path)
        
        # Assistant state
        self.running = False
        self.listening_for_wake_word = True
        self.listening_for_command = False
        
        # Set up command grammar
        self._setup_commands()
    
    def _setup_commands(self):
        """Set up the available commands"""
        commands = [
            "what time is it",
            "what's the weather",
            "tell me a joke",
            "turn on the lights",
            "turn off the lights",
            "play music",
            "stop music",
            "goodbye"
        ]
        
        for command in commands:
            self.command_recognizer.add_phrase(command)
    
    def _handle_command(self, command):
        """Handle recognized command"""
        if not command:
            print("Sorry, I didn't understand that command.")
            return True
        
        command = command.lower()
        
        if "time" in command:
            current_time = time.strftime("%I:%M %p")
            print(f"It's {current_time}")
        
        elif "weather" in command:
            print("I don't have a weather API connected yet, but I'm sure it's lovely outside!")
        
        elif "joke" in command:
            jokes = [
                "Why do programmers prefer dark mode? Because light attracts bugs!",
                "Why did the function go to therapy? It had too many complex issues.",
                "What's a computer's favorite snack? Microchips!",
                "Why did the programmer quit his job? Because he didn't get arrays."
            ]
            print(f"Here's a joke: {np.random.choice(jokes)}")
        
        elif "light" in command:
            if "on" in command:
                print("Turning the lights on!")
            else:
                print("Turning the lights off!")
        
        elif "music" in command:
            if "play" in command:
                print("Playing your favorite music...")
            else:
                print("Stopping music playback.")
        
        elif "goodbye" in command:
            print("Goodbye! Have a great day!")
            return False
        
        return True
    
    def run(self, run_time=120):
        """
        Run the voice assistant
        
        Parameters:
        - run_time: Maximum run time in seconds
        """
        self.running = True
        self.listener.start()
        
        print(f"Voice assistant started! Say '{self.wake_word}' to activate.")
        print(f"Running for up to {run_time} seconds (Ctrl+C to exit)")
        
        try:
            start_time = time.time()
            while self.running and time.time() - start_time < run_time:
                if self.listening_for_wake_word:
                    # Listen for wake word
                    result = self.listener.get_result(block=False)
                    if result:
                        result_type, text = result
                        if result_type == "final":
                            print(f"\nHeard: {text}")
                            
                            # Check for wake word
                            if self.wake_word in text.lower():
                                print(f"Wake word detected! What can I help you with?")
                                
                                # Stop wake word listener and switch to command mode
                                self.listener.stop()
                                self.listening_for_wake_word = False
                                self.listening_for_command = True
                                
                                # Get command
                                command = self.command_recognizer.start_listening(max_seconds=10)
                                
                                # Handle command
                                if self._handle_command(command):
                                    # Restart wake word listener
                                    self.listening_for_wake_word = True
                                    self.listening_for_command = False
                                    self.listener.start()
                                else:
                                    # Command was "goodbye"
                                    self.running = False
                
                time.sleep(0.1)
        
        except KeyboardInterrupt:
            print("\nVoice assistant stopped by user")
        finally:
            if self.listener.running:
                self.listener.stop()
            print("Voice assistant shutdown complete")

# Demo the simple voice assistant
def start_voice_assistant():
    """Start the simple voice assistant"""
    if not os.path.exists(model_path):
        print(f"Model not found at {model_path}")
        return
    
    assistant = SimpleVoiceAssistant(model_path, wake_word="hey assistant")
    assistant.run(run_time=120)

# Uncomment to start the voice assistant
# start_voice_assistant()

## Conclusion

Congratulations! You've built a complete real-time speech recognition system for your voice assistant. In this practice, you've learned how to:

1. Capture and process audio in real-time
2. Apply preprocessing techniques to improve recognition
3. Use threading for non-blocking recognition
4. Implement command recognition
5. Use grammar constraints for better accuracy
6. Create a simple wake-word based voice assistant

In the next module, you'll learn about speech understanding - how to extract meaning and intent from the recognized text.

## Exploring Further

To expand your speech recognition system, you might want to explore:

1. Integrating with a more advanced NLU (Natural Language Understanding) system
2. Adding more sophisticated voice activity detection
3. Implementing adaptive noise cancellation
4. Storing and learning from user interactions
5. Adding multiple wake word options

Feel free to experiment with the code and adapt it to your specific needs!

### Challenge Exercise

As a challenge, try to extend the voice assistant to:

1. Remember user preferences
2. Use environment variables for more natural interactions (e.g., time of day)
3. Integrate with a web API (e.g., for weather information)
4. Add conversational context (remember previous commands)

This will prepare you well for the next modules on Speech Understanding and Integration!