# AUDIO PREPROCESSING PRACTICE GUIDE

## EXERCISE GOAL
In this practice exercise, you'll build a complete audio preprocessing pipeline for speech recognition and test how it improves Vosk's accuracy.

## STEP-BY-STEP INSTRUCTIONS

### Step 1: Setup Project and Install Required Libraries

In [None]:
# Install required libraries
!pip install vosk librosa scipy numpy matplotlib noisereduce webrtcvad soundfile

# Verify all libraries installed correctly
import vosk, librosa, scipy, numpy, matplotlib, noisereduce
try:
    import webrtcvad, soundfile
    print('All libraries installed successfully!')
except ImportError as e:
    print(f"Error importing: {e}")

### Step 2: Create a Sample Audio Recording

We'll need an audio file to test our preprocessing. You can either:
1. Record your own audio
2. Use the provided functions to record audio
3. Use an existing audio file

Let's set up a function to record some test audio:

In [None]:
import pyaudio
import wave
import time
import os

def record_test_audio(filename="test_audio.wav", duration=5, sample_rate=16000):
    """
    Record audio for testing preprocessing
    
    Parameters:
    - filename: Output filename
    - duration: Recording duration in seconds
    - sample_rate: Sample rate (16000 recommended for Vosk)
    """
    p = pyaudio.PyAudio()
    stream = p.open(format=pyaudio.paInt16,
                    channels=1,
                    rate=sample_rate,
                    input=True,
                    frames_per_buffer=1024)
    
    print(f"Recording for {duration} seconds...")
    frames = []
    
    for i in range(0, int(sample_rate / 1024 * duration)):
        data = stream.read(1024)
        frames.append(data)
        if i % 10 == 0:
            # Print a dot every 10 frames to show progress
            print(".", end="", flush=True)
            
    print("\nFinished recording!")
    
    stream.stop_stream()
    stream.close()
    p.terminate()
    
    wf = wave.open(filename, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
    wf.setframerate(sample_rate)
    wf.writeframes(b''.join(frames))
    wf.close()
    
    print(f"Audio saved to {filename}")
    return filename

# Record a test audio file (uncomment if you want to record)
# test_file = record_test_audio(duration=5)

# If you already have an audio file, set the path here:
# test_file = "path/to/your/audio.wav"

### Step 3: Create the Audio Preprocessing Module

Now, let's build a comprehensive audio preprocessing module. We'll create a class that implements all the preprocessing techniques we learned about.

In [None]:
import numpy as np
import librosa
import soundfile as sf
import noisereduce as nr
from scipy.signal import butter, lfilter
import matplotlib.pyplot as plt
import webrtcvad
import struct
import os
import json

class AudioPreprocessor:
    """A complete audio preprocessing pipeline for speech recognition."""
    
    def __init__(self, target_sr=16000, normalize_level=-3, 
                 vad_aggressiveness=3, lowcut=300, highcut=3000):
        """
        Initialize the preprocessor
        
        Parameters:
        - target_sr: Target sample rate in Hz
        - normalize_level: Target peak normalization level in dB
        - vad_aggressiveness: VAD aggressiveness (0-3)
        - lowcut: Low cutoff frequency for bandpass filter
        - highcut: High cutoff frequency for bandpass filter
        """
        self.target_sr = target_sr
        self.normalize_level = normalize_level
        self.vad_aggressiveness = vad_aggressiveness
        self.lowcut = lowcut
        self.highcut = highcut
        
        # Check if VAD sample rate is valid
        if self.target_sr not in (8000, 16000, 32000, 48000):
            print(f"Warning: VAD requires sample rate of 8000, 16000, 32000, or 48000 Hz")
            print(f"VAD will be disabled if sample rate is not compatible")
    
    def resample(self, audio, original_sr):
        """Resample audio to target sample rate"""
        if original_sr == self.target_sr:
            return audio
        return librosa.resample(audio, orig_sr=original_sr, target_sr=self.target_sr)
    
    def normalize(self, audio):
        """Normalize audio to target level"""
        # Find the maximum absolute amplitude
        max_amplitude = np.max(np.abs(audio))
        
        # Calculate current peak in dB
        current_dB = 20 * np.log10(max_amplitude) if max_amplitude > 0 else -80
        
        # Calculate the gain needed
        gain_dB = self.normalize_level - current_dB
        gain_linear = 10 ** (gain_dB / 20)
        
        # Apply gain
        normalized = audio * gain_linear
        
        # Ensure no clipping
        if np.max(np.abs(normalized)) > 1.0:
            normalized = normalized / np.max(np.abs(normalized))
        
        return normalized
    
    def reduce_noise(self, audio, stationary=True):
        """Apply noise reduction"""
        return nr.reduce_noise(y=audio, sr=self.target_sr, stationary=stationary)
    
    def detect_speech(self, audio, frame_duration=30):
        """
        Detect speech segments using WebRTC VAD
        Returns list of booleans (True = speech)
        """
        # Initialize VAD
        vad = webrtcvad.Vad(self.vad_aggressiveness)
        
        # WebRTC VAD requires specific sample rates
        if self.target_sr not in (8000, 16000, 32000, 48000):
            return [True] * int(len(audio) / (self.target_sr * frame_duration / 1000))
        
        # Scale to int16 range
        audio_int16 = (audio * 32767).astype(np.int16)
        
        # Calculate frame size
        frame_size = int(self.target_sr * frame_duration / 1000)
        
        # Process frames
        speech_frames = []
        for i in range(0, len(audio_int16) - frame_size, frame_size):
            frame = audio_int16[i:i + frame_size]
            frame_bytes = struct.pack("h" * len(frame), *frame)
            try:
                is_speech = vad.is_speech(frame_bytes, self.target_sr)
                speech_frames.append(is_speech)
            except Exception as e:
                print(f"VAD error: {e}")
                speech_frames.append(True)  # Default to keeping the frame
        
        return speech_frames
    
    def apply_vad(self, audio, speech_frames, frame_duration=30):
        """Keep only detected speech segments"""
        frame_size = int(self.target_sr * frame_duration / 1000)
        speech_only = np.zeros_like(audio)
        
        for i, is_speech in enumerate(speech_frames):
            if is_speech:
                start = i * frame_size
                end = min(start + frame_size, len(audio))
                speech_only[start:end] = audio[start:end]
        
        return speech_only
    
    def apply_bandpass(self, audio, order=5):
        """Apply bandpass filter to focus on speech frequencies"""
        nyquist = 0.5 * self.target_sr
        low = self.lowcut / nyquist
        high = self.highcut / nyquist
        
        # Design filter
        b, a = butter(order, [low, high], btype='band')
        
        # Apply filter
        return lfilter(b, a, audio)
    
    def process(self, audio_path, apply_noise_reduction=True, 
                apply_vad=True, apply_bandpass=True):
        """
        Process audio file with complete pipeline
        
        Parameters:
        - audio_path: Path to audio file
        - apply_noise_reduction: Whether to apply noise reduction
        - apply_vad: Whether to apply voice activity detection
        - apply_bandpass: Whether to apply bandpass filtering
        
        Returns:
        - processed_audio: Processed audio data
        - settings: Dictionary of settings used
        """
        print(f"\nProcessing: {audio_path}")
        
        # Load audio
        audio, original_sr = librosa.load(audio_path, sr=None)
        print(f"Loaded audio: {len(audio)/original_sr:.1f}s, {original_sr}Hz")
        
        # Track original audio for comparison
        original_audio = audio.copy()
        
        # Step 1: Resample
        if original_sr != self.target_sr:
            audio = self.resample(audio, original_sr)
            print(f"Resampled from {original_sr}Hz to {self.target_sr}Hz")
        
        # Step 2: Normalize
        audio = self.normalize(audio)
        print(f"Normalized audio to {self.normalize_level}dB")
        
        # Step 3: Noise reduction
        if apply_noise_reduction:
            audio = self.reduce_noise(audio)
            print("Applied noise reduction")
        
        # Step 4: Voice activity detection
        if apply_vad:
            speech_frames = self.detect_speech(audio)
            speech_percentage = sum(speech_frames) / len(speech_frames) * 100
            print(f"VAD: Speech detected in {speech_percentage:.1f}% of frames")
            audio = self.apply_vad(audio, speech_frames)
        
        # Step 5: Bandpass filter
        if apply_bandpass:
            audio = self.apply_bandpass(audio)
            print(f"Applied bandpass filter: {self.lowcut}-{self.highcut}Hz")
        
        print("Processing complete!")
        
        # Return both processed audio and settings used
        settings = {
            "sample_rate": self.target_sr,
            "normalize_level": self.normalize_level,
            "vad_aggressiveness": self.vad_aggressiveness,
            "bandpass_range": [self.lowcut, self.highcut],
            "steps_applied": {
                "resampling": True,
                "normalization": True,
                "noise_reduction": apply_noise_reduction,
                "vad": apply_vad,
                "bandpass": apply_bandpass
            }
        }
        
        return audio, original_audio, settings
    
    def plot_comparison(self, original, processed, title="Audio Preprocessing Comparison"):
        """Plot original vs processed audio waveforms"""
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
        
        # Time axis
        time_original = np.linspace(0, len(original)/self.target_sr, len(original))
        time_processed = np.linspace(0, len(processed)/self.target_sr, len(processed))
        
        # Plot original audio
        ax1.plot(time_original, original)
        ax1.set_title("Original Audio")
        ax1.set_ylabel("Amplitude")
        ax1.set_xlim(0, len(original)/self.target_sr)
        
        # Plot processed audio
        ax2.plot(time_processed, processed, color='orange')
        ax2.set_title("Processed Audio")
        ax2.set_xlabel("Time (seconds)")
        ax2.set_ylabel("Amplitude")
        ax2.set_xlim(0, len(processed)/self.target_sr)
        
        plt.tight_layout()
        plt.suptitle(title, fontsize=16)
        plt.subplots_adjust(top=0.9)
        plt.show()
        
    def save_audio(self, audio, output_path):
        """Save processed audio to file"""
        sf.write(output_path, audio, self.target_sr)
        print(f"Audio saved to {output_path}")

# Create an instance of the preprocessor
preprocessor = AudioPreprocessor(
    target_sr=16000,         # 16kHz is optimal for Vosk
    normalize_level=-3,      # -3dB peak level
    vad_aggressiveness=2,    # Medium aggressiveness
    lowcut=300,              # Low cutoff frequency for speech
    highcut=3000             # High cutoff frequency for speech
)

### Step 4: Test the Preprocessing Pipeline

Now that we have our processor, let's test it with an audio file:

In [None]:
# Path to your test audio file
test_file = "test_audio.wav"  # Change this if you're using a different file

# Check if file exists
if not os.path.exists(test_file):
    print(f"File {test_file} not found. Please record an audio file first.")
else:
    # Process with all techniques
    processed_audio, original_audio, settings = preprocessor.process(
        test_file,
        apply_noise_reduction=True,
        apply_vad=True,
        apply_bandpass=True
    )
    
    # Visualize the results
    preprocessor.plot_comparison(original_audio, processed_audio, "Full Preprocessing Pipeline")
    
    # Save processed audio
    processed_file = "processed_" + os.path.basename(test_file)
    preprocessor.save_audio(processed_audio, processed_file)

### Step 5: Testing with Vosk

Now, let's test how our preprocessing affects speech recognition accuracy:

In [None]:
from vosk import Model, KaldiRecognizer
import wave
import json

def recognize_with_vosk(audio_file, model_path="models/vosk-model-small-en-us-0.15"):
    """
    Recognize speech in audio file using Vosk
    
    Parameters:
    - audio_file: Path to audio file
    - model_path: Path to Vosk model
    
    Returns:
    - text: Recognized text
    """
    if not os.path.exists(model_path):
        print(f"Model not found at {model_path}. Please download a Vosk model first.")
        return ""
        
    if not os.path.exists(audio_file):
        print(f"Audio file not found: {audio_file}")
        return ""
    
    # Open the audio file
    wf = wave.open(audio_file, "rb")
    
    # Check format
    if wf.getnchannels() != 1 or wf.getsampwidth() != 2:
        print("Audio file must be mono PCM.")
        return ""
    
    # Load model
    model = Model(model_path)
    recognizer = KaldiRecognizer(model, wf.getframerate())
    
    # Process the audio
    text = ""
    while True:
        data = wf.readframes(4000)
        if len(data) == 0:
            break
        if recognizer.AcceptWaveform(data):
            result = json.loads(recognizer.Result())
            text += result.get("text", "") + " "
    
    # Get final result
    final_result = json.loads(recognizer.FinalResult())
    text += final_result.get("text", "")
    
    return text.strip()

# Check if we have a Vosk model
model_path = "models/vosk-model-small-en-us-0.15"
if not os.path.exists(model_path):
    print(f"Model not found at {model_path}. Please download a Vosk model first.")
    print("You can download models from: https://alphacephei.com/vosk/models")
else:
    # Recognize original audio
    print("\n--- Recognition Results ---")
    print("\nOriginal Audio:")
    original_text = recognize_with_vosk(test_file, model_path)
    print(f'"{original_text}"')
    
    # Recognize processed audio
    print("\nProcessed Audio:")
    processed_text = recognize_with_vosk(processed_file, model_path)
    print(f'"{processed_text}"')

### Step 6: Experimentation

Now, let's experiment with different preprocessing settings to see how they affect recognition:

In [None]:
def experiment_with_settings(audio_file, model_path, output_file="experiment_results.txt"):
    """Run experiments with different preprocessing settings"""
    results = []
    
    # Original audio (baseline)
    original_text = recognize_with_vosk(audio_file, model_path)
    results.append({
        "experiment": "Baseline (No preprocessing)",
        "settings": {},
        "recognized_text": original_text
    })
    
    # Experiment 1: Only normalization
    print("\nExperiment 1: Only normalization")
    audio1, _, settings1 = preprocessor.process(
        audio_file, 
        apply_noise_reduction=False,
        apply_vad=False, 
        apply_bandpass=False
    )
    output1 = "exp1_norm_only.wav"
    preprocessor.save_audio(audio1, output1)
    text1 = recognize_with_vosk(output1, model_path)
    results.append({
        "experiment": "Experiment 1: Only normalization",
        "settings": settings1,
        "recognized_text": text1
    })
    
    # Experiment 2: Normalization + Noise Reduction
    print("\nExperiment 2: Normalization + Noise Reduction")
    audio2, _, settings2 = preprocessor.process(
        audio_file, 
        apply_noise_reduction=True,
        apply_vad=False, 
        apply_bandpass=False
    )
    output2 = "exp2_norm_noise.wav"
    preprocessor.save_audio(audio2, output2)
    text2 = recognize_with_vosk(output2, model_path)
    results.append({
        "experiment": "Experiment 2: Normalization + Noise Reduction",
        "settings": settings2,
        "recognized_text": text2
    })
    
    # Experiment 3: Full Pipeline
    print("\nExperiment 3: Full Pipeline")
    audio3, _, settings3 = preprocessor.process(
        audio_file, 
        apply_noise_reduction=True,
        apply_vad=True, 
        apply_bandpass=True
    )
    output3 = "exp3_full.wav"
    preprocessor.save_audio(audio3, output3)
    text3 = recognize_with_vosk(output3, model_path)
    results.append({
        "experiment": "Experiment 3: Full Pipeline",
        "settings": settings3,
        "recognized_text": text3
    })
    
    # Custom experiment: Noise reduction + bandpass (no VAD)
    print("\nExperiment 4: Noise reduction + Bandpass")
    audio4, _, settings4 = preprocessor.process(
        audio_file, 
        apply_noise_reduction=True,
        apply_vad=False, 
        apply_bandpass=True
    )
    output4 = "exp4_noise_bandpass.wav"
    preprocessor.save_audio(audio4, output4)
    text4 = recognize_with_vosk(output4, model_path)
    results.append({
        "experiment": "Experiment 4: Noise reduction + Bandpass",
        "settings": settings4,
        "recognized_text": text4
    })
    
    # Save experiment results
    with open(output_file, "w") as f:
        for res in results:
            f.write(f"=== {res['experiment']} ===\n")
            f.write(f"Recognized text: \"{res['recognized_text']}\"\n\n")
    
    print(f"\nExperiment results saved to {output_file}")
    return results

# Run experiments if Vosk model exists
if os.path.exists(model_path) and os.path.exists(test_file):
    experiment_results = experiment_with_settings(test_file, model_path)
    
    # Display results comparison
    print("\n=== EXPERIMENT RESULTS COMPARISON ===")
    for res in experiment_results:
        print(f"\n{res['experiment']}:")
        print(f'"{res["recognized_text"]}"')

### Step 7: Create a Reusable Module

Now, let's create a reusable module that you can import into your voice assistant project:

In [None]:
# Save the AudioPreprocessor class as a Python module
module_code = '''"""
audio_preprocessor.py - A comprehensive audio preprocessing module for speech recognition
"""

import numpy as np
import librosa
import soundfile as sf
import noisereduce as nr
from scipy.signal import butter, lfilter
import matplotlib.pyplot as plt
import webrtcvad
import struct
import os

class AudioPreprocessor:
    """A complete audio preprocessing pipeline for speech recognition."""
    
    def __init__(self, target_sr=16000, normalize_level=-3, 
                 vad_aggressiveness=3, lowcut=300, highcut=3000):
        """
        Initialize the preprocessor
        
        Parameters:
        - target_sr: Target sample rate in Hz
        - normalize_level: Target peak normalization level in dB
        - vad_aggressiveness: VAD aggressiveness (0-3)
        - lowcut: Low cutoff frequency for bandpass filter
        - highcut: High cutoff frequency for bandpass filter
        """
        self.target_sr = target_sr
        self.normalize_level = normalize_level
        self.vad_aggressiveness = vad_aggressiveness
        self.lowcut = lowcut
        self.highcut = highcut
        
        # Check if VAD sample rate is valid
        if self.target_sr not in (8000, 16000, 32000, 48000):
            print(f"Warning: VAD requires sample rate of 8000, 16000, 32000, or 48000 Hz")
            print(f"VAD will be disabled if sample rate is not compatible")
    
    def resample(self, audio, original_sr):
        """Resample audio to target sample rate"""
        if original_sr == self.target_sr:
            return audio
        return librosa.resample(audio, orig_sr=original_sr, target_sr=self.target_sr)
    
    def normalize(self, audio):
        """Normalize audio to target level"""
        # Find the maximum absolute amplitude
        max_amplitude = np.max(np.abs(audio))
        
        # Calculate current peak in dB
        current_dB = 20 * np.log10(max_amplitude) if max_amplitude > 0 else -80
        
        # Calculate the gain needed
        gain_dB = self.normalize_level - current_dB
        gain_linear = 10 ** (gain_dB / 20)
        
        # Apply gain
        normalized = audio * gain_linear
        
        # Ensure no clipping
        if np.max(np.abs(normalized)) > 1.0:
            normalized = normalized / np.max(np.abs(normalized))
        
        return normalized
    
    def reduce_noise(self, audio, stationary=True):
        """Apply noise reduction"""
        return nr.reduce_noise(y=audio, sr=self.target_sr, stationary=stationary)
    
    def detect_speech(self, audio, frame_duration=30):
        """
        Detect speech segments using WebRTC VAD
        Returns list of booleans (True = speech)
        """
        # Initialize VAD
        vad = webrtcvad.Vad(self.vad_aggressiveness)
        
        # WebRTC VAD requires specific sample rates
        if self.target_sr not in (8000, 16000, 32000, 48000):
            return [True] * int(len(audio) / (self.target_sr * frame_duration / 1000))
        
        # Scale to int16 range
        audio_int16 = (audio * 32767).astype(np.int16)
        
        # Calculate frame size
        frame_size = int(self.target_sr * frame_duration / 1000)
        
        # Process frames
        speech_frames = []
        for i in range(0, len(audio_int16) - frame_size, frame_size):
            frame = audio_int16[i:i + frame_size]
            frame_bytes = struct.pack("h" * len(frame), *frame)
            try:
                is_speech = vad.is_speech(frame_bytes, self.target_sr)
                speech_frames.append(is_speech)
            except Exception as e:
                print(f"VAD error: {e}")
                speech_frames.append(True)  # Default to keeping the frame
        
        return speech_frames
    
    def apply_vad(self, audio, speech_frames, frame_duration=30):
        """Keep only detected speech segments"""
        frame_size = int(self.target_sr * frame_duration / 1000)
        speech_only = np.zeros_like(audio)
        
        for i, is_speech in enumerate(speech_frames):
            if is_speech:
                start = i * frame_size
                end = min(start + frame_size, len(audio))
                speech_only[start:end] = audio[start:end]
        
        return speech_only
    
    def apply_bandpass(self, audio, order=5):
        """Apply bandpass filter to focus on speech frequencies"""
        nyquist = 0.5 * self.target_sr
        low = self.lowcut / nyquist
        high = self.highcut / nyquist
        
        # Design filter
        b, a = butter(order, [low, high], btype='band')
        
        # Apply filter
        return lfilter(b, a, audio)
    
    def process(self, audio_path, apply_noise_reduction=True, 
                apply_vad=True, apply_bandpass=True):
        """
        Process audio file with complete pipeline
        
        Parameters:
        - audio_path: Path to audio file
        - apply_noise_reduction: Whether to apply noise reduction
        - apply_vad: Whether to apply voice activity detection
        - apply_bandpass: Whether to apply bandpass filtering
        
        Returns:
        - processed_audio: Processed audio data
        - original_audio: Original audio data
        - settings: Dictionary of settings used
        """
        print(f"Processing: {audio_path}")
        
        # Load audio
        audio, original_sr = librosa.load(audio_path, sr=None)
        print(f"Loaded audio: {len(audio)/original_sr:.1f}s, {original_sr}Hz")
        
        # Track original audio for comparison
        original_audio = audio.copy()
        
        # Step 1: Resample
        if original_sr != self.target_sr:
            audio = self.resample(audio, original_sr)
            print(f"Resampled from {original_sr}Hz to {self.target_sr}Hz")
        
        # Step 2: Normalize
        audio = self.normalize(audio)
        print(f"Normalized audio to {self.normalize_level}dB")
        
        # Step 3: Noise reduction
        if apply_noise_reduction:
            audio = self.reduce_noise(audio)
            print("Applied noise reduction")
        
        # Step 4: Voice activity detection
        if apply_vad:
            speech_frames = self.detect_speech(audio)
            speech_percentage = sum(speech_frames) / len(speech_frames) * 100
            print(f"VAD: Speech detected in {speech_percentage:.1f}% of frames")
            audio = self.apply_vad(audio, speech_frames)
        
        # Step 5: Bandpass filter
        if apply_bandpass:
            audio = self.apply_bandpass(audio)
            print(f"Applied bandpass filter: {self.lowcut}-{self.highcut}Hz")
        
        print("Processing complete!")
        
        # Return both processed audio and settings used
        settings = {
            "sample_rate": self.target_sr,
            "normalize_level": self.normalize_level,
            "vad_aggressiveness": self.vad_aggressiveness,
            "bandpass_range": [self.lowcut, self.highcut],
            "steps_applied": {
                "resampling": True,
                "normalization": True,
                "noise_reduction": apply_noise_reduction,
                "vad": apply_vad,
                "bandpass": apply_bandpass
            }
        }
        
        return audio, original_audio, settings
    
    def process_stream(self, audio_data, sr, apply_noise_reduction=True, 
                       apply_vad=False, apply_bandpass=True):
        """
        Process audio data from a stream (e.g., microphone)
        
        Parameters:
        - audio_data: Numpy array of audio data
        - sr: Sample rate of the audio data
        - apply_noise_reduction: Whether to apply noise reduction
        - apply_vad: Whether to apply voice activity detection
        - apply_bandpass: Whether to apply bandpass filtering
        
        Returns:
        - processed_audio: Processed audio data
        """
        # Copy original audio
        audio = audio_data.copy()
        
        # Step 1: Resample if needed
        if sr != self.target_sr:
            audio = self.resample(audio, sr)
        
        # Step 2: Normalize
        audio = self.normalize(audio)
        
        # Step 3: Noise reduction
        if apply_noise_reduction:
            audio = self.reduce_noise(audio)
        
        # Step 4: Voice activity detection
        if apply_vad:
            speech_frames = self.detect_speech(audio)
            audio = self.apply_vad(audio, speech_frames)
        
        # Step 5: Bandpass filter
        if apply_bandpass:
            audio = self.apply_bandpass(audio)
        
        return audio
    
    def plot_comparison(self, original, processed, title="Audio Preprocessing Comparison"):
        """Plot original vs processed audio waveforms"""
        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
        
        # Time axis
        time_original = np.linspace(0, len(original)/self.target_sr, len(original))
        time_processed = np.linspace(0, len(processed)/self.target_sr, len(processed))
        
        # Plot original audio
        ax1.plot(time_original, original)
        ax1.set_title("Original Audio")
        ax1.set_ylabel("Amplitude")
        ax1.set_xlim(0, len(original)/self.target_sr)
        
        # Plot processed audio
        ax2.plot(time_processed, processed, color='orange')
        ax2.set_title("Processed Audio")
        ax2.set_xlabel("Time (seconds)")
        ax2.set_ylabel("Amplitude")
        ax2.set_xlim(0, len(processed)/self.target_sr)
        
        plt.tight_layout()
        plt.suptitle(title, fontsize=16)
        plt.subplots_adjust(top=0.9)
        plt.show()
        
    def save_audio(self, audio, output_path):
        """Save processed audio to file"""
        sf.write(output_path, audio, self.target_sr)
        print(f"Audio saved to {output_path}")

# Example usage
if __name__ == "__main__":
    # Create preprocessor
    preprocessor = AudioPreprocessor(
        target_sr=16000,
        normalize_level=-3,
        vad_aggressiveness=2,
        lowcut=300,
        highcut=3000
    )
    
    # Process a file if provided as an argument
    import sys
    if len(sys.argv) > 1:
        input_file = sys.argv[1]
        output_file = "processed_" + os.path.basename(input_file)
        
        processed, original, _ = preprocessor.process(input_file)
        preprocessor.save_audio(processed, output_file)
        
        # If matplotlib is available, show a comparison
        try:
            preprocessor.plot_comparison(original, processed)
        except:
            pass
'''

# Write the module to a file
with open('audio_preprocessor.py', 'w') as f:
    f.write(module_code)

print("Created audio_preprocessor.py module!")
print("You can now import this module in your projects:")
print("from audio_preprocessor import AudioPreprocessor")

### Step 8: Integration with Real-time Audio

In a real voice assistant application, you'll often need to process audio in real-time. Let's create a simple example of real-time audio preprocessing:

In [None]:
def create_realtime_example():
    """Create a real-time audio processing example file"""
    
    code = '''"""
real_time_example.py - Example of real-time audio preprocessing for speech recognition
"""

import pyaudio
import numpy as np
import time
from audio_preprocessor import AudioPreprocessor
from vosk import Model, KaldiRecognizer
import json
import os

# Constants
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 5

# Initialize the preprocessor
preprocessor = AudioPreprocessor(
    target_sr=16000,
    normalize_level=-3,
    vad_aggressiveness=1,  # Lower for real-time
    lowcut=300,
    highcut=3000
)

# Initialize Vosk recognizer
def init_recognizer(model_path):
    if not os.path.exists(model_path):
        print(f"Model not found at {model_path}")
        return None
    
    model = Model(model_path)
    recognizer = KaldiRecognizer(model, RATE)
    return recognizer

def process_audio_stream():
    # Initialize PyAudio
    p = pyaudio.PyAudio()
    
    # Set up the audio stream
    stream = p.open(format=FORMAT,
                   channels=CHANNELS,
                   rate=RATE,
                   input=True,
                   frames_per_buffer=CHUNK)
    
    # Initialize Vosk
    model_path = "models/vosk-model-small-en-us-0.15"  # Change this if needed
    recognizer = init_recognizer(model_path)
    
    if not recognizer:
        print("Failed to initialize speech recognition. Exiting.")
        return
    
    print("Ready to process audio! Say something...")
    
    try:
        while True:
            # Collect audio for processing
            frames = []
            for _ in range(0, int(RATE / CHUNK * 2)):  # 2 seconds of audio
                data = stream.read(CHUNK, exception_on_overflow=False)
                frames.append(data)
            
            # Convert to numpy array
            audio_data = np.frombuffer(b''.join(frames), dtype=np.int16)
            audio_data = audio_data.astype(np.float32) / 32767.0  # Convert to float
            
            print("\\nProcessing audio chunk...")
            
            # Apply preprocessing
            start_time = time.time()
            processed_audio = preprocessor.process_stream(
                audio_data, 
                RATE, 
                apply_noise_reduction=True,
                apply_vad=False,  # VAD can be tricky in real-time
                apply_bandpass=True
            )
            
            # Convert back to int16
            processed_int16 = (processed_audio * 32767).astype(np.int16)
            processed_bytes = processed_int16.tobytes()
            
            # Feed to recognizer
            if recognizer.AcceptWaveform(processed_bytes):
                result = json.loads(recognizer.Result())
                text = result.get("text", "")
                if text:
                    print(f"Recognized: {text}")
            else:
                partial = json.loads(recognizer.PartialResult())
                partial_text = partial.get("partial", "")
                if partial_text:
                    print(f"Partial: {partial_text}")
            
            proc_time = time.time() - start_time
            print(f"Processing time: {proc_time:.3f}s")
    
    except KeyboardInterrupt:
        print("\\nStopping...")
    finally:
        # Clean up
        stream.stop_stream()
        stream.close()
        p.terminate()
        print("Audio processing stopped")

if __name__ == "__main__":
    process_audio_stream()
'''
    
    with open('real_time_example.py', 'w') as f:
        f.write(code)
    
    print("Created real_time_example.py!")
    print("You can run this example to test real-time audio preprocessing.")

create_realtime_example()

## Conclusion

In this practice exercise, you've built and tested a complete audio preprocessing pipeline for speech recognition. You've learned how to:

1. **Resample audio** to the proper sample rate for Vosk
2. **Normalize audio** to consistent volume levels
3. **Reduce background noise** in recordings
4. **Apply Voice Activity Detection** to identify speech segments
5. **Filter audio** to focus on speech frequencies
6. **Integrate preprocessing** with Vosk for improved recognition

You now have a reusable `AudioPreprocessor` class that you can integrate into your voice assistant project. In the next module, we'll build on these skills to create a more robust speech-to-text system that can handle continuous real-time audio.

## Next Steps

To further improve your preprocessing skills:

1. Experiment with different VAD aggressiveness settings
2. Try different filter bandwidths
3. Test with various noise environments
4. Compare stationary vs. non-stationary noise reduction
5. Integrate with your own voice assistant project