# üé§ Reconnaissance Vocale Hors Ligne - Guide d'Utilisation

## üìã Comment utiliser ce notebook

### √âtape 1Ô∏è‚É£ : Installation et Imports (Cellule 1)
Ex√©cutez cette cellule pour importer les biblioth√®ques n√©cessaires.
Si les packages ne sont pas install√©s, d√©commentez la ligne `!pip install ...` et ex√©cutez-la d'abord.

### √âtape 2Ô∏è‚É£ : Configuration (Cellule 2)
Ex√©cutez cette cellule pour charger les configurations des mod√®les disponibles.

### √âtape 3Ô∏è‚É£ : S√©lection du Moteur (Cellule 3) ‚ö†Ô∏è **IMPORTANT**
**C'est ici que vous choisissez votre moteur de reconnaissance !**
- S√©lectionnez **Sphinx** (int√©gr√©, aucun t√©l√©chargement) ou **Vosk** (meilleure pr√©cision)
- Si vous choisissez Vosk, s√©lectionnez le mod√®le de langue
- Cliquez sur "Apply & Load Model"
- **Attendez que le mod√®le se charge avant de continuer**

### √âtape 4Ô∏è‚É£ : Fonctions Utilitaires (Cellule 4)
Ex√©cutez cette cellule pour charger les fonctions de reconnaissance.

### √âtape 5Ô∏è‚É£ : V√©rifier vos Microphones (Cellule 5)
Optionnel - liste tous vos microphones disponibles.

### √âtape 6Ô∏è‚É£ : Test Rapide (Cellule 6)
D√©commentez `test_single_recognition()` pour tester avec une seule phrase.

### √âtape 7Ô∏è‚É£ : √âcoute Continue (Cellule 7)
D√©commentez `start_continuous_listening()` pour une reconnaissance en continu.
**Appuyez sur le bouton stop (‚ñ†) pour arr√™ter.**

### √âtape 8Ô∏è‚É£ : Reconnaissance depuis un Fichier (Cellule 8)
Pour transcrire un fichier audio existant (WAV, AIFF, FLAC).

### √âtape 9Ô∏è‚É£ : Cr√©er un Enregistrement de Test (Cellule 9)
Pour cr√©er un fichier audio de test.

---

## üöÄ D√©marrage Rapide
1. Ex√©cutez les cellules 1 ‚Üí 2 ‚Üí 3 ‚Üí 4
2. Dans la cellule 3, choisissez votre moteur et cliquez sur "Apply & Load Model"
3. Choisissez une cellule d'utilisation (6, 7, 8, ou 9) et d√©commentez la fonction
4. Ex√©cutez et parlez !

---

# Cell 1: Installation & Imports

Run each cell sequentially. Start by installing dependencies.


In [14]:
!pip install SpeechRecognition pyaudio pydub pocketsphinx vosk ipywidgets

import speech_recognition as sr
import os
from pathlib import Path
import json
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML

print("‚úì Imports successful")
print(f"Speech Recognition version: {sr.__version__}")


‚úì Imports successful
Speech Recognition version: 3.14.4


Could not find platform independent libraries <prefix>

[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


# Cell 2: Configuration Setup

Configure your speech recognition settings


In [15]:
# Available Vosk models
VOSK_MODELS = {
    'english_small': {
        'url': 'https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip',
        'name': 'vosk-model-small-en-us-0.15',
        'size': '40 MB',
        'language': 'English (US)'
    },
    'english_large': {
        'url': 'https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip',
        'name': 'vosk-model-en-us-0.22',
        'size': '1.8 GB',
        'language': 'English (US)'
    },
    'french_small': {
        'url': 'https://alphacephei.com/vosk/models/vosk-model-small-fr-0.22.zip',
        'name': 'vosk-model-small-fr-0.22',
        'size': '41 MB',
        'language': 'French'
    },
    'french': {
        'url': 'https://alphacephei.com/vosk/models/vosk-model-fr-0.22.zip',
        'name': 'vosk-model-fr-0.22',
        'size': '1.5 GB',
        'language': 'French'
    }
}

# Global state
class Config:
    engine = 'sphinx'
    vosk_model = None
    vosk_model_name = 'french_small'
    recognizer = sr.Recognizer()

config = Config()

print("‚úì Configuration loaded")

‚úì Configuration loaded


# Cell 3: Interactive Engine Selection

Select your speech recognition engine


In [25]:
# Create widgets
engine_selector = widgets.RadioButtons(
    options=[
        ('CMU Sphinx (Built-in, no download needed)', 'sphinx'),
        ('Vosk (Better accuracy, requires model download)', 'vosk')
    ],
    value='sphinx',
    description='Engine:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='500px')
)

vosk_model_selector = widgets.Dropdown(
    options=[
        ('üá¨üáß English Small - 40 MB', 'english_small'),
        ('üá¨üáß English Large - 1.8 GB (best accuracy)', 'english_large'),
        ('üá´üá∑ French Small - 41 MB', 'french_small'),
        ('üá´üá∑ French Large - 1.5 GB (best accuracy)', 'french')
    ],
    value='french_small',
    description='Vosk Model:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='400px'),
    disabled=True
)

status_output = widgets.Output()
apply_btn = widgets.Button(
    description='Apply & Load Model',
    button_style='success',
    icon='check',
    layout=widgets.Layout(width='200px')
)

# Widget interactions
def on_engine_change(change):
    vosk_model_selector.disabled = (change['new'] == 'sphinx')

def download_and_extract_model(model_info, model_path):
    """Download and extract Vosk model"""
    import urllib.request
    import zipfile
    import shutil

    zip_filename = f"{model_info['name']}.zip"

    print(f"\nüì• Downloading {model_info['name']}...")
    print(f"   Size: {model_info['size']} - This may take a while...")

    try:
        # Download with progress
        def download_progress(block_num, block_size, total_size):
            downloaded = block_num * block_size
            percent = min(downloaded * 100 / total_size, 100)
            print(f"\r   Progress: {percent:.1f}%", end='')

        urllib.request.urlretrieve(
            model_info['url'],
            zip_filename,
            reporthook=download_progress
        )
        print("\n‚úì Download complete!")

        # Extract
        print(f"\nüì¶ Extracting {zip_filename}...")
        with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
            zip_ref.extractall('.')
        print("‚úì Extraction complete!")

        # Clean up zip file
        os.remove(zip_filename)
        print("‚úì Cleanup complete!")

        return True

    except Exception as e:
        print(f"\n‚ùå Download/extraction failed: {e}")
        if os.path.exists(zip_filename):
            os.remove(zip_filename)
        return False

def on_apply_click(b):
    config.engine = engine_selector.value
    config.vosk_model_name = vosk_model_selector.value

    with status_output:
        clear_output()
        print(f"‚úì Engine set to: {config.engine.upper()}")

        if config.engine == 'vosk':
            model_info = VOSK_MODELS[config.vosk_model_name]
            print(f"\nModel: {model_info['name']}")
            print(f"Language: {model_info['language']}")
            print(f"Size: {model_info['size']}")

            try:
                from vosk import Model
                model_path = Path(model_info['name'])

                if not model_path.exists():
                    print(f"\n‚ö†Ô∏è  Model not found locally!")
                    print(f"\nüì• Do you want to download {model_info['name']} ({model_info['size']})?")

                    # Create download buttons
                    download_btn = widgets.Button(
                        description=f'‚úì Download ({model_info["size"]})',
                        button_style='success',
                        icon='download'
                    )
                    cancel_btn = widgets.Button(
                        description='‚úó Cancel',
                        button_style='danger',
                        icon='times'
                    )

                    button_output = widgets.Output()

                    def on_download(b):
                        with button_output:
                            clear_output()
                            if download_and_extract_model(model_info, model_path):
                                print(f"\n‚è≥ Loading model...")
                                config.vosk_model = Model(str(model_path))
                                print(f"‚úì Model loaded successfully!")
                            else:
                                print("\n‚ùå Failed to download model. Please try manual download:")
                                print(f"   wget {model_info['url']}")
                                print(f"   unzip {model_info['name']}.zip")
                                config.vosk_model = None

                    def on_cancel(b):
                        with button_output:
                            clear_output()
                            print("\n‚ùå Download cancelled.")
                            print(f"\nManual download instructions:")
                            print(f"   wget {model_info['url']}")
                            print(f"   unzip {model_info['name']}.zip")
                            print(f"\nOr download from: {model_info['url']}")
                            config.vosk_model = None

                    download_btn.on_click(on_download)
                    cancel_btn.on_click(on_cancel)

                    display(widgets.HBox([download_btn, cancel_btn]))
                    display(button_output)

                else:
                    print(f"\n‚è≥ Loading model...")
                    config.vosk_model = Model(str(model_path))
                    print(f"‚úì Model loaded successfully!")

            except ImportError:
                print(f"\n‚ö†Ô∏è  Vosk not installed!")
                print(f"Run: !pip install vosk")
                config.vosk_model = None
            except Exception as e:
                print(f"\n‚ùå Error: {e}")
                config.vosk_model = None
        else:
            print("\n‚úì Using CMU Sphinx (built-in)")
            print("No additional download needed!")

engine_selector.observe(on_engine_change, names='value')
apply_btn.on_click(on_apply_click)

# Display
display(HTML("<h3>üé§ Speech Recognition Configuration</h3>"))
display(widgets.VBox([
    engine_selector,
    vosk_model_selector,
    apply_btn,
    status_output
]))

VBox(children=(RadioButtons(description='Engine:', layout=Layout(width='500px'), options=(('CMU Sphinx (Built-‚Ä¶

# Cell 4: Helper Functions

Core recognition functions

In [36]:
def recognize_audio(audio_data):
    """Recognize speech using the configured engine"""

    if config.engine == 'sphinx':
        try:
            return config.recognizer.recognize_sphinx(audio_data)
        except sr.UnknownValueError:
            return None
        except sr.RequestError as e:
            print(f"Sphinx error: {e}")
            return None

    elif config.engine == 'vosk':
        if config.vosk_model is None:
            print("‚ö†Ô∏è  Vosk model not loaded. Please configure and load a model first.")
            return None

        try:
            from vosk import KaldiRecognizer
            import io
            import wave

            # Get raw audio data
            wav_data = audio_data.get_wav_data()

            # Read the WAV data to get actual sample rate
            with io.BytesIO(wav_data) as wav_io:
                with wave.open(wav_io, 'rb') as wf:
                    sample_rate = wf.getframerate()
                    frames = wf.readframes(wf.getnframes())

            # Create recognizer with the correct sample rate
            rec = KaldiRecognizer(config.vosk_model, sample_rate)
            rec.SetWords(True)

            # Process the audio data
            rec.AcceptWaveform(frames)
            result = json.loads(rec.FinalResult())

            text = result.get('text', '').strip()

            # Debug: show what Vosk actually returned
            if not text:
                print(f"Debug: Vosk result = {result}")

            return text if text else None

        except Exception as e:
            print(f"Vosk error: {e}")
            import traceback
            traceback.print_exc()
            return None

    return None

def configure_recognizer():
    """Optimize recognizer settings"""
    config.recognizer.energy_threshold = 4000
    config.recognizer.dynamic_energy_threshold = True
    config.recognizer.dynamic_energy_adjustment_damping = 0.15
    config.recognizer.dynamic_energy_ratio = 1.5
    config.recognizer.pause_threshold = 0.8
    config.recognizer.phrase_threshold = 0.3
    config.recognizer.non_speaking_duration = 0.5

print("‚úì Helper functions loaded")

‚úì Helper functions loaded


# Cell 5: List Available Microphones

Check your available audio input devices


In [18]:
"""
Check your available audio input devices
"""

def list_microphones():
    print("Available microphones:")
    for index, name in enumerate(sr.Microphone.list_microphone_names()):
        print(f"  {index}: {name}")

# Run this to see your microphones
list_microphones()

Available microphones:
  0: Mappeur de sons Microsoft - Input
  1: R√É¬©seau de microphones (Realtek(
  2: Casque (PowerLocus)
  3: Mappeur de sons Microsoft - Output
  4: Casque (PowerLocus)
  5: Haut-parleurs (Realtek(R) Audio
  6: Pilote de capture audio principal
  7: R√É¬©seau de microphones (Realtek(R) Audio)
  8: Casque (PowerLocus)
  9: P√É¬©riph√É¬©rique audio principal
  10: Casque (PowerLocus)
  11: Haut-parleurs (Realtek(R) Audio)
  12: Haut-parleurs (Realtek(R) Audio)
  13: Casque (PowerLocus)
  14: Casque (PowerLocus)
  15: R√É¬©seau de microphones (Realtek(R) Audio)
  16: Casque ()
  17: Speakers (Nahimic Wave Speaker)
  18: R√É¬©seau de microphones (Realtek HD Audio Mic Array input)
  19: Headphones 1 (Realtek HD Audio 2nd output with SST)
  20: Headphones 2 (Realtek HD Audio 2nd output with SST)
  21: Haut-parleur du PC (Realtek HD Audio 2nd output with SST)
  22: Mixage st√É¬©r√É¬©o (Realtek HD Audio Stereo input)
  23: Microphone (Realtek HD Audio Mic input)
  24: Sp

# Cell 6: Single Recognition Test
Test recognition with a single phrase
Run this cell and speak when prompted

In [37]:
def test_single_recognition():
    print(f"Using engine: {config.engine.upper()}")

    if config.engine == 'vosk' and config.vosk_model is None:
        print("‚ö†Ô∏è  Please load a Vosk model first (Cell 3)")
        return

    with sr.Microphone() as source:
        print("\n‚è≥ Adjusting for ambient noise...")
        config.recognizer.adjust_for_ambient_noise(source, duration=1)

        print("üé§ Listening... Speak now!")
        audio = config.recognizer.listen(source, timeout=5, phrase_time_limit=10)

        print("‚è≥ Processing...")
        text = recognize_audio(audio)

        if text:
            print(f"\n‚úì Recognized: '{text}'")
        else:
            print("\n‚ùå Could not understand the audio")

test_single_recognition()

Using engine: VOSK

‚è≥ Adjusting for ambient noise...
üé§ Listening... Speak now!
‚è≥ Processing...

‚úì Recognized: 'test'


# Cell 7: Continuous Listening Mode
Continuous speech recognition
Press the stop button (‚ñ†) to interrupt

In [38]:
def start_continuous_listening():
    print(f"=== Continuous Listening ({config.engine.upper()}) ===")

    if config.engine == 'vosk' and config.vosk_model is None:
        print("‚ö†Ô∏è  Please load a Vosk model first (Cell 3)")
        return

    print("\nüé§ Speak clearly into your microphone")
    print("üõë Press the stop button (‚ñ†) to exit\n")

    configure_recognizer()

    try:
        with sr.Microphone() as source:
            print("‚è≥ Adjusting for ambient noise...")
            config.recognizer.adjust_for_ambient_noise(source, duration=1)
            print("‚úì Ready! Start speaking...\n")

            phrase_count = 0

            while True:
                try:
                    print(f"[Listening for phrase #{phrase_count + 1}...]")
                    audio = config.recognizer.listen(source, timeout=None, phrase_time_limit=10)

                    print("‚è≥ Processing...")
                    text = recognize_audio(audio)

                    if text:
                        phrase_count += 1
                        print(f"‚úì Phrase #{phrase_count}: '{text}'\n")
                    else:
                        print("‚ùå Could not understand audio\n")

                except KeyboardInterrupt:
                    break
                except Exception as e:
                    print(f"‚ùå Error: {e}\n")

    except KeyboardInterrupt:
        pass

    print(f"\nüõë Stopped. Total phrases recognized: {phrase_count}")

start_continuous_listening()

=== Continuous Listening (VOSK) ===

üé§ Speak clearly into your microphone
üõë Press the stop button (‚ñ†) to exit

‚è≥ Adjusting for ambient noise...
‚úì Ready! Start speaking...

[Listening for phrase #1...]
‚è≥ Processing...
‚úì Phrase #1: 'est un test en continu'

[Listening for phrase #2...]
‚è≥ Processing...
‚úì Phrase #2: 'si je tente de parler que me diras-tu'

[Listening for phrase #3...]
‚è≥ Processing...
‚úì Phrase #3: 'peut-√™tre que tu nous diras que j'ai fa√ßon'

[Listening for phrase #4...]
‚è≥ Processing...
‚úì Phrase #4: 'peut-√™tre que tu diras des conneries'

[Listening for phrase #5...]

üõë Stopped. Total phrases recognized: 4


# Cell 8: Recognize from Audio File
Transcribe an existing audio file
Supports: WAV, AIFF, FLAC

In [21]:
def recognize_from_file(audio_file_path):
    if not os.path.exists(audio_file_path):
        print(f"‚ùå File not found: {audio_file_path}")
        return None

    print(f"Using engine: {config.engine.upper()}")

    if config.engine == 'vosk' and config.vosk_model is None:
        print("‚ö†Ô∏è  Please load a Vosk model first (Cell 3)")
        return None

    with sr.AudioFile(audio_file_path) as source:
        print(f"‚è≥ Loading audio from {audio_file_path}...")
        audio = config.recognizer.record(source)

        print("‚è≥ Processing...")
        text = recognize_audio(audio)

        if text:
            print(f"\n‚úì Recognized: '{text}'")
            return text
        else:
            print("\n‚ùå Could not understand the audio")
            return None

# Example usage:
# recognize_from_file("your_audio_file.wav")

# Cell 9: Create Test Recording
Record audio to a file for testing

In [22]:
def create_test_recording(filename="test_recording.wav", duration=5):
    with sr.Microphone() as source:
        print(f"üî¥ Recording for {duration} seconds...")
        config.recognizer.adjust_for_ambient_noise(source)
        audio = config.recognizer.listen(source, timeout=duration, phrase_time_limit=duration)

        with open(filename, "wb") as f:
            f.write(audio.get_wav_data())

        print(f"‚úì Recording saved to {filename}")
        return filename

# Uncomment to run:
# create_test_recording()