# ffvoice Python Bindings Tutorial

Welcome to the ffvoice tutorial! This notebook will guide you through all the features of the ffvoice Python library.

## What is ffvoice?

ffvoice is a high-performance offline speech recognition library with:
- ‚ö° **3-10x faster** than pure Python solutions
- üîí **100% offline** - no cloud dependencies
- üéôÔ∏è **Complete audio pipeline** - capture, denoise, VAD, transcription
- üêç **Easy Python API** - NumPy arrays and callbacks

## Prerequisites

```bash
pip install ffvoice numpy
```

## 1. Basic Imports

In [None]:
import ffvoice
import numpy as np

print(f"ffvoice version: {ffvoice.__version__}")
print(f"Available components: {[x for x in dir(ffvoice) if not x.startswith('_')]}")

## 2. Whisper Speech Recognition

Let's start with the core feature - speech recognition using Whisper.

In [None]:
# Configure Whisper
config = ffvoice.WhisperConfig()
config.model_type = ffvoice.WhisperModelType.TINY  # Fastest model
config.language = "auto"  # Auto-detect language

print(f"Model: {ffvoice.WhisperASR.get_model_type_name(config.model_type)}")
print(f"Language: {config.language}")

In [None]:
# Initialize ASR (this downloads the model on first run)
asr = ffvoice.WhisperASR(config)
print("Loading model...")
if asr.initialize():
    print("‚úì Model loaded successfully!")
else:
    print(f"‚úó Error: {asr.get_last_error()}")

### Transcribe from NumPy Array

In [None]:
# Create sample audio (1 second of silence)
sample_rate = 48000
audio = np.zeros(sample_rate, dtype=np.int16)

print(f"Audio shape: {audio.shape}")
print(f"Audio dtype: {audio.dtype}")
print(f"Duration: {len(audio)/sample_rate:.1f}s")

In [None]:
# Transcribe (note: silence won't produce meaningful results)
segments = asr.transcribe_buffer(audio)
inference_time = asr.get_last_inference_time_ms()

print(f"Inference time: {inference_time}ms")
print(f"Number of segments: {len(segments)}")

for i, seg in enumerate(segments):
    print(f"\nSegment {i+1}:")
    print(f"  Time: [{seg.start_ms}ms - {seg.end_ms}ms]")
    print(f"  Text: {seg.text}")
    print(f"  Confidence: {seg.confidence:.2f}")

## 3. Noise Reduction with RNNoise

RNNoise is an AI-powered noise suppression system that also provides Voice Activity Detection (VAD).

In [None]:
# Configure RNNoise
rnnoise_config = ffvoice.RNNoiseConfig()
rnnoise_config.enable_vad = True

# Initialize
rnnoise = ffvoice.RNNoise(rnnoise_config)
rnnoise.initialize(sample_rate=48000, channels=1)
print("‚úì RNNoise initialized")

In [None]:
# Create noisy audio (random noise)
noisy_audio = np.random.randint(-1000, 1000, 256, dtype=np.int16)

print(f"Before processing: mean={noisy_audio.mean():.1f}, std={noisy_audio.std():.1f}")

# Process (modifies array in-place)
rnnoise.process(noisy_audio)

print(f"After processing: mean={noisy_audio.mean():.1f}, std={noisy_audio.std():.1f}")
print(f"VAD probability: {rnnoise.get_vad_probability():.2%}")

## 4. Voice Activity Detection

VADSegmenter intelligently segments audio based on voice activity.

In [None]:
# Available sensitivity presets
print("VAD Sensitivity Presets:")
for sensitivity in [ffvoice.VADSensitivity.VERY_SENSITIVE,
                    ffvoice.VADSensitivity.SENSITIVE,
                    ffvoice.VADSensitivity.BALANCED,
                    ffvoice.VADSensitivity.CONSERVATIVE,
                    ffvoice.VADSensitivity.VERY_CONSERVATIVE]:
    print(f"  - {sensitivity}")

In [None]:
# Create VAD with balanced sensitivity
vad_config = ffvoice.VADConfig.from_preset(ffvoice.VADSensitivity.BALANCED)
vad = ffvoice.VADSegmenter(vad_config, sample_rate=48000)

print("‚úì VAD Segmenter initialized")
print(f"  Threshold: {vad.get_current_threshold():.2f}")
print(f"  Buffer size: {vad.get_buffer_size()} samples")

In [None]:
# Process audio frames with callback
segment_count = 0

def on_segment(segment_array):
    """Called when a complete speech segment is detected"""
    global segment_count
    segment_count += 1
    duration_ms = len(segment_array) / 48 # samples to ms at 48kHz
    print(f"Segment {segment_count}: {len(segment_array)} samples ({duration_ms:.0f}ms)")

# Simulate processing multiple frames
for i in range(10):
    frame = np.random.randint(-500, 500, 256, dtype=np.int16)
    vad_prob = 0.8 if i % 3 == 0 else 0.3  # Simulate intermittent speech
    vad.process_frame(frame, vad_prob, on_segment)

# Flush remaining audio
vad.flush(on_segment)

# Print statistics
stats = vad.get_statistics()
print(f"\nStatistics:")
print(f"  Average VAD: {stats['avg_vad_prob']:.2f}")
print(f"  Speech ratio: {stats['speech_ratio']:.1%}")
print(f"  Is in speech: {vad.is_in_speech()}")

## 5. Audio File Writing

Save NumPy arrays to WAV or FLAC files.

In [None]:
# Generate test audio (1-second sine wave at 440Hz)
duration = 1.0
frequency = 440  # Hz (A4 note)
t = np.linspace(0, duration, int(sample_rate * duration))
audio = (np.sin(2 * np.pi * frequency * t) * 32767 * 0.5).astype(np.int16)

print(f"Generated {len(audio)} samples ({duration}s at {sample_rate}Hz)")

In [None]:
# Write WAV file
wav = ffvoice.WAVWriter()
wav.open("/tmp/test.wav", sample_rate, channels=1)
samples_written = wav.write_samples_array(audio)
wav.close()

print(f"‚úì Wrote WAV: {samples_written} samples")
print(f"  File: /tmp/test.wav")
print(f"  Total samples: {wav.total_samples}")

In [None]:
# Write FLAC file (compressed)
flac = ffvoice.FLACWriter()
flac.open("/tmp/test.flac", sample_rate, channels=1, bits_per_sample=16, compression_level=5)
samples_written = flac.write_samples_array(audio)
compression_ratio = flac.get_compression_ratio()
flac.close()

print(f"‚úì Wrote FLAC: {samples_written} samples")
print(f"  File: /tmp/test.flac")
print(f"  Compression: {compression_ratio:.2f}x")
print(f"  Total samples: {flac.total_samples}")

## 6. Audio Device Information

List available audio input devices.

In [None]:
# Initialize PortAudio
ffvoice.AudioCapture.initialize()

# Get devices
devices = ffvoice.AudioCapture.get_devices()

print(f"Found {len(devices)} audio device(s):\n")

for dev in devices:
    default_marker = " [DEFAULT]" if dev.is_default else ""
    print(f"Device {dev.id}: {dev.name}{default_marker}")
    print(f"  Input channels: {dev.max_input_channels}")
    print(f"  Output channels: {dev.max_output_channels}")
    print(f"  Supported sample rates: {dev.supported_sample_rates[:5]}...")
    print()

# Get default device
default_dev = ffvoice.AudioCapture.get_default_input_device()
if default_dev:
    print(f"Default input device: {default_dev.name}")

# Cleanup
ffvoice.AudioCapture.terminate()

## 7. Complete Real-time Pipeline

This example demonstrates the full pipeline:
AudioCapture ‚Üí RNNoise ‚Üí VADSegmenter ‚Üí WhisperASR

**Note:** This requires a microphone and should be run in a local environment, not in a cloud notebook.

In [None]:
# This is a demonstration - won't work without a microphone
# See complete_realtime_pipeline.py for a working example

class RealtimePipeline:
    def __init__(self):
        # Initialize all components
        self.rnnoise = ffvoice.RNNoise(ffvoice.RNNoiseConfig())
        self.rnnoise.initialize(48000, 1)
        
        vad_config = ffvoice.VADConfig.from_preset(ffvoice.VADSensitivity.BALANCED)
        self.vad = ffvoice.VADSegmenter(vad_config, 48000)
        
        config = ffvoice.WhisperConfig()
        config.model_type = ffvoice.WhisperModelType.TINY
        self.asr = ffvoice.WhisperASR(config)
        self.asr.initialize()
        
        ffvoice.AudioCapture.initialize()
        self.capture = ffvoice.AudioCapture()
        self.capture.open(48000, 1, 256)
    
    def on_segment(self, segment_array):
        """Transcribe complete speech segments"""
        segments = self.asr.transcribe_buffer(segment_array)
        for seg in segments:
            print(f"‚Üí {seg.text}")
    
    def on_audio(self, audio_array):
        """Process each audio frame"""
        # 1. Denoise
        self.rnnoise.process(audio_array)
        
        # 2. Get VAD and segment
        vad_prob = self.rnnoise.get_vad_probability()
        self.vad.process_frame(audio_array, vad_prob, self.on_segment)
    
    def run(self, duration=5):
        """Run pipeline for specified duration"""
        import time
        print(f"Recording for {duration} seconds...")
        self.capture.start(self.on_audio)
        time.sleep(duration)
        self.capture.stop()
        self.vad.flush(self.on_segment)
        
        # Cleanup
        self.capture.close()
        ffvoice.AudioCapture.terminate()

# Uncomment to run (requires microphone)
# pipeline = RealtimePipeline()
# pipeline.run(duration=5)

## Summary

You've learned:
1. ‚úÖ Transcribe audio with Whisper ASR
2. ‚úÖ Process NumPy arrays
3. ‚úÖ Reduce noise with RNNoise
4. ‚úÖ Detect voice activity with VAD
5. ‚úÖ Write audio files (WAV/FLAC)
6. ‚úÖ List audio devices
7. ‚úÖ Build complete pipelines

## Next Steps

- Try the [complete_realtime_pipeline.py](../../examples/complete_realtime_pipeline.py) example
- Read the [Quick Start Guide](../QUICKSTART.md)
- Check the [API Reference](../../README.md#api-reference)
- Explore [performance benchmarks](../../README.md#performance)

## Resources

- GitHub: https://github.com/chicogong/ffvoice-engine
- Issues: https://github.com/chicogong/ffvoice-engine/issues
- Documentation: https://github.com/chicogong/ffvoice-engine/tree/master/docs