# Voxtral.cpp on Google Colab

This notebook runs [Voxtral.cpp](https://github.com/andrijdavid/voxtral.cpp) - a ggml-based C++ implementation of Voxtral Realtime 4B.

The model performs audio inference on 16-bit PCM WAV files at 16kHz (mono).

## 1. Setup and Installation

In [None]:
# Install required dependencies
!apt-get update -qq
!apt-get install -y -qq cmake build-essential ffmpeg git

In [None]:
# Clone the repository with submodules
!git clone --recursive https://github.com/andrijdavid/voxtral.cpp.git
%cd voxtral.cpp

In [None]:
# If submodules weren't initialized, do it manually
!git submodule update --init --recursive

## 2. Build the Project

In [None]:
# Build with CMake
!cmake -B build -DCMAKE_BUILD_TYPE=Release
!cmake --build build -j$(nproc)

## 3. Download the Model

Downloads the Q4_0 quantized GGUF model from Hugging Face.

In [None]:
# Download the pre-converted GGUF model (Q4_0 quantization)
!chmod +x ./tools/download_model.sh
!./tools/download_model.sh Q4_0

## 4. Audio Processing Utilities

In [None]:
import os
import shlex
import subprocess
from IPython.display import Audio, display
from google.colab import files

def convert_audio_to_wav(input_file, output_file="input.wav"):
    """
    Convert audio and display the full FFmpeg log.
    """
    safe_input = shlex.quote(input_file)

    cmd = f"ffmpeg -i {safe_input} -ar 16000 -ac 1 -c:a pcm_s16le {output_file} -y 2>&1"

    print("--- Starting FFmpeg Log ---")
    process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)

    for line in process.stdout:
        print(line.strip())

    process.wait()
    print("--- End of FFmpeg Log ---\n")

    if os.path.exists(output_file) and os.path.getsize(output_file) > 0:
        print(f"✓ Success: {output_file} created.")
        return output_file
    else:
        print(f"✗ Error: {output_file} was not created or is empty.")
        return None

def upload_audio():
    print("Please upload your audio file...")
    uploaded = files.upload()
    if uploaded:
        filename = list(uploaded.keys())[0]
        print(f"✓ Uploaded: {filename}")
        return filename
    return None

## 5. Upload and Convert Your Audio (Max 30s)

In [None]:
# Upload your audio file
audio_file = upload_audio()

if audio_file:
    # Convert to the required format
    wav_file = convert_audio_to_wav(audio_file)

    # Only display if the conversion actually worked
    if wav_file:
        print("\nYour audio:")
        display(Audio(filename=wav_file))
    else:
        print("\nConversion failed. Please check the filename or ffmpeg logs.")

## 6. Run Inference

Process the audio file using Voxtral.cpp.

In [None]:
# Run inference
MODEL_PATH = "models/voxtral/Q4_0.gguf"
AUDIO_PATH = "input.wav"  # or use the wav_file variable
THREADS = 8

!./build/voxtral \
  --model {MODEL_PATH} \
  --audio {AUDIO_PATH} \
  --threads {THREADS}

## 7. Alternative: Use Sample Audio

If you want to test with sample audio files included in the repository:

In [None]:
# List available sample files
!ls -lh samples/*.wav 2>/dev/null || echo "No sample files found"

In [None]:
# Run inference on a sample file (update the path if needed)
SAMPLE_FILE = "samples/8297-275156-0000.wav"  # Change to your sample file

!./build/voxtral \
  --model {MODEL_PATH} \
  --audio {SAMPLE_FILE} \
  --threads {THREADS}