<a href="https://colab.research.google.com/github/LaansDole/whisperX-FastAPI/blob/main/notebooks/whisperx_fastapi_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WhisperX FastAPI on Google Colab

This notebook sets up and runs the WhisperX FastAPI project on Google Colab, utilizing its GPU for speech-to-text processing. The API service is exposed through a Cloudflare tunnel to allow external access.

## Features

- Speech-to-text transcription
- Audio alignment
- Speaker diarization
- Combined services

## Requirements

- Google Colab with GPU runtime
- Hugging Face token for model access
- Cloudflare account (free tier works fine)

## Setup Instructions

1. Make sure you're running this notebook with GPU runtime
2. Execute each cell in order
3. Use the Cloudflare tunnel URL to access the API

Let's start by checking if we have GPU access and setting up the environment.

In [None]:
%%html
<audio src="https://oobabooga.github.io/silence.m4a" controls>

## 1. Install System Dependencies

First, we need to install the required system packages and utilities.

In [3]:
# Install ffmpeg for audio/video processing
!apt-get update && apt-get install -y ffmpeg

# Install git and other utilities
!apt-get install -y git curl wget

!apt update
!apt install libcudnn8 libcudnn8-dev -y

0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (185.125.190.82)] [                                                                               Hit:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
0% [Connecting to security.ubuntu.com (185.125.190.82)] [Connected to cloud.r-p                                                                               Hit:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://pp

## 2. Clone the WhisperX FastAPI Repository

In [4]:
# Clone the repository
!rm -rf whisperX-FastAPI
!git clone https://github.com/LaansDole/whisperX-FastAPI.git
!cd whisperX-FastAPI && ls -la

Cloning into 'whisperX-FastAPI'...
remote: Enumerating objects: 1336, done.[K
remote: Counting objects: 100% (524/524), done.[K
remote: Compressing objects: 100% (244/244), done.[K
remote: Total 1336 (delta 361), reused 317 (delta 278), pack-reused 812 (from 2)[K
Receiving objects: 100% (1336/1336), 40.54 MiB | 44.54 MiB/s, done.
Resolving deltas: 100% (730/730), done.
total 116
drwxr-xr-x 11 root root  4096 Jun 25 14:12 .
drwxr-xr-x  1 root root  4096 Jun 25 14:12 ..
drwxr-xr-x  4 root root  4096 Jun 25 14:12 app
drwxr-xr-x  2 root root  4096 Jun 25 14:12 .devcontainer
-rw-r--r--  1 root root   531 Jun 25 14:12 docker-compose.yml
-rw-r--r--  1 root root  1627 Jun 25 14:12 dockerfile
-rw-r--r--  1 root root   331 Jun 25 14:12 .dockerignore
-rw-r--r--  1 root root   727 Jun 25 14:12 .env.example
drwxr-xr-x  8 root root  4096 Jun 25 14:12 .git
drwxr-xr-x  3 root root  4096 Jun 25 14:12 .github
-rw-r--r--  1 root root   168 Jun 25 14:12 .gitignore
-rw-r--r--  1 root root   207 Jun 25 

## 3. Install Dependencies

We'll install PyTorch with CUDA support and all required dependencies.

In [5]:
"""
Test script to verify PyTorch installation and CUDA availability
"""
import sys

def test_torch_installation():
    try:
        import torch
        print(f"✓ PyTorch installed successfully: {torch.__version__}")

        # Test CUDA availability
        if hasattr(torch, 'cuda'):
            if torch.cuda.is_available():
                print(f"✓ CUDA is available: {torch.cuda.get_device_name(0)}")
                print(f"✓ CUDA version: {torch.version.cuda}")
            else:
                print("⚠ CUDA is not available, will use CPU")
        else:
            print("✗ torch.cuda module not found - PyTorch installation is corrupted")
            return False

        # Test basic tensor operations
        x = torch.randn(3, 3)
        print(f"✓ Basic tensor operations work: {x.shape}")

        return True

    except ImportError as e:
        print(f"✗ Failed to import PyTorch: {e}")
        return False
    except Exception as e:
        print(f"✗ PyTorch test failed: {e}")
        return False

def test_numpy_installation():
    try:
        import numpy as np
        print(f"✓ NumPy installed successfully: {np.__version__}")

        # Test basic operations
        arr = np.array([1, 2, 3])
        print(f"✓ Basic NumPy operations work: {arr.shape}")

        return True

    except ImportError as e:
        print(f"✗ Failed to import NumPy: {e}")
        return False
    except Exception as e:
        print(f"✗ NumPy test failed: {e}")
        return False

if __name__ == "__main__":
    print("Testing PyTorch and NumPy installation...")
    print("=" * 50)

    numpy_ok = test_numpy_installation()
    torch_ok = test_torch_installation()

    print("=" * 50)
    if numpy_ok and torch_ok:
        print("✓ All tests passed! Environment is ready.")
    else:
        print("✗ Some tests failed. Please check the installation.")

Testing PyTorch and NumPy installation...
✓ NumPy installed successfully: 2.0.2
✓ Basic NumPy operations work: (3,)
✓ PyTorch installed successfully: 2.6.0+cu124
✓ CUDA is available: Tesla T4
✓ CUDA version: 12.4
✓ Basic tensor operations work: torch.Size([3, 3])
✓ All tests passed! Environment is ready.


In [None]:
# Install project requirements
!cd whisperX-FastAPI && pip install -r requirements/prod.txt

# Install additional packages for Colab environment
!cd whisperX-FastAPI && pip install colorlog pyngrok python-dotenv

## 4. Set Up Environment Variables

Configure the required environment variables for WhisperX. You'll need to enter your Hugging Face API token to access the models.

### Create Hugging Face API token
1. Go to your Hugging Face token settings page.
2. Select the token you are using.
3. Under the "Token permissions" section, make sure that "Read access to public gated repositories" is enabled **[IMPORTANT]**.
4. Save the changes to your token.



To add your Hugging Face token as a secret in Google Colab:

1.  Go to your Hugging Face settings page: [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
2.  Create a new token or copy an existing one.
3.  In your Google Colab notebook, click on the "🔑 Secrets" tab in the left sidebar.
4.  Click on "Add new secret".
5.  For the "Name" field, enter `HF_TOKEN`.
6.  For the "Value" field, paste your Hugging Face token.
7.  Make sure the "Notebook access" toggle is enabled for this notebook.
8.  Restart your Colab session by going to "Runtime" -> "Restart session".

Once you have followed these steps, the `HF_TOKEN` secret will be available in your notebook and the warning message should disappear after restarting the runtime.

In [34]:
from huggingface_hub import login
login(new_session=False)

In [None]:
import os
from huggingface_hub import snapshot_download

def download_model(model_name, cache_dir=None):
    """
    Downloads a model from the Hugging Face Hub.

    Args:
        model_name (str): The name of the model to download.
        cache_dir (str, optional): The directory to cache the model in. Defaults to None.
    """
    print(f"Downloading model: {model_name}")
    try:
        snapshot_download(
            repo_id=model_name,
            cache_dir=cache_dir,
            token=os.environ.get("HF_TOKEN"),
        )
        print(f"Model '{model_name}' downloaded successfully.")
    except Exception as e:
        print(f"Error downloading model '{model_name}': {e}")

# Directly call the download function with the desired model name
download_model(model_name="pyannote/speaker-diarization-3.1", cache_dir="models/pyannote/speaker-diarization-3.1")

In [None]:
import os

# Check if we're already in the whisperX-FastAPI directory
current_dir = os.path.basename(os.getcwd())
if current_dir != "whisperX-FastAPI":
    os.chdir("whisperX-FastAPI")
    print(f"Changed directory to whisperX-FastAPI")
else:
    print("Already in whisperX-FastAPI directory")

# Enter your Hugging Face token here
HF_TOKEN = input("Enter your Hugging Face token: ")

# Choose Whisper model size
WHISPER_MODEL = input("Enter Whisper model size (default: tiny): ") or "tiny"

# Set log level
LOG_LEVEL = "INFO"

# Create .env file
env_content = f"""HF_TOKEN={HF_TOKEN}
WHISPER_MODEL={WHISPER_MODEL}
LOG_LEVEL={LOG_LEVEL}
DEVICE=cuda
COMPUTE_TYPE=float16
DB_URL=sqlite:///records.db
"""

with open(".env", "w") as f:
    f.write(env_content)

print("Environment configuration completed.")

In [None]:
!cat .env

## 5. Start the FastAPI Service

In [None]:
import os
import signal
import subprocess
import threading
import time
from google.colab.output import serve_kernel_port_as_iframe

# --- Configuration ---
PORT = 8000
LOG_CONFIG_PATH = "app/uvicorn_log_conf.yaml"
APP_MODULE = "app.main:app"

# --- Global variable to hold the server process ---
server_process = None

def kill_port(port):
    """Kills any process listening on the given port."""
    print(f"Checking for and terminating any process on port {port}...")
    try:
        result = subprocess.run(["lsof", "-ti", f":{port}"], capture_output=True, text=True)
        if result.stdout:
            pids = result.stdout.strip().split('\n')
            for pid in pids:
                try:
                    os.kill(int(pid), signal.SIGKILL)
                    print(f"Killed process {pid} on port {port}.")
                except (ProcessLookupError, ValueError):
                    pass  # Process already gone
    except FileNotFoundError:
        print("`lsof` command not found. Skipping port clearing.")
    except Exception as e:
        print(f"An error occurred while trying to kill port {port}: {e}")

def start_server():
    """Starts the Uvicorn server in a background thread."""
    global server_process

    # Ensure we are in the correct directory
    if os.path.basename(os.getcwd()) != "whisperX-FastAPI":
        os.chdir("whisperX-FastAPI")
        print("Changed directory to whisperX-FastAPI")

    # First, ensure the port is free
    kill_port(PORT)

    # Command to start Uvicorn
    command = [
        "uvicorn",
        APP_MODULE,
        "--host", "0.0.0.0",
        "--port", str(PORT),
        "--log-config", LOG_CONFIG_PATH,
        "--log-level", "info"
    ]

    # Start the server as a background process
    print("Starting FastAPI server...")
    server_process = subprocess.Popen(command)
    print(f"Server process started with PID: {server_process.pid}")

    # Wait a moment for the server to initialize
    time.sleep(12)

    # Expose the port to a public URL
    print(f"Exposing port {PORT} as an iframe...")
    serve_kernel_port_as_iframe(port=PORT, height=800)

def stop_server():
    """Stops the background Uvicorn server."""
    global server_process
    if server_process:
        print(f"Stopping server process with PID: {server_process.pid}...")
        server_process.terminate()
        try:
            # Wait for the process to terminate
            server_process.wait(timeout=10)
            print("Server stopped successfully.")
        except subprocess.TimeoutExpired:
            print("Server did not terminate gracefully. Forcing shutdown...")
            server_process.kill()
            print("Server forced to shut down.")
        server_process = None
    else:
        print("Server is not running.")

# --- Main execution ---
if __name__ == "__main__":
    try:
        start_server()
        # The server is running in the background.
        # The script will keep running, allowing the server to stay active.
        # To stop the server, you would call stop_server() in another cell.
        print("\nServer is running in the background.")
        print("To stop the server, call the stop_server() function.")
        # Keep the main thread alive
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        print("\nKeyboard interrupt received. Shutting down server...")
        stop_server()
        print("Shutdown complete.")
