<a href="https://colab.research.google.com/github/LaansDole/whisperX-FastAPI/blob/main/notebooks/whisperx_fastapi_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WhisperX FastAPI on Google Colab

This notebook sets up and runs the WhisperX FastAPI project on Google Colab, utilizing its GPU for speech-to-text processing. The API service is exposed through a Cloudflare tunnel to allow external access.

## Features

- Speech-to-text transcription
- Audio alignment
- Speaker diarization
- Combined services

## Requirements

- Google Colab with GPU runtime
- Hugging Face token for model access
- Cloudflare account (free tier works fine)

## Setup Instructions

1. Make sure you're running this notebook with GPU runtime
2. Execute each cell in order
3. Use the Cloudflare tunnel URL to access the API

Let's start by checking if we have GPU access and setting up the environment.

In [1]:
# Keep this tab alive to prevent Colab from disconnecting you { display-mode: "form" }

#@markdown Press play on the music player that will appear below:
%%html
<audio src="https://oobabooga.github.io/silence.m4a" controls>

## 1. Install System Dependencies

First, we need to install the required system packages and utilities.

In [None]:
# Install ffmpeg for audio/video processing
!apt-get update && apt-get install -y ffmpeg

# Install git and other utilities
!apt-get install -y git curl wget

## 2. Clone the WhisperX FastAPI Repository

In [2]:
# Clone the repository
!rm -rf whisperX-FastAPI
!git clone https://github.com/pavelzbornik/whisperX-FastAPI.git
!cd whisperX-FastAPI && ls -la

Cloning into 'whisperX-FastAPI'...
remote: Enumerating objects: 1265, done.[K
remote: Counting objects: 100% (454/454), done.[K
remote: Compressing objects: 100% (196/196), done.[K
remote: Total 1265 (delta 316), reused 258 (delta 258), pack-reused 811 (from 2)[K
Receiving objects: 100% (1265/1265), 40.47 MiB | 40.79 MiB/s, done.
Resolving deltas: 100% (682/682), done.
total 96
drwxr-xr-x 9 root root 4096 Jun 24 17:28 .
drwxr-xr-x 1 root root 4096 Jun 24 17:28 ..
drwxr-xr-x 4 root root 4096 Jun 24 17:28 app
drwxr-xr-x 2 root root 4096 Jun 24 17:28 .devcontainer
-rw-r--r-- 1 root root  531 Jun 24 17:28 docker-compose.yml
-rw-r--r-- 1 root root 1627 Jun 24 17:28 dockerfile
-rw-r--r-- 1 root root  331 Jun 24 17:28 .dockerignore
drwxr-xr-x 8 root root 4096 Jun 24 17:28 .git
drwxr-xr-x 3 root root 4096 Jun 24 17:28 .github
-rw-r--r-- 1 root root   52 Jun 24 17:28 .gitignore
-rw-r--r-- 1 root root  207 Jun 24 17:28 .gitleaks.toml
-rw-r--r-- 1 root root 1070 Jun 24 17:28 LICENSE
-rw-r--r-

## 3. Create Python Virtual Environment and Install Dependencies

We'll install PyTorch with CUDA support and all required dependencies.

In [None]:
# Install PyTorch with CUDA support
!cd whisperX-FastAPI && pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

# Install project requirements
!cd whisperX-FastAPI && pip install -r requirements/prod.txt

# Install additional packages for Colab environment
!cd whisperX-FastAPI && pip install colorlog pyngrok python-dotenv

## 4. Set Up Environment Variables

Configure the required environment variables for WhisperX. You'll need to enter your Hugging Face API token to access the models.

In [None]:
import os

# Check if we're already in the whisperX-FastAPI directory
current_dir = os.path.basename(os.getcwd())
if current_dir != "whisperX-FastAPI":
    os.chdir("whisperX-FastAPI")
    print(f"Changed directory to whisperX-FastAPI")
else:
    print("Already in whisperX-FastAPI directory")

# Enter your Hugging Face token here
HF_TOKEN = input("Enter your Hugging Face token: ")

# Choose Whisper model size
WHISPER_MODEL = input("Enter Whisper model size (default: tiny): ") or "tiny"

# Set log level
LOG_LEVEL = "INFO"

# Create .env file
env_content = f"""HF_TOKEN={HF_TOKEN}
WHISPER_MODEL={WHISPER_MODEL}
LOG_LEVEL={LOG_LEVEL}
DEVICE=cuda
COMPUTE_TYPE=float16
DB_URL=sqlite:///records.db
"""

with open(".env", "w") as f:
    f.write(env_content)

print("Environment configuration completed.")

In [13]:
from huggingface_hub import snapshot_download; snapshot_download(repo_id='openai/whisper-base')

Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]

'/root/.cache/huggingface/hub/models--openai--whisper-base/snapshots/e37978b90ca9030d5170a5c07aadb050351a65bb'

## 5. Start the FastAPI Service

In [14]:
import os
import signal
import subprocess
import threading
import time
from google.colab.output import serve_kernel_port_as_iframe

# --- Configuration ---
PORT = 8000
LOG_CONFIG_PATH = "app/uvicorn_log_conf.yaml"
APP_MODULE = "app.main:app"

# --- Global variable to hold the server process ---
server_process = None

def kill_port(port):
    """Kills any process listening on the given port."""
    print(f"Checking for and terminating any process on port {port}...")
    try:
        result = subprocess.run(["lsof", "-ti", f":{port}"], capture_output=True, text=True)
        if result.stdout:
            pids = result.stdout.strip().split('\n')
            for pid in pids:
                try:
                    os.kill(int(pid), signal.SIGKILL)
                    print(f"Killed process {pid} on port {port}.")
                except (ProcessLookupError, ValueError):
                    pass  # Process already gone
    except FileNotFoundError:
        print("`lsof` command not found. Skipping port clearing.")
    except Exception as e:
        print(f"An error occurred while trying to kill port {port}: {e}")

def start_server():
    """Starts the Uvicorn server in a background thread."""
    global server_process

    # Ensure we are in the correct directory
    if os.path.basename(os.getcwd()) != "whisperX-FastAPI":
        os.chdir("whisperX-FastAPI")
        print("Changed directory to whisperX-FastAPI")

    # First, ensure the port is free
    kill_port(PORT)

    # Command to start Uvicorn
    command = [
        "uvicorn",
        APP_MODULE,
        "--host", "0.0.0.0",
        "--port", str(PORT),
        "--log-config", LOG_CONFIG_PATH,
        "--log-level", "info"
    ]

    # Start the server as a background process
    print("Starting FastAPI server...")
    server_process = subprocess.Popen(command)
    print(f"Server process started with PID: {server_process.pid}")

    # Wait a moment for the server to initialize
    time.sleep(12)

    # Expose the port to a public URL
    print(f"Exposing port {PORT} as an iframe...")
    serve_kernel_port_as_iframe(port=PORT, height=800)

def stop_server():
    """Stops the background Uvicorn server."""
    global server_process
    if server_process:
        print(f"Stopping server process with PID: {server_process.pid}...")
        server_process.terminate()
        try:
            # Wait for the process to terminate
            server_process.wait(timeout=10)
            print("Server stopped successfully.")
        except subprocess.TimeoutExpired:
            print("Server did not terminate gracefully. Forcing shutdown...")
            server_process.kill()
            print("Server forced to shut down.")
        server_process = None
    else:
        print("Server is not running.")

# --- Main execution ---
if __name__ == "__main__":
    try:
        start_server()
        # The server is running in the background.
        # The script will keep running, allowing the server to stay active.
        # To stop the server, you would call stop_server() in another cell.
        print("\nServer is running in the background.")
        print("To stop the server, call the stop_server() function.")
        # Keep the main thread alive
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        print("\nKeyboard interrupt received. Shutting down server...")
        stop_server()
        print("Shutdown complete.")


Checking for and terminating any process on port 8000...
Starting FastAPI server...
Server process started with PID: 5240
Exposing port 8000 as an iframe...


<IPython.core.display.Javascript object>


Server is running in the background.
To stop the server, call the stop_server() function.

Keyboard interrupt received. Shutting down server...
Stopping server process with PID: 5240...
Server stopped successfully.
Shutdown complete.


## Cleanup

Finally, clean up any temporary files and free up GPU memory.

In [None]:
# Clean up temporary files
!rm -f cloudflared-linux-amd64.deb

# Free up GPU memory (if any is still in use)
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory cleared.")

print("Cleanup completed. You can now close this notebook or run it again if needed.")