# WhisperX FastAPI on Google Colab

This notebook sets up and runs the WhisperX FastAPI project on Google Colab, utilizing its GPU for speech-to-text processing. The API service is exposed through a Cloudflare tunnel to allow external access.

## Features

- Speech-to-text transcription
- Audio alignment
- Speaker diarization
- Combined services

## Requirements

- Google Colab with GPU runtime
- Hugging Face token for model access
- Cloudflare account (free tier works fine)

## Setup Instructions

1. Make sure you're running this notebook with GPU runtime
2. Execute each cell in order
3. Use the Cloudflare tunnel URL to access the API

Let's start by checking if we have GPU access and setting up the environment.

In [None]:
# Check if GPU is available
!nvidia-smi

## 1. Install System Dependencies

First, we need to install the required system packages and utilities.

In [None]:
# Install ffmpeg for audio/video processing
!apt-get update && apt-get install -y ffmpeg

# Install git and other utilities
!apt-get install -y git curl wget

## 2. Clone the WhisperX FastAPI Repository

In [None]:
# Clone the repository
!git clone https://github.com/pavelzbornik/whisperX-FastAPI.git
!cd whisperX-FastAPI && ls -la

## 3. Create Python Virtual Environment and Install Dependencies

We'll install PyTorch with CUDA support and all required dependencies.

In [None]:
# Create and activate a virtual environment
!cd whisperX-FastAPI && python -m venv venv
!cd whisperX-FastAPI && source venv/bin/activate && pip install --upgrade pip

# Install PyTorch with CUDA support
!cd whisperX-FastAPI && source venv/bin/activate && pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

# Install project requirements
!cd whisperX-FastAPI && source venv/bin/activate && pip install -r requirements/dev.txt

# Install additional packages for Colab environment
!cd whisperX-FastAPI && source venv/bin/activate && pip install colorlog pyngrok python-dotenv

## 4. Set Up Environment Variables

Configure the required environment variables for WhisperX. You'll need to enter your Hugging Face API token to access the models.

In [None]:
import os

# Enter your Hugging Face token here
HF_TOKEN = input("Enter your Hugging Face token: ")

# Choose Whisper model size
WHISPER_MODEL = input("Enter Whisper model size (default: tiny): ") or "tiny"

# Set log level
LOG_LEVEL = "INFO"

# Create .env file
env_content = f"""HF_TOKEN={HF_TOKEN}
WHISPER_MODEL={WHISPER_MODEL}
LOG_LEVEL={LOG_LEVEL}
DEVICE=cuda
COMPUTE_TYPE=float16
DB_URL=sqlite:///records.db
"""

with open("whisperX-FastAPI/.env", "w") as f:
    f.write(env_content)

print("Environment configuration completed.")

## 5. Install and Configure Cloudflare Tunnel

We'll use Cloudflare tunnels to expose the FastAPI service to the internet, allowing you to access it from your browser or other clients.

In [None]:
# Download and install cloudflared
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb

print("Cloudflare tunnel client installed.")

## 6. Start the FastAPI Service

Now we'll run the FastAPI application in the background and expose it through the Cloudflare tunnel.

In [None]:
# Function to run FastAPI in the background
import subprocess
import time
import threading
import IPython.display as display
from IPython.display import clear_output

# Start FastAPI server in a separate process
def start_fastapi_server():
    print("Starting FastAPI server...")
    # Command to start the FastAPI application
    os.chdir("whisperX-FastAPI")
    command = "source venv/bin/activate && uvicorn app.main:app --host 0.0.0.0 --port 8000 --log-config app/uvicorn_log_conf.yaml"
    process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    print("FastAPI server started. Waiting for it to initialize...")
    time.sleep(5)  # Give some time for the server to start
    os.chdir("..")
    return process

# Start Cloudflare tunnel in a separate process
def start_cloudflare_tunnel():
    print("Starting Cloudflare tunnel...")
    tunnel_process = subprocess.Popen(
        ["cloudflared", "tunnel", "--url", "http://localhost:8000"],
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        universal_newlines=True
    )
    
    # Extract the tunnel URL from the output
    tunnel_url = None
    for line in tunnel_process.stdout:
        print(line.strip())
        if "https://" in line and "trycloudflare.com" in line:
            tunnel_url = line.strip().split(" ")[-1]
            break
    
    return tunnel_process, tunnel_url

# Define global processes to keep them alive
fastapi_process = None
tunnel_process = None
tunnel_url = None

# Start the services
try:
    fastapi_process = start_fastapi_server()
    time.sleep(2)  # Wait for FastAPI to start
    tunnel_process, tunnel_url = start_cloudflare_tunnel()
    
    # Display the tunnel URL with a clickable link
    if tunnel_url:
        print(f"\n✨ WhisperX FastAPI is now running and accessible at: {tunnel_url}")
        print(f"\n📝 API Documentation: {tunnel_url}/docs")
        display.display(display.HTML(f'<a href="{tunnel_url}/docs" target="_blank">Open API Documentation</a>'))
    else:
        print("\n⚠️ Failed to get tunnel URL. Check the output above for errors.")
except Exception as e:
    print(f"Error starting services: {str(e)}")


## 7. Monitor GPU Usage

You can monitor GPU usage while the service is running to ensure it's properly utilizing the available resources.

In [None]:
# Check GPU usage
!nvidia-smi

## 8. Test the API

Here's an example of how to use the API to transcribe an audio file uploaded to Google Colab.

In [None]:
import requests
from google.colab import files

# Function to upload a file to the WhisperX API
def transcribe_audio(file_path, api_url):
    # API endpoint for speech-to-text
    endpoint = f"{api_url}/speech-to-text"
    
    # Parameters for the request
    params = {
        "model": WHISPER_MODEL,  # Use the same model as configured earlier
        "language": "en"  # Change this to match your audio language
    }
    
    # Create the multipart form data
    with open(file_path, "rb") as audio_file:
        files = {"audio_file": (os.path.basename(file_path), audio_file)}
        response = requests.post(endpoint, params=params, files=files)
    
    # Return the response
    return response.json()

# Upload an audio file
print("Please upload an audio file (supported formats: mp3, wav, m4a, etc.)")
uploaded = files.upload()

# Process the uploaded file
if uploaded:
    file_name = list(uploaded.keys())[0]
    
    # Check if tunnel URL is available
    if tunnel_url:
        print(f"Transcribing {file_name}...")
        result = transcribe_audio(file_name, tunnel_url)
        
        # Display the identifier for checking the task status
        if "identifier" in result:
            print(f"\nTask identifier: {result['identifier']}")
            print(f"\nCheck the task status at: {tunnel_url}/task/{result['identifier']}")
            display.display(display.HTML(f'<a href="{tunnel_url}/task/{result["identifier"]}" target="_blank">Check Task Status</a>'))
        else:
            print("Error:", result)
    else:
        print("Tunnel URL is not available. Make sure the FastAPI service and Cloudflare tunnel are running.")

## 9. Check Task Status

Use this cell to check the status of your transcription task using the task identifier.

In [None]:
# Function to check task status
def check_task_status(task_id, api_url):
    endpoint = f"{api_url}/task/{task_id}"
    response = requests.get(endpoint)
    return response.json()

# Enter task identifier
task_id = input("Enter task identifier: ")

# Check task status
if tunnel_url and task_id:
    status = check_task_status(task_id, tunnel_url)
    print("Task Status:")
    import json
    print(json.dumps(status, indent=2))
else:
    print("Tunnel URL or task ID is not available.")

## 10. Shutdown Services

When you're done, use this cell to properly shut down the services and free up resources.

In [None]:
# Function to shut down services
def shutdown_services():
    global fastapi_process, tunnel_process
    
    print("Shutting down services...")
    
    # Terminate Cloudflare tunnel
    if tunnel_process:
        tunnel_process.terminate()
        tunnel_process.wait()
        print("Cloudflare tunnel stopped.")
    
    # Terminate FastAPI server
    if fastapi_process:
        fastapi_process.terminate()
        fastapi_process.wait()
        print("FastAPI server stopped.")
    
    print("All services have been shut down.")

# Shut down the services
shutdown_services()

## 11. Cleanup

Finally, clean up any temporary files and free up GPU memory.

In [None]:
# Clean up temporary files
!rm -f cloudflared-linux-amd64.deb

# Free up GPU memory (if any is still in use)
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory cleared.")

print("Cleanup completed. You can now close this notebook or run it again if needed.")