# WhisperX FastAPI on Google Colab

This notebook sets up and runs the WhisperX FastAPI project on Google Colab, utilizing its GPU for speech-to-text processing. The API service is exposed through a Cloudflare tunnel to allow external access.

## Features

- Speech-to-text transcription
- Audio alignment
- Speaker diarization
- Combined services

## Requirements

- Google Colab with GPU runtime
- Hugging Face token for model access
- Cloudflare account (free tier works fine)

## Setup Instructions

1. Make sure you're running this notebook with GPU runtime
2. Execute each cell in order
3. Use the Cloudflare tunnel URL to access the API

Let's start by checking if we have GPU access and setting up the environment.

In [None]:
# Check if GPU is available using multiple methodsimport torchimport subprocessimport platformprint('=== GPU Availability Check ===')# Method 1: PyTorch CUDA checkprint(f'PyTorch version: {torch.__version__}')print(f'CUDA available in PyTorch: {torch.cuda.is_available()}')if torch.cuda.is_available():    print(f'CUDA version: {torch.version.cuda}')    print(f'Number of GPUs: {torch.cuda.device_count()}')    for i in range(torch.cuda.device_count()):        gpu_name = torch.cuda.get_device_name(i)        gpu_memory = torch.cuda.get_device_properties(i).total_memory / 1024**3        print(f'GPU {i}: {gpu_name} ({gpu_memory:.1f} GB)')            # Set device    device = torch.device('cuda')    print(f'Current device: {device}')        # Test GPU with a simple tensor operation    try:        test_tensor = torch.randn(1000, 1000).to(device)        result = torch.matmul(test_tensor, test_tensor)        print('✅ GPU tensor operations working!')        del test_tensor, result        torch.cuda.empty_cache()    except Exception as e:        print(f'❌ GPU tensor operation failed: {e}')else:    print('❌ No CUDA-capable GPU found')    device = torch.device('cpu')    print(f'Falling back to CPU: {device}')# Method 2: Check for nvidia-ml-py (if available)try:    import pynvml    pynvml.nvmlInit()    gpu_count = pynvml.nvmlDeviceGetCount()    print(f'NVML GPU count: {gpu_count}')    for i in range(gpu_count):        handle = pynvml.nvmlDeviceGetHandleByIndex(i)        name = pynvml.nvmlDeviceGetName(handle).decode('utf-8')        memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)        print(f'GPU {i}: {name}')        print(f'  Memory: {memory_info.used/1024**3:.1f}GB / {memory_info.total/1024**3:.1f}GB')except ImportError:    print('Note: pynvml not available for detailed GPU info')except Exception as e:    print(f'NVML error: {e}')# Method 3: System infoprint(f'System: {platform.system()} {platform.release()}')print(f'Python: {platform.python_version()}')# Method 4: Try nvidia-smi if available (fallback)try:    result = subprocess.run(['nvidia-smi', '--query-gpu=name,memory.total,memory.used', '--format=csv,noheader,nounits'],                           capture_output=True, text=True, timeout=10)    if result.returncode == 0:        print('=== nvidia-smi output ===')        lines = result.stdout.strip().split('')        for i, line in enumerate(lines):            if line.strip():                name, total, used = line.split(', ')                print(f'GPU {i}: {name.strip()} ({used}MB / {total}MB used)')    else:        print('nvidia-smi not available or failed')except (subprocess.TimeoutExpired, FileNotFoundError):    print('nvidia-smi command not found or timed out')except Exception as e:    print(f'nvidia-smi error: {e}')print('=== Summary ===')if torch.cuda.is_available():    print('✅ GPU is available and ready for use!')else:    print('⚠️  No GPU available, will use CPU')

## 1. Install System Dependencies

First, we need to install the required system packages and utilities.

In [None]:
# Install ffmpeg for audio/video processing
!apt-get update && apt-get install -y ffmpeg

# Install git and other utilities
!apt-get install -y git curl wget

## 2. Clone the WhisperX FastAPI Repository

In [None]:
# Clone the repository
!git clone https://github.com/pavelzbornik/whisperX-FastAPI.git
!cd whisperX-FastAPI && ls -la

## 3. Create Python Virtual Environment and Install Dependencies

We'll install PyTorch with CUDA support and all required dependencies.

In [None]:
# Create and activate a virtual environment
!cd whisperX-FastAPI && python -m venv venv
!cd whisperX-FastAPI && source venv/bin/activate && pip install --upgrade pip

# Install PyTorch with CUDA support
!cd whisperX-FastAPI && source venv/bin/activate && pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

# Install project requirements
!cd whisperX-FastAPI && source venv/bin/activate && pip install -r requirements/dev.txt

# Install additional packages for Colab environment
!cd whisperX-FastAPI && source venv/bin/activate && pip install colorlog pyngrok python-dotenv

## 4. Set Up Environment Variables

Configure the required environment variables for WhisperX. You'll need to enter your Hugging Face API token to access the models.

In [None]:
import os

# Enter your Hugging Face token here
HF_TOKEN = input("Enter your Hugging Face token: ")

# Choose Whisper model size
WHISPER_MODEL = input("Enter Whisper model size (default: tiny): ") or "tiny"

# Set log level
LOG_LEVEL = "INFO"

# Create .env file
env_content = f"""HF_TOKEN={HF_TOKEN}
WHISPER_MODEL={WHISPER_MODEL}
LOG_LEVEL={LOG_LEVEL}
DEVICE=cuda
COMPUTE_TYPE=float16
DB_URL=sqlite:///records.db
"""

with open("whisperX-FastAPI/.env", "w") as f:
    f.write(env_content)

print("Environment configuration completed.")

## 5. Install and Configure Cloudflare Tunnel

We'll use Cloudflare tunnels to expose the FastAPI service to the internet, allowing you to access it from your browser or other clients.

In [None]:
# Download and install cloudflared
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb

print("Cloudflare tunnel client installed.")

## 6. Start the FastAPI Service

Now we'll run the FastAPI application in the background and expose it through the Cloudflare tunnel.

In [None]:
# Function to run FastAPI in the backgroundimport subprocessimport timeimport threadingimport osimport requestsimport IPython.display as displayfrom IPython.display import clear_output# Start FastAPI server in a separate processdef start_fastapi_server():    print("Starting FastAPI server...")    try:        # Change to the project directory        os.chdir("whisperX-FastAPI")                # Command to start the FastAPI application with virtual environment        command = "bash -c 'source venv/bin/activate && uvicorn app.main:app --host 0.0.0.0 --port 8000 --log-config app/uvicorn_log_conf.yaml'"                # Start the process        process = subprocess.Popen(            command,             shell=True,             stdout=subprocess.PIPE,             stderr=subprocess.PIPE,            universal_newlines=True,            preexec_fn=os.setsid  # Create new process group        )                print("FastAPI server process started. Process ID:", process.pid)        time.sleep(3)  # Give some time for the server to start                # Change back to original directory        os.chdir("..")                return process            except Exception as e:        print(f"Error in start_fastapi_server: {e}")        # Make sure we change back to original directory        try:            os.chdir("..")        except:            pass        return None# Wait for FastAPI HTTP API to be readydef wait_for_fastapi(timeout=60):    print("Waiting for FastAPI to become ready...")    for i in range(timeout):        try:            # Try both root endpoint and docs endpoint            response = requests.get("http://localhost:8000/", timeout=2)            if response.status_code in [200, 404, 422]:  # 422 is expected for root endpoint                print(f"✅ FastAPI is ready! (after {i+1}s)")                return True        except requests.exceptions.RequestException:            pass                print(f"⏳ Waiting for FastAPI to start... {i+1}s")        time.sleep(1)        print("❌ FastAPI did not start within timeout period")    return False# Start the servicesfastapi_process = Nonetry:    print("=== Starting FastAPI Service ===")    fastapi_process = start_fastapi_server()        if fastapi_process:        # Wait for the service to be ready        if wait_for_fastapi():            print("\n✅ FastAPI service is running successfully!")            print("📝 API Documentation: http://localhost:8000/docs")            print("🔗 API Root: http://localhost:8000/")                        # Display clickable links in notebook            display.display(display.HTML('''                <div style="border: 2px solid #4CAF50; padding: 10px; border-radius: 5px; background-color: #f9fff9;">                    <h3>🎉 FastAPI is Ready!</h3>                    <p><strong>API Documentation:</strong> <a href="http://localhost:8000/docs" target="_blank">http://localhost:8000/docs</a></p>                    <p><strong>API Root:</strong> <a href="http://localhost:8000/" target="_blank">http://localhost:8000/</a></p>                    <p><em>Note: These links work if you're running locally. For Colab, you'll need to use the Cloudflare tunnel.</em></p>                </div>            '''))        else:            print("❌ FastAPI failed to start properly")            if fastapi_process:                fastapi_process.terminate()    else:        print("❌ Failed to start FastAPI process")        except Exception as e:    print(f"❌ Error starting FastAPI service: {str(e)}")    if fastapi_process:        try:            fastapi_process.terminate()        except:            pass# Store the process globally so it can be accessed laterglobals()["fastapi_process"] = fastapi_process

## 7. Monitor GPU Usage

You can monitor GPU usage while the service is running to ensure it's properly utilizing the available resources.

In [None]:
# Check GPU usage
!nvidia-smi

## 8. Test the API

Here's an example of how to use the API to transcribe an audio file uploaded to Google Colab.

In [None]:
import requests
from google.colab import files
import os
# Function to upload a file to the WhisperX API
def transcribe_audio(file_path, api_url):
    # API endpoint for speech-to-text
    endpoint = f"{api_url}/speech-to-text"
    
    # Parameters for the request
    params = {
        "model": WHISPER_MODEL,  # Use the same model as configured earlier
        "language": "en"  # Change this to match your audio language
    }
    
    # Create the multipart form data
    with open(file_path, "rb") as audio_file:
        files = {"audio_file": (os.path.basename(file_path), audio_file)}
        response = requests.post(endpoint, params=params, files=files)
    
    # Return the response
    return response.json()

# Upload an audio file
print("Please upload an audio file (supported formats: mp3, wav, m4a, etc.)")
uploaded = files.upload()

# Process the uploaded file
if uploaded:
    file_name = list(uploaded.keys())[0]
    
    # Check if tunnel URL is available
    if tunnel_url:
        print(f"Transcribing {file_name}...")
        result = transcribe_audio(file_name, tunnel_url)
        
        # Display the identifier for checking the task status
        if "identifier" in result:
            print(f"\nTask identifier: {result['identifier']}")
            print(f"\nCheck the task status at: {tunnel_url}/task/{result['identifier']}")
            display.display(display.HTML(f'<a href="{tunnel_url}/task/{result["identifier"]}" target="_blank">Check Task Status</a>'))
        else:
            print("Error:", result)
    else:
        print("Tunnel URL is not available. Make sure the FastAPI service and Cloudflare tunnel are running.")

## 9. Check Task Status

Use this cell to check the status of your transcription task using the task identifier.

In [None]:
# Function to check task status
def check_task_status(task_id, api_url):
    endpoint = f"{api_url}/task/{task_id}"
    response = requests.get(endpoint)
    return response.json()

# Enter task identifier
task_id = input("Enter task identifier: ")

# Check task status
if tunnel_url and task_id:
    status = check_task_status(task_id, tunnel_url)
    print("Task Status:")
    import json
    print(json.dumps(status, indent=2))
else:
    print("Tunnel URL or task ID is not available.")

## 10. Shutdown Services

When you're done, use this cell to properly shut down the services and free up resources.

In [None]:
# Function to shut down services
def shutdown_services():
    global fastapi_process, tunnel_process
    
    print("Shutting down services...")
    
    # Terminate Cloudflare tunnel
    if tunnel_process:
        tunnel_process.terminate()
        tunnel_process.wait()
        print("Cloudflare tunnel stopped.")
    
    # Terminate FastAPI server
    if fastapi_process:
        fastapi_process.terminate()
        fastapi_process.wait()
        print("FastAPI server stopped.")
    
    print("All services have been shut down.")

# Shut down the services
shutdown_services()

## 11. Cleanup

Finally, clean up any temporary files and free up GPU memory.

In [None]:
# Clean up temporary files
!rm -f cloudflared-linux-amd64.deb

# Free up GPU memory (if any is still in use)
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory cleared.")

print("Cleanup completed. You can now close this notebook or run it again if needed.")