Wordcab Transcribe

💬 Speech recognition is now a commodity

FastAPI based API for transcribing audio files using faster-whisper and Auto-Tuning-Spectral-Clustering for diarization (based on this GitHub implementation).

Important

If you want to see the great performance of Wordcab-Transcribe compared to all the available ASR tools on the market, please check out our benchmark project: Rate that ASR.

Key features

⚡ Fast: The faster-whisper library and CTranslate2 make audio processing incredibly fast compared to other implementations.
🐳 Easy to deploy: You can deploy the project on your workstation or in the cloud using Docker.
🔥 Batch requests: You can transcribe multiple audio files at once because batch requests are implemented in the API.
💸 Cost-effective: As an open-source solution, you won't have to pay for costly ASR platforms.
🫶 Easy-to-use API: With just a few lines of code, you can use the API to transcribe audio files or even YouTube videos.
🤗 MIT License: You can use the project for commercial purposes without any restrictions.

Requirements

Local development

Linux (tested on Ubuntu Server 20.04/22.04)
Python >=3.8, <3.12
Hatch
FFmpeg

Run the API locally 🚀

hatch run runtime:launch

Deployment

Docker (optional for deployment)
NVIDIA GPU + NVIDIA Container Toolkit (optional for deployment)

Run the API using Docker

Build the image.

docker build -t wordcab-transcribe:latest .

Run the container.

docker run -d --name wordcab-transcribe \
    --gpus all \
    --shm-size 1g \
    --restart unless-stopped \
    -p 5001:5001 \
    -v ~/.cache:/root/.cache \
    wordcab-transcribe:latest

You can mount a volume to the container to load local whisper models.

If you mount a volume, you need to update the WHISPER_MODEL environment variable in the .env file.

docker run -d --name wordcab-transcribe \
    --gpus all \
    --shm-size 1g \
    --restart unless-stopped \
    -p 5001:5001 \
    -v ~/.cache:/root/.cache \
    -v /path/to/whisper/models:/app/whisper/models \
    wordcab-transcribe:latest

You can simply enter the container using the following command:

docker exec -it wordcab-transcribe /bin/bash

This is useful to check everything is working as expected.

Run the API behind a reverse proxy

You can run the API behind a reverse proxy like Nginx. We have included a nginx.conf file to help you get started.

# Create a docker network and connect the api container to it
docker network create transcribe
docker network connect transcribe wordcab-transcribe

# Replace /absolute/path/to/nginx.conf with the absolute path to the nginx.conf
# file on your machine (e.g. /home/user/wordcab-transcribe/nginx.conf).
docker run -d \
    --name nginx \
    --network transcribe \
    -p 80:80 \
    -v /absolute/path/to/nginx.conf:/etc/nginx/nginx.conf:ro \
    nginx

# Check everything is working as expected
docker logs nginx

⏱️ Profile the API

You can profile the process executions using py-spy as a profiler.

# Launch the container with the cap-add=SYS_PTRACE option
docker run -d --name wordcab-transcribe \
    --gpus all \
    --shm-size 1g \
    --restart unless-stopped \
    --cap-add=SYS_PTRACE \
    -p 5001:5001 \
    -v ~/.cache:/root/.cache \
    wordcab-transcribe:latest

# Enter the container
docker exec -it wordcab-transcribe /bin/bash

# Install py-spy
pip install py-spy

# Find the PID of the process to profile
top  # 28 for example

# Run the profiler
py-spy record --pid 28 --format speedscope -o profile.speedscope.json

# Launch any task on the API to generate some profiling data

# Exit the container and copy the generated file to your local machine
exit
docker cp wordcab-transcribe:/app/profile.speedscope.json profile.speedscope.json

# Go to https://www.speedscope.app/ and upload the file to visualize the profile

Test the API

Once the container is running, you can test the API.

The API documentation is available at http://localhost:5001/docs.

Audio file:

import json
import requests

filepath = "/path/to/audio/file.wav"  # or any other convertible format by ffmpeg
data = {
  "num_speakers": -1,  # # Leave at -1 to guess the number of speakers
  "diarization": True,  # Longer processing time but speaker segment attribution
  "multi_channel": False,  # Only for stereo audio files with one speaker per channel
  "source_lang": "en",  # optional, default is "en"
  "timestamps": "s",  # optional, default is "s". Can be "s", "ms" or "hms".
  "word_timestamps": False,  # optional, default is False
}

with open(filepath, "rb") as f:
    files = {"file": f}
    response = requests.post(
        "http://localhost:5001/api/v1/audio",
        files=files,
        data=data,
    )

r_json = response.json()

filename = filepath.split(".")[0]
with open(f"{filename}.json", "w", encoding="utf-8") as f:
  json.dump(r_json, f, indent=4, ensure_ascii=False)

YouTube video:

import json
import requests

headers = {"accept": "application/json", "Content-Type": "application/json"}
params = {"url": "https://youtu.be/JZ696sbfPHs"}
data = {
  "diarization": True,  # Longer processing time but speaker segment attribution
  "source_lang": "en",  # optional, default is "en"
  "timestamps": "s",  # optional, default is "s". Can be "s", "ms" or "hms".
  "word_timestamps": False,  # optional, default is False
}

response = requests.post(
  "http://localhost:5001/api/v1/youtube",
  headers=headers,
  params=params,
  data=json.dumps(data),
)

r_json = response.json()

with open("youtube_video_output.json", "w", encoding="utf-8") as f:
  json.dump(r_json, f, indent=4, ensure_ascii=False)

Running Local Models

You can link a local folder path to use a custom model. If you do so, you should mount the folder in the docker run command as a volume, or include the model directory in your Dockerfile to bake it into the image.

Note that for the default tensorrt-llm whisper engine, the simplest way to get a converted model is to use hatch to start the server locally once. Specify the WHISPER_MODEL and ALIGN_MODEL in .env, then run hatch run runtime:launch in your terminal. This will download and convert these models.

You'll then find the converted models in cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models. Then in your Dockerfile, copy the converted models to the /app/src/wordcab_transcribe/whisper_models directory.

Example Dockerfile line for WHISPER_MODEL: COPY cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models/large-v3 /app/src/wordcab_transcribe/whisper_models/large-v3 Example Dockerfile line for ALIGN_MODEL: COPY cloned_wordcab_transcribe_repo/src/wordcab_transcribe/whisper_models/tiny /app/src/wordcab_transcribe/whisper_models/tiny

🚀 Contributing

Getting started

Ensure you have the Hatch installed (with pipx for example):

hatch

Clone the repo

git clone
cd wordcab-transcribe

Install dependencies and start coding

hatch env create

Run tests

# Quality checks without modifying the code
hatch run quality:check

# Quality checks and auto-formatting
hatch run quality:format

# Run tests with coverage
hatch run tests:run

Working workflow

Create an issue for the feature or bug you want to work on.
Create a branch using the left panel on GitHub.
git fetchand git checkout the branch.
Make changes and commit.
Push the branch to GitHub.
Create a pull request and ask for review.
Merge the pull request when it's approved and CI passes.
Delete the branch.
Update your local repo with git fetch and git pull.

Name		Name	Last commit message	Last commit date
Latest commit History 582 Commits
.github		.github
docs		docs
faster-whisper		faster-whisper
nemo_local		nemo_local
notebooks		notebooks
scripts		scripts
src/wordcab_transcribe		src/wordcab_transcribe
tests		tests
.dockerignore		.dockerignore
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
error.log		error.log
hatch-1.9.4.pkg		hatch-1.9.4.pkg
mkdocs.yml		mkdocs.yml
nginx.conf		nginx.conf
pre_requirements.txt		pre_requirements.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wordcab Transcribe

Key features

Requirements

Local development

Run the API locally 🚀

Deployment

Run the API using Docker

Run the API behind a reverse proxy

Test the API

Running Local Models

🚀 Contributing

Getting started

Working workflow

About

Releases 9

Contributors 9

Languages

License

Wordcab/wordcab-transcribe

Folders and files

Latest commit

History

Repository files navigation

Wordcab Transcribe

Key features

Requirements

Local development

Run the API locally 🚀

Deployment

Run the API using Docker

Run the API behind a reverse proxy

Test the API

Running Local Models

🚀 Contributing

Getting started

Working workflow

About

Resources

License

Stars

Watchers

Forks

Releases 9

Contributors 9

Languages