Skip to content

A minimalist, performance-oriented server for automatic speech recognition

License

Notifications You must be signed in to change notification settings

SaladTechnologies/asr-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

261c22f · Mar 26, 2024

History

33 Commits
Jan 18, 2024
Jan 19, 2024
Nov 30, 2023
Mar 26, 2024
Dec 14, 2023
Jan 18, 2024
Nov 30, 2023
Jan 19, 2024
Nov 29, 2023
Mar 26, 2024
Jan 18, 2024
Dec 14, 2023
Nov 30, 2023
Jan 19, 2024
Mar 11, 2024
Jan 18, 2024

Repository files navigation

Automatic Speech Recognition API

A minimalist, performance-oriented inference server for automatic speech recognition.

Here's some prelimary performance numbers. These numbers are for total round-trip request time, including downloading the audio file, and parsing the response. The default configuration is used for all models.

More extensive benchmarks are available here:

RTX 3080 Ti w/ BetterTransformers

Model Input Audio Length Realtime Multiple
OpenAI Whisper Large v3 19 min 51s 50x
Distil Whisper Distil Large v2 19 min 51s 78x

RTX 4090 w/ BetterTransformers

Model Input Audio Length Realtime Multiple
OpenAI Whisper Large v3 19 min 51s 68x
Distil Whisper Distil Large v2 19 min 51s 83x

RTX 4090 w/ Flash Attention 2

Model Input Audio Length Realtime Multiple
OpenAI Whisper Large v3 19 min 51s 63x
Distil Whisper Distil Large v2 19 min 51s 93x

API

GET /hc

This healthcheck will not respond until the server is fully ready to accept requests.

Response

{
  "status": "ok",
  "version": "0.0.5",
}

POST /asr

Request - JSON

URL should be a download link to an audio file. It can also be a local filepath, if the server is running on the same machine as the file.

Verified extension support:

  • mp3
  • ogg
  • wav
  • webm
  • flac

It may support more formats. It is using ffmpeg and Soundfile under the hood.

{
  "url": "https://example.com/audio.mp3",
}
{
  "url": "/path/to/local/audio.mp3"
}

Request - Upload

You can also upload an audio file directly. Use the raw bytes of the file as the request body.

Python

with open(file_path, 'rb') as f:
    # Make the POST request, uploading the file's bytes directly
    response = requests.post(base_url + "/asr", data=f).json()

CURL

curl -X POST http://example.com/asr \
--data-binary @/path/to/your/audiofile.mp3 \
-H "Content-Type: application/octet-stream"

Response

{
  "text": "hello world",
  "chunks": [
    {
      "timestamp": [
        0.0,
        2.1
      ],
      "text": "hello world"
    },
  ]
}

GET /docs

Swagger docs for the API.

Configuration

All configuration is via environment variables.

See documentation for the ASR Pipeline for more information on the model configuration options:

Name Description Default
HOST The host to listen on *
PORT The port to listen on 8000
MODEL_ID The model to use. See Automatic Speech Recognition Models openai/whisper-large-v3
CACHE_DIR The directory to cache models in /data
FLASH_ATTENTION_2 Whether to use flash attention 2. Must be 1 to enable. Enabled by default in -fa2 images. Note, if your GPU does not support compute capability >= 8.9, BetterTransformers will be used instead. None
BATCH_SIZE The batch size to use. 16
MAX_NEW_TOKENS Not sure what this does. 128
CHUNK_LENGTH_S The length of each chunk in seconds. 30
STRIDE_LENGTH_S The stride length in seconds. Defaults to 1/6 of CHUNK_LENGTH_S CHUNK_LENGTH_S / 6

Docker Images

Note: The -fa2 images are larger, and require a GPU with compute capability >= 8.9. If your GPU does not support this, use the non -fa2 images.

  • saladtechnologies/asr-api:latest, saladtechnologies/asr-api:0.0.5 - The base image, no models included. Does not support flash attention 2, but is a smaller base image. Will download the model at runtime.
  • saladtechnologies/asr-api:latest-fa2 ,saladtechnologies/asr-api:0.0.5-fa2 - The base image, no models included. Supports flash attention 2, but is a larger base image. Will download the model at runtime.
  • saladtechnologies/asr-api:latest-openai-whisper-large-v3, saladtechnologies/asr-api:0.0.5-openai-whisper-large-v3 - The base image, with the OpenAI Whisper Large v3 model included. Does not support flash attention 2.
  • saladtechnologies/asr-api:latest-fa2-openai-whisper-large-v3, saladtechnologies/asr-api:0.0.5-fa2-openai-whisper-large-v3 - The base image, with the OpenAI Whisper Large v3 model included. Supports flash attention 2.
  • saladtechnologies/asr-api:latest-distil-whisper-distil-large-v2, saladtechnologies/asr-api:0.0.5-distil-whisper-distil-large-v2 - The base image, with the Distil Whisper Distil Large v2 model included. Does not support flash attention 2.
  • saladtechnologies/asr-api:latest-fa2-distil-whisper-distil-large-v2, saladtechnologies/asr-api:0.0.5-fa2-distil-whisper-distil-large-v2 - The base image, with the Distil Whisper Distil Large v2 model included. Supports flash attention 2.

Deploying On Salad

Deploying with the API

You can deploy this API on Salad using the following command:

See API Docs for more information.

organization_name="my-org"
project_name="my-project"
salad_api_key="my-api-key"
curl -X POST \
  --url https://api.salad.com/api/public/organizations/${organization_name}/projects/${project_name}/containers \
  --header "Salad-Api-Key: ${salad_api_key}" \
  --data '
{
  "name": "asr-api-distil-whisper-lg-v2",
  "display_name": "asr-api-distil-whisper-lg-v2",
  "container": {
    "image": "saladtechnologies/asr-api:latest-distil-whisper-distil-large-v2",
    "resources": {
      "cpu": 2,
      "memory": 8192,
      "gpu_classes": [
        "65247de0-746f-45c6-8537-650ba613966a"
      ]
    },
    "command": [],
  },
  "autostart_policy": true,
  "restart_policy": "always",
  "replicas": 3,
  "networking": {
    "protocol": "http",
    "port": 8000,
    "auth": false
  },
  "startup_probe": {
    "http": {
      "path": "/hc",
      "port": 8000,
      "scheme": "http",
      "headers": []
    },
    "initial_delay_seconds": 1,
    "period_seconds": 1,
    "timeout_seconds": 1,
    "success_threshold": 1,
    "failure_threshold": 20
  }
}'

Deploying With The Portal

You can also deploy this API on Salad using the Salad Portal.

Select or Create the organization and project you want to work with, then click the "Deploy a Container Group" button.

  1. Give your container group a name that is unique within this organization and project.

Name the container group

  1. Select the saladtechnologies/asr-api:latest-distil-whisper-distil-large-v2 image to deploy distil-whisper large v2, using BetterTransformers.

Select the correct docker image

  1. Set your replica count. We recommend at least 3 replicas for production use.

Choose Replicas

  1. Set the CPU to 2, and the memory to 8 GB.

Select CPU and RAM

  1. Set the GPU to 1x RTX 3080 Ti (Or another GPU. We haven't done comprehensive testing on all GPUs, so your mileage may vary).

Select GPU

  1. Configure the Startup Probe. This is used to determine when the container is ready to accept requests. Select the HTTP protocol, set the path to /hc, and the port to 8000. Set the initial delay, period, and timeout to 1. Set the success threshold to 1, and the failure threshold to 20. If you are using an image that downloads the model weights at runtime, you should increase intial delay to 10 or more, and a failure threshold of 180 to allow up to 3 minutes for the container to start.

Configure your startup probe

  1. Enable networking for port 8000, and choose authenticated or not authenticated. If you choose authenticated, you will need to provide an API key when making requests.

Enable networking

  1. Click "Deploy" to deploy your container group.

Hit Deploy

About

A minimalist, performance-oriented server for automatic speech recognition

Resources

License

Stars

Watchers

Forks

Packages

No packages published