Fast Audio/Video transcribe using Openai's Whisper and Modal

Backend for a third-place project from the August 2023 Epson Innovation Challenge hackathon, created by Christopher Smith and Kevin Mora. Meant to be used in combination with our Javascript frontend. Based on "Fast Audio/Video transcribe using Openai's Whisper and Modal" by mharrvic.

Powered by Modal.com for parallel processing on-demand, an hour audio file can be transcribed in ~1 minute.

"Modal’s dead-simple parallelism primitives are the key to doing the transcription so quickly. Even with a GPU, transcribing a full episode serially was taking around 10 minutes. But by pulling in ffmpeg with a simple .pip_install("ffmpeg-python") addition to our Modal Image, we could exploit the natural silences of the podcast medium to partition episodes into hundreds of short segments. Each segment is transcribed by Whisper in its own container task with 2 physical CPU cores, and when all are done we stitch the segments back together with only a minimal loss in transcription quality. This approach actually accords quite well with Whisper’s model architecture." The model uses 30-second chunking.

How to develop

Create a Modal account and get your API key.
- Run this command to install modal client and generate token.
```
pip install modal-client
modal token new
```
  - The first command will install the Modal client library on your computer, along with its dependencies.
  - The second command creates an API token by authenticating through your web browser. It will open a new tab, but you can close it when you are done.
Deploy your modal project with the following command.
```
modal deploy whisper_api.main
```

To-do items

How to use

Transcribe your audio file using the following curl command. The 'transcribe' endpoint wants a JSON formatted request:

curl --location --request POST 'https://your-domain.modal.run/api/transcribe' \
--header 'Content-Type: application/json' \
--data-raw '{
    "src_url": "https://storage.googleapis.com/your-bucket/filename.mp3",
    "unique_id": 987654,
    "session_title": "Session Title Here",
    "presenters": "Presenters Here",
    "is_video": false
}'

Sample response:

{
  "job_id": "your-job-id"
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
tests		tests
whisper_api		whisper_api
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pseudocode.md		pseudocode.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Audio/Video transcribe using Openai's Whisper and Modal

Powered by Modal.com for parallel processing on-demand, an hour audio file can be transcribed in ~1 minute.

How to develop

To-do items

How to use

About

Releases

Packages

Languages

chriscarrollsmith/session-scribe-whisper-api

Folders and files

Latest commit

History

Repository files navigation

Fast Audio/Video transcribe using Openai's Whisper and Modal

Powered by Modal.com for parallel processing on-demand, an hour audio file can be transcribed in ~1 minute.

How to develop

To-do items

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages