Skip to content

Latest commit

 

History

History
34 lines (30 loc) · 1.23 KB

README.md

File metadata and controls

34 lines (30 loc) · 1.23 KB

Ara 🦜

Overview

Ara is a script / api to transcribe ✍️ and diarize 📓 audio. The typical use case for this is transcribing audio from interviews, podcasts and anything where multiple people are speaking. The output is 'easy' to read (if you like .txt files), formatted so that speakers are clear for each segment.

It uses Whisper to transcribe the audio into text. It then uses Pyannote to diarize different speakers. Finally, it matches the segments from the two models and writes the output to file or returns it through the api.

Usage

Script

call the script like so:

python script.py -i input.wav -o output.txt -l English 

Flask API

main.py defines a basic FastAPI with an endpoint for transcription Start the server

uvicorn main:app --reload 

query

curl 127.0.0.1:8000/transcribe/sample_data.interview.wav

This can be useful for interacting with it through Docker, or deploying the code.

The repo comes with a Dockerfile, which makes it easier to deploy in a containerised way. build the docker, then run like so

sudo docker run -p 80:80 --gpus all <CONTAINER NAME>