This project uses OpenAI Whisper, an open-source speech recognition model, to transcribe audio or video files for free on Google Colab.
It can generate transcripts in plain text, as well as subtitle files in SRT and VTT formats.
- β Upload an audio/video file from your computer.
- β Transcribe it using Whisperβs large model (high accuracy).
- β
Save results as:
transcript.txtβ plain text transcriptiontranscript.srtβ subtitle file with timestamps (for movies/players)transcript.vttβ WebVTT subtitle file (for YouTube/HTML5)
- β Store Whisper cache on Google Drive (so it doesnβt re-download the model each run).
- β 100% free (runs locally in Google Colab using GPU).
- A Google account (to use Google Colab).
- Access to Google Drive (for caching the model).
- Internet connection (first run downloads the Whisper model).
- Open this notebook in directly in Google Colab.
- Click on Runtime ---> Change Runtime type --> Select T4 GPU ---> Save
- Follow the Instructions in the note book.
- For faster transcriptions, try smaller models:
"tiny","base","small","medium". - For best accuracy, use
"large"(but itβs slower and needs more GPU memory), this is advisable to use for better Accuracy - Works with most audio/video formats supported by FFmpeg (
.mp3,.wav,.m4a,.mp4,.mkv, etc.). - I will advise you to extract the Audio from the video first before using whisper, it's a bit faster this way.
This project uses OpenAI Whisper, released under the MIT License.