Python script for converting speech to text. Uses Google Cloud Speech-to-Text API. Suitable for long audio/video files.
Originally based on Sundar Krishnan’s work.
- The code actually works now
- Detects language automatically (from up to 6 predefined options)
- Supports any audio or video format as a source, not just MP3
- Converts to Opus format, which takes less space → faster upload
- Supports source files of any length (previously was up to 100 MB)
- Removes temporary files from disk after transcribing
- Provides succinct verbose output for every stage of the process
- Works on Windows, too
- Set up Google Cloud stuff: do the 6 steps
- Create a storage bucket
- Install ffmpeg
- Install all the dependencies for the .py
- Change the settings in the top of the .py for your needs
- Put your audio or video files to a specified folder
- Run .py
- Do your stuff (the whole process will take about 50–80% of your files duration)
- Gather your transcripts from another folder