A simple media transcription CLI that converts audio and video files to text with diarization and speaker labels using AssemblyAI.
- Transcribe both audio and video files
- Support for numerous media formats:
- Audio: mp3, wav, m4a, flac, aac, ogg, wma, aiff, alac
- Video: mp4, mov, avi, wmv, webm, mkv, mpg, mpeg, m4v, asf, dv, ogv, vp8
- Speaker diarization (identifying who said what)
- Easy-to-use command-line interface
- Batch processing for multiple files
- Optional email notifications when transcriptions complete
# Install from source
git clone https://github.com/alex-salazar/transcribe.git
cd transcribe
pip install -e .After installation, verify the package is installed correctly:
pip list | grep transcribe- Python 3.7+
- AssemblyAI API key (set as environment variable
ASSEMBLY_API_KEY)
# Transcribe a single file
python -m transcribe file path/to/media_file.mp3
# With custom output path
python -m transcribe file path/to/media_file.mp3 --output path/to/output.md
# Enable verbose logging
python -m transcribe file path/to/media_file.mp3 --verbose# Process all new media files in a directory
python -m transcribe batch path/to/directory
# With email notifications
python -m transcribe batch path/to/directory --emailFor email notifications, set the following environment variables:
SMTP_SERVER- SMTP server addressSMTP_PORT- SMTP port (default: 587)SMTP_USERNAME- SMTP usernameSMTP_PASSWORD- SMTP passwordEMAIL_FROM- Sender email addressEMAIL_TO- Recipient email address
# Show general help
python -m transcribe --help
# Show help for a specific command
python -m transcribe file --help
python -m transcribe batch --helpYou can automate transcription of new media files using cron jobs:
# Create a copy of the example script and customize it
cp run_transcribe_example.sh run_transcribe.sh
chmod +x run_transcribe.shEdit run_transcribe.sh to:
- Set the path to your media directory
- Configure your Python environment
- Set your API key and email settings if needed
Then set up a cron job:
# Open crontab editor
crontab -e
# Add a line to run the script every minute
* * * * * /path/to/your/run_transcribe.sh
# Save and exit (in vi: press Esc, type :wq, press Enter)For less frequent runs, adjust the cron timing:
*/5 * * * *- every 5 minutes0 * * * *- every hour0 0 * * *- daily at midnight
If you encounter an error like command not found: transcribe, use the module invocation pattern with Python:
python -m transcribe [command] [options]If you encounter authentication errors, verify your AssemblyAI API key is set correctly:
# Check if environment variable is set
echo $ASSEMBLY_API_KEY
# Set the API key if needed
export ASSEMBLY_API_KEY="your_api_key_here"If using the automation script, ensure all paths are correct and absolute. Verify paths with:
# Check media directory exists
ls -la /path/to/your/media/directory
# Check project directory exists
ls -la /path/to/transcribe/projectPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.