Voice transcription tool with AI-powered cleanup capabilities. Transcribe audio from YouTube videos, uploaded files, or live recordings, with intelligent summarization and message refinement using local LLM or OpenAI GPT.
-
Multiple Input Methods:
- YouTube URL transcription
- Audio file upload (mp3, wav, m4a, ogg, flac)
- Live voice recording
-
Dual AI Processing Modes:
- Summarize Mode: Get clear, concise summaries of transcripts with key points extraction
- Refine Mode: Transform voice recordings into well-structured, professional messages
-
Flexible AI Providers:
- Local LLM (Default): Run AI models on your machine via Ollama - free, private, no API key needed
- OpenAI GPT: Cloud-based option for GPT-4/3.5 models
-
Local Transcription: Uses Whisper model locally for privacy and cost-effectiveness
-
User-Friendly Interface: Built with Streamlit for an intuitive web experience
AiTranscript follows a service-oriented architecture with clear separation of concerns:
- Services Layer: YouTube extraction, audio transcription, AI summarization
- Utils Layer: Input validation, file handling
- UI Layer: Reusable Streamlit components
- Python: 3.11 or higher
- FFmpeg: Required for audio processing
- Ollama: For local LLM (recommended, free)
- OpenAI API Key: Optional, only if using OpenAI provider
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt-get update
sudo apt-get install ffmpegWindows: Download from ffmpeg.org and add to PATH.
macOS/Linux:
curl https://ollama.ai/install.sh | shWindows: Download from ollama.ai
Pull a model:
# Start Ollama
ollama serve
# Pull a model (in another terminal)
ollama pull llama2
# or
ollama pull mistralgit clone <repository-url>
cd AiTranscript# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"# Create virtual environment
uv venv
# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# Install dependencies
uv pip install -e .# Copy the example environment file
cp .env.example .env
# Edit .env if needed (optional - defaults work out of the box)Environment variables (all optional):
AI_PROVIDER: Choose 'local' (default) or 'openai'LOCAL_MODEL: Local model to use (default: llama2)OPENAI_API_KEY: Your OpenAI API key (only if using OpenAI)OPENAI_MODEL: OpenAI model to use (default: gpt-4-turbo-preview)WHISPER_MODEL_SIZE: Whisper model size (default: base)
streamlit run app.pyThe application will open in your default browser at http://localhost:8501.
aitranscript/
βββ src/
β βββ __init__.py
β βββ common/ # Shared services and utilities
β β βββ __init__.py
β β βββ ai_processing.py # AI processing logic
β β βββ ai_service.py # AI service integration
β β βββ audio_service.py # Audio transcription (Whisper)
β βββ recording/ # Voice recording feature
β β βββ __init__.py
β β βββ service.py
β β βββ view.py
β βββ ui/ # UI components
β β βββ __init__.py
β β βββ components.py
β βββ upload/ # File upload feature
β β βββ __init__.py
β β βββ service.py
β β βββ view.py
β βββ utils/ # Utilities
β β βββ __init__.py
β β βββ config.py
β β βββ file_handler.py
β β βββ time_utils.py
β β βββ validators.py
β βββ youtube/ # YouTube feature
β βββ __init__.py
β βββ provider.py
β βββ service.py
β βββ view.py
βββ app.py # Main Streamlit application
βββ pyproject.toml # Project configuration
βββ .env.example # Environment variables template
βββ README.md # This file
Choose the appropriate model size based on your needs:
| Model | Size | RAM | Speed | Accuracy | Use Case |
|---|---|---|---|---|---|
| tiny | 39M | ~1GB | Fastest | Lowest | Quick drafts |
| base | 74M | ~1GB | Fast | Good | Default choice |
| small | 244M | ~2GB | Medium | Better | Quality focus |
| medium | 769M | ~5GB | Slow | High | Professional |
| large | 1550M | ~10GB | Slowest | Best | Maximum accuracy |
Set in .env:
WHISPER_MODEL_SIZE=base| Model | Size | RAM | Speed | Quality | Use Case |
|---|---|---|---|---|---|
| llama2 | 3.8GB | ~8GB | Fast | Good | Default, balanced |
| llama3 | 4.7GB | ~8GB | Fast | Excellent | Best quality |
| mistral | 4.1GB | ~8GB | Fast | Very Good | Great alternative |
| phi | 1.6GB | ~4GB | Fastest | Good | Low resource |
| codellama | 3.8GB | ~8GB | Fast | Good | Code-focused |
Advantages:
- β Free - no API costs
- β Private - data stays on your machine
- β No API key needed
- β Works offline
Setup:
# Install Ollama
curl https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama2
# Run the app (Ollama will start automatically)
streamlit run app.py| Model | Cost/1K tokens | Speed | Quality | Use Case |
|---|---|---|---|---|
| gpt-3.5-turbo | $0.0015 | Fast | Good | Cost-effective |
| gpt-4-turbo | $0.01 | Medium | Excellent | High quality |
| gpt-4 | $0.03 | Slow | Best | Premium quality |
Setup:
# Set in .env
AI_PROVIDER=openai
OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4-turbo-preview# Add a new dependency
uv pip install package-name
# Add a development dependency
uv pip install --dev package-name
# Update dependencies
uv pip install --upgrade package-name
# Sync dependencies from pyproject.toml
uv pip syncpytest# Format code with black
black .
# Lint with ruff
ruff check .Summarize Mode - Get clear summaries of content:
- Perfect for YouTube videos, podcasts, or long recordings
- Extracts key points automatically
- Choose from concise, detailed, or bullet-point styles
Refine Mode - Transform your voice into professional messages:
- Record what you want to say naturally
- AI refines it into a clear, well-structured message
- Choose tone: professional, friendly, formal, or casual
- Optionally specify recipient context for better refinement
-
Configure Settings (in sidebar):
- Select processing mode (Summarize or Refine)
- Configure mode-specific options
- Note: AI Provider is configured via
.envfile
-
Choose Input Method:
YouTube Transcription:
- Navigate to the "YouTube" tab
- Paste a YouTube URL
- Click "Get Transcript"
- View transcript and AI-processed result
File Upload:
- Navigate to the "Upload File" tab
- Upload an audio file (mp3, wav, m4a, ogg, flac)
- Click "Transcribe File"
- View transcript and AI-processed result
Voice Recording:
- Navigate to the "Record Audio" tab
- Click the microphone button to start/stop recording
- Click "Transcribe Recording"
- View transcript and AI-processed result
-
Download Results:
- Use the download buttons to save your transcript and AI-processed output
- Local Transcription: Audio is transcribed locally using Whisper (no data sent to external services)
- No Data Storage: Transcripts are not stored permanently
- Temporary Files: Automatically cleaned up after processing
- API Key Security: Store your OpenAI API key securely in
.env(never commit to version control)
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for local transcription
- Ollama for local LLM support
- Streamlit for the web framework
- OpenAI for GPT API
For issues and questions, please open an issue on GitHub.
Built with β€οΈ using Streamlit, Whisper, Ollama, and OpenAI GPT


