🎙️ AiTranscript

Voice transcription tool with AI-powered cleanup capabilities. Transcribe audio from YouTube videos, uploaded files, or live recordings, with intelligent summarization and message refinement using local LLM or OpenAI GPT.

🎥 Demo

1. Record Transcription

2. YouTube Transcription

3. Audio File Transcription

✨ Features

Multiple Input Methods:
- YouTube URL transcription
- Audio file upload (mp3, wav, m4a, ogg, flac)
- Live voice recording
Dual AI Processing Modes:
- Summarize Mode: Get clear, concise summaries of transcripts with key points extraction
- Refine Mode: Transform voice recordings into well-structured, professional messages
Flexible AI Providers:
- Local LLM (Default): Run AI models on your machine via Ollama - free, private, no API key needed
- OpenAI GPT: Cloud-based option for GPT-4/3.5 models
Local Transcription: Uses Whisper model locally for privacy and cost-effectiveness
User-Friendly Interface: Built with Streamlit for an intuitive web experience

🏗️ Architecture

AiTranscript follows a service-oriented architecture with clear separation of concerns:

Services Layer: YouTube extraction, audio transcription, AI summarization
Utils Layer: Input validation, file handling
UI Layer: Reusable Streamlit components

📋 Prerequisites

Python: 3.11 or higher
FFmpeg: Required for audio processing
Ollama: For local LLM (recommended, free)
OpenAI API Key: Optional, only if using OpenAI provider

Installing FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install ffmpeg

Windows: Download from ffmpeg.org and add to PATH.

Installing Ollama (for Local LLM)

macOS/Linux:

curl https://ollama.ai/install.sh | sh

Windows: Download from ollama.ai

Pull a model:

# Start Ollama
ollama serve

# Pull a model (in another terminal)
ollama pull llama2
# or
ollama pull mistral

🚀 Quick Start

1. Clone the Repository

git clone <repository-url>
cd AiTranscript

2. Install uv (if not already installed)

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

3. Create Virtual Environment and Install Dependencies

# Create virtual environment
uv venv

# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install dependencies
uv pip install -e .

4. Configure Environment Variables

# Copy the example environment file
cp .env.example .env

# Edit .env if needed (optional - defaults work out of the box)

Environment variables (all optional):

AI_PROVIDER: Choose 'local' (default) or 'openai'
LOCAL_MODEL: Local model to use (default: llama2)
OPENAI_API_KEY: Your OpenAI API key (only if using OpenAI)
OPENAI_MODEL: OpenAI model to use (default: gpt-4-turbo-preview)
WHISPER_MODEL_SIZE: Whisper model size (default: base)

5. Run the Application

streamlit run app.py

The application will open in your default browser at http://localhost:8501.

📦 Project Structure

aitranscript/
├── src/
│   ├── __init__.py
│   ├── common/              # Shared services and utilities
│   │   ├── __init__.py
│   │   ├── ai_processing.py # AI processing logic
│   │   ├── ai_service.py    # AI service integration
│   │   └── audio_service.py # Audio transcription (Whisper)
│   ├── recording/           # Voice recording feature
│   │   ├── __init__.py
│   │   ├── service.py
│   │   └── view.py
│   ├── ui/                  # UI components
│   │   ├── __init__.py
│   │   └── components.py
│   ├── upload/              # File upload feature
│   │   ├── __init__.py
│   │   ├── service.py
│   │   └── view.py
│   ├── utils/               # Utilities
│   │   ├── __init__.py
│   │   ├── config.py
│   │   ├── file_handler.py
│   │   ├── time_utils.py
│   │   └── validators.py
│   └── youtube/             # YouTube feature
│       ├── __init__.py
│       ├── provider.py
│       ├── service.py
│       └── view.py
├── app.py                   # Main Streamlit application
├── pyproject.toml           # Project configuration
├── .env.example             # Environment variables template
└── README.md                # This file

🔧 Configuration

Whisper Model Sizes

Choose the appropriate model size based on your needs:

Model	Size	RAM	Speed	Accuracy	Use Case
tiny	39M	~1GB	Fastest	Lowest	Quick drafts
base	74M	~1GB	Fast	Good	Default choice
small	244M	~2GB	Medium	Better	Quality focus
medium	769M	~5GB	Slow	High	Professional
large	1550M	~10GB	Slowest	Best	Maximum accuracy

Set in .env:

WHISPER_MODEL_SIZE=base

AI Provider Models

Local LLM Models (via Ollama) - Default & Recommended

Model	Size	RAM	Speed	Quality	Use Case
llama2	3.8GB	~8GB	Fast	Good	Default, balanced
llama3	4.7GB	~8GB	Fast	Excellent	Best quality
mistral	4.1GB	~8GB	Fast	Very Good	Great alternative
phi	1.6GB	~4GB	Fastest	Good	Low resource
codellama	3.8GB	~8GB	Fast	Good	Code-focused

Advantages:

✅ Free - no API costs
✅ Private - data stays on your machine
✅ No API key needed
✅ Works offline

Setup:

# Install Ollama
curl https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama2

# Run the app (Ollama will start automatically)
streamlit run app.py

OpenAI Models (Optional)

Model	Cost/1K tokens	Speed	Quality	Use Case
gpt-3.5-turbo	$0.0015	Fast	Good	Cost-effective
gpt-4-turbo	$0.01	Medium	Excellent	High quality
gpt-4	$0.03	Slow	Best	Premium quality

Setup:

# Set in .env
AI_PROVIDER=openai
OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4-turbo-preview

🛠️ Development

Using uv for Package Management

# Add a new dependency
uv pip install package-name

# Add a development dependency
uv pip install --dev package-name

# Update dependencies
uv pip install --upgrade package-name

# Sync dependencies from pyproject.toml
uv pip sync

Running Tests (Coming Soon)

pytest

Code Formatting

# Format code with black
black .

# Lint with ruff
ruff check .

📝 Usage

Processing Modes

Summarize Mode - Get clear summaries of content:

Perfect for YouTube videos, podcasts, or long recordings
Extracts key points automatically
Choose from concise, detailed, or bullet-point styles

Refine Mode - Transform your voice into professional messages:

Record what you want to say naturally
AI refines it into a clear, well-structured message
Choose tone: professional, friendly, formal, or casual
Optionally specify recipient context for better refinement

Step-by-Step Guide

Configure Settings (in sidebar):
- Select processing mode (Summarize or Refine)
- Configure mode-specific options
- Note: AI Provider is configured via .env file
Choose Input Method:

YouTube Transcription:
- Navigate to the "YouTube" tab
- Paste a YouTube URL
- Click "Get Transcript"
- View transcript and AI-processed result
File Upload:
- Navigate to the "Upload File" tab
- Upload an audio file (mp3, wav, m4a, ogg, flac)
- Click "Transcribe File"
- View transcript and AI-processed result
Voice Recording:
- Navigate to the "Record Audio" tab
- Click the microphone button to start/stop recording
- Click "Transcribe Recording"
- View transcript and AI-processed result
Download Results:
- Use the download buttons to save your transcript and AI-processed output

🔒 Privacy & Security

Local Transcription: Audio is transcribed locally using Whisper (no data sent to external services)
No Data Storage: Transcripts are not stored permanently
Temporary Files: Automatically cleaned up after processing
API Key Security: Store your OpenAI API key securely in .env (never commit to version control)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI Whisper for local transcription
Ollama for local LLM support
Streamlit for the web framework
OpenAI for GPT API

📞 Support

For issues and questions, please open an issue on GitHub.

Built with ❤️ using Streamlit, Whisper, Ollama, and OpenAI GPT

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.streamlit		.streamlit
docs		docs
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
STREAMLIT_DEPLOYMENT.md		STREAMLIT_DEPLOYMENT.md
app.py		app.py
docker-compose.yml		docker-compose.yml
packages.txt		packages.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🎙️ AiTranscript

🎥 Demo

1. Record Transcription

2. YouTube Transcription

3. Audio File Transcription

✨ Features

🏗️ Architecture

📋 Prerequisites

Installing FFmpeg

Installing Ollama (for Local LLM)

🚀 Quick Start

1. Clone the Repository

2. Install uv (if not already installed)

3. Create Virtual Environment and Install Dependencies

4. Configure Environment Variables

5. Run the Application

📦 Project Structure

🔧 Configuration

Whisper Model Sizes

AI Provider Models

Local LLM Models (via Ollama) - Default & Recommended

OpenAI Models (Optional)

🛠️ Development

Using uv for Package Management

Running Tests (Coming Soon)

Code Formatting

📝 Usage

Processing Modes

Step-by-Step Guide

🔒 Privacy & Security

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages