Skip to content

GuyAfik/AiTranscript

Repository files navigation

πŸŽ™οΈ AiTranscript

Voice transcription tool with AI-powered cleanup capabilities. Transcribe audio from YouTube videos, uploaded files, or live recordings, with intelligent summarization and message refinement using local LLM or OpenAI GPT.

πŸŽ₯ Demo

1. Record Transcription

Record Transcription

2. YouTube Transcription

YouTube Transcription

3. Audio File Transcription

Audio File Transcription

✨ Features

  • Multiple Input Methods:

    • YouTube URL transcription
    • Audio file upload (mp3, wav, m4a, ogg, flac)
    • Live voice recording
  • Dual AI Processing Modes:

    • Summarize Mode: Get clear, concise summaries of transcripts with key points extraction
    • Refine Mode: Transform voice recordings into well-structured, professional messages
  • Flexible AI Providers:

    • Local LLM (Default): Run AI models on your machine via Ollama - free, private, no API key needed
    • OpenAI GPT: Cloud-based option for GPT-4/3.5 models
  • Local Transcription: Uses Whisper model locally for privacy and cost-effectiveness

  • User-Friendly Interface: Built with Streamlit for an intuitive web experience

πŸ—οΈ Architecture

AiTranscript follows a service-oriented architecture with clear separation of concerns:

  • Services Layer: YouTube extraction, audio transcription, AI summarization
  • Utils Layer: Input validation, file handling
  • UI Layer: Reusable Streamlit components

πŸ“‹ Prerequisites

  • Python: 3.11 or higher
  • FFmpeg: Required for audio processing
  • Ollama: For local LLM (recommended, free)
  • OpenAI API Key: Optional, only if using OpenAI provider

Installing FFmpeg

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install ffmpeg

Windows: Download from ffmpeg.org and add to PATH.

Installing Ollama (for Local LLM)

macOS/Linux:

curl https://ollama.ai/install.sh | sh

Windows: Download from ollama.ai

Pull a model:

# Start Ollama
ollama serve

# Pull a model (in another terminal)
ollama pull llama2
# or
ollama pull mistral

πŸš€ Quick Start

1. Clone the Repository

git clone <repository-url>
cd AiTranscript

2. Install uv (if not already installed)

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

3. Create Virtual Environment and Install Dependencies

# Create virtual environment
uv venv

# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install dependencies
uv pip install -e .

4. Configure Environment Variables

# Copy the example environment file
cp .env.example .env

# Edit .env if needed (optional - defaults work out of the box)

Environment variables (all optional):

  • AI_PROVIDER: Choose 'local' (default) or 'openai'
  • LOCAL_MODEL: Local model to use (default: llama2)
  • OPENAI_API_KEY: Your OpenAI API key (only if using OpenAI)
  • OPENAI_MODEL: OpenAI model to use (default: gpt-4-turbo-preview)
  • WHISPER_MODEL_SIZE: Whisper model size (default: base)

5. Run the Application

streamlit run app.py

The application will open in your default browser at http://localhost:8501.

πŸ“¦ Project Structure

aitranscript/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ common/              # Shared services and utilities
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ ai_processing.py # AI processing logic
β”‚   β”‚   β”œβ”€β”€ ai_service.py    # AI service integration
β”‚   β”‚   └── audio_service.py # Audio transcription (Whisper)
β”‚   β”œβ”€β”€ recording/           # Voice recording feature
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ service.py
β”‚   β”‚   └── view.py
β”‚   β”œβ”€β”€ ui/                  # UI components
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── components.py
β”‚   β”œβ”€β”€ upload/              # File upload feature
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ service.py
β”‚   β”‚   └── view.py
β”‚   β”œβ”€β”€ utils/               # Utilities
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ config.py
β”‚   β”‚   β”œβ”€β”€ file_handler.py
β”‚   β”‚   β”œβ”€β”€ time_utils.py
β”‚   β”‚   └── validators.py
β”‚   └── youtube/             # YouTube feature
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ provider.py
β”‚       β”œβ”€β”€ service.py
β”‚       └── view.py
β”œβ”€β”€ app.py                   # Main Streamlit application
β”œβ”€β”€ pyproject.toml           # Project configuration
β”œβ”€β”€ .env.example             # Environment variables template
└── README.md                # This file

πŸ”§ Configuration

Whisper Model Sizes

Choose the appropriate model size based on your needs:

Model Size RAM Speed Accuracy Use Case
tiny 39M ~1GB Fastest Lowest Quick drafts
base 74M ~1GB Fast Good Default choice
small 244M ~2GB Medium Better Quality focus
medium 769M ~5GB Slow High Professional
large 1550M ~10GB Slowest Best Maximum accuracy

Set in .env:

WHISPER_MODEL_SIZE=base

AI Provider Models

Local LLM Models (via Ollama) - Default & Recommended

Model Size RAM Speed Quality Use Case
llama2 3.8GB ~8GB Fast Good Default, balanced
llama3 4.7GB ~8GB Fast Excellent Best quality
mistral 4.1GB ~8GB Fast Very Good Great alternative
phi 1.6GB ~4GB Fastest Good Low resource
codellama 3.8GB ~8GB Fast Good Code-focused

Advantages:

  • βœ… Free - no API costs
  • βœ… Private - data stays on your machine
  • βœ… No API key needed
  • βœ… Works offline

Setup:

# Install Ollama
curl https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama2

# Run the app (Ollama will start automatically)
streamlit run app.py

OpenAI Models (Optional)

Model Cost/1K tokens Speed Quality Use Case
gpt-3.5-turbo $0.0015 Fast Good Cost-effective
gpt-4-turbo $0.01 Medium Excellent High quality
gpt-4 $0.03 Slow Best Premium quality

Setup:

# Set in .env
AI_PROVIDER=openai
OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4-turbo-preview

πŸ› οΈ Development

Using uv for Package Management

# Add a new dependency
uv pip install package-name

# Add a development dependency
uv pip install --dev package-name

# Update dependencies
uv pip install --upgrade package-name

# Sync dependencies from pyproject.toml
uv pip sync

Running Tests (Coming Soon)

pytest

Code Formatting

# Format code with black
black .

# Lint with ruff
ruff check .

πŸ“ Usage

Processing Modes

Summarize Mode - Get clear summaries of content:

  • Perfect for YouTube videos, podcasts, or long recordings
  • Extracts key points automatically
  • Choose from concise, detailed, or bullet-point styles

Refine Mode - Transform your voice into professional messages:

  • Record what you want to say naturally
  • AI refines it into a clear, well-structured message
  • Choose tone: professional, friendly, formal, or casual
  • Optionally specify recipient context for better refinement

Step-by-Step Guide

  1. Configure Settings (in sidebar):

    • Select processing mode (Summarize or Refine)
    • Configure mode-specific options
    • Note: AI Provider is configured via .env file
  2. Choose Input Method:

    YouTube Transcription:

    • Navigate to the "YouTube" tab
    • Paste a YouTube URL
    • Click "Get Transcript"
    • View transcript and AI-processed result

    File Upload:

    • Navigate to the "Upload File" tab
    • Upload an audio file (mp3, wav, m4a, ogg, flac)
    • Click "Transcribe File"
    • View transcript and AI-processed result

    Voice Recording:

    • Navigate to the "Record Audio" tab
    • Click the microphone button to start/stop recording
    • Click "Transcribe Recording"
    • View transcript and AI-processed result
  3. Download Results:

    • Use the download buttons to save your transcript and AI-processed output

πŸ”’ Privacy & Security

  • Local Transcription: Audio is transcribed locally using Whisper (no data sent to external services)
  • No Data Storage: Transcripts are not stored permanently
  • Temporary Files: Automatically cleaned up after processing
  • API Key Security: Store your OpenAI API key securely in .env (never commit to version control)

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Support

For issues and questions, please open an issue on GitHub.


Built with ❀️ using Streamlit, Whisper, Ollama, and OpenAI GPT

About

Voice transcription tool that cleans up messy recording using AI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors