Skip to content

Transcribrr is a python desktop application that uses transcribes audio/video files or youtube videos and summarizes the output using a variety of preset prompts using OpenAI's GPT models.

License

jbmiller10/transcribrr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transcribrr

Transcribrr is a desktop tool that turns audio into text and then refines the output using OpenAI's GPT models. It works with audio or video files on your computer, YouTube videos via a provided URL, or recordings made directly in the app. While functional, this is a personal project that I work on in my free time and very much a work in progress.

Features

  • fast, accurate, local transcription with optional speaker detection (via the excellent whisperx library)
  • GPT-4 for transcript processing & summarization
  • Manageable transcription quality settings
  • Preset prompt management for GPT processing

Installation

Prerequisites

Before installing the application, ensure you have the following dependencies:

  • Python 3.10
  • Cuda 11.8 or higher (optional, though highly recommended, for hardware acceleration. Requires a supported Nvidia GPU.)
  • ffmpeg

Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/jbmiller10/transcribrr.git
cd transcribrr

Create a Virtual Environment

Windows

python -m venv venv
.\venv\Scripts\activate

MacOS/Linux

python -m venv venv
source venv/bin/activate

Install Dependencies

Install Torch w/ Cuda (optional, though highly recommended, for hardware acceleration. Requires an Nvidia GPU and cuda toolkit)

pip3 install torch~=2.0.0 torchaudio~=2.0.0 --index-url https://download.pytorch.org/whl/cu118

Install requirements.txt

pip install -r requirements.txt

Usage

Run the main script to start the application:

python main.py

Configuration

Before usage, configure the application with your Hugging Face Access Token (optional, required for speaker detection/diarization) and OpenAI API keys through the 'Settings' menu. ll You can also adjust transcription quality, GPT model selection, max tokens, temperature, speaker detection settings, and your preset GPT prompts.

Speaker Detection/Diarization

To enable Speaker Detection, you will need a Huggingface access token (generate here) that you can set in the settings menu. Additionally, you will need to accept the usage terms for the following models while logged into your huggingface account: Segmentation and Speaker-Diarization.

How to Use

  1. Choose the mode of transcription (File Upload or YouTube URL).
  2. If using File Upload, select your video/audio file using the "Open Audio/Video File" button.
  3. If using the YouTube URL mode, paste the YouTube link into the corresponding field.
  4. Click the "Start Transcription" button to begin processing.
  5. After transcription, you can process the text with GPT-4 using the "Process with GPT-4" button after setting your prompts.

About

Transcribrr is a python desktop application that uses transcribes audio/video files or youtube videos and summarizes the output using a variety of preset prompts using OpenAI's GPT models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages