Transcribrr

Transcribrr is a desktop tool that turns audio into text and then refines the output using OpenAI's GPT models. It works with audio or video files on your computer, YouTube videos via a provided URL, or recordings made directly in the app. While functional, this is a personal project that I work on in my free time and very much a work in progress.

Features

fast, accurate, local transcription with optional speaker detection (via the excellent whisperx library)
GPT-4 for transcript processing & summarization
Manageable transcription quality settings
Preset prompt management for GPT processing

Installation

Prerequisites

Before installing the application, ensure you have the following dependencies:

Python 3.10
Cuda 11.8 or higher (optional, though highly recommended, for hardware acceleration. Requires a supported Nvidia GPU.)
ffmpeg

Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/jbmiller10/transcribrr.git
cd transcribrr

Create a Virtual Environment

Windows

python -m venv venv
.\venv\Scripts\activate

MacOS/Linux

python -m venv venv
source venv/bin/activate

Install Dependencies

Install Torch w/ Cuda (optional, though highly recommended, for hardware acceleration. Requires an Nvidia GPU and cuda toolkit)

pip3 install torch~=2.0.0 torchaudio~=2.0.0 --index-url https://download.pytorch.org/whl/cu118

Install requirements.txt

pip install -r requirements.txt

Usage

Run the main script to start the application:

python main.py

Configuration

Before usage, configure the application with your Hugging Face Access Token (optional, required for speaker detection/diarization) and OpenAI API keys through the 'Settings' menu. ll You can also adjust transcription quality, GPT model selection, max tokens, temperature, speaker detection settings, and your preset GPT prompts.

Speaker Detection/Diarization

To enable Speaker Detection, you will need a Huggingface access token (generate here) that you can set in the settings menu. Additionally, you will need to accept the usage terms for the following models while logged into your huggingface account: Segmentation and Speaker-Diarization.

How to Use

Choose the mode of transcription (File Upload or YouTube URL).
If using File Upload, select your video/audio file using the "Open Audio/Video File" button.
If using the YouTube URL mode, paste the YouTube link into the corresponding field.
Click the "Start Transcription" button to begin processing.
After transcription, you can process the text with GPT-4 using the "Process with GPT-4" button after setting your prompts.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
app		app
database		database
icons		icons
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
main.py		main.py
requirements.txt		requirements.txt

License

jbmiller10/transcribrr

Folders and files

Latest commit

History

Repository files navigation

Transcribrr

Features

Installation

Prerequisites

Clone the Repository

Create a Virtual Environment

Windows

MacOS/Linux

Install Dependencies

Install Torch w/ Cuda (optional, though highly recommended, for hardware acceleration. Requires an Nvidia GPU and cuda toolkit)

Install requirements.txt

Usage

Configuration

Speaker Detection/Diarization

How to Use

About

Topics

Resources

License

Stars

Watchers

Forks

Languages