This script is a simple voice interface that allows you to record from your microphone with a keypress, the transcription for which is created via the Whisper Large V3 Turbo model, and automatically copy the transcribed material to the clipboard.
The intended use of this is to enable the user to paste their speech as text into the ChatGPT web application which at the moment of this writing still does not have voice capability.
- Real-time speech recording with a sampling rate of 16kHz.
- Transcription of recorded audio using Whisper Large V3 Turbo.
- Auto-copy the transcribed text to the system clipboard.
- Keyboard controls to start/stop recording and quit the program.
Before running the code, ensure that you have the following dependencies installed:
- Conda for managing the environment.
- Python 3.12.7 or higher.
- The necessary Python libraries, as listed below.
Create a new Conda environment and activate it:
conda create --name voice-interface python=3.12.7
conda activate voice-interface
Run the following command to install the required libraries:
pip install sounddevice numpy torch transformers pyperclip
The Whisper Large V3 Turbo model should be placed in the /whisper-large-v3-turbo
directory. You can find the model file here.
Ensure the downloaded model file is located at ./whisper-large-v3-turbo/model.safetensors
.
After setting up the environment and placing the model file in the correct directory, you can run the script:
python main.py
- Press
r
to start or stop recording. - Press
q
to quit the application.
The transcribed text will be printed to the console and copied to your clipboard for easy use.
- The transcription pipeline uses the Whisper Large V3 Turbo model, which works best with a 16kHz sampling rate.
- The application automatically detects if a CUDA-enabled GPU is available and will utilize it for faster processing if available.
- Audio input is recorded using the system's default microphone.
This project is licensed under the terms of the MIT license.