The main goal of the project is to create a graphical user interface for the Whisper speech recognition library. GUI created using the Flet library
The project supports two recognition modes:
- single file process - recognition of only one file
- batch process - processing multiple files at once
You can select a mode by clicking on the popup menu button on the main tab:
You need to select an audio file, select a recognition model and click the Recognize button. The recognition process is displayed in real time and you can copy part of the result. look at the example below:
You need select an output folder, audio files (one or more), select recognition model. You can remove concrete selected file from audio list or clear all files
Take a look at the process of recognizing multiple files:
Whisper requires FFMpeg to be installed
- By default the model is downloaded to the following directory "~/.cache/whisper". You can't change this value via ui
- The output tab does not display information about the model download process.The beginning and end of the model download process is indicated as follows:
Model download directory: ~./cache/whisper #download model started Model {model_name} downloaded #download model finished
- In Linux, the FilePicker control depends on Zenity when running Flet as an app. This is not a requirement when running Flet in a browser.
To install Zenity on Ubuntu/Debian run the following commands:
sudo apt-get install zenity