AudioDictate is an intuitive desktop application designed to transcribe audio content with high accuracy. It offers offline functionality for the transcription of WAV audio files, including the conversion of non-WAV formats into WAV. Featuring a straightforward graphical user interface, it provides a seamless experience for users to convert spoken language into text.
- Features
- Prerequisites
- Setup and Installation
- Running the Application
- How to Use
- Tools and Technologies
- Contributing
- License
- Contact
- Audio File Transcription: Converts spoken words from audio files into written text with high accuracy.
- WAV File Conversion: Automatically converts non-WAV files to WAV format for processing.
- Offline Functionality: Processes audio files offline, ensuring data privacy and security.
- Interactive GUI: Provides a user-friendly interface for file selection and displaying transcription results.
- Python 3.x
- Tkinter
- PyDub
- Vosk Speech Recognition Toolkit
git clone https://github.com/ascender1729/AudioDictate.git
cd AudioDictate
Create and activate a virtual environment:
python -m venv myenv
.\myenv\Scripts\Activate.ps1 # On Windows
source myenv/bin/activate # On Unix or MacOS
pip install -r requirements.txt
Download the Vosk model appropriate for your language and note the directory path where it is saved.
python transcribe.py
When you are finished using AudioDictate, you can deactivate the virtual environment:
deactivate
- Start AudioDictate.
- When prompted, input the directory path to the Vosk model.
- Use the application's interface to select your audio file. The application supports browsing and selecting the file directly within the app.
- If the selected file is not in WAV format, you'll be asked to select an output directory for the conversion process.
- After processing, the transcribed text will be displayed within the application window.
Area | Tool/Technology | Description |
---|---|---|
Audio Processing | PyDub | Handles the audio file format conversion. |
Speech Recognition | Vosk | Performs the speech-to-text transcription. |
GUI | Tkinter | Provides the graphical user interface for the application. |
Programming Language | Python | The core language used for developing the application. |
To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature
). - Commit your changes (
git commit -m 'Add YourFeature'
). - Push to the branch (
git push origin feature/YourFeature
). - Create a new Pull Request.
Distributed under the MIT License. See LICENSE
for more information.
Pavan Kumar - pavankumard.pg19.ma@nitp.ac.in
LinkedIn: linkedin.com/in/im-pavankumar
Project Link: AudioDictate