Skip to content

ascender1729/AudioDictate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AudioDictate

AudioDictate is an intuitive desktop application designed to transcribe audio content with high accuracy. It offers offline functionality for the transcription of WAV audio files, including the conversion of non-WAV formats into WAV. Featuring a straightforward graphical user interface, it provides a seamless experience for users to convert spoken language into text.

Table of Contents

Features

  • Audio File Transcription: Converts spoken words from audio files into written text with high accuracy.
  • WAV File Conversion: Automatically converts non-WAV files to WAV format for processing.
  • Offline Functionality: Processes audio files offline, ensuring data privacy and security.
  • Interactive GUI: Provides a user-friendly interface for file selection and displaying transcription results.

Prerequisites

  • Python 3.x
  • Tkinter
  • PyDub
  • Vosk Speech Recognition Toolkit

Setup and Installation

Clone the Repository

git clone https://github.com/ascender1729/AudioDictate.git
cd AudioDictate

Environment Setup

Create and activate a virtual environment:

python -m venv myenv
.\myenv\Scripts\Activate.ps1  # On Windows
source myenv/bin/activate  # On Unix or MacOS

Install Dependencies

pip install -r requirements.txt

Download Vosk Model

Download the Vosk model appropriate for your language and note the directory path where it is saved.

Running the Application

python transcribe.py

Shutting Down

When you are finished using AudioDictate, you can deactivate the virtual environment:

deactivate

How to Use

  1. Start AudioDictate.
  2. When prompted, input the directory path to the Vosk model.
  3. Use the application's interface to select your audio file. The application supports browsing and selecting the file directly within the app.
  4. If the selected file is not in WAV format, you'll be asked to select an output directory for the conversion process.
  5. After processing, the transcribed text will be displayed within the application window.

Tools and Technologies

Area Tool/Technology Description
Audio Processing PyDub Handles the audio file format conversion.
Speech Recognition Vosk Performs the speech-to-text transcription.
GUI Tkinter Provides the graphical user interface for the application.
Programming Language Python The core language used for developing the application.

Contributing

To contribute:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeature).
  3. Commit your changes (git commit -m 'Add YourFeature').
  4. Push to the branch (git push origin feature/YourFeature).
  5. Create a new Pull Request.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Pavan Kumar - pavankumard.pg19.ma@nitp.ac.in

LinkedIn: linkedin.com/in/im-pavankumar

Project Link: AudioDictate