AudioDictate

AudioDictate is an intuitive desktop application designed to transcribe audio content with high accuracy. It offers offline functionality for the transcription of WAV audio files, including the conversion of non-WAV formats into WAV. Featuring a straightforward graphical user interface, it provides a seamless experience for users to convert spoken language into text.

Features

Audio File Transcription: Converts spoken words from audio files into written text with high accuracy.
WAV File Conversion: Automatically converts non-WAV files to WAV format for processing.
Offline Functionality: Processes audio files offline, ensuring data privacy and security.
Interactive GUI: Provides a user-friendly interface for file selection and displaying transcription results.

Prerequisites

Python 3.x
Tkinter
PyDub
Vosk Speech Recognition Toolkit

Setup and Installation

Clone the Repository

git clone https://github.com/ascender1729/AudioDictate.git
cd AudioDictate

Environment Setup

Create and activate a virtual environment:

python -m venv myenv
.\myenv\Scripts\Activate.ps1  # On Windows
source myenv/bin/activate  # On Unix or MacOS

Install Dependencies

pip install -r requirements.txt

Download Vosk Model

Download the Vosk model appropriate for your language and note the directory path where it is saved.

Running the Application

python transcribe.py

Shutting Down

When you are finished using AudioDictate, you can deactivate the virtual environment:

deactivate

How to Use

Start AudioDictate.
When prompted, input the directory path to the Vosk model.
Use the application's interface to select your audio file. The application supports browsing and selecting the file directly within the app.
If the selected file is not in WAV format, you'll be asked to select an output directory for the conversion process.
After processing, the transcribed text will be displayed within the application window.

Tools and Technologies

Area	Tool/Technology	Description
Audio Processing	PyDub	Handles the audio file format conversion.
Speech Recognition	Vosk	Performs the speech-to-text transcription.
GUI	Tkinter	Provides the graphical user interface for the application.
Programming Language	Python	The core language used for developing the application.

Contributing

To contribute:

Fork the repository.
Create a new branch (git checkout -b feature/YourFeature).
Commit your changes (git commit -m 'Add YourFeature').
Push to the branch (git push origin feature/YourFeature).
Create a new Pull Request.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Pavan Kumar - pavankumard.pg19.ma@nitp.ac.in

LinkedIn: linkedin.com/in/im-pavankumar

Project Link: AudioDictate

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
transcribe.py		transcribe.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioDictate

Table of Contents

Features

Prerequisites

Setup and Installation

Clone the Repository

Environment Setup

Install Dependencies

Download Vosk Model

Running the Application

Shutting Down

How to Use

Tools and Technologies

Contributing

License

Contact

About

Sponsor this project

Languages

License

ascender1729/AudioDictate

Folders and files

Latest commit

History

Repository files navigation

AudioDictate

Table of Contents

Features

Prerequisites

Setup and Installation

Clone the Repository

Environment Setup

Install Dependencies

Download Vosk Model

Running the Application

Shutting Down

How to Use

Tools and Technologies

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages