Skip to content

Generative Audio Synthesis Problem Statement(ML01) for TRI-NIT Hackathon 2023.

License

Notifications You must be signed in to change notification settings

yash-srivastava19/TRINIT_EzDub_ML01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TRI NIT Hackathon 2023

Track Chosen - ML01

Problem

Sustainability in language technology is crucial for supporting and preserving linguistic diversity, which is an important aspect of human culture and communication. However, many underrepresented languages and dialects are not well-supported by current speech technology due to a shortage of available speech data for training models. Developing speech recognition models for these languages and dialects can play a crucial role in promoting sustainability in language technology and preserving linguistic diversity.

Objective:

  • We propose to develop a generative model that can create synthetic speech samples for underrepresented languages and dialects.
  • The model should be able to generate speech samples in a variety of languages and dialects, with a focus on those that are underrepresented in existing speech datasets.
  • The generated samples can then be used to train and improve speech recognition models for these languages and dialects, promoting linguistic diversity and reducing language barriers in speech technology.

Approach

The problem stood out more than any of the problem in any of the track. The motto from starting was - A Voice for Everyone. This project will help in aiding the low resource languages. We did everything to provide support for such languages such as Tamil, Telegu, Kannada and many more. Option of chaging the accent is also there.

We know hackathons are a bit messy, but still we have tried to make our approach as simple as possible. Glancing through our solution you'll find a clean code. We believe that alone is a good selling point 😉

Solution

We made Voce per Tutti - which translates to 'A Voice For Everyone'. We provide Text-To-Speech(Speech Synthesis), Text-To-Text(Speech Translation), Speech-To-Text(Automatic Speech Recognition). The application is completely written in Python and built upon GRadio and other packages. Clear documentation of the code done to ensure better readability(and hence better software). Complete deployment of the application is done. Check out the related links.

  • Check out the live demo(deployed at HuggingFace Spaces) here
  • Website for the projects(deployed at Firebase) here

About

Generative Audio Synthesis Problem Statement(ML01) for TRI-NIT Hackathon 2023.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published