Audio-Book-Corpus-for-European-Languages-# Audio-Book-Corpus

Audio Book Corpus (ABC) project has been developed to aid linguist researchers in the field of text to speech for purely academic purposes. In the current form, the corpus consists approximately 200 minutes of speech data in German language. Besides German, we are also in the process of developing Corpus Portuguese and Italian langugae. Future versions of the corpus shall encompass most European languages such as French, Spanish, Czech, Dutch, Polish, Romanian.

CORPUS DETAILS

The ABC project consists of three modules. The speech data is in wave file format, taken from Librivox, https://librivox.org/. Librivox provides free audio books on public domain for the academia on linguistic research.

TECHNIQUE FOR ANNOTATION :

After noise removal of audio data, we used semi-annotation based on deep learning and fuzzy matching technique. This corpus was annotated manually by 20%, and using deep learning techniques, we trained the machine to validate the rest of 80% speech data. In order to complete this, we successfully built a small GUI (python platform) to visualize the audio files and annotated text with perfect coherence and match with speech signals.

CONTRIBUTORS/CORRESPONDENCE

Ajinkya Kulkarni (ajinkyakulkarni14@gmail.com)

LICENSE FOR USAGE

This work/project is licensed under GNU GPL which gives users:

 the freedom to use the software for any purpose,

 the freedom to change the software to suit your needs,

 the freedom to share the software with your friends and neighbors, and

 the freedom to share the changes you make.

It is recommended that due acknowledgement is given to authors, Ajinkya Kulkarni and Parth Gargava, when using the corpus for research.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
German/DIE-VERWANDLUNG		German/DIE-VERWANDLUNG
Portuguse/mao_e_lua		Portuguse/mao_e_lua
Audio_book_annotation_GUI.ipynb		Audio_book_annotation_GUI.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-Book-Corpus-for-European-Languages-# Audio-Book-Corpus

About

Releases

Packages

Languages

License

ajinkyakulkarni14/Audio-Book-Corpus-for-European-Languages-

Folders and files

Latest commit

History

Repository files navigation

Audio-Book-Corpus-for-European-Languages-# Audio-Book-Corpus

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages