The AMMI 2020 Speech Recognition course was taught by Gabriel Synnaeve, Neil Zeghidour, Emmanuel Dupoux, Laurent Besacier and Morgane Rivera.
The repo contains two major recordings
in Hausa language: 2h read speech and raw speech on Frog Story. They were collected using a mobile app, lig-aikuma. (Source: https://lig-aikuma.imag.fr/download/)
Caveat: This record is made up of approximately 2h speech read from text (source: https://github.com/afrisauti/hausa_text_corpus) with long sentences (don't be suprised to find a prompt of up to 7 lines). It is very useful for two categories of people:
- Researchers who truly seek raw speech from very long sentences.
- Researchers who are curious to know the effect of long sentences on their speech or language models.
Despite the fact that short sentences are preferred for raw speech, this long-sentences speech recording is very useful as it has its own role to play in research where testing robustness of models is required. This explains why a recorded prompt could vary from a minute to 5 or 7 minutes. This data is worth appreciating due to its uniqueness and research prospects.
This data contains raw speech recorded (in Hausa language) by describing events as found in different images that made up the story. (Source: https://lig-aikuma.imag.fr/wp-content/uploads/2018/07/FrogStory.zip)
- There is possiblity the data contains background noise, like the humming sound of A.C or the soft chatters of flatmates.
- In case my voice sounded sleepy, the recording took place in the middle of the night (I was hoping to find no one awake).
- The recording was done after undergoing a minor dental surgery (in case I sounded funny).
- The recorded language, Hausa, is the language of my region of birth (not my mother-tongue). Apologies to the native Hausa speakers for all incorrectly pronounced words and every injustice done to the stress patterns.
Tunde Ajayi
Machine Intelligence Student, African Masters of Machine Intelligence (AMMI), African Institute for Mathematical Sciences (AIMS) Ghana.
Many thanks to the following contributors:
Dattijo Murtala Makama - for providing native speaker version of the frog story.
Abubakr Babiker - for data preprocessing.
Laurent Besacier - for initiating the project.