In this project we are going to build deep learning model to process and convert African language (Amharic) speech/voice to text format.
The World Food Program wants to deploy an intelligent form that collects nutritional information of food bought and sold at markets in three different countries in Africa - Ethiopia and Kenya.
The design of this intelligent form requires selected people to install an app on their mobile phone, and whenever they buy food, they use their voice to activate the app to register the list of items they just bought in their own language. The intelligent systems in the app are expected to live to transcribe the speech-to-text and organize the information in an easy-to-process way in a database.
Our responsibility is to build a deep learning model that is capable of transcribing a speech to text. The model we produce should be accurate and is robust against background noise. This project is made on the fourth week of 10Academy Machine Learning training session.
- Install Required Python Moduls
git clone https://github.com/10acad-group3/speech_recognition
cd speech_recognition
pip install -r requirements.txt
- Jupiter Notebook
cd notebooks
jupyter notebook
- Model Training ui (not-implimented)
mlflow ui
- Dashboard (not-implimented)
streamlit run app.py
The folder is being tarcked with DVC and the files are only shown after cloning and setting up locally. The sub-folder AMHARIC
contain training
and testing
files for our model. Both files contain similar file structure.
wav/
: a folder containing all audio filestext
: file contining the metadata (audio file name and cropsonding transcription)spk2utt
,trsTest.txt
,utt2spk
,wav.scp
: this are files provided with the dataset, Currently they don't have a purpose but could be used for future analysis.
1.0 preprocessing.ipynb
: Notebook file showing metada-generation, new features, Data exploration, Removing outliers, Clean audio and **clean text **1.0 acoustic_modeling_v2.ipynb
: Notebook file similar to1.0 preprocessing.ipynb
, but contain more analysis on audio visualization and text data anlysis2.0 outliers.ipynb
: visualize the effect of ourlier removal on features of the dataset. finally save the outlier cleaned file.3.0 speech_recognition.ipynb
: Notebook file, showing how to Tokenize, Augument and Generate data, from outlier cleaned data.audio_visualization
: Google colab file showing how to visualize audio file using audio wave and spectogram.4.0 acoustic_modeling
: on progress ...
audio_vis.py
: Helper class for visualizing and playing audio filesclean_audio.py
: Helper class for cleaning audio filesconfig.py
: Project configration and file pathsfile_handler.py
: Helper class for reading fileslog.py
: Helper class for loggingscript.py
: Utility functions