This is a model for identifying the language spoken in a short audio segment.
To install the required libraries (tested on Ubuntu 17.11) run:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
-
Convert an audio file to a spectrogram:
python data/dataset_gen.py -z speech.wav -o .
-
Obtain the prediction using a pre-trained model:
python main.py --model-dir your-trained-model/ --params your-trained-model/params.json --model combo --predict speech.png
-
Prepare a dataset:
- Place your spectrograms in a folder
- Create a test set CSV file containing "Filename,Language" pairs
- Create an evaluation set CSV file (same format as the test)
-
Train the model:
python main.py --model-dir your-trained-model/ --params your-trained-model/params.json --model combo --image-dir your-data/ --train-set your-data/train-set.csv --eval-set your-data/eval-set.csv
This project was developed by Rimvydas Naktinis during Pi School's AI programme in Fall 2017.