Spoken language identification with deep learning
Read more in the following blog posts:
- About TopCoder contest and our CNN-based solution implemented in Caffe (October 2015)
- About combining CNN and RNN using Theano/Lasagne (June 2016)
Theano/Lasagne models are here. The basic steps to run them are:
- Download the dataset from here or use your own dataset.
- Create spectrograms for recording using
augment_data.py. The latter will also augment the data by randomly perturbing the spectrograms and cropping a random interval of length 9s from the recording.
- Create listfiles for training set and validation set, where each row of the a listfile describes one example and has 2 values seperated by a comma. The first one is the name of the example, the second one is the label (counting starts from 0). A typical listfile will look like this.
- Change the
png_folderand listfile paths in