Skip to content

Solution for TensorFlow Speech Recognition Challenge on Kaggle (125th place, top 10%)

License

Notifications You must be signed in to change notification settings

creafz/kaggle-speech-recognition

Repository files navigation

Solution for TensorFlow Speech Recognition Challenge on Kaggle (125th place, top 10%). I used mel-scaled spectrograms and mel-frequency cepstral coefficients as inputs for two NASNet-A Convolutional Neural Networks and then averaged their predictions.

Competition

In the competition, we were given a Speech Commands Datasets which includes 65,000 one-second long utterances of 30 short words, by thousands of different people. The goal is to assign one of the following 12 labels to each command: yes, no, up, down, left, right, on, off, stop, go, silence, unknown.

My solution

I used PyTorch to train two NASNet-A Convolutional Neural Networks. The First network was trained on mel-scaled spectrograms, the second - on mel-frequency cepstral coefficients. Then I averaged their predictions to make a final submission.

Examples of mel-scaled spectrograms for speech commands: Mel-scaled spectrograms for speech commands

Directory structure

The code expects the following directory structure:

input/
├── train/
    ├──  audio/
    └── ...
├── test/
    ├──  audio/
    └── ...
└── sample_submission.csv
code/
├── __init__.py
├── augmentation_transforms.py
├── custom_transforms.py
├── config.py
└── ...
predictions/
└── ...
saved_models/
└── ...
submissions/
└── ...

Requirements

To run the code

  • Adjust config variables in config.py
  • Execute run.sh file

About

Solution for TensorFlow Speech Recognition Challenge on Kaggle (125th place, top 10%)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published