Spoken digit recognition

The dataset is a subset of the Tensorflow speech commands dataset that includes other sound recordings besides the digits 0–9.

The project has three approaches to classifying the recordings:

Logistic Regression using five extracted features - 76.19% accuracy.
CNN using Mel spectrogram - 95.81% accuracy.

There are five .ipynb files:

Feature extraction - The necessary CSV files and features used by the three approaches are extracted.
Feature visualization - The features are plotted for two examples in each class.
Spokendigit-Five features - Implementation of logistic regression using five extracted features.
Spokendigit-CNNs - Implementation of CNN using Mel spectrogram.

Medium article - Torch: Spoken digits recognition from features to model.