Skip to content

Latest commit

 

History

History
15 lines (11 loc) · 980 Bytes

README.md

File metadata and controls

15 lines (11 loc) · 980 Bytes

Spoken digit recognition

The dataset is a subset of the Tensorflow speech commands dataset that includes other sound recordings besides the digits 0–9.

The project has three approaches to classifying the recordings:

  1. Logistic Regression using five extracted features - 76.19% accuracy.
  2. CNN using Mel spectrogram - 95.81% accuracy.

There are five .ipynb files:

  • Feature extraction - The necessary CSV files and features used by the three approaches are extracted.
  • Feature visualization - The features are plotted for two examples in each class.
  • Spokendigit-Five features - Implementation of logistic regression using five extracted features.
  • Spokendigit-CNNs - Implementation of CNN using Mel spectrogram.

Medium article - Torch: Spoken digits recognition from features to model.