Skip to content

AyishaR/Spokendigit

Repository files navigation

Spoken digit recognition

The dataset is a subset of the Tensorflow speech commands dataset that includes other sound recordings besides the digits 0–9.

The project has three approaches to classifying the recordings:

  1. Logistic Regression using five extracted features - 76.19% accuracy.
  2. CNN using Mel spectrogram - 95.81% accuracy.

There are five .ipynb files:

  • Feature extraction - The necessary CSV files and features used by the three approaches are extracted.
  • Feature visualization - The features are plotted for two examples in each class.
  • Spokendigit-Five features - Implementation of logistic regression using five extracted features.
  • Spokendigit-CNNs - Implementation of CNN using Mel spectrogram.

Medium article - Torch: Spoken digits recognition from features to model.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages