This repo will be my experiments with audio data processing and Deep Learning for speech recognition.
I will be using some of the well known datasets used in audio machine learning as well as a few lesser known sets. The following datasets will be used:
- TensorFlow Speech Recognition Challenge
- VOiCES
- LibriSpeech 960h
- Switchboard 300h
- Mozilla's Common Voice Dataset
- Speech Accent Archive
- Audio, Speech, and Vision Processing Lab Emotional Sound database (ASVP-ESD)
├── Data <- contains csv data and nested subfolder of audio data
│ └── ...
├── images <- contains saved images from notebooks.
│ └── ...
├── notebooks <- contains more in depth notebooks
│ └── .ipynb <-
│ └── .ipynb <-
│ └── .ipynb <-
│ └── .ipynb <-
├── .gitattributes <- file specifying files for git lfs to track
├── .gitignore <- file specifying files/directories to ignore
├── README.md <- Top-level README
└──