Urban Sound 8K Classification Using CNN
-
Urban Sound Classification Dataset: The Urban Sound Classification dataset is a popular dataset widely used in sound classification tasks. It can be accessed through the official website.
-
Understanding Dataset Structure and Feature Extraction:
- Regardless of the dataset used, it's crucial to comprehend its structure and methods for extracting necessary features.
-
Downloading UrbanSound8K Dataset:
- The UrbanSound8K dataset can be downloaded from this link. Upon downloading, it yields a compressed tar file of approximately 6GB in size.
-
Dataset Contents:
- Upon extraction, the dataset comprises two main folders: 'audio' and 'metadata'.
-
Audio Folder:
- The 'audio' folder contains 10 subfolders named fold1, fold2, and so forth, each containing around 800 audio files, each lasting 4 seconds.
-
Metadata Folder:
- The 'metadata' folder contains a CSV file with various columns such as file_id, label, class_id corresponding to the label, salience, etc.
-
Detailed Description:
- More detailed information about the dataset structure and its contents can be found here.
- Meyda
- pyAudioAnalysis
- Speech Recognition with Deep Learning
- Urban Sound Classification Part 1
- Audio & Voice Processing with Deep Learning
- Librosa Library: This library can be installed using the following commands:
Librosa facilitates reading audio files and converting them into amplitude values for each sample. For instance, a 4-second audio file with a sampling rate of 22050 Hz translates to an array of size 88200, with each element representing an amplitude sample.
pip install librosa pip install ffmpeg-python