Skip to content

anujdutt9/Audio-Scene-Classification

Repository files navigation

Audio Scene Classification

This repository contains the code for project "Audio Scene Classification". This project uses audio in the nearby environment to classify the things in a scene without using a visual component.

PROJECT STATUS: Ongoing

To convert WAVE audio files from 44.1 or 48 KHz to 16 KHz PCM WAVE file, use the following command from the current audio files folder:

for f in *.wav;do 
  ffmpeg -i $f -ar 16000 path_to_destination_folder/${f}; 
done

Requirements

1. Python 3.6

2. Librosa 0.6 [Audio Processing Library]

pip3 install librosa --upgrade

3. Matplotlib

pip3 install matplotlib --upgrade

4. Keras

pip3 install keras --upgrade

5. Tensorflow

pip3 install tensorflow --upgrade

or

pip3 install tensorflow-gpu --upgrade

NOTE: Tensorflow GPU requires CUDA and cuDNN.

6. Pickle

pip3 install pickle --upgrade

7. TQDM [for Progressbar]

pip3 install tqdm --upgrade

Dataset

The dataset I am using for this project is the "UrbanSound dataset".

Download the dataset from the link below and place inside the dataset folder.

https://serv.cusp.nyu.edu/projects/urbansounddataset/

Extracted Audio Features

The main extracted features from the audio are:

a). Mel Spectrogram: Mel-scaled Power Spectrogram

b). MFCC: Mel-Frequency Cepstral Coefficients

c). Chorma STFT: Compute a chromagram from a waveform or power spectrogram

d). Spectral Contrast: Compute spectral contrast

e). Tonnetz: Computes the tonal centroid features (tonnetz)

Following are the extracted features for some audio files:

1. Air Conditioner Audio Features

2. Car Horn Audio Features

3. Children Playing Audio Features

4. Dog Barking Audio Features

5. Idle Engine Audio Features

Releases

No releases published

Packages

No packages published