Skip to content

Features extraction, training and evaluation of a featuring comparison, made for the Advanced Signal Processing Laboratory course, in Tampere University of Technology (TUT).

Notifications You must be signed in to change notification settings

asik03/tut-AudioSceneClassifation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Scene Classifier

The main goal of this project is present a performance comparison when using different features. We will use the same model based on a convolutional neural network and we will train it using different features as input. The impact on the performance depending on the features used will be surveyed.

Feature extraction

We will train different networks using the following features, and we will present a comparison.

  1. Mel spectogram: Mel-scaled power spectogram
  2. MFCC: Mel-Frequency Cepstral Coefficients
  3. Chroma STFT: Chromagram from a waveform or power spectrogram
  4. Chroma CQT: Constant-Q chromagram
  5. Spectral Contrast: Spectral contrast

All the proposed are spectral features based, to extract them we will use the python library librosa.

[Todo] Detail each transformation.

Convolutional Neural Network

Our CNN model is based in this reference, and it is as follows

alt text

For each stereo audio signal, we will pre-process independently left and right channels extracting the features. Both features will be the input for two independent CNN, as described in the previous figure, and after that we will concatenate to estimate the category through a softmax regression.

About

Features extraction, training and evaluation of a featuring comparison, made for the Advanced Signal Processing Laboratory course, in Tampere University of Technology (TUT).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages