Skip to content

MParvan/Audio_Scene_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Audio_Scene_Classification

This repository provides an implementation of this paper.

In this paper, Audio Scene classification is performed using deep learning methods. The database used is the LITIS-rouen collection which contains 3026 data from 19 different environments (classes), each of which has a length of 30 seconds. The sampling frequency is 22050 Hz. Considering 3026 data, as defined in the database, 2419 were used as training data and the rest as test data.

In general, processing methods (for audio, etc.) are performed using machine learning methods, which include preprocessing, feature extraction, and classification. In order to preprocessing, background noise removal methods (to create an auxiliary channel) and log mel spectrogram (to show as many features as possible in the signal) were used. CNN, GRU and Attention were used to extract the features, and finally a linear SVM layer was used instead of the Softmax layer for classification.

The network training process is done in such a way that firstly the number of data is increased by between class data augmentation method and the network is trained using the Softmax function in the classification layer. After complete training of the neural network, the softmax layer is removed and the SVM classifier is replaced. In this case, the actual data is used to learn SVM and the data generated by the between class method is not used for SVM training.

Other parameters used in the neural network include the number of epochs, batch size, optimizer and learning rate, which are selected 500, 100, Adam and 0.0001, respectively.

For more information refer to the original paper.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages