This project is a CNN to classify audio files. It was developed as a final project for the course of Artificial Intelligence from Engineering to Arts at the University of Roma Tre.
The project use a simple CNN to classify audio files. The CNN is trained on the UrbanSound8K and on the EMOVO datasets.
The dataset contains 8732 labeled sounds of urban sounds from 10 classes:
- air_conditioner,
- car_horn,
- children_playing,
- dog_bark,
- drilling,
- enginge_idling,
- gun_shot,
- jackhammer,
- siren,
- street_music.
The classes are drawn from the urban sound taxonomy. For more details, click here for the paper.
The dataset contains 588 labeled sounds built from the voices of 6 actors who played 14 sentences simulating 7 emotional states:
- dis: disgust
- gio: joy
- pau: fear
- rab: anger
- sor: surprise
- tri: sad
- neu: neutral
git clone https://github.com/Marini97/Audio-CNN.git
It's recommended to use a virtual environment and then run the following command to install the required packages:
pip install -r requirements.txt
Download the datasets from the following links:
Before running the code, you need to set up the folder structure as described in the README files inside the folders 'Emovo' and 'UrbanSound8K'.
Inside the folders 'Emovo' and 'UrbanSound8K' there are the following files:
- 'prep_dataset.ipynb': used to create the dataset of images for the CNN training.
- 'train.ipynb': used to train the model with k-fold cross validation.
- 'predict.ipynb': used to predict the class of a random audio file.
If you're getting the error:
ImportError: cannot import name 'builder' from 'google.protobuf.internal' ...
You can fix it by copying the file 'builder.py' to your enviroment C:\ ... \Lib\site-packages\google\protobuf\internal