DCASE 2018 - Task 5 - Baseline system
- Gert Dekkers (email@example.com, https://iiw.kuleuven.be/onderzoek/advise/People/Gert_Dekkers)
- Peter Karsmakers (firstname.lastname@example.org, https://iiw.kuleuven.be/onderzoek/advise/People/PeterKarsmakers)
- Toni Heittola (email@example.com, http://www.cs.tut.fi/~heittolt/, https://github.com/toni-heittola)
- Clone repository from Github. The baseline code for Task 5 is available under subdirectory
- Install requirements with command:
pip install -r requirements.txt.
- Run the application with default settings:
- The code is mainly using the DCASE util library. It is advised to read the manual.
Note: The baseline has been tested on CentOS Linux 7.4 and Windows 7 using Python 3.6 and tensorflow 1.4.
Make sure you have dcase_util v0.2.4 (or higher) installed (release 16/05/2018). An error in the Task 5 dataset was reported here. Earlier versions of dcase_util refer to an older repository. A quick fix is available here in case you prefer not to download the entire dataset again.
Important note 2
A bug was reported on baselines prior to version 3.0.0. Features were not normalized at the data processing chain (when reading in the data at the data generator).
Baseline system description
This is the baseline system for the Task 5 of the DCASE2018 challenge. The baseline system is intended to lower the hurdle to participate the DCASE challenge(s). It provides an entry-level approach which is simple but relatively close to the state of the art systems. High-end performance is left for the challenge participants to find.
Participants are allowed to build their system on top of the given baseline system. The system has all needed functionality for dataset handling, storing / accessing features and models, and evaluating the results, making the adaptation for one's needs rather easy. The baseline system is also a good starting point for entry level researchers.
task5.py contains the main code of the baseline system and is largely controlled by the configuration file
task5.yaml. The code handles downloading and reading the dataset, calculating the features and models and evaluating the results. The code is commented to assist you in understanding the structure. If there are still some questions, feel free to contact us.
. ├── task5.py # main code ├── task5.yaml # configuration file (system state machine control, baseline parameters, ...) ├── task5_datagenerator.py # data generator class ├── task5_utils # some additional functions used in the baseline system code ├── README.md # This file └── requirements.txt # External module dependencies
By default, the code is set to
development mode. In development mode results are acquired in 4-fold cross-validation based fashion. This mode is used for developing your system. The code provides an option to change to
evaluation mode which then uses the full development dataset to train a model to be tested on the evaluation dataset. The option for
evaluation mode is available in the configuration file (
Feature/Machine Learning parameters
During the recording campaign, data was measured simultaneously using multiple microphone arrays (nodes) each containing 4 microphones. Hence, each domestic activity is recorded as many times as there were microphones. The baseline system trains a single classifier model that takes a single channel as input. Each parallel recording of a single activity is considered as a different example during training. The learner in the baseline system is based on a Neural Network architecture using convolutional and dense layers. As input, log mel-band energies are provided to the network for each microphone channel separately. In the prediction stage a single outcome is computed for each node by averaging the 4 model outcomes (posteriors) that were computed by evaluating the trained classifier model on all 4 microphones.
The baseline system parameters are as follows:
- Frame size: 40 ms (50% hop size)
- Feature matrix:
- 40 log mel-band energies in 501 successive frames (10 s)
- Neural Network:
- Input data: 40x501 (each microphone channel is considered to be a separate example for the learner)
- 1D Convolutional layer (filters: 32, kernel size: 5, stride: 1, axis: time) + Batch Normalization + ReLU activation
- 1D Max Pooling (pool size: 5, stride: 5) + Dropout (rate: 20%)
- 1D Convolutional layer (filters, 64, kernel size: 3, stride: 1, axis: time) + Batch Normalization + ReLU activation
- 1D Global Max Pooling + Dropout (rate: 20%)
- Dense layer (neurons: 64) + ReLU activation + Dropout (rate: 20%)
- Softmax output layer (classes: 9)
- Optimizer: Adam (learning rate: 0.0001)
- Epochs: 500
- On each epoch, the training dataset is randomly subsampled so that the number of examples for each class match the size of the smallest class
- Batch size: 256 * 4 channels (each channel is considered as a different example for the learner)
- Fusion: Output probabilities from the four microphones in a particular node under test are averaged to obtain the final posterior probability.
- Model selection: The performance of the model is evaluated every 10 epochs on a validation subset (30% subsampled from the training set). The model with the highest Macro-averaged F1-score is picked.
When running in development mode (
eval_mode = False) the baseline system provides results for a 4-fold cross-validation setup. The table below shows the averaged
Macro-averaged F1-score over these 4 folds. The F1-score is calculated for each class seperately and averaged over all classes to obtain the
Macro-averaged F1-score. A full 10s multi-channel audio segment is considered to be one sample.
|Social activity||93.92 %|
|Vacuum cleaning||99.31 %|
|Watching TV||99.59 %|
|Macro-averaged F1-score||84.50 %|
Note: The performance might not be exactly reproducible but similar results should be obtainable.
3.0.0 / 2018-07-24
- Fixed bug of normalization not being performed in the data processing chain
2.0.1 / 2018-07-16
- Fixed bug when no labels available in eval set
2.0.0 / 2018-06-29
- Added evaluation mode
1.0.3 / 2018-04-16
- adjusted results (due to reported error)
1.0.2 / 2018-04-16
- changed requirements of dcase_util to match newest release of dataset
1.0.1 / 2018-04-16
- Added validation split saving for additional safety
- Minor edits
1.0.0 / 2018-04-16
- First public release
This software is released under the terms of the MIT License.