 <pre>
 ____   ____    _    ____  _____                          _      _     
|  _ \ / ___|  / \  / ___|| ____|     _ __ ___   ___   __| | ___| |___ 
| | | | |     / _ \ \___ \|  _| _____| '_ ` _ \ / _ \ / _` |/ _ \ / __|
| |_| | |___ / ___ \ ___) | |__|_____| | | | | | (_) | (_| |  __/ \__ \
|____/ \____/_/   \_\____/|_____|    |_| |_| |_|\___/ \__,_|\___|_|___/
                                                                        
</pre>

# DCASE-models Notebooks
Python Notebooks for [DCASE-models](https://github.com/pzinemanas/DCASE-models)

---

### About 
This Notebook reproduces the results for **Sound Event Detection (SED)** presented in:
<ul>
<li><a href="http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_scaper_waspaa_2017.pdf"><strong>
    Scaper: A Library for Soundscape Synthesis and Augmentation</strong></a>
    J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello.
    In IEEE Workshop on Applications of Signal Processing to
    Audio and Acoustics (WASPAA), New Paltz, NY, USA, Oct. 2017.
    <br>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_scaper_waspaa_2017.pdf"> PDF </a>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="https://ieeexplore.ieee.org/document/8170052"> IEEE</a>
    </li>   
</ul>

### Overview

The paper introduces [Scaper](https://github.com/justinsalamon/scaper), an open-source library for soundscape synthesis and augmentation. To illustrate the potential of the library, the authors generate a dataset of 10,000 sound-scapes, namely [URBAN-SED](http://urbansed.weebly.com/), and use it to compare the performance of two state-of-the-art algorithms for sound event detection:
- the Convolutional Recurrent Neural Net-work (CRNN) proposed by Cakir et al. [[C-CRNN]](https://ieeexplore.ieee.org/document/7933050)
- an adaptation of the Convolutional Neural Network (CNN) proposed by Salamon and Bello [[SB-CNN]](http://ieeexplore.ieee.org/document/7829341/)

### Organization

The Notebook is organized into the following sections.
* [1. Load parameters](#LoadParameters)
* [2. Extract features](#ExtractFeatures)
* [3. Load data](#LoadData)
* [4. Initialize model](#InitModel)
* [5. Train model](#TrainModel)
* [6. Evaluate model](#EvaluateModel)

In [1]:
%load_ext autoreload
%autoreload 2
rootdir_path = '../../'
import sys
import os
import json
import warnings
import glob
import numpy as np
import argparse

sys.path.append(rootdir_path)
from dcase_models.util.files import load_json, mkdir_if_not_exists
from dcase_models.data.data_generator import DataGenerator
from dcase_models.data.datasets import URBAN_SED
from dcase_models.data.features import MelSpectrogram
from dcase_models.model.models import SB_CNN_SED
from dcase_models.data.scaler import Scaler
from dcase_models.util.files import load_json
from dcase_models.util.data import evaluation_setup

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


<a id="LoadParameters"></a>
## 1. Load parameters

Dataset, feature extraction and training parameters are stored in a json file on the root directory.

In [2]:
# load all parameters from json file
params = load_json(os.path.join(rootdir_path, 'parameters.json'))
# set the dataset we are going to use
dataset = 'URBAN_SED'

# get dataset parameters
params_dataset = params["datasets"][dataset]

# get feature extraction parameters
params_features = params["features"]

# get training parameters
params_train = params["train"]
# Replacing default training parameters by paper parameters
params_train["epochs"]=300
params_train["early_stopping"]=100

params_model = params["models"]["SB_CNN_SED"]

Check that the values of the parameters are correct.

In [3]:
# print the dataset parameters 
print("Dataset Parameters:\n", json.dumps(params_dataset, indent=4, sort_keys=True))
# print feature extraction parameters 
print("Features' Parameters:\n",json.dumps(params_features, indent=4, sort_keys=True))
# print training parameters 
print("Training Parameters:\n",json.dumps(params_train, indent=4, sort_keys=True))

Dataset Parameters:
 {
    "dataset_path": "datasets/URBAN-SED_v2.0.0",
    "evaluation_mode": "train-validate-test"
}
Features' Parameters:
 {
    "MelSpectrogram": {
        "mel_bands": 64,
        "n_fft": 1024
    },
    "Openl3": {
        "content_type": "env",
        "embedding_size": 512,
        "input_repr": "mel256"
    },
    "Spectrogram": {
        "n_fft": 1024
    },
    "audio_hop": 690,
    "audio_win": 1024,
    "sequence_hop_time": 1.0,
    "sequence_time": 2.0,
    "sr": 22050
}
Training Parameters:
 {
    "batch_size": 32,
    "considered_improvement": 0,
    "early_stopping": 100,
    "epochs": 300,
    "learning_rate": 0.001,
    "optimizer": "Adam",
    "verbose": 1
}


<a id="ExtractFeatures"></a>
## 2. Extract features

Initialize Feature Extractor and Data Generator.

In [4]:
# Initialize Feature Extractor
features = MelSpectrogram(sequence_time=params_features['sequence_time'], 
                          sequence_hop_time=params_features['sequence_hop_time'], 
                          audio_win=params_features['audio_win'], 
                          audio_hop=params_features['audio_hop'], 
                          sr=params_features['sr'],
                          **params_features['MelSpectrogram'])

print(features.get_shape())

(10, 64, 64)


In [5]:
# Initialize Data Generator as an instance of URBAN_SED
kwargs = {'sequence_hop_time': params_features['sequence_hop_time']}
dataset = URBAN_SED(os.path.join(rootdir_path, params_dataset["dataset_path"]), **kwargs)

Check if dataset exists, and download it if doesn't exist.

In [6]:
dataset.download()

Extract the features (if they were not extracted before).

In [7]:
if not features.check_if_extracted(dataset):
    features.extract(dataset)
print('Done!')

Done!


<a id="LoadData"></a>
## 3. Load data

In [8]:
# Get train/test folds
folds_train, folds_val, folds_test = evaluation_setup('fold1', dataset.fold_list,\
                                             params_dataset['evaluation_mode'],
                                             use_validate_set=True)
#initialise Data Generator
data_gen_train = DataGenerator(dataset, features, folds=folds_train,\
                                batch_size=params_train['batch_size'],
                                shuffle=True, train=True, scaler=None)

And also fit a scaler to transform training data.

In [9]:
scaler = Scaler(normalizer=params_model['normalizer'])
print('Fitting features ...')
scaler.fit(data_gen_train)
print('Done!')

data_gen_train.set_scaler(scaler)


Fitting features ...
Done!


Initialise validation data generator.

In [10]:
data_gen_val = DataGenerator(dataset, features, folds=folds_val,\
                             batch_size=params_train['batch_size'],
                             shuffle=False, train=False, scaler=scaler)

In [11]:
print(f"X: {data_gen_train.get_data_batch(0)[0][0].shape}")
print(f"Y: {data_gen_train.get_data_batch(0)[1][0].shape}")

X: (64, 64)
Y: (10,)


<a id="InitModel"></a>
## 4. Initialize model

In [12]:
X, y = data_gen_train.get_data_batch(0)
 
n_frames_cnn = X.shape[1]
n_freq_cnn = X.shape[2]
n_classes = y.shape[1]

metrics = ['sed']

model_container = SB_CNN_SED(model=None, model_path=None, n_classes=n_classes, 
                             n_frames_cnn=n_frames_cnn, n_freq_cnn=n_freq_cnn,
                             metrics=metrics, **params_model['model_arguments'])

model_container.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 64, 64)            0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 64, 64, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 60, 60, 64)        1664      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 64)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 30, 30, 64)        256       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 26, 26, 64)        102464    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 64)        0         
__________

<a id="TrainModel"></a>
## 5. Train model

In [None]:
#define path to save weights and training log
mkdir_if_not_exists('./output')
exp_folder = './output/SB_CNN'
mkdir_if_not_exists(exp_folder)


kwargs = {'label_list': dataset.label_list}

# Uncomment the following line to running fewer epochs 
#params_train["epochs"]= 50

model_container.train(data_gen_train, data_gen_val, weights_path=exp_folder, **params_train, **kwargs)

Epoch 1/300
F1 = 0.3037, ER = 4.5373 - Best val F1: 0.3037
                  (IMPROVEMENT, saving)

Epoch 2/300
F1 = 0.3196, ER = 3.9983 - Best val F1: 0.3196
                  (IMPROVEMENT, saving)

Epoch 3/300
F1 = 0.3201, ER = 3.4725 - Best val F1: 0.3201
                  (IMPROVEMENT, saving)

Epoch 4/300
F1 = 0.3537, ER = 2.9456 - Best val F1: 0.3537
                  (IMPROVEMENT, saving)

Epoch 5/300
F1 = 0.3355, ER = 2.7377 - Best val F1: 0.3537 (3)

Epoch 6/300
F1 = 0.3870, ER = 1.5939 - Best val F1: 0.3870
                  (IMPROVEMENT, saving)

Epoch 7/300
F1 = 0.3689, ER = 1.6889 - Best val F1: 0.3870 (5)

Epoch 8/300
F1 = 0.4070, ER = 1.2900 - Best val F1: 0.4070
                  (IMPROVEMENT, saving)

Epoch 9/300
F1 = 0.4694, ER = 0.9778 - Best val F1: 0.4694
                  (IMPROVEMENT, saving)

Epoch 10/300
F1 = 0.3014, ER = 1.6840 - Best val F1: 0.4694 (8)

Epoch 11/300
F1 = 0.4353, ER = 1.0321 - Best val F1: 0.4694 (8)

Epoch 12/300
F1 = 0.5127, ER = 0.7532 - Be

F1 = 0.5641, ER = 0.6154 - Best val F1: 0.5772 (37)

Epoch 58/300
F1 = 0.5786, ER = 0.5902 - Best val F1: 0.5786
                  (IMPROVEMENT, saving)

Epoch 59/300
F1 = 0.5837, ER = 0.5887 - Best val F1: 0.5837
                  (IMPROVEMENT, saving)

Epoch 60/300
F1 = 0.5769, ER = 0.5757 - Best val F1: 0.5837 (58)

Epoch 61/300
F1 = 0.5283, ER = 0.7306 - Best val F1: 0.5837 (58)

Epoch 62/300
F1 = 0.5588, ER = 0.6473 - Best val F1: 0.5837 (58)

Epoch 63/300
F1 = 0.5481, ER = 0.7084 - Best val F1: 0.5837 (58)

Epoch 64/300
F1 = 0.5572, ER = 0.6458 - Best val F1: 0.5837 (58)

Epoch 65/300
F1 = 0.5722, ER = 0.6044 - Best val F1: 0.5837 (58)

Epoch 66/300
F1 = 0.5734, ER = 0.5876 - Best val F1: 0.5837 (58)

Epoch 67/300
F1 = 0.5743, ER = 0.5898 - Best val F1: 0.5837 (58)

Epoch 68/300
F1 = 0.5773, ER = 0.5917 - Best val F1: 0.5837 (58)

Epoch 69/300
F1 = 0.5263, ER = 0.7598 - Best val F1: 0.5837 (58)

Epoch 70/300
F1 = 0.5755, ER = 0.5950 - Best val F1: 0.5837 (58)

Epoch 71/300
F1 = 0

<a id="EvaluateModel"></a>
## 6. Evaluate Model

In [None]:
# Load best_weights
model_container.load_model_weights(exp_folder)
data_gen_test = DataGenerator(dataset, features, folds=folds_test,
                              batch_size=params_train['batch_size'],
                              shuffle=False, train=False, scaler=scaler)

kwargs = {'sequence_time_sec': params_features['sequence_hop_time'],
          'metric_resolution_sec': 1.0}
results = model_container.evaluate(data_gen_test, label_list=dataset.label_list, **kwargs)

print(results[metrics[0]])