<pre>
 ____   ____    _    ____  _____                          _      _     
|  _ \ / ___|  / \  / ___|| ____|     _ __ ___   ___   __| | ___| |___ 
| | | | |     / _ \ \___ \|  _| _____| '_ ` _ \ / _ \ / _` |/ _ \ / __|
| |_| | |___ / ___ \ ___) | |__|_____| | | | | | (_) | (_| |  __/ \__ \
|____/ \____/_/   \_\____/|_____|    |_| |_| |_|\___/ \__,_|\___|_|___/
                                                                        
</pre>

# DCASE-models Notebooks
Python Notebooks for [DCASE-models](https://github.com/pzinemanas/DCASE-models)

---

### About 
This Notebook reproduces the results for **Sound Event Detection (SED)** presented in:
<ul>
<li><a href="https://arxiv.org/pdf/1706.02291.pdf"><strong>
    Sound event detection using spatial features and convolutional recurrent neural network </strong></a>
   S. Adavanne, P. Pertilä, T. Virtanen ICASSP 2017.
    <br>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="https://arxiv.org/pdf/1706.02291.pdf"> PDF </a>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="https://ieeexplore.ieee.org/document/7952260"> IEEE</a>
    </li>   
</ul>

### Overview

This paper extends the Convolutional Recurrent Neural Network (CRNN) proposed by Cakir et al. [[C-CRNN]](https://ieeexplore.ieee.org/document/7933050) for multichannel audio event detection using spatial features (generalized cross-correlation with phase based weighting
weighting) and autocorrelation, besides log mel-band energy. The system is evaluated on TUT Sound Events 2009 ([TUT-SED 2009](http://www.cs.tut.fi/sgn/arg/taslp2017-crnn-sed/#tut-sed-2009)- not public)  TUT Sound Events 2016 Development set ([TUT-SED 2016](http://www.cs.tut.fi/sgn/arg/taslp2017-crnn-sed/#tut-sed-2016)).

We'll use TUT-SED 2017, and use mel band energies only.

### Organization

The Notebook is organized into the following sections.
* [1. Load parameters](#LoadParameters)
* [2. Extract features](#ExtractFeatures)
* [3. Load data](#LoadData)
* [4. Initialize model](#InitModel)
* [5. Train model](#TrainModel)
* [6. Evaluate model](#EvaluateModel)

In [1]:
%load_ext autoreload
%autoreload 2
rootdir_path = '../../'
import sys
import os
import json
import warnings
import glob
import numpy as np
import argparse

sys.path.append(rootdir_path)
from dcase_models.utils.files import load_json, mkdir_if_not_exists
from dcase_models.data.data_generator import DataGenerator
from dcase_models.data.datasets import TUTSoundEvents2017
from dcase_models.model.models import A_CRNN
from dcase_models.data.features import MelSpectrogram
from dcase_models.data.scaler import Scaler
from dcase_models.utils.files import load_json

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


<a id="LoadeParameters"></a>
## 1. Load parameters

Dataset parameters are stored in a json file on the root directory.

In [1]:
# load all parameters from json file
params = load_json(os.path.join(rootdir_path, 'parameters.json'))
# set the dataset we are going to use
dataset = 'TUTSoundEvents2017'

# get dataset parameters
params_dataset = params["datasets"][dataset]

# get feature extraction parameters
params_features = params["features"]

# get training parameters
params_train = params["train"]

params_model = params["models"]["A_CRNN"]



NameError: name 'load_json' is not defined

Check that the values of the parameters are correct.

In [3]:
# print the dataset parameters 
print("Dataset Parameters:\n", json.dumps(params_dataset, indent=4, sort_keys=True))
# print feature extraction parameters 
print("Features' Parameters:\n",json.dumps(params_features, indent=4, sort_keys=True))
# print training parameters 
print("Training Parameters:\n",json.dumps(params_train, indent=4, sort_keys=True))

Dataset Parameters:
 {
    "dataset_path": "datasets/URBAN-SED_v2.0.0",
    "evaluation_mode": "train-validate-test"
}
Features' Parameters:
 {
    "MelSpectrogram": {
        "mel_bands": 64,
        "n_fft": 1024
    },
    "Openl3": {
        "content_type": "env",
        "embedding_size": 512,
        "input_repr": "mel256"
    },
    "Spectrogram": {
        "n_fft": 1024
    },
    "audio_hop": 690,
    "audio_win": 1024,
    "sequence_hop_time": 1.0,
    "sequence_time": 2.0,
    "sr": 22050
}
Training Parameters:
 {
    "batch_size": 256,
    "considered_improvement": 0,
    "early_stopping": 30,
    "epochs": 50,
    "learning_rate": 0.001,
    "optimizer": "Adam",
    "verbose": 1
}


<a id="ExtractFeatures"></a>
## 2. Extract features

Initialize Feature Extractor and Data Generator.

In [4]:
# Define params

params_features = {
    "MelSpectrogram": {
        "mel_bands": 40,
        "n_fft": 1024
    },
    "audio_hop": 512,
    "audio_win": 1024,
    "sequence_hop_time": 0.02,
    "sequence_time": 0.04,
    "sr": 44100
}
# Initialize Feature Extractor
feature_extractor = MelSpectrogram(sequence_time=params_features['sequence_time'], \
                                   sequence_hop_time=params_features['sequence_hop_time'], 
                                   audio_win=params_features['audio_win'], 
                                   audio_hop=params_features['audio_hop'], 
                                   sr=params_features['sr'],
                                   **params_features['MelSpectrogram'])


print(feature_extractor.get_shape())

(10, 64, 64)


In [5]:
# Initialize Data Generator as an instance of URBAN_SED
params_data = {'sequence_hop_time': params_features['sequence_hop_time']}
dataset = (os.path.join(rootdir_path, params_dataset["dataset_path"]), **params_data)

Check if dataset exists, and download it if doesn't exist.

In [6]:
dataset.download()

Initialize data generator

In [7]:
data_generator = DataGenerator(dataset, feature_extractor,
                               evaluation_mode=params_dataset["evaluation_mode"])

Extract the features (if they were not extracted before).

In [8]:
if not data_generator.check_if_features_extracted():
    data_generator.extract_features()
print('Done!')

Done!


<a id="LoadData"></a>
## 3. Load data

In [9]:
print('Loading data... ')
data_generator.load_data()
print('Done!')

Loading data... 
fold: [############################################################] 3/3
Done!


And also fit a scaler and transform the training data.

In [10]:
fold_test = 'test'
fold_train = 'train'
X_train, Y_train, X_val, Y_val = data_generator.get_data_for_training(fold_train)
scaler = Scaler(normalizer=params_model['normalizer'])
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_val = scaler.transform(X_val)

In [11]:
print(X_train.shape, Y_train.shape, X_val[0].shape, Y_val[0].shape)

(60000, 64, 64) (60000, 10) (10, 64, 64) (10, 10)


<a id="InitModel"></a>
## 4. Initialize model

In [12]:
n_frames_cnn = X_train.shape[1]
n_freq_cnn = X_train.shape[2]
n_classes = Y_train.shape[1]

metrics = ['sed']

model_container = A_CRNN(model=None, model_path=None, n_classes=n_classes, 
                             n_frames_cnn=n_frames_cnn, n_freq_cnn=n_freq_cnn,
                             metrics=metrics)

model_container.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 64, 64)            0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 64, 64, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 60, 60, 64)        1664      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 30, 30, 64)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 30, 30, 64)        256       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 26, 26, 64)        102464    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 64)        0         
__________

<a id="TrainModel"></a>
## 5. Train model

In [None]:
exp_folder = './'

# Replacing default training parameters by paper parameters
params_train["epochs"]=300
params_train["early_stopping"]=50
params_train["batch_size"] = 32

kwargs = {'label_list': dataset.label_list}
model_container.train(X_train, Y_train, X_val, Y_val, weights_path=exp_folder, **params_train, **kwargs)

Epoch 1/300
F1 = 0.3352, ER = 2.6273 - Best val F1: 0.3352
                  (IMPROVEMENT, saving)

Epoch 2/300
F1 = 0.3101, ER = 3.2438 - Best val F1: 0.3352 (0)

Epoch 3/300
F1 = 0.3569, ER = 2.6834 - Best val F1: 0.3569
                  (IMPROVEMENT, saving)

Epoch 4/300

<a id="EvaluateModel"></a>
## 6. Evaluate Model

In [None]:
# Load best_weights
model_container.load_model_weights(exp_folder)
# Test model
X_test, Y_test = data_generator.get_data_for_testing(fold_test)
X_test = scaler.transform(X_test)
print(np.amin(X_test), np.amax(X_test))
kwargs = {'sequence_time_sec': params_features['sequence_hop_time'],
          'metric_resolution_sec': 1.0,
          'label_list': dataset.label_list}
results = model_container.evaluate(X_test, Y_test, **kwargs)

print(results['sed'])