<pre>
 ____   ____    _    ____  _____                          _      _     
|  _ \ / ___|  / \  / ___|| ____|     _ __ ___   ___   __| | ___| |___ 
| | | | |     / _ \ \___ \|  _| _____| '_ ` _ \ / _ \ / _` |/ _ \ / __|
| |_| | |___ / ___ \ ___) | |__|_____| | | | | | (_) | (_| |  __/ \__ \
|____/ \____/_/   \_\____/|_____|    |_| |_| |_|\___/ \__,_|\___|_|___/
                                                                        
</pre>

# DCASE-models Notebooks
Python Notebooks for [DCASE-models](https://github.com/pzinemanas/DCASE-models)

---

### About 
This Notebook reproduces *some* results for **Enviromental Sound Classification** presented in:
<ul>
<li><a href="http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_cnn-aug-env_ieeespl_2017.pdf"><strong>
    Deep Convolutional Neural Networks and Data Augmentation For Environmental Sound Classification</strong></a>
    J. Salamon and J. P. Bello IEEE Signal Processing Letters, 24(3), pages 279 - 283, 2017.
    <br>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_cnn-aug-env_ieeespl_2017.pdf"> PDF </a>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="https://ieeexplore.ieee.org/document/7829341"> IEEE</a>
    </li>   
</ul>

### Overview

The paper introduces an adaptation of the Convolutional Neural Network (CNN) for environmental
sound classification, using [UrbanSound8k dataset](https://urbansounddataset.weebly.com/urbansound8k.html). To overcome data scarcity,the dataset is augmented through the following transformations:
* Time stretching
* Pitch shifting
* Dynamic Range Compression
* Background Noise

This notebook aims to reproduce results regarding the augmented set *PS1*, obtained by pitch shifting each sample by 4 values (in semitones): {−2, −1, 1, 2}. Reported accuracy for all classes can be found on *figure 3* on the paper.

### Organization

The Notebook is organized into the following sections.
* [1. Load parameters](#LoadParameters)
* [2. Data augmentation](#DataAugmentation)
* [3. Extract features](#ExtractFeatures)
* [4. Load data](#LoadData)
* [5. Initialize model](#InitModel)
* [6. Train model](#TrainModel)
* [7. Evaluate model](#EvaluateModel)


In [1]:
%load_ext autoreload
%autoreload 2
rootdir_path = '../../../'
import sys
import os
import json
import warnings
import glob
import numpy as np
import argparse

sys.path.append(rootdir_path)
from dcase_models.util.files import load_json, mkdir_if_not_exists
from dcase_models.data.data_generator import DataGenerator
from dcase_models.data.datasets import UrbanSound8k
from dcase_models.data.scaler import Scaler
from dcase_models.data.data_augmentation import AugmentedDataset
from dcase_models.data.features import MelSpectrogram
from dcase_models.model.models import SB_CNN
from dcase_models.util.data import evaluation_setup
from dcase_models.util.files import mkdir_if_not_exists

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


<a id="LoadeParameters"></a>
## 1. Load parameters

Dataset, feature extraction, training and data augmentation default parameters are stored in a json file on the root directory.

In [2]:
# load all parameters from json file
params = load_json(os.path.join(rootdir_path, 'parameters.json'))
# set the dataset we are going to use
dataset = 'UrbanSound8k'

# get dataset parameters
params_dataset = params["datasets"][dataset]

# get augmentation parameters
params_augmentation = params["data_augmentations"]

# get feature extraction parameters
params_features = params["features"]

# get training parameters
params_train = params["train"]

params_model = params["models"]["SB_CNN"]


In [3]:
# print the dataset parameters 
print("Dataset Parameters:\n", json.dumps(params_dataset, indent=4, sort_keys=True))
# print augmentation parameters 
print("Augmentation Parameters:\n",json.dumps(params_augmentation, indent=4, sort_keys=True))
# print feature extraction parameters 
print("Features' Parameters:\n",json.dumps(params_features, indent=4, sort_keys=True))
# print training parameters 
print("Training Parameters:\n",json.dumps(params_train, indent=4, sort_keys=True))


Dataset Parameters:
 {
    "dataset_path": "datasets/UrbanSound8K",
    "evaluation_mode": "cross-validation"
}
Augmentation Parameters:
 [
    {
        "n_semitones": -1,
        "type": "pitch_shift"
    },
    {
        "factor": 1.05,
        "type": "time_stretching"
    },
    {
        "snr": 60,
        "type": "white_noise"
    }
]
Features' Parameters:
 {
    "MelSpectrogram": {
        "mel_bands": 64,
        "n_fft": 1024
    },
    "Openl3": {
        "content_type": "env",
        "embedding_size": 512,
        "input_repr": "mel256"
    },
    "Spectrogram": {
        "n_fft": 1024
    },
    "audio_hop": 690,
    "audio_win": 1024,
    "sequence_hop_time": 1.0,
    "sequence_time": 2.0,
    "sr": 22050
}
Training Parameters:
 {
    "batch_size": 32,
    "considered_improvement": 0,
    "early_stopping": 30,
    "epochs": 50,
    "learning_rate": 0.001,
    "optimizer": "Adam",
    "verbose": 1
}


<a id="DataAugmentation"></a>
## 2. Data Augmentation

We'll apply one of the pitch shifting transformations described on the paper.

In [4]:
# Initialize Data Generator as an instance of UrbanSound8k
dataset = UrbanSound8k(os.path.join(rootdir_path, params_dataset["dataset_path"]))
# Download if needed
dataset.download()
# Set augmentation parameters to match values described on the paper
semitones = [-2, -1, 1, 2]
params_augmentation  = [{'type' : 'pitch_shift', 'n_semitones': s } for s in semitones]

# Initialize AugmentedDataset
aug_dataset = AugmentedDataset(dataset, 44100, params_augmentation)

# Process all files
print('Processing ...')
aug_dataset.process()
print('Done!')


Processing ...
Done!


<a id="ExtractFeatures"></a>
## 2. Extract features

Initialize Feature Extractor and extract features from the augmented dataset.

First define feature extraction params as described on the paper.

In [5]:
# Define params

params_features = {
    "MelSpectrogram": {
        "mel_bands": 128,
        "n_fft": 1024
    },
    "audio_hop": 1024,
    "audio_win": 1024,
    "sequence_hop_time": 1.0,
    "sequence_time": 3.0,
    "sr": 44100
}
# Initialize Feature Extractor
features = MelSpectrogram(sequence_time=params_features['sequence_time'], 
                          sequence_hop_time=params_features['sequence_hop_time'], 
                          audio_win=params_features['audio_win'], 
                          audio_hop=params_features['audio_hop'], 
                          sr=params_features['sr'],
                          **params_features['MelSpectrogram'])
print(features.get_shape())

(11, 129, 128)


In [6]:
# Extract the features (if they were not extracted before).
if not features.check_if_extracted(aug_dataset):
    features.extract(aug_dataset)
print('Done!')



Done!


<a id="LoadData"></a>
## 3. Load data

In [7]:
# Get train/test folds
folds_train, folds_val, folds_test = evaluation_setup('fold1', dataset.fold_list,
                                                      params_dataset['evaluation_mode'],
                                                      use_validate_set=True)
#initialise Data Generator
data_gen_train = DataGenerator(dataset, features, folds=folds_train,
                               batch_size=params_train['batch_size'],
                               shuffle=True, train=True, scaler=None)


And also fit a scaler to transform training data.

In [8]:
scaler = Scaler(normalizer=params_model['normalizer'])
print('Fitting features ...')
scaler.fit(data_gen_train)
print('Done!')

data_gen_train.set_scaler(scaler)

Fitting features ...
Done!


Initialise validation data generator

In [9]:
data_gen_val = DataGenerator(dataset, features, folds=folds_val,
                             batch_size=params_train['batch_size'],
                             shuffle=False, train=False, scaler=scaler)

<a id="initmodel"></a>
## 6. Define and initialise model

In [10]:
features_shape = features.get_shape()
n_frames_cnn = features_shape[1]
n_freq_cnn = features_shape[2]
n_classes = len(dataset.label_list)


print(n_frames_cnn, n_freq_cnn, n_classes)

model_container = SB_CNN(model=None, model_path=None, n_classes=n_classes, 
                         n_frames_cnn=n_frames_cnn, n_freq_cnn=n_freq_cnn)

model_container.model.summary()

129 128 10
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           (None, 129, 128)          0         
_________________________________________________________________
lambda (Lambda)              (None, 129, 128, 1)       0         
_________________________________________________________________
conv1 (Conv2D)               (None, 125, 124, 24)      624       
_________________________________________________________________
maxpool1 (MaxPooling2D)      (None, 62, 62, 24)        0         
_________________________________________________________________
batchnorm1 (BatchNormalizati (None, 62, 62, 24)        96        
_________________________________________________________________
conv2 (Conv2D)               (None, 58, 58, 48)        28848     
_________________________________________________________________
maxpool2 (MaxPooling2D)      (None, 14, 29, 48)        0         

<a id="train"></a>
## 7. Train model

In [11]:
#define path to save weights
exp_folder = './'
# Number of epochs can be modified
params_train["epochs"]=100
# Train model
model_container.train(data_gen_train, data_gen_val,label_list=dataset.label_list, weights_path=exp_folder, **params_train)

<a id="Eval"></a>
## 8. Evaluate Model

In [20]:
# Load best_weights
model_container.load_model_weights(exp_folder)
data_gen_test = DataGenerator(dataset, features, folds=folds_test,\
                              batch_size=params_train['batch_size'],\
                              shuffle=False, train=False, scaler=scaler)

results = model_container.evaluate(data_gen_test, label_list=dataset.label_list)

print(results["classification"])

Scene classification metrics
  Scene labels                      : 10 
  Evaluated units                   : 873 

  Class-wise average metrics (macro-average)
  Accuracy
    Accuracy                        : 70.85 %

  Class-wise metrics
    Scene label       | Ncorr       Nref      | Accuracy   
    ----------------- | ---------   --------- | ---------  
    air_conditioner   | 16          100       | 16.0%      
    car_horn          | 29          36        | 80.6%      
    children_playing  | 84          100       | 84.0%      
    dog_bark          | 93          100       | 93.0%      
    drilling          | 65          100       | 65.0%      
    engine_idling     | 42          96        | 43.8%      
    gun_shot          | 33          35        | 94.3%      
    jackhammer        | 81          120       | 67.5%      
    siren             | 70          86        | 81.4%      
    street_music      | 83          100       | 83.0%      


