<pre>
 ____   ____    _    ____  _____                          _      _     
|  _ \ / ___|  / \  / ___|| ____|     _ __ ___   ___   __| | ___| |___ 
| | | | |     / _ \ \___ \|  _| _____| '_ ` _ \ / _ \ / _` |/ _ \ / __|
| |_| | |___ / ___ \ ___) | |__|_____| | | | | | (_) | (_| |  __/ \__ \
|____/ \____/_/   \_\____/|_____|    |_| |_| |_|\___/ \__,_|\___|_|___/
                                                                        
</pre>

# DCASE-models Notebooks
Python Notebooks for [DCASE-models](https://github.com/pzinemanas/DCASE-models)

---

### About 
This Notebook reproduces the baseline sistem for Task 1 - subtask A from DCASE 2020 challenge presented in: 
<ul>
<li><a href="https://arxiv.org/pdf/2005.14623.pdf"><strong>
    Acoustic scene classification in dcase 2020 challenge: generalization across devices and low complexity solutions.</strong></a>
    Toni Heittola, Annamaria Mesaros, and Tuomas Virtanen. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop. 2020.
    <br>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="https://arxiv.org/pdf/2005.14623.pdf"> Paper </a>
   <a type="button" class="btn btn-default btn-xs" target="_blank" href="http://dcase.community/challenge2020/task-acoustic-scene-classification"> Challenge</a>
    </li>   
</ul>

### Overview

Subtask A in DCASE 2020 Acoustic Scene Clasification task focuses on generalization across different audio recording devices.  For this challenge a dataset with both real and simulated mobile recording devices, [TAU Urban Acoustic Scenes 2020 Mobile](https://zenodo.org/record/3670185), was released. The baseline system uses Open-L3 embeddings and two fully-connected feed-forward layers, architecture proposed by  Cramer et. al [[L3-Net]](http://www.justinsalamon.com/uploads/4/3/9/4/4394963/cramer_looklistenlearnmore_icassp_2019.pdf).
We aim to reproduce average (across devices) accuracy results.



### Organization

The Notebook is organized into the following sections.
* [1. Load parameters](#LoadParameters)
* [2. Extract features](#ExtractFeatures)
* [3. Load data](#LoadData)
* [4. Initialize model](#InitModel)
* [5. Train model](#TrainModel)
* [6. Evaluate model](#EvaluateModel)

In [4]:
import sys
import os
import glob
import numpy as np
import argparse

sys.path.append('../../')

from dcase_models.data.data_generator import DataGenerator
from dcase_models.model.container import KerasModelContainer
from dcase_models.data.datasets import TAUUrbanAcousticScenes2020Mobile
from dcase_models.data.features import Openl3
from dcase_models.data.scaler import Scaler
from dcase_models.data.scaler import Scaler
from dcase_models.util.files import load_json
from dcase_models.util.data import evaluation_setup
from dcase_models.util.files import load_json, mkdir_if_not_exists

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

<a id="LoadeParameters"></a>
## 1. Set parameters

In [5]:
# features parameters
sequence_time = 1.0
sequence_hop_time = 0.1
audio_hop = 512
audio_win = 1024
n_fft = 2048
sr = 44100

features_name = 'Openl3'
features_params = {'content_type': 'music', 
                   'input_repr': 'mel256',
                   'embedding_size': 512} 

# normalizer
normalizer = 'minmax'

# train parameters
early_stopping = 20
epochs = 200
considered_improvement = 0
learning_rate = 0.001
batch_size = 64
verbose = 1
optimizer = 'Adam'



dataset_name = 'TAUUrbanAcousticScenes2020Mobile'
dataset_path = "datasets/TAUUrbanAcousticScenes2020Mobile"
evaluation_mode = "train-test"

<a id="ExtractFeatures"></a>
## 2. Extract features

Initialize Dataset and Feature Extractor.

In [11]:
# Initialize Feature Extractor
features = Openl3(sequence_time=sequence_time, sequence_hop_time=sequence_hop_time,
                  audio_win=audio_win, audio_hop=audio_hop, sr=44100,
                  content_type=features_params["content_type"],
                  input_repr=features_params["input_repr"],
                  embedding_size=features_params["embedding_size"])

print(features.get_shape())

(96, 512)


In [12]:
# Initialize Data Generator as an instance of TAUUrbanAcousticScenes2020Mobile
rootdir_path = '../../'
dataset = TAUUrbanAcousticScenes2020Mobile(os.path.join(rootdir_path, dataset_path))
# Download dataset if needed
dataset.download()

TypeError: download() takes from 1 to 2 positional arguments but 4 were given

Extract features if needed.

In [9]:
if not features.check_if_extracted(dataset):
    features.extract(dataset)
print('Done!')

Changing sampling rate ...
Done!


FileNotFoundError: [Errno 2] No such file or directory: '../../datasets/TAUUrbanAcousticScenes2020Mobile/features/Openl3/original/parameters.json'

<a id="LoadData"></a>
## 3. Load data

Initialise train data generator.

In [None]:
# Get train/test folds
folds_train, folds_val, folds_test = evaluation_setup('test', dataset.fold_list,
                                                      evaluation_mode,
                                                      use_validate_set=True)
#initialise Data Generator
data_gen_train = DataGenerator(dataset, features, folds=folds_train,
                               batch_size=batch_size,
                               shuffle=True, train=True, scaler=None)

And also fit a scaler to transform training data.

In [None]:
scaler = Scaler(normalizer=normalizer)
print('Fitting features ...')
scaler.fit(data_gen_train)
print('Done!')

data_gen_train.set_scaler(scaler)

Initialise validation data generator.

In [None]:
data_gen_val = DataGenerator(dataset, features, folds=folds_val,
                             batch_size=batch_size,
                             shuffle=False, train=False, scaler=scaler)

<a id="DefineModel"></a>
## 4. Define model

In [None]:
from autopool import AutoPool1D
from keras.layers import Input, TimeDistributed, Dense
from keras.models import Model


class DCASE2020Task1Baseline(KerasModelContainer):
    
    def __init__(self, model=None, model_path=None, metrics=['accuracy'],
                 n_frames_cnn=96, n_freq_cnn=64, n_classes=10,
                 hidden_layers_size=[512, 128]):
        
        self.n_frames_cnn = n_frames_cnn 
        self.n_freq_cnn = n_freq_cnn
        self.n_classes = n_classes
        self.hidden_layers_size = hidden_layers_size
        
        super().__init__(model=model, model_path=model_path,
                         model_name='DCASE2020Task1Baseline', metrics=metrics)
        
    def build(self):
        # Input
        inputs = Input(shape=(self.n_frames_cnn, self.n_freq_cnn), dtype='float32', name='input')

        num_hidden_layers = len(self.hidden_layers_size)
        # Hidden layers
        for idx in range(num_hidden_layers):
            if idx == 0:
                y = inputs
            y = TimeDistributed(Dense(self.hidden_layers_size[idx], activation='relu',
                                name='dense_{}'.format(idx+1)))(y)

        # Output layer
        y = TimeDistributed(Dense(self.n_classes, activation='softmax',
                            name='output_t'))(y)

        # Apply autopool over time dimension
        y = AutoPool1D(axis=1, name='output')(y)

        # Create model
        self.model = Model(inputs=inputs, outputs=y, name='model')

        super().build()

<a id="InitModel"></a>
## 5. Init model

In [None]:
features_shape = features.get_shape()
n_frames_cnn = features_shape[1]
n_freq_cnn = features_shape[2]
n_classes = len(dataset.label_list)

print(n_frames_cnn, n_freq_cnn, n_classes)
model_container = DCASE2020Task1Baseline(model=None, model_path=None, n_classes=n_classes, 
                                         n_frames_cnn=n_frames_cnn, n_freq_cnn=n_freq_cnn)

model_container.model.summary()

## Set paths and save model json

In [None]:
model_name = 'DCASE2020Task1Baseline'
mkdir_if_not_exists(model_name)
exp_folder = os.path.join(model_name, dataset_name)
mkdir_if_not_exists(exp_folder)

# save model as json
print('saving model to %s' % exp_folder)
model_container.save_model_json(exp_folder)

## Train model

In [None]:
train_arguments = {'early_stopping': early_stopping,
                  'epochs': epochs,
                  'considered_improvement': considered_improvement,
                  'learning_rate': learning_rate,
                  'batch_size': batch_size,
                  'verbose': verbose,
                  'optimizer': optimizer}

model_container.train(data_gen_train, data_gen_val, weights_path=exp_folder, **train_arguments)

## Test model

In [None]:
# load best_weights
model_container.load_model_weights(exp_folder)

# test model
X_test, Y_test = data_generator.get_data_for_testing()
X_test = scaler.transform(X_test)
results = model_container.evaluate(X_test, Y_test)

print(results['accuracy'])

In [None]:
# Load best_weights
model_container.load_model_weights(exp_folder)
data_gen_test = DataGenerator(dataset, features, folds=folds_test,
                              batch_size='batch_size',
                              shuffle=False, train=False, scaler=scaler)

