# Distinguish author-specific patterns in music

* Find this notebook at `EpyNN/nnlive/author_music/train.ipynb`.
* Regular python code at `EpyNN/nnlive/author_music/train.py`.

In this notebook we will review:

* Handling univariate time series that represents a **huge amount of data points**.

## Environment and data

Follow [this link](prepare_dataset.ipynb) for details about data preparation.

Briefly, raw data are acoustic guittare music from the *True* author and the *False* author. These are raw ``.wav`` files that were resampled, clipped and digitalized using a 4-bits encoder.

Commonly, music ``.wav`` files have a sampling rate of 44100 Hz. This means that each second of music represents a numerical time series of length 44100.

In [1]:
# EpyNN/nnlive/author_music/train.ipynb
# Standard library imports
import random

# Related third party imports
import numpy as np

# Local application/library specific imports
import nnlibs.initialize
from nnlibs.commons.maths import relu, softmax
from nnlibs.commons.library import (
    configure_directory,
    read_model,
)
from nnlibs.network.models import EpyNN
from nnlibs.embedding.models import Embedding
from nnlibs.rnn.models import RNN
# from nnlibs.lstm.models import LSTM
from nnlibs.gru.models import GRU
from nnlibs.flatten.models import Flatten
from nnlibs.dropout.models import Dropout
from nnlibs.dense.models import Dense
from prepare_dataset import prepare_dataset
from settings import se_hPars


########################## CONFIGURE ##########################
random.seed(1)

np.set_printoptions(threshold=10)

np.seterr(all='warn')
np.seterr(under='ignore')


############################ DATASET ##########################
X_features, Y_label = prepare_dataset(N_SAMPLES=1000)

_

In [2]:
print(len(X_features))
print(X_features[0])
print(np.min(X_features[0]), np.max(X_features[0]))

1000
[10  9  9 ...  9  9  9]
2 14


In addition to resampling and clipping, data have been normalized and rewritten on a 4-bits encoder using 16 bins. Audio files are typically 16-bits data or a sequence of integers ranging from 0 to 32767 included. Herein, we will apply one-hot encoding to the input data, so the vocabulary size will be 16.

## Feed-Forward (FF)

We first start by our reference, a Feed-Forward network with dropout regularization.

### Embedding

We one-hot encode both features and label and set a batch size of 32.

Note that we could, alternatively, not apply ``X_encode`` but ``X_scale`` instead. That would reduce the number of data points but would also consider the amplitude of the signal along with the frequencies it contains.

Because each data point turns into an array of shape ``(256,)`` and sum ``1`` upon one-hot encoding, the direct information about amplitude is lost.

In [3]:
embedding = Embedding(X_data=X_features,
                      Y_data=Y_label,
                      X_encode=True,
                      Y_encode=True,
                      batch_size=32,
                      relative_size=(2, 1, 0))

Let's inspect the shape of the data.

In [4]:
print(embedding.dtrain.X.shape)

(667, 44100, 16)


Let's proceed with the network design and training.

### Flatten-(Dense)n with Dropout

We place two *dropout* layers with a ``keep_prob`` of 0.5 each to reduce overfitting.

In [5]:
name = 'Flatten_Dropout05_Dense-64-relu_Dropout-05_Dense-2-softmax'

se_hPars['learning_rate'] = 0.005
se_hPars['softmax_temperature'] = 5

flatten = Flatten()

dropout1 = Dropout(keep_prob=0.5)

hidden_dense = Dense(64, relu)

dropout2 = Dropout(keep_prob=0.5)

dense = Dense(2, softmax)

layers = [embedding, flatten, dropout1, hidden_dense, dropout2, dense]

model = EpyNN(layers=layers, name=name)

We have set the softmax temperature to ``5`` to diminish the confidence of the model and the risk of vanishing/exploding gradients.

We can initialize the model.

In [6]:
model.initialize(loss='BCE', seed=1, metrics=['accuracy', 'recall', 'precision'], se_hPars=se_hPars.copy())

[1m--- EpyNN Check --- [0m
[1mLayer: Embedding[0m
[1m[32mcompute_shapes: Embedding[0m
[1m[32minitialize_parameters: Embedding[0m
[1m[32mforward: Embedding[0m
[1mLayer: Flatten[0m
[1m[32mcompute_shapes: Flatten[0m
[1m[32minitialize_parameters: Flatten[0m
[1m[32mforward: Flatten[0m
[1mLayer: Dropout[0m
[1m[32mcompute_shapes: Dropout[0m
[1m[32minitialize_parameters: Dropout[0m
[1m[32mforward: Dropout[0m
[1mLayer: Dense[0m
[1m[32mcompute_shapes: Dense[0m
[1m[32minitialize_parameters: Dense[0m
[1m[32mforward: Dense[0m
[1mLayer: Dropout[0m
[1m[32mcompute_shapes: Dropout[0m
[1m[32minitialize_parameters: Dropout[0m
[1m[32mforward: Dropout[0m
[1mLayer: Dense[0m
[1m[32mcompute_shapes: Dense[0m
[1m[32minitialize_parameters: Dense[0m
[1m[32mforward: Dense[0m
[1mLayer: Dense[0m
[1m[36mbackward: Dense[0m
[1m[36mcompute_gradients: Dense[0m
[1mLayer: Dropout[0m
[1m[36mbackward: Dropout[0m
[1m[36mcompute_gradients: Dropo

Train it for 50 epochs.

In [7]:
# model.train(epochs=50, init_logs=False)

Despite the attention we paid to prevent overfitting, we observe that the model has well reproduced the training data while it fails to achieve a comparable accuracy on the testing set.

## Recurrent Architectures

_

### Embedding

The embedding setup is the same as above.

In [8]:
embedding = Embedding(X_data=X_features,
                      Y_data=Y_label,
                      X_encode=True,
                      Y_encode=True,
                      batch_size=32,
                      relative_size=(2, 1, 0))

MemoryError: Unable to allocate 3.51 GiB for an array with shape (667, 44100, 16) and data type float64

_

### RNN(sequences=True)-Flatten-Dense

We set an even greater softmax temperature since the *RNN* cell is very sensitive to the problem of exploding/vanishing gradient.

We have also set the ``sequences=True`` flag, which means that the *RNN* layer will forward all hidden cell states to the next layer, and not only the last one computed in the 100th cell.

In [None]:
name = 'RNN-100-Seq_Flatten_Dense-2-softmax'

se_hPars['learning_rate'] = 0.05
se_hPars['softmax_temperature'] = 10

rnn = RNN(16, sequences=True)

flatten = Flatten()

dense = Dense(2, softmax)

layers = [embedding, rnn, flatten, dense]

model = EpyNN(layers=layers, name=name)

We initialize the model.

In [None]:
model.initialize(loss='BCE', seed=1, metrics=['accuracy', 'recall', 'precision'], se_hPars=se_hPars.copy())

We will only train for 10 epochs.

In [None]:
model.train(epochs=5, init_logs=False)

Again it seems that the network did reproduce the training data well, but the metrics on the testing set did not improve. 

### GRU(sequences=True)-Flatten-Dense

_

In [None]:
name = 'GRU-100-Seq_Flatten_Dense-2-softmax'

se_hPars['learning_rate'] = 0.05
se_hPars['softmax_temperature'] = 10

gru = GRU(16)

flatten = Flatten()

dense = Dense(2, softmax)

layers = [embedding, gru, dense]

model = EpyNN(layers=layers, name=name)

Initialize and train the model.

In [None]:
model.initialize(loss='BCE', seed=1, metrics=['accuracy', 'recall', 'precision'], se_hPars=se_hPars.copy())

model.train(epochs=10, init_logs=False)

_

_