# Autoencoder for ESNs using sklearn

## Introduction

In this notebook, we demonstrate how the ESN can deal with multipitch tracking, a challenging multilabel classification problem in music analysis.

As this is a computationally expensive task, we have pre-trained models to serve as an entry point.

At first, we import all packages required for this task. You can find the import statements below.

To use another objective than `accuracy_score` for hyperparameter tuning, check out the documentation of [make_scorer](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) or ask me.

In [None]:
import numpy as np
from sklearn.metrics import make_scorer
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.utils.fixes import loguniform
from scipy.stats import uniform
from joblib import dump, load

from pyrcn.echo_state_network import ESNClassifier  # ESNRegressor
from pyrcn.metrics import accuracy_score  # more available or create custom score
from pyrcn.model_selection import SequentialSearchCV

import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib import ticker
from mpl_toolkits.mplot3d import Axes3D  # noqa: F401 unused import
%matplotlib inline
#Options
plt.rc('image', cmap='RdBu')
plt.rc('font', family='serif', serif='Times')
plt.rc('text', usetex=True)
plt.rc('xtick', labelsize=8)
plt.rc('ytick', labelsize=8)
plt.rc('axes', labelsize=8)

from IPython.display import set_matplotlib_formats
set_matplotlib_formats('png', 'pdf')
from mpl_toolkits.axes_grid1 import make_axes_locatable

## Load and preprocess the dataset

This might require a large amount of and memory. 

In [None]:
# At first, please load all training and test sequences and targets. 
# Each sequence should be a numpy.array with the shape (n_samples, n_features)
# Each target should be
# - either be a numpy.array with the shape (n_samples, n_targets) 
# - or a 1D numpy.array with the shape (n_samples, 1)
train_sequences = ......................
train_targets = ......................
if len(train_sequences) != len(train_targets):
    raise ValueError("Number of training sequences does not match number of training targets!")
n_train_sequences = len(train_sequences)
test_sequences = ......................
test_targets = ......................
if len(test_sequences) != len(test_targets):
    raise ValueError("Number of test sequences does not match number of test targets!")
n_test_sequences = len(test_sequences)

# Initialize training and test sequences
X_train = np.empty(shape=(n_train_sequences, ), dtype=object)
y_train = np.empty(shape=(n_train_sequences, ), dtype=object)
X_test = np.empty(shape=(n_test_sequences, ), dtype=object)
y_test = np.empty(shape=(n_test_sequences, ), dtype=object)

for k, (train_sequence, train_target) in enumerate(zip(train_sequences, train_targets)):
    X_train[k] = train_sequence
    y_train[k] = train_target

for k, (test_sequence, test_target) in enumerate(zip(test_sequences, test_targets)):
    X_test[k] = test_sequence
    y_test[k] = test_target

Initial variables to be equal in the Autoencoder and in the ESN

In [None]:
hidden_layer_size = 500
input_activation = 'relu'

## Train a MLP autoencoder

Currently very rudimentary. However, it can be flexibly made deeper or more complex. Check [MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html) documentation for hyper-parameters.

In [None]:
mlp_autoencoder = MLPRegressor(hidden_layer_sizes=(hidden_layer_size, ), activation=input_activation)
# X_train is a numpy array of sequences - the MLP does not handle sequences. Thus, concatenate all sequences
# Target of an autoencoder is the input of the autoencoder
mlp_autoencoder.fit(np.concatenate(X_train), np.concatenate(X_train))

w_in = np.divide(mlp_autoencoder.coefs_[0], np.linalg.norm(mlp_autoencoder.coefs_[0], axis=0)[None, :])
# w_in = mlp_autoencoder.coefs_[0]  # uncomment in case that the vector norm does not make sense

## Set up an ESN

To develop an ESN model, we need to tune several hyper-parameters, e.g., input_scaling, spectral_radius, bias_scaling and leaky integration.

We define the search spaces for each step in a sequential search together with the type of search (a grid or random search in this context).

At last, we initialize a ESNClassifier with the desired output strategy and with the initially fixed parameters.

In [None]:
initially_fixed_params = {'hidden_layer_size': hidden_layer_size,
                          'k_in': 10,
                          'input_scaling': 0.4,
                          'input_activation': input_activation,
                          'bias_scaling': 0.0,
                          'spectral_radius': 0.0,
                          'leakage': 1.0,
                          'k_rec': 10,
                          'reservoir_activation': 'tanh',
                          'bidirectional': False,
                          'wash_out': 0,
                          'continuation': False,
                          'alpha': 1e-3,
                          'random_state': 42}

step1_esn_params = {'input_scaling': uniform(loc=1e-2, scale=1),
                    'spectral_radius': uniform(loc=0, scale=2)}

step2_esn_params = {'leakage': loguniform(1e-5, 1e0)}
step3_esn_params = {'bias_scaling': np.linspace(0.0, 1.0, 11)}
step4_esn_params = {'alpha': loguniform(1e-5, 1e1)}

kwargs_step1 = {'n_iter': 200, 'random_state': 42, 'verbose': 1, 'n_jobs': -1, 'scoring': make_scorer(accuracy_score)}
kwargs_step2 = {'n_iter': 50, 'random_state': 42, 'verbose': 1, 'n_jobs': -1, 'scoring': make_scorer(accuracy_score)}
kwargs_step3 = {'verbose': 1, 'n_jobs': -1, 'scoring': make_scorer(accuracy_score)}
kwargs_step4 = {'n_iter': 50, 'random_state': 42, 'verbose': 1, 'n_jobs': -1, 'scoring': make_scorer(accuracy_score)}

# The searches are defined similarly to the steps of a sklearn.pipeline.Pipeline:
searches = [('step1', RandomizedSearchCV, step1_esn_params, kwargs_step1),
            ('step2', RandomizedSearchCV, step2_esn_params, kwargs_step2),
            ('step3', GridSearchCV, step3_esn_params, kwargs_step3),
            ('step4', RandomizedSearchCV, step4_esn_params, kwargs_step4)]

base_esn = ESNClassifier(input_to_node=PredefinedWeightsInputToNode(predefined_input_weights=w_in),
                         **initially_fixed_params)

## Optimization

We provide a SequentialSearchCV that basically iterates through the list of searches that we have defined before. It can be combined with any model selection tool from scikit-learn.

In [None]:
try: 
    sequential_search = load("sequential_search.joblib")
except FileNotFoundError:
    print(FileNotFoundError)
    sequential_search = SequentialSearchCV(base_esn, searches=searches).fit(X_train, y_train)
    dump(sequential_search, "sequential_search.joblib")

## Visualize hyper-parameter optimization

### First optimization step: input scaling and spectral radius

Either create a scatterplot - useful in case of a random search to optimize input scaling and spectral radius

In [None]:
df = pd.DataFrame(sequential_search.all_cv_results_["step1"])

fig = plt.figure()
ax = sns.scatterplot(x="param_spectral_radius", y="param_input_scaling", hue="mean_test_score", palette='RdBu', data=df)
plt.xlabel("Spectral Radius")
plt.ylabel("Input Scaling")

norm = plt.Normalize(0, df['mean_test_score'].max())
sm = plt.cm.ScalarMappable(cmap="RdBu", norm=norm)
sm.set_array([])
plt.xlim((0, 2.05))
plt.ylim((0, 1.05))

# Remove the legend and add a colorbar
ax.get_legend().remove()
ax.figure.colorbar(sm)
fig.set_size_inches(4, 2.5)
tick_locator = ticker.MaxNLocator(5)
ax.yaxis.set_major_locator(tick_locator)
ax.xaxis.set_major_locator(tick_locator)

Or create a heatmap - useful in case of a grid search to optimize input scaling and spectral radius

In [None]:
df = pd.DataFrame(sequential_search.all_cv_results_["step1"])
pvt = pd.pivot_table(df,
                     values='mean_test_score', index='param_input_scaling', columns='param_spectral_radius')

pvt.columns = pvt.columns.astype(float)
pvt2 =  pd.DataFrame(pvt.loc[pd.IndexSlice[0:1], pd.IndexSlice[0.0:1.0]])

fig = plt.figure()
ax = sns.heatmap(pvt2, xticklabels=pvt2.columns.values.round(2), yticklabels=pvt2.index.values.round(2), cbar_kws={'label': 'Score'})
ax.invert_yaxis()
plt.xlabel("Spectral Radius")
plt.ylabel("Input Scaling")
fig.set_size_inches(4, 2.5)
tick_locator = ticker.MaxNLocator(10)
ax.yaxis.set_major_locator(tick_locator)
ax.xaxis.set_major_locator(tick_locator)

### Second optimization step: leakage

In [None]:
df = pd.DataFrame(sequential_search.all_cv_results_["step2"])
fig = plt.figure()
fig.set_size_inches(2, 1.25)
ax = sns.lineplot(data=df, x="param_leakage", y="mean_test_score")
ax.set_xscale('log')
plt.xlabel("Leakage")
plt.ylabel("Score")
plt.xlim((1e-5, 1e0))
tick_locator = ticker.MaxNLocator(10)
ax.xaxis.set_major_locator(tick_locator)
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.4f'))
plt.grid()

### Third optimization step: bias_scaling

In [None]:
df = pd.DataFrame(sequential_search.all_cv_results_["step3"])
fig = plt.figure()
fig.set_size_inches(2, 1.25)
ax = sns.lineplot(data=df, x="param_bias_scaling", y="mean_test_score")
plt.xlabel("Bias Scaling")
plt.ylabel("Score")
plt.xlim((0, 1))
tick_locator = ticker.MaxNLocator(5)
ax.xaxis.set_major_locator(tick_locator)
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.5f'))
plt.grid()

### Fourth optimization step: alpha (regularization)

In [None]:
df = pd.DataFrame(sequential_search.all_cv_results_["step4"])
fig = plt.figure()
fig.set_size_inches(2, 1.25)
ax = sns.lineplot(data=df, x="param_alpha", y="mean_test_score")
ax.set_xscale('log')
plt.xlabel("Alpha")
plt.ylabel("Score")
plt.xlim((1e-5, 1e0))
tick_locator = ticker.MaxNLocator(5)
ax.xaxis.set_major_locator(tick_locator)
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.5f'))
plt.grid()

## Test the ESN

Finally, we test the ESN on unseen data.

In [None]:
y_pred = esn.predict(X_train)
y_pred_proba = esn.predict_proba(X_train)
