# Prediction of musical notes

## Introduction

This notebook adapts one reference experiment for note prediction using ESNs from ([https://arxiv.org/abs/1812.11527](https://arxiv.org/abs/1812.11527)) to PyRCN and shows that introducing bidirectional ESNs significantly improves the results in terms of Accuracy, already for rather small networks.

The tutorial is based on numpy, scikit-learn, joblib and PyRCN. We are using the ESNRegressor, because we further process the outputs of the ESN. Note that the same can also be done using the ESNClassifier. Then, during prediction, we simply call "predict_proba".

This tutorial requires the Python modules numpy, scikit-learn, matplotlib and pyrcn.

In [1]:
import numpy as np
import os
from joblib import load
from sklearn.model_selection import ParameterGrid
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.base import clone
from sklearn.metrics import mean_squared_error

from matplotlib import pyplot as plt
plt.rcParams['image.cmap'] = 'jet'
plt.rcParams['pdf.fonttype'] = 42
plt.rcParams['ps.fonttype'] = 42
%matplotlib inline

from pyrcn.echo_state_network import ESNRegressor

## Load the dataset

The datasets are online available at ([http://www-etud.iro.umontreal.ca/~boulanni/icml2012](http://www-etud.iro.umontreal.ca/~boulanni/icml2012)). In this notebook, we use the pre-processed piano-rolls. They are coming as a serialized file including a dictionary with training, validation and test partitions. In this example, we are using the "piano-midi.de"-datset, because it is relatively small compared to the other datasets.

In [2]:
dataset_path = os.path.normpath(r"C:\Temp\MusicPrediction\Piano-midi.de.pickle")
dataset = load(dataset_path)
training_set = dataset['train']
validation_set = dataset['valid']
test_set = dataset['test']
print("Number of sequences in the training, validation and test set: {0}, {1}, {2}".format(len(training_set), len(validation_set), len(test_set)))

Number of sequences in the training, validation and test set: 87, 12, 25


## Prepare the dataset

We use the MultiLabelBinarizer to transform the sequences of MIDI pitches into one-hot encoded vectors. Although the piano is restricted to 88 keys, we are initializing the MultiLabelBinarizer with 128 possible pitches to stay more general. Note that this does not affect the performance critically. 

We can see that the sequences have different lenghts, but consist of vector with 128 dimensions.

In [3]:
mlb = MultiLabelBinarizer(classes=range(128))
training_set = [mlb.fit_transform(training_set[k]) for k in range(len(training_set))]
validation_set = [mlb.fit_transform(validation_set[k]) for k in range(len(validation_set))]
test_set = [mlb.fit_transform(training_set[k]) for k in range(len(test_set))]
print("Shape of first sequences in the training, validation and test set: {0}, {1}, {2}".format(training_set[0].shape, validation_set[0].shape, test_set[0].shape))

Shape of first sequences in the training, validation and test set: (347, 128), (471, 128), (347, 128)


## Set up a basic ESN

To develop an ESN model for musical note prediction, we need to tune several hyper-parameters, e.g., input_scaling, spectral_radius, bias_scaling and leaky integration.

We follow the way proposed in the introductory paper of PyRCN to optimize hyper-parameters sequentially.

We start to jointly optimize input_scaling and spectral_radius and therefore deactivate bias connections and leaky integration. This is our base_reg.

We define the search space for input_scaling and spectral_radius. This is done using best practice and background information from the literature: The spectral radius, the largest absolute eigenvalue of the reservoir matrix, is often smaller than 1. Thus, we can search in a space between 0.0 (e.g. no recurrent connections) and 1.0 (maximum recurrent connections). It is usually recommended to tune the input_scaling factor between 0.1 and 1.0. However, as this is strongly task-dependent, we decided to slightly increase the search space.

In [4]:
base_reg = ESNRegressor(k_in = 10, input_scaling = 0.1, spectral_radius = 0.0, bias = 0.0, leakage = 1.0, reservoir_size = 500, 
                   k_res = 10, reservoir_activation = 'tanh', teacher_scaling = 1.0, teacher_shift = 0.0, 
                   bi_directional = True, solver = 'ridge', beta = 1e-2, random_state = 1)

grid = {'input_scaling': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5], 
        'spectral_radius': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
       }

## Optimize input_scaling and spectral_radius

We use the ParameterGrid from scikit-learn, which converts the grid parameters defined before into a list of dictionaries for each parameter combination. 

We loop over each entry of the Parameter Grid, set the parameters in reg and fit our model on the training data. Afterwards, we report the MSE on the training and validation set.  

    The lowest training MSE: 0.000238243207656839; parameter combination: {'input_scaling': 0.4, 'spectral_radius': 0.5}
    The lowest validation MSE: 0.000223548432343247; parameter combination: {'input_scaling': 0.4, 'spectral_radius': 0.5}

We use the best parameter combination from the validation set.

As we can see in the python call, we have modified the training procedure: We use "partial_fit" in order to present the ESN all sequences independently from each other. The function "partial_fit" is part of the scikit-learn API. We have added one optional argument "update_output_weights". By default, it is True and thus, after feeding one sequence through the ESN, output weights are computed.

However, as this is computationally expensive, we can deactivate computing output weights after each sequence by setting "update_output_weights" to False. Now, we simply collect sufficient statistics for the later linear regression. To finish the training process, we call finalize() after passing all sequences through the ESN.

In [5]:
for params in ParameterGrid(grid):
    print(params)
    reg = clone(base_reg)
    reg.set_params(**params)
    for X in training_set:
        reg.partial_fit(X=X[:-1, :], y=X[1:, :], update_output_weights=False)
    reg.finalize()
    err_train = []
    for X in training_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        err_train.append(mean_squared_error(X[1:, :], y_pred))
    err_test = []
    for X in validation_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        err_test.append(mean_squared_error(X[1:, :], y_pred))
    print('{0}\t{1}'.format(np.mean(err_train), np.mean(err_test)))
        
    

{'input_scaling': 0.1, 'spectral_radius': 0.0}
0.016589532048833903	0.01748523486921971
{'input_scaling': 0.1, 'spectral_radius': 0.1}
0.007709785924933541	0.007746762176190139
{'input_scaling': 0.1, 'spectral_radius': 0.2}
0.003262465320384634	0.003129792702764861
{'input_scaling': 0.1, 'spectral_radius': 0.3}
0.0017897540901361687	0.0016728830387868702
{'input_scaling': 0.1, 'spectral_radius': 0.4}
0.0012150311984190773	0.0011213037256691764
{'input_scaling': 0.1, 'spectral_radius': 0.5}
0.0009659250154892994	0.0008883409761153083
{'input_scaling': 0.1, 'spectral_radius': 0.6}
0.0008687069363400937	0.0008014777814304316
{'input_scaling': 0.1, 'spectral_radius': 0.7}
0.0008635463972149178	0.0008016080245510459
{'input_scaling': 0.1, 'spectral_radius': 0.8}
0.0009361041697281352	0.0008764565921668792
{'input_scaling': 0.1, 'spectral_radius': 0.9}
0.0010967235688695044	0.001039255340276024
{'input_scaling': 0.1, 'spectral_radius': 1.0}
0.0014058216652793885	0.0013567150570479487
{'input

0.0005019693800363906	0.0005020359994038717
{'input_scaling': 0.9, 'spectral_radius': 0.3}
0.0004095160975112882	0.00041379598406662306
{'input_scaling': 0.9, 'spectral_radius': 0.4}
0.00040134544405467476	0.0004105226845364574
{'input_scaling': 0.9, 'spectral_radius': 0.5}
0.0004362796873213878	0.00045175299442955105
{'input_scaling': 0.9, 'spectral_radius': 0.6}
0.0005102718557682024	0.0005338608836306312
{'input_scaling': 0.9, 'spectral_radius': 0.7}
0.0006337145125580534	0.0006681268357974379
{'input_scaling': 0.9, 'spectral_radius': 0.8}
0.0008294846231495112	0.0008782822512568923
{'input_scaling': 0.9, 'spectral_radius': 0.9}
0.0011357602313460768	0.0012031394665568238
{'input_scaling': 0.9, 'spectral_radius': 1.0}
0.0016070234252433299	0.0016968590485489435
{'input_scaling': 1.0, 'spectral_radius': 0.0}
0.016271328319144487	0.017463862329435816
{'input_scaling': 1.0, 'spectral_radius': 0.1}
0.00107140421003329	0.0010737355711080809
{'input_scaling': 1.0, 'spectral_radius': 0.2}


## Update parameter of the basic ESN

After optimizing input_scaling and spectral_radius, we update our basic ESN with the identified values for input_scaling and spectral_radius. 

For the next optimization step, we jointly optimize bias and leakage.

We define the search space for bias and leakage. This is again done using best practice and background information from the literature: The bias often lies in a similar value range as the input scaling. Thus we use exactly the same search space as before. The leakage, the parameter of the leaky integration is defined in (0.0, 1.0]. Thus, we tune the leakage between 0.1 and 1.0.

In [6]:
base_reg = ESNRegressor(k_in = 10, input_scaling = 0.4, spectral_radius = 0.5, bias = 0.0, leakage = 1.0, reservoir_size = 500, 
                   k_res = 10, reservoir_activation = 'tanh', teacher_scaling = 1.0, teacher_shift = 0.0, 
                   bi_directional = True, solver = 'ridge', beta = 1e-2, random_state = 1)

grid = {'bias': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5], 
        'leakage': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
       }

## Optimize bias and leakage

The optimization workflow is exactly the same as before: We define a ParameterGrid, loop over each entry, set the parameters in reg and fit our model on the training data. Afterwards, we report the MSE on the training and validation set.  

    The lowest training MSE: 0.000229618469284352; parameter combination: {'bias': 0.8, 'leakage': 0.2}
    The lowest validation MSE: 0.000213898523704083; parameter combination: {'bias': 0.1, 'leakage': 0.2}

We use the best parameter combination from the validation set.

In [7]:
for params in ParameterGrid(grid):
    print(params)
    reg = clone(base_reg)
    reg.set_params(**params)
    for X in training_set:
        reg.partial_fit(X=X[:-1, :], y=X[1:, :], update_output_weights=False)
    reg.finalize()
    err_train = []
    for X in training_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        err_train.append(mean_squared_error(X[1:, :], y_pred))
    err_test = []
    for X in validation_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        err_test.append(mean_squared_error(X[1:, :], y_pred))
    print('{0}\t{1}'.format(np.mean(err_train), np.mean(err_test)))
    

{'bias': 0.0, 'leakage': 0.1}
0.010249419444392355	0.010694929335977462
{'bias': 0.0, 'leakage': 0.2}
0.007634640247970895	0.008019580554021867
{'bias': 0.0, 'leakage': 0.3}
0.006202359825832479	0.006553572448562299
{'bias': 0.0, 'leakage': 0.4}
0.00476240868743226	0.005027693247644061
{'bias': 0.0, 'leakage': 0.5}
0.0032390119072056023	0.003382260405944299
{'bias': 0.0, 'leakage': 0.6}
0.0019125084229791305	0.0019694307251265698
{'bias': 0.0, 'leakage': 0.7}
0.000994285875766011	0.0010024937425722345
{'bias': 0.0, 'leakage': 0.8}
0.0004872118065037238	0.00047597771412737365
{'bias': 0.0, 'leakage': 0.9}
0.00028600086080475914	0.0002715961629013839
{'bias': 0.0, 'leakage': 1.0}
0.00023824320765684012	0.00022354843234325005
{'bias': 0.1, 'leakage': 0.1}
0.010253821751560671	0.01070071678552123
{'bias': 0.1, 'leakage': 0.2}
0.007636276956516659	0.008020700023232323
{'bias': 0.1, 'leakage': 0.3}
0.0062053930397647505	0.006555662840512899
{'bias': 0.1, 'leakage': 0.4}
0.0047700479365708984

## Update parameter of the basic ESN

After optimizing bias and leakage, we update our basic ESN with the identified values for bias and leakage. 

Finally, we would quickly like to see whether the regularization parameter beta lies in the correct range.

Typically, it is rather difficult to find a proper search range. Here, we use a very rough logarithmic search space.

In [8]:
base_reg = ESNRegressor(k_in = 10, input_scaling = 0.4, spectral_radius = 0.5, bias = 0.1, leakage = 1.0, reservoir_size = 500, 
                   k_res = 10, reservoir_activation = 'tanh', teacher_scaling = 1.0, teacher_shift = 0.0, 
                   bi_directional = True, solver = 'ridge', beta = 1e-2, random_state = 1)

grid = {'beta': [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1, 1e0], 
       }

## Optimize beta

The optimization workflow is exactly the same as before: We define a ParameterGrid, loop over each entry, set the parameters in reg and fit our model on the training data. Afterwards, we report the MSE on the training and test set.  

    The lowest training MSE: 0.00012083938686566446; parameter combination: {'beta': 5e-4}
    The lowest validation MSE: 0.00011885985457347002; parameter combination: {'beta': 5e-3}

We use the best parameter combination from the validation set, because the regularization is responsible to prevent overfitting on the training set. In a running system, of course, we should determine the regularization on a separate validation set.

In [9]:
for params in ParameterGrid(grid):
    print(params)
    reg = clone(base_reg)
    reg.set_params(**params)
    for X in training_set:
        reg.partial_fit(X=X[:-1, :], y=X[1:, :], update_output_weights=False)
    reg.finalize()
    err_train = []
    for X in training_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        err_train.append(mean_squared_error(X[1:, :], y_pred))
    err_test = []
    for X in validation_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        err_test.append(mean_squared_error(X[1:, :], y_pred))
    print('{0}\t{1}'.format(np.mean(err_train), np.mean(err_test)))
    

{'beta': 1e-05}
0.00011971948963492298	0.00011887365607614947
{'beta': 5e-05}
0.00011971953972792966	0.00011887024002026733
{'beta': 0.0001}
0.0001197199148855893	0.00011885985457394135
{'beta': 0.0005}
0.00011985335435862776	0.00011869323792661982
{'beta': 0.001}
0.00012083938686551243	0.00011902382213449177
{'beta': 0.005}
0.00015525611945948047	0.000147550151721302
{'beta': 0.01}
0.0002296184692843593	0.00021389852370407565
{'beta': 0.05}
0.001468717890151173	0.0013746110993211873
{'beta': 0.1}
0.0038005653926923254	0.003670915948844293
{'beta': 0.5}
0.01475683280005029	0.01518164144729703
{'beta': 1.0}
0.018486210642812168	0.01925338491171024


## Update parameter of the basic ESN

After optimizing beta, we update our basic ESN with the identified value for beta.

Note that we have used almost the ideal value already in the beginning. Thus, the impact is rather small.

Next, we want to measure the classification accuracy. To do that, we compare several reservoir sizes as well as unidirectional and bidirectional architectures.

Because this is a rather small dataset, we can use rather small reservoir sizes and increase it up to 5000 neurons.

In [17]:
base_reg = ESNRegressor(k_in = 10, input_scaling = 0.4, spectral_radius = 0.5, bias = 0.1, leakage = 1.0, reservoir_size = 500, 
                   k_res = 10, reservoir_activation = 'tanh', teacher_scaling = 1.0, teacher_shift = 0.0, 
                   bi_directional = True, solver = 'ridge', beta = 5e-3, random_state = 1)

grid = {'reservoir_size': [500, 1000, 2000, 4000, 5000], 
        'bi_directional': [True]
       }

## Test the ESN

In the test case, we train the ESN using the entire training and validation set as seen before. Next, we compute the predicted outputs on the training, validation and test set and fix a threshold of 0.5, above a note is assumed to be predicted.

We report the accuracy score for each frame in order to follow the reference paper. 

As can be seen, the bidirectional mode has a very strong impact on the classification result.

In [23]:
from sklearn.metrics import accuracy_score
for params in ParameterGrid(grid):
    print(params)
    reg = clone(base_reg)
    reg.set_params(**params)
    for X in training_set + validation_set:
        reg.partial_fit(X=X[:-1, :], y=X[1:, :], update_output_weights=False)
    reg.finalize()
    err_train = []
    for X in training_set + validation_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        y_pred_bin = np.asarray(y_pred > 0.1, dtype=int)
        err_train.append(accuracy_score(y_true=X[1:, :], y_pred=y_pred_bin))
    err_test = []
    for X in test_set:
        y_pred = reg.predict(X=X[:-1, :], keep_reservoir_state=False)
        print(np.sum(y_pred, axis=0))
        y_pred_bin = np.asarray(y_pred > 0.1, dtype=int)
        err_test.append(accuracy_score(y_true=X[1:, :], y_pred=y_pred_bin))
    print('{0}\t{1}'.format(np.mean(err_train), np.mean(err_test)))
    

{'bi_directional': True, 'reservoir_size': 500}
[   0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.            6.01866676   36.47688378   37.68663084  -61.14086222
   51.86156938  -52.83853087   49.593491    -27.64517327    9.12419274
   65.03793307  142.07913416  109.59346424  108.01160537   75.3843315
  -36.6857703   -16.32784639  -25.0700788    -8.93352544 -282.96339922
   83.25300363 -107.47427433   76.85927141  -17.51388713 -136.52384675
  107.8788745    79.24087186  163.85768438  -60.54655061   86.54642076
   -3.88157629  -27.30526929   88.90043076 -104.80236222   -2.36999444
  -68.11301463  -87.65683868  -45.27762017  -14.53448006   97.00307415
   73.41427666   24.52858331 -292.62510723  134.53480879  -70.39639697
 -162.50969515  232.63533871   74.52960698  -25.54242869 -144.

    0.            0.            0.        ]
[   0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.           11.5366638    72.51654593   74.82229021 -121.73088783
  104.46438422 -104.90584068   98.89625714  -57.96393574   21.04995648
  126.39600121  282.69141993  216.92777642  213.68312024  151.34984133
  -74.25668961  -33.02252727  -49.54995736  -17.34828985 -563.05229023
  166.19449613 -215.12666829  152.88284994  -35.01413121 -270.3741961
  215.00773735  160.65812516  325.83978146 -117.34062502  171.33133872
   -5.23065045  -54.33015385  176.43308392 -209.00361833   -4.59670494
 -133.70736128 -173.45498011  -94.86765611  -31.13115848  197.51250069
  142.82489693   45.62296213 -585.08861275  267.67874332 -141.88051982
 -324.92122046  459.98141825  151.01211798  -51.38677351 -287.3670

[    0.             0.             0.             0.
     0.             0.             0.             0.
     0.             0.             0.             0.
     0.             0.             0.             0.
     0.             0.             0.             0.
     0.            51.675373     328.70172745   338.90981595
  -552.12401139   475.65118732  -475.20712091   448.66844132
  -267.28085345   100.19014219   567.86328794  1280.8368688
   981.15536682   966.64505388   687.86690444  -339.176828
  -151.55573498  -224.30828903   -78.18514246 -2551.90624097
   754.57317861  -977.13860822   692.71329053  -159.27832352
 -1223.96886798   974.75206383   733.12435666  1475.81308219
  -527.99621847   774.94220192   -19.39196616  -246.77167137
   798.51136428  -948.06258128   -20.77536927  -603.45757122
  -784.48277257  -437.93393221  -144.3025953    903.16155458
   640.67611333   201.62200182 -2656.39764066  1212.79254982
  -645.91118654 -1475.89799634  2079.58999836   688.31761563
  -233

[    0.             0.             0.             0.
     0.             0.             0.             0.
     0.             0.             0.             0.
     0.             0.             0.             0.
     0.             0.             0.             0.
     0.            26.87513333   195.57883065   201.51107861
  -329.55782497   294.77894616  -282.36746494   269.48971395
  -185.81754793    84.74036882   314.59719968   769.01595016
   578.41041661   567.09361834   424.11969102  -211.32274385
   -91.11722842  -130.85683888   -42.40582088 -1528.36845646
   454.90021292  -595.7400007    414.39647825   -95.03686455
  -719.92191258   587.65361771   464.20335996   884.46640755
  -283.67317176   456.76062549     8.59708019  -145.74174969
   475.6786797   -571.49695533   -10.97267426  -343.80511409
  -462.07367622  -302.1538771   -105.54933794   575.73892373
   361.26042036    93.69248989 -1614.25281211   726.83332015
  -401.5860148   -894.07149552  1222.43102712   437.27101493
  -

[   0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.            0.            0.            0.            0.
    0.           11.56854505   72.11889272   74.40262678 -121.07893188
  103.65241661 -104.3578074    98.33119224  -57.09642189   20.42606709
  126.17600131  281.0528674   215.84344109  212.67405056  150.22207659
  -73.67622386  -32.81092714  -49.34245351  -17.32363571 -559.81973974
  165.18546619 -213.6845148   152.01826545  -34.8228921  -269.06474515
  213.73600614  159.23394884  323.99393122 -117.32111508  170.47084626
   -5.63338032  -54.07729596  175.4550594  -207.71209937   -4.59767456
 -133.28068347 -172.65085019  -93.50697057  -30.56897078  195.63781015
  142.46390659   45.89049954 -581.19695243  266.15487659 -140.7961207
 -322.82210449  457.82703852  149.62013732  -50.92569518 -285.6465948
 -231.80447615  -33.60935903  -47.138465

KeyboardInterrupt: 