# TP 4: Neural networks (2)

## KALMOGO Benjamin

## 2. Jazz improvisation with recurrent neural networks

In this exercise, we will train a recurrent neural network in order to generate jazz music. Our network is trained on an existing jazz melody, and it learns to predict the next note based on the history of notes. Once the network is trained, we can use it to generate a new sequence of notes to in order to generate a piece of music.

Note: for this exercise you will need to install one extra module for music processing, called $music21$. You can install this module using $pip$, either within the Jupyter notebook or on the Acaconda prompt.

In [90]:
!pip install music21



Recurrent neural networks are perfectly suited for processing sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other, but this is not the case for  sequences. For example, in order to predict the next word in a sentence you need to know which words came before. RNNs are called recurrent because they perform the same task for every element of a sequence, producing an output based on the previous computations. Another way to think about RNNs is that they have a "memory" which captures information about what has been calculated so far. A graphical representation is given in the figure below. 

![Graphical representation of convolutional network](img/rnn.jpg)


The picture on the left shows the neural network with recurrent connection; the picture on the right shows the same recurrent network, "unrolled" for three different timesteps. For music generation the process is as follows: at each timestep, the current note in a sequence is submitted to the neural network. This note is represented as a one-hot vector: say that we have a set of 7 notes, $[do, re, mi, fa, sol, la, ti]$. The one-hot vector representation of $mi$ would then be $[0 0 1 0 0 0 0]$, i.e. all-zero, except for the position of the correct note. The recurrent neural network combines the current note with the representation of the context history (the preceding notes) in order to construct a representation for the current timestep. The representation of the current timestep may then be used for the prediction of the note in the next timestep (which can then be submitted to the network again in order to repeat the process).

Note that a number of more evolved recurrent architectures exist, such as long short term memory (LSTM) networks and gated recurrent unit (GRU) networks. These architectures allow us to more precisely model complex interactions between events that occur several timesteps apart.

Let's see how we can implement such a recurrent neural network for jazz improvisation. First, we'll import a number of necessary modules and helper functions.

In [91]:
from __future__ import print_function
from music21 import *
import numpy as np
from data_utils import *

Next, we'll load the necessary data files. $X$ and $y$ represent our training data, the remaining files are used for music generation according to the predictions of the network.

In [92]:
(X, y, n_values, chords,
 abstract_grammars, corpus, tones, 
 tones_indices, indices_tones) = load_music_data()

In [93]:
n_train = X.shape[0]
n_timesteps = X.shape[1]

print('number of training examples:', n_train)
print('Tx (length of sequence):', n_timesteps)
print('total # of unique values:', n_values)

number of training examples: 58
Tx (length of sequence): 20
total # of unique values: 78


Our data consists of 58 training examples. Each training example in $X$ is represented by a 20x78 matrix. This matrix represents a sequence of 20 notes, and each of the 20 notes can be one of 78 possibilities (represented as a one-hot vector).


In [94]:
print('shape of X:', X.shape)

shape of X: (58, 20, 78)


Each label in $y$ is a 78-valued one hot vector, representing the note that follows the 20 notes represented in X. The goal of the network is to predict the correct note in y given the 20 notes that precede it.


In [95]:
print('Shape of Y:', y.shape)

Shape of Y: (58, 78)


We will predict the correct note using a recurrent neural network. The RNN processes the sequence of 20 notes, which results in a hidden representation of the sequence. The hidden representation is then propagated to a dense layer with softmax activation, resulting in a probability distribution over the 78 possibilities for the next note.


In [147]:
model = keras.Sequential([
keras.layers.GRU(100, input_shape=(n_timesteps, n_values)),
keras.layers.Dense(256, activation = 'exponential'),
keras.layers.Dense(32, activation = 'relu'),
keras.layers.Dense(78, activation = 'softmax')
])

In [148]:
#activation='tanh', recurrent_activation='sigmoid',

Once our model is defined, we compile it.

In [149]:
model.compile(loss='categorical_crossentropy', optimizer='adam')

 And finally, we fit it to our data.

In [150]:
model.fit(X, y, batch_size=10, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7fa2455e6790>

Once our model is fitted, we can use it to predict sequences of notes. These sequences are then used by the function below in order to generate some jazz improvisation. The function below equally applies a number of post-processing steps, which are beyond the scope of this practical session.


In [151]:
out_stream = generate_music(model, chords, abstract_grammars,
                            corpus, tones, tones_indices,
                            indices_tones, X)

Predicting new values for different set of chords.
Generated 51 sounds (chord 1)
Generated 51 sounds (chord 2)
Generated 51 sounds (chord 3)
Generated 51 sounds (chord 4)
Generated 51 sounds (chord 5)
Your generated music is saved in output/my_music.midi


The code above generates a midi file in the directory $output$, based on the predictions of the network. The code below should display a widget to listen to the generated music.

In [146]:
mf = midi.MidiFile()
mf.open('output/my_music.midi')
mf.read()
mf.close()
s = midi.translate.midiFileToStream(mf)
s.show('midi')

In [55]:
s.show('midi')

Alternatively, you can use a midi player available on your computer to play the file, or if you don't have a midi player, you can use an online conversion tool that converts midi to mp3. Does the result sound convincing?

**Le resultat est moins convaincain**

### Exercise

Try to improve the neural network's performance. Try a GRU instead of an LSTM. Try to change the hidden representation size. Experiment with an extra recurrent layer. How does this change the loss? Is the resulting music better?

**Le resultat n'est pas terrible , ce pendant on a faible valeur de la losse avec la presente configuration.**