<a href="https://colab.research.google.com/github/callysthenes/python_deep_learning_mbd/blob/main/2023_Music_Generation_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Music generation with LSTM in Keras
In this notebook, we will **generate some piano compositions** using a Long Short-Term Memory (LSTM) network. We will use some piano compositions from Chopin to be able to train our network.  We will feed the network with **MIDI files**. These files are not audio files. They contain all the information, notes, chords, etc about a music composition, but they don't contain audio. Our network will be able to generate new MIDI files.


This notebook was created to be used in **Google Colaboratory**, so there are some lines of code specially dedicated to upload our files to Google Colab or to obtain some outputs  from the LSTM network in Google Drive. If you are not running this notebook on Google, set **run_on_colab** to false.

In [1]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True


**This notebook was inspired (and almost all the code comes from it) by [towardsdatascience](https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5)**

**Files used in this notebook can be found at [github](https://github.com/unmonoqueteclea/DeepLearning-Notebooks/tree/master/LSTM-Music-Generation)**

## LSTM networks
Long Short-Term Memory networks are one type of **Recurrent Neural Network (RNN)**. 
They are networks whose output depends on the previous ones. This loop behaviour makes them the perfect option to work with sequences and lists. 

If you want to know more about this type of netwrks read [this amazing post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) 

As we can see in [this very famous post](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) , a recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.


![RNN vs DNN t](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)




The problem comes when they have to deal with **long-term dependencies**. Although, teorically, they would be able to handle this dependencies, some researchers have found some pretty fundamental reasons why it might be difficult. LSTM are a special kind of RNN, capable of learning long-term dependencies. Actually, they are designed to remember information for long periods


## Music generation
To train our model, we will use [MIDI files](https://en.wikipedia.org/wiki/MIDI).
MIDI files contain **information about a music composition**, not the music itself. They contain information about notation, pitch, velocity, vibrato, panning, and clock signals (which set tempo).

There are some synthesizers which are able to transform this composition information into a real audio track. Our model will learn from these **MIDI files** and will be able to generate new ones. 

![MIDI file visualization](http://i1-win.softpedia-static.com/screenshots/Speedy-MIDI_1.png)


## Google drive configuration  (only Colab)
This will let us use our own **Google Drive** account to store files that can be used inside the Jupyter notebook. When you execute this cell, you will be prompted to visit a web, allow Google Colab to access to your Google Drive account, and copy the authorization code into the notebook.

In [None]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## Packages and data
Instead of using raw MIDI files, we will process them to obtain only the information we need and discard the rest. 

That's why we will use [**music21 package**](http://web.mit.edu/music21/). This package contains a set of tools that let us work with MIDI files easily. 
It creates its own representation of a MIDI file, with different **Note** of **Chord** objects representing all the music inside a MIDI file. It's a representation easier to read than the MIDI one, so it will help our network to *understand* music and be able to create new compositions.

Let's install it.

In [None]:
!pip install music21;

Collecting music21
[?25l  Downloading https://files.pythonhosted.org/packages/81/de/5af13438e28b80b41e1db0d6f082204fadccd3b1d90c1951568d92df7c68/music21-5.5.0.tar.gz (18.5MB)
[K    100% |████████████████████████████████| 18.5MB 2.0MB/s 
[?25hBuilding wheels for collected packages: music21
  Running setup.py bdist_wheel for music21 ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done
[?25h  Stored in directory: /root/.cache/pip/wheels/7b/21/95/d396f231b8095f30aba2a1fbffbc2411fb22eb4e611ddbed57
Successfully built music21
Installing collected packages: music21
Successfully installed music21-5.5.0


### Original MIDI files
 We have obtained  **MIDI files** from [piano-midi.de](http://www.piano-midi.de/midis/format0/). 
 
 We have downloaded all the MIDI files from Chopin. 50 files, that will be enough to train the network. 
 
 Feel free to train the network with other authors.
 
 If you are note using Google Colaboratory, be sure that you have a **midi_files.zip** file inside your working directory.
 
 **FILE DOWNLOAD: https://github.com/unmonoqueteclea/DeepLearning-Notebooks/tree/master/LSTM-Music-Generation**
 


In [None]:
if(run_on_colab):
  from google.colab import files
  files.upload()


Saving midi_files.zip to midi_files.zip


In [None]:
!unzip midi_files.zip;

## Processing data

Let's process the files, and load them into **music21**

In [None]:
# Importing dependencies
import glob
import pickle
import numpy
from music21 import converter, instrument, note, chord, stream
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Activation
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint

Using TensorFlow backend.


Let's see how **music21** represents music.
As we can see below, we have, two different kind of elements:
- **Notes**
- **Chords**

We also have the time offset of each element. This is the time when the note or chord must be played.

In [None]:
file = "midi_files/chopin/chpn-p9_format0.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord B3 E2> 0.0
<music21.chord.Chord E3 G#3> 0.0
<music21.note.Note B> 1/3
<music21.chord.Chord E3 G#3> 2/3
<music21.note.Note B> 1.0
<music21.chord.Chord E-3 F#3> 1.0
<music21.note.Note B> 1.0
<music21.note.Note B> 4/3
<music21.chord.Chord E-3 F#3> 5/3
<music21.note.Note B> 1.75


We will process all MIDI files obtaining data from each note of chord.

- If we  process a **note**, we will store in the list a string representing the pitch (the note name) and the octave.

- If we process a **chord** (Remember that chords are set of notes that are played at the same time) we will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **we are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, we will consider them.

We are creating a big list with all the elements of all the compositions.

In [None]:
notes = []
for i,file in enumerate(glob.glob("midi_files/chopin/*.mid")):
  midi = converter.parse(file)
  print('\r', 'Parsing file ', i, " ",file, end='')
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.parts[0].recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

 Parsing file  50   midi_files/chopin/chpn-p11-format0.mid

We obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [None]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

456

Now, there is some **data processing** that we have to do:

- We will map each pitch or chord to an integer
- We will create pairs of input sequences and its corresponding output note

We can try different **sequence_length** to obtain different results. In this first version, we will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 

![Sequence learning](https://unmonoqueteclea.github.io/assets/images/inputoutputsequences.png)

In [None]:
sequence_length = 100
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  sequence_in = notes[i:i + sequence_length] # Size sequence_length
  sequence_out = notes[i + sequence_length]  # Size 1
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in sequence_in])
  # Map integer of sequence_out to an integer
  network_output.append(note_to_int[sequence_out])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
# normalize input
network_input = network_input / float(n_vocab)
network_output = np_utils.to_categorical(network_output)

Let's see the new metwork_input size

In [None]:
network_input.shape

(55535, 100, 1)

## Creating model

Let's create the network. We will create a network with 9 layers (3 of them **LSTM layers**).

For regularization, we will also add 2 **Dropout** layers

In [None]:
def create_network(network_input, n_vocab):
    """ create the structure of the neural network """
    model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        return_sequences=True
    ))
    model.add(Dropout(0.3))
    model.add(LSTM(512, return_sequences=True))
    model.add(Dropout(0.3))
    model.add(LSTM(512))
    model.add(Dense(256))
    model.add(Dropout(0.3))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    return model

In [None]:
model = create_network(network_input,n_vocab)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 100, 512)          1052672   
_________________________________________________________________
dropout_1 (Dropout)          (None, 100, 512)          0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 512)          2099200   
_________________________________________________________________
dropout_2 (Dropout)          (None, 100, 512)          0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 512)               2099200   
_________________________________________________________________
dense_1 (Dense)              (None, 256)               131328    
_________________________________________________________________
dropout_3 (Dropout)          (None, 256)               0         
__________

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model. 

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.

In [None]:
# In case we want to use previously trained weights
weights = ""
if(len(weights)>0): model.load_weights(weights)

We will use **ModelCheckpoint**.

ModelCheckpoint will save our weights in a file after each epoch.

This way, we can start execution where we left if the training stops.

You can train as many epochs as you want. I have checked that, with **75 epochs**, the network is able to compose new interesting music. You can do this 75 epochs using Google Colab GPU in about 13 hours. As you can see, you don't need an extremly low loss, 0.7 or 0.8 is fine.





In [None]:
filepath = "/content/drive/My Drive/{epoch:02d}-{loss:.4f}.h5"

checkpoint = ModelCheckpoint(filepath, monitor='loss',verbose=0,
                             save_best_only=True,mode='min')

callbacks_list = [checkpoint]
model.fit(network_input, network_output, epochs=200, batch_size=64, 
          callbacks=callbacks_list)

## Music generation

Let's compose music!
We have renamed our last weights file as **final-weights.h5**.

 **FILE DOWNLOAD: https://github.com/unmonoqueteclea/DeepLearning-Notebooks/tree/master/LSTM-Music-Generation**

There can be generated songs that sound awful, but try to execute the generation
process several times and you will get interesting results.

![Music Generation](https://unmonoqueteclea.github.io/assets/images/lstm.png)

In [None]:
# In case we want to use other previously trained weights
weights = "final-weights.h5"
if(len(weights)>0): model.load_weights(weights)

In [None]:
# Generate network input again
network_input = []
output = []
for i in range(0, len(notes) - sequence_length, 1):
  sequence_in = notes[i:i + sequence_length]
  sequence_out = notes[i + sequence_length]
  network_input.append([note_to_int[char] for char in sequence_in])
  output.append(note_to_int[sequence_out])
n_patterns = len(network_input)

The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2




In [None]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = numpy.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = network_input[start]
prediction_output = []
# generate 500 notes
for i,note_index in enumerate(range(500)):
  prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
  prediction_input = prediction_input / float(n_vocab)
  prediction = model.predict(prediction_input, verbose=0)
  index = numpy.argmax(prediction)
  result = int_to_note[index]
  print('\r', 'Predicted ', i, " ",result, end='')
  prediction_output.append(result)
  pattern.append(index)
  pattern = pattern[1:len(pattern)]

 Predicted  499   E-5

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [None]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'