# Music generation with LSTM in Keras

In [None]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

## Packages and data
Instead of using raw MIDI files, we will process them to obtain only the information we need and discard the rest. 

That's why we will use [**music21 package**](http://web.mit.edu/music21/). This package contains a set of tools that let us work with MIDI files easily. 
It creates its own representation of a MIDI file, with different **Note** of **Chord** objects representing all the music inside a MIDI file. It's a representation easier to read than the MIDI one, so it will help our network to *understand* music and be able to create new compositions.

Let's install it.

In [2]:
!pip install music21;

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
if(run_on_colab):
  from google.colab import files
  files.upload()


Saving xmusic.zip to xmusic.zip


In [4]:
!unzip xmusic.zip;

Archive:  xmusic.zip
   creating: xmusic/
  inflating: xmusic/AnimalSong.mid   
  inflating: xmusic/AOD3.mid         
  inflating: xmusic/AOD4.mid         
  inflating: xmusic/AOD5.mid         
  inflating: xmusic/Bearclaw.mid     
  inflating: xmusic/bjorkoh_so_quiet.mid  
  inflating: xmusic/BloodBrother.mid  
  inflating: xmusic/BuffaloDance.mid  
  inflating: xmusic/Cerosimo.mid     
  inflating: xmusic/Crystals.mid     
  inflating: xmusic/DistantDrums.mid  
  inflating: xmusic/DrumsofThunder.mid  
  inflating: xmusic/duran_duranrelax.mid  
  inflating: xmusic/EarthandSky.mid  
  inflating: xmusic/EarthMother.mid  
  inflating: xmusic/FallingWater.mid  
  inflating: xmusic/Firefly.mid      
  inflating: xmusic/FourColors.mid   
  inflating: xmusic/FrankMills_relax.mid  
  inflating: xmusic/GreatDivide.mid  
  inflating: xmusic/GreenbriarGlen.mid  
  inflating: xmusic/HealingDoors.mid  
  inflating: xmusic/HighNoon.mid     
  inflating: xmusic/Lightwaves.mid   
  inflating: xmusic/

## Processing data

Let's process the files, and load them into **music21**

In [5]:
# Importing dependencies
import glob
import pickle
import numpy
from music21 import converter, instrument, note, chord, stream
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Activation
from keras.utils import np_utils
from keras.callbacks import ModelCheckpoint

Let's see how **music21** represents music.
As we can see below, we have, two different kind of elements:
- **Notes**
- **Chords**

We also have the time offset of each element. This is the time when the note or chord must be played.

In [6]:
file = "/content/xmusic/Crystals.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)



<music21.chord.Chord C4 E4> 0.5
<music21.note.Note F> 0.5
<music21.chord.Chord E4 C4> 5/3
<music21.note.Note F> 2.25
<music21.chord.Chord C6 A5 D6> 2.5
<music21.chord.Chord D4 B3> 8/3
<music21.note.Note F> 3.25
<music21.chord.Chord D4 B3> 3.75
<music21.note.Note F> 4.0
<music21.chord.Chord C6 A5 D6> 4.0


In [7]:
notes = []
for i,file in enumerate(glob.glob("/content/xmusic/*.mid")):
  midi = converter.parse(file)
  print('\r', 'Parsing file ', i, " ",file, end='')
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.parts[0].recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

 Parsing file  0   /content/xmusic/MH3.mid



 Parsing file  1   /content/xmusic/BuffaloDance.mid



 Parsing file  5   /content/xmusic/Crystals.mid



 Parsing file  6   /content/xmusic/SkyDog.mid



 Parsing file  9   /content/xmusic/EarthandSky.mid



 Parsing file  10   /content/xmusic/Reach.mid



 Parsing file  12   /content/xmusic/NHeaven.mid



 Parsing file  16   /content/xmusic/OctoberMoon.mid



 Parsing file  18   /content/xmusic/FallingWater.mid



 Parsing file  19   /content/xmusic/Firefly.mid



 Parsing file  20   /content/xmusic/TrailofTears.mid



 Parsing file  24   /content/xmusic/RELAX.mid



 Parsing file  26   /content/xmusic/OhSoQuiet_Bjork.mid



 Parsing file  29   /content/xmusic/MaidenoftheWood.mid



 Parsing file  30   /content/xmusic/FourColors.mid



 Parsing file  32   /content/xmusic/Metukweasyn.mid



 Parsing file  35   /content/xmusic/quiet_man.mid



 Parsing file  38   /content/xmusic/oh_so_quiet.mid



 Parsing file  41   /content/xmusic/NExp.mid



 Parsing file  46   /content/xmusic/MysticWaters.mid



 Parsing file  52   /content/xmusic/duran_duranrelax.mid



 Parsing file  55   /content/xmusic/MH4.mid



 Parsing file  57   /content/xmusic/Cerosimo.mid



 Parsing file  59   /content/xmusic/DrumsofThunder.mid



 Parsing file  62   /content/xmusic/Winter94.mid

We obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [8]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

217

In [9]:
sequence_length = 100
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  sequence_in = notes[i:i + sequence_length] # Size sequence_length
  sequence_out = notes[i + sequence_length]  # Size 1
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in sequence_in])
  # Map integer of sequence_out to an integer
  network_output.append(note_to_int[sequence_out])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
# normalize input
network_input = network_input / float(n_vocab)
network_output = np_utils.to_categorical(network_output)

Let's see the new metwork_input size

In [10]:
network_input.shape

(37225, 100, 1)

## Creating model

Let's create the network. We will create a network with 9 layers (3 of them **LSTM layers**).

For regularization, we will also add 2 **Dropout** layers

In [11]:
def create_network(network_input, n_vocab):
    """ create the structure of the neural network """
    model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        return_sequences=True
    ))
    model.add(Dropout(0.3))
    model.add(LSTM(512, return_sequences=True))
    model.add(Dropout(0.3))
    model.add(LSTM(512))
    model.add(Dense(256))
    model.add(Dropout(0.3))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    return model

In [12]:
model = create_network(network_input,n_vocab)
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm (LSTM)                 (None, 100, 512)          1052672   
                                                                 
 dropout (Dropout)           (None, 100, 512)          0         
                                                                 
 lstm_1 (LSTM)               (None, 100, 512)          2099200   
                                                                 
 dropout_1 (Dropout)         (None, 100, 512)          0         
                                                                 
 lstm_2 (LSTM)               (None, 512)               2099200   
                                                                 
 dense (Dense)               (None, 256)               131328    
                                                                 
 dropout_2 (Dropout)         (None, 256)               0

In [13]:
# In case we want to use previously trained weights
weights = ""
if(len(weights)>0): model.load_weights(weights)

In [14]:
filepath = "/content/drive/My Drive/{epoch:02d}-{loss:.4f}.h5"

checkpoint = ModelCheckpoint(filepath, monitor='loss',verbose=0,
                             save_best_only=True,mode='min')

callbacks_list = [checkpoint]
model.fit(network_input, network_output, epochs=100, batch_size=64, 
          callbacks=callbacks_list)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x7fd2714bf0d0>

In [17]:
# In case we want to use other previously trained weights
weights = "/content/drive/My Drive/54-0.2086.h5"
if(len(weights)>0): model.load_weights(weights)

In [18]:
# Generate network input again
network_input = []
output = []
for i in range(0, len(notes) - sequence_length, 1):
  sequence_in = notes[i:i + sequence_length]
  sequence_out = notes[i + sequence_length]
  network_input.append([note_to_int[char] for char in sequence_in])
  output.append(note_to_int[sequence_out])
n_patterns = len(network_input)

In [39]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = numpy.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = network_input[start]
prediction_output = []
# generate 500 notes
for i,note_index in enumerate(range(500)):
  prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
  prediction_input = prediction_input / float(n_vocab)
  prediction = model.predict(prediction_input, verbose=0)
  index = numpy.argmax(prediction)
  result = int_to_note[index]
  print('\r', 'Predicted ', i, " ",result, end='')
  prediction_output.append(result)
  pattern.append(index)
  pattern = pattern[1:len(pattern)]

 Predicted  499   D5

In [40]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test.mid')

'x0.mid'