# Music generation Using Deep Learning  
### Machine Learning course Project  

This is the project or the machine learning course at UL FRI in semester 1 2022/23.
The aim of this project is to take MIDI files and train a deep neural network to generate MIDI files of new music.
The main paper I am taking inspiration from is "This time with feeling: learning expressive musical performance" by Oore et. al, published online on 2018: https://doi.org/10.1007/s00521-018-3758-9.  

This has several sections.
First there is some data exploration, then model building, then training and then generating and saving new MIDI files.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import sys; sys.path.append(".")
import os
import mido
import random

In [2]:
# set random seeds for reproducability
random.seed(0)
np.random.seed(0)
tf.random.set_seed(0)

## The Data  
International Piano-e-Competition™ (https://piano-e-competition.com/default.asp), is a competition where played on a disklavier, which is a real piano that simultaneously records what is played as MIDI data.
The MIDI files are made freely available online (more specifically I obtained them from this github repository https://github.com/studioph/international-e-piano-dataset).

This section is about analysing, exploring and preprocessing this data to be used in subsequent sections.

In [3]:
test_file = mido.MidiFile("midi/ADIG01.mid")
test_file

MidiFile(type=0, ticks_per_beat=480, tracks=[
  MidiTrack([
    MetaMessage('sequencer_specific', data=(67, 113, 0, 0, 0, 67), time=0),
    MetaMessage('sequencer_specific', data=(67, 113, 0, 1, 0, 1, 0, 83, 111, 110, 97, 116, 97, 32, 105, 110, 32, 70, 32, 32, 32, 32, 32, 77, 46, 32, 65, 100, 105, 103, 101, 122, 97, 108, 122, 97, 100, 101, 32), time=0),
    MetaMessage('sequencer_specific', data=(67, 123, 12, 0, 1), time=0),
    MetaMessage('set_tempo', tempo=512820, time=0),
    MetaMessage('time_signature', numerator=4, denominator=4, clocks_per_click=24, notated_32nd_notes_per_beat=8, time=0),
    Message('sysex', data=(126, 127, 9, 1), time=0),
    Message('sysex', data=(67, 16, 76, 0, 0, 126, 0), time=1),
    Message('control_change', channel=0, control=0, value=0, time=4),
    Message('control_change', channel=0, control=32, value=0, time=1),
    Message('program_change', channel=0, program=0, time=1),
    Message('control_change', channel=0, control=7, value=100, time=1),
    Me

In [4]:
np.unique([x.type for x in test_file.tracks[0]])

array(['control_change', 'end_of_track', 'note_on', 'polytouch',
       'program_change', 'sequencer_specific', 'set_tempo', 'sysex',
       'time_signature'], dtype='<U18')

In [5]:
test_file.tracks[0][150:165]

MidiTrack([
  Message('control_change', channel=0, control=64, value=47, time=3),
  Message('control_change', channel=0, control=64, value=0, time=36),
  Message('note_on', channel=0, note=69, velocity=74, time=3),
  Message('note_on', channel=0, note=65, velocity=60, time=12),
  Message('note_on', channel=0, note=41, velocity=60, time=1),
  Message('polytouch', channel=0, note=73, value=127, time=18),
  Message('polytouch', channel=0, note=73, value=0, time=34),
  Message('note_on', channel=0, note=70, velocity=70, time=258),
  Message('note_on', channel=0, note=50, velocity=57, time=18),
  Message('note_on', channel=0, note=41, velocity=0, time=15),
  Message('note_on', channel=0, note=69, velocity=0, time=33),
  Message('polytouch', channel=0, note=50, value=22, time=240),
  Message('note_on', channel=0, note=74, velocity=66, time=2),
  Message('note_on', channel=0, note=50, velocity=0, time=3),
  Message('note_on', channel=0, note=46, velocity=53, time=13)])

In [6]:
for message in test_file.tracks[0]:
    if message.time == 10 and message.type in ["note_on", "note_off"]:
        print(message, message.note)

note_on channel=0 note=53 velocity=52 time=10 53
note_on channel=0 note=58 velocity=52 time=10 58
note_on channel=0 note=65 velocity=0 time=10 65
note_on channel=0 note=69 velocity=0 time=10 69
note_on channel=0 note=69 velocity=65 time=10 69
note_on channel=0 note=65 velocity=60 time=10 65
note_on channel=0 note=69 velocity=66 time=10 69
note_on channel=0 note=65 velocity=58 time=10 65
note_on channel=0 note=65 velocity=69 time=10 65
note_on channel=0 note=65 velocity=0 time=10 65
note_on channel=0 note=61 velocity=86 time=10 61
note_on channel=0 note=69 velocity=77 time=10 69
note_on channel=0 note=53 velocity=68 time=10 53
note_on channel=0 note=79 velocity=0 time=10 79
note_on channel=0 note=76 velocity=78 time=10 76
note_on channel=0 note=74 velocity=0 time=10 74
note_on channel=0 note=57 velocity=66 time=10 57
note_on channel=0 note=59 velocity=75 time=10 59
note_on channel=0 note=56 velocity=69 time=10 56
note_on channel=0 note=74 velocity=0 time=10 74
note_on channel=0 note=59 

In [7]:
print(test_file.ticks_per_beat, test_file.ticks_per_beat*30/4)

480 3600.0


In [8]:
print("type:" + str(test_file.type), "; num. tracks: " + str(len(test_file.tracks)), " ; ", str(test_file.length) + " seconds", "; ", str(len(test_file.tracks[0])) + "messages")
    

type:0 ; num. tracks: 1  ;  261.4196103749941 seconds ;  7874messages


In [9]:
midi_dir = os.listdir("midi/")
print(f'{len(midi_dir)} files')

2431 files


## The Model  
Using Keras (from the tensorflow library), we can build a model with a few lines of code.
In the reference paper by Oore et. al they used a model with an input layer that takes a one-hot 413-dimensional vector, then has three hidden LSTM layers with 512 cells each, then followed by an output layer, which is similar to the input layer.

Following keras documentation: https://keras.io/api/models/model/  
and: https://keras.io/examples/nlp/lstm_seq2seq/

In [109]:
# define input shape: 128 note-on events; 128 note-off events;
# The reference paper also included 125 time-shift events and 32 velocity events, however I am using a simplified model
input_size = 128 + 128 + 125
num_messages = 800
inputs = keras.layers.Input(shape=(input_size, num_messages), name="Input")

lstm0, state_h, state_c = keras.layers.LSTM(512, name="LSTM-0", return_sequences=True, return_state=True) (inputs)
zero_states = [state_h, state_c]

# The reference paper uses 3 LSTM layers, I use 2 for simplicity and faster training
lstm1_layer= keras.layers.LSTM(512, name="LSTM-1", return_sequences=True, return_state=True)
lstm1, state_h, state_c = lstm1_layer(lstm0, initial_state=zero_states)
first_states = [state_h, state_c]

inputs3 = keras.layers.Input(shape=(input_size, num_messages), name="Input3")
lstm3, _, _ = keras.layers.LSTM(512, name="LSTM-3", return_sequences=True, return_state=True) (inputs3, initial_state=first_states)

outputs = keras.layers.Dense(num_messages, name="Output", activation="softmax") (lstm3)

model = keras.Model(inputs=[inputs, inputs3], outputs=outputs)
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.RMSprop(learning_rate=0.01))

## Training  
we loop over a subset of training data, in order to make training faster.

In [20]:
# set random seeds for reproducability
random.seed(0)
np.random.seed(0)
tf.random.set_seed(0)

In [12]:
training_files = np.random.choice(midi_dir, 100)
print(training_files)

['ParkJH05.mid' 'KaszoS07.mid' 'Jussow01.mid' 'Richardson02.mid'
 'Kociuban14.mid' 'ChowK05.mid' 'Savitski09.mid' 'Shamray03.mid'
 'ZhangH06.mid' 'Izzard01.mid' 'Wang08.mid' 'Woo05.mid'
 'PrjevalskayaM10.mid' 'GonzalezJ01.mid' 'Verbaite03.mid' 'Staupe01.mid'
 'WangY02.mid' 'KiselevaD16.mid' 'JohannsonP24.mid' 'Tetzloff08.mid'
 'Shilyaev03.mid' 'BENABD09.mid' 'Tario07.mid' 'JohannsonP24.mid'
 'KaiRuiR08.mid' 'HuangSW06.mid' 'FALIKS01.mid' 'Huang12.mid'
 'GonzalezJ08.mid' 'JeonH06.mid' 'Yeletskiy04.mid' 'BuiJL02.mid'
 'Lee_E03.mid' 'Tsianos02.mid' 'KanM02.mid' 'Wilshire04.mid'
 'Wilshire04.mid' 'Sun02.mid' 'Song05.mid' 'JohannsonP25.mid'
 'ChowK01.mid' 'ZhaoK02.mid' 'Eras01.mid' 'Park09.mid' 'Mizumoto02.mid'
 'KimG06.mid' 'ChernovA22.mid' 'LiYZ08.mid' 'Tysman09.mid' 'YuP04.mid'
 'Falzone02.mid' 'BENABD01.mid' 'SOLOM05.mid' 'Nikiforov01.mid'
 'Avila01.mid' 'Yang03.mid' 'Denisova09.mid' 'KabuliL01.mid'
 'KotysV16.mid' 'Huang12.mid' 'TET01.mid' 'Lee02.mid' 'KiselevaD12.mid'
 'LeeSH14.mid' '

In [13]:
def convertMidiToArray(messages, num_messages = -1, input_size=128*2):
    ret_arr = np.empty((input_size,0), np.int8)
    current_message = 0
    template_vector = np.zeros((input_size,1), np.int8)
    for m in messages:
        if num_messages >= 0 and current_message >= num_messages/2.0:
            break

        if (m.type == "note_on"):
            # add time shift vector
            tmp_vector = template_vector.copy()
            tmp_vector[128*2+int(min(m.time/8.0, 124))] = 1
            ret_arr = np.hstack((ret_arr, tmp_vector))

            # add note vector
            tmp_vector = template_vector.copy()
            tmp_vector[m.note] = 1
            ret_arr = np.hstack((ret_arr, tmp_vector))
            current_message += 1
        
        elif (m.type == "note_off"):
            # add time shift vector
            tmp_vector = template_vector.copy()
            tmp_vector[128*2+int(min(m.time/8.0, 124))] = 1
            ret_arr = np.hstack((ret_arr, tmp_vector))

            tmp_vector = template_vector.copy()
            tmp_vector[m.note+128] = 1
            ret_arr = np.hstack((ret_arr, tmp_vector))
            current_message += 1
        
        
    
    return ret_arr.reshape((1, input_size, -1))

In [14]:
training_data = np.empty((0, input_size, num_messages))
for file in training_files:
    tmp = mido.MidiFile(f"midi/{file}")
    
    test_if_valid = np.unique([x.type for x in tmp.tracks[0]])
    if not ("note_on" in test_if_valid and "note_off" in test_if_valid):
        continue
    new_data = convertMidiToArray(tmp.tracks[0], num_messages=num_messages, input_size=input_size)
    print(training_data.shape, new_data.shape)
    if (new_data.shape == (1, input_size, num_messages)):
        training_data = np.vstack((training_data, new_data))

(0, 381, 800) (1, 381, 800)
(1, 381, 800) (1, 381, 800)
(2, 381, 800) (1, 381, 800)
(3, 381, 800) (1, 381, 800)
(4, 381, 800) (1, 381, 800)
(5, 381, 800) (1, 381, 800)
(6, 381, 800) (1, 381, 800)
(7, 381, 800) (1, 381, 800)
(8, 381, 800) (1, 381, 800)
(9, 381, 800) (1, 381, 800)
(10, 381, 800) (1, 381, 800)
(11, 381, 800) (1, 381, 800)
(12, 381, 800) (1, 381, 800)
(13, 381, 800) (1, 381, 800)
(14, 381, 800) (1, 381, 800)
(15, 381, 800) (1, 381, 800)
(16, 381, 800) (1, 381, 800)
(17, 381, 800) (1, 381, 800)
(18, 381, 800) (1, 381, 800)
(19, 381, 800) (1, 381, 800)
(20, 381, 800) (1, 381, 800)
(21, 381, 800) (1, 381, 800)
(22, 381, 800) (1, 381, 800)
(23, 381, 800) (1, 381, 800)
(24, 381, 800) (1, 381, 800)
(25, 381, 800) (1, 381, 800)
(26, 381, 800) (1, 381, 800)
(27, 381, 800) (1, 381, 800)
(28, 381, 800) (1, 381, 800)
(29, 381, 800) (1, 381, 800)
(30, 381, 800) (1, 381, 800)
(31, 381, 800) (1, 381, 800)
(32, 381, 800) (1, 381, 800)
(33, 381, 800) (1, 381, 800)
(34, 381, 800) (1, 381, 

In [15]:
training_data.shape

(91, 381, 800)

In [16]:
target_data = np.zeros(training_data.shape)
target_data[:, :, 0:-1] = training_data[:, :, 1:]

In [98]:
# set random seeds for reproducability
random.seed(0)
np.random.seed(0)
tf.random.set_seed(0)

In [99]:
model.fit([training_data, training_data], target_data, batch_size=64, epochs=20, validation_split=0.2)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x238a25f0eb0>

In [102]:
model.save("test1")



INFO:tensorflow:Assets written to: test1\assets


INFO:tensorflow:Assets written to: test1\assets


## Generating predictions  
Now we can make the model 'predict' in order to make it generate new music.

Relying on Keras documentation examples for code: https://keras.io/examples/nlp/lstm_seq2seq/

In [103]:
model.layers

[<keras.engine.input_layer.InputLayer at 0x238a724bbb0>,
 <keras.engine.input_layer.InputLayer at 0x238a71a75b0>,
 <keras.layers.rnn.lstm.LSTM at 0x238a369f340>,
 <keras.layers.rnn.lstm.LSTM at 0x238a73eabc0>,
 <keras.layers.core.dense.Dense at 0x238bf0fa0b0>]

In [104]:
model = keras.models.load_model("test1")

encoder_inputs = model.input[0]  # input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output  # lstm_1
encoder_states = [state_h_enc, state_c_enc]
encoder_model = keras.Model(encoder_inputs, encoder_states)

decoder_inputs = model.input[1]  # input_2
decoder_state_input_h = keras.Input(shape=(512,))
decoder_state_input_c = keras.Input(shape=(512,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[3]
decoder_outputs, state_h_dec, state_c_dec = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs
)
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[4]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = keras.Model(
    [decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states
)

In [105]:
def decode_sequence(input_seq = np.zeros((1, input_size, num_messages))):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    ret_seq = np.empty((input_size,0), np.int8)
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, input_size, num_messages))
    # Populate the first character of target sequence with the start character.
    target_seq[0, np.random.randint(35,100), 0] = 1

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    current_messages = 0
    output_tokens, h, c = decoder_model.predict([target_seq] + states_value)
    ret_seq = output_tokens
    # while not stop_condition:
    #     output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

    #     # Sample a token
    #     sampled_token_index = np.argmax(output_tokens[0, -1, :])
    #     current_messages += 1

    #     print(current_messages)
    #     # Exit condition: either hit max length
    #     # or find stop character.
    #     if current_messages > num_messages:
    #         stop_condition = True

    #     # Update the target sequence (of length 1).
    #     target_seq = np.zeros((1, input_size, 1))
    #     target_seq[0, sampled_token_index, 0] = 1

    #     ret_seq = np.hstack((ret_seq, target_seq[0]))

    #     # Update states
    #     states_value = [h, c]
    return ret_seq

In [106]:
# set random seeds for reproducability
random.seed(0)
np.random.seed(0)
tf.random.set_seed(0)

In [107]:
output_sequence = decode_sequence()



In [94]:
print(output_sequence.shape)

(1, 381, 800)


## Exporting  
Now we have to convert the output back to the midi file.

In [95]:
def convertArrayToMidi(arr, filename = None):
    ret_file = mido.MidiFile(type=0) # create empty midi file
    track = mido.MidiTrack()
    ret_file.tracks.append(track)

    # track.append(mido.Message('program_change', program=12, time=0))
    time = 0
    prev_was_time_event = True
    for col in range(arr.shape[-1]):
        tmp_col = arr[:, col]
        tmp_note = np.argmax(tmp_col)

        if tmp_note < 128: # note on event
            if not (prev_was_time_event):
                time = 32
                print("no time")
            track.append(mido.Message('note_on', note=tmp_note, velocity=64, time=time))
            time = 0
            prev_was_time_event = False
        elif tmp_note >= 128 and tmp_note < 128*2:
            if not (prev_was_time_event):
                print("no time")
                time = 32
            track.append(mido.Message('note_off', note=tmp_note-128, velocity=127, time=time))
            time = 0
            prev_was_time_event = False
        else:
            time += int((tmp_note-128*2)*8*2)
            prev_was_time_event = True
        
    

    if (filename is not None):
        ret_file.save(filename)
    
    return ret_file

In [96]:
print(training_files[0])
_ = convertArrayToMidi(training_data[0], "ParkJH05-decoded.mid")

ParkJH05.mid


In [108]:
ret_midi = convertArrayToMidi(output_sequence[0], "test1.mid")
ret_midi.ticks_per_beat = mido.MidiFile(f'midi/{training_files[0]}').ticks_per_beat
ret_midi.save("test1.mid")

no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time
no time


In [50]:
testing = mido.MidiFile(f'midi/{training_files[0]}')

In [51]:
testing

MidiFile(type=0, ticks_per_beat=384, tracks=[
  MidiTrack([
    MetaMessage('track_name', name='SchumannOp16_2ParkJooHyeon', time=0),
    MetaMessage('text', text='2014 Alaska International Piano-e-Competition', time=0),
    MetaMessage('text', text='captured in June and July 2014 in Alaska', time=0),
    MetaMessage('text', text='http://www.piano-e-competition.com', time=0),
    MetaMessage('text', text='standard-resolution version', time=0),
    MetaMessage('text', text='File processed for distribution with software designed and coded by Dr. John Q. Walker', time=0),
    MetaMessage('text', text='Title: Kreisleriana, Op. 16: II. Sehr inning und nicht zu rasch', time=0),
    MetaMessage('text', text='Composer: Robert Alexander Schumann', time=0),
    MetaMessage('text', text='Performer: Joo Hyeon Park', time=0),
    MetaMessage('instrument_name', name='Yamaha Disklavier Pro Mark IV concert grand piano, model DCFIIISM4PRO', time=0),
    MetaMessage('sequencer_specific', data=(67, 113, 