# MeloDeep

In this project, we will be using Deep Learning with Keras to generate new music

## Dependencies
1. **numpy** for numbers / matrixes manipulation
2. **music21** for midi file parsing

In [1]:
# dependencies
import numpy as np
from music21 import *

## Initialization of constants
The output size of our neural net is 129, as midi contains 128 notes C0(0) - G9(127), and we add additional 1 for rest note

In [2]:
OUTPUT_SIZE = 129

## Parsing of features
Firstly, we parse and extract the melody portion from our training file. As of now, we are not able to generate music with multiple tracks (eg: melody + harmony, or different instruments). This is something to consider in the future. We retain only the musical notes and rest nodes from the melody track. 

In [3]:
mid = converter.parse('qinghuaci.mid')  # can change to read from terminal next time
melody = mid.parts[0]  # the first part is the melody

In [4]:
all_sounds = []
is_beginning_rest = True
for i in melody.notesAndRests:
    if i.isRest and is_beginning_rest:  # remove beginning silence
        is_beginning_rest = False
        continue
    all_sounds.append(i)

Next we perform one hot encoding on all the notes we obtain from previous steps. Basically we encode a certain sound or rest note to be an array of all zeros with the index of the pitch set as 1. e.g. a C0 note will be [1 0 0 . . . . 0] and a rest note will be [0 0 0 .... 0 0 1]. Notice I didn't process Chord since it contains multiple note, which will make this a **multi-label classification** problem and it will become more complicated... something to consider in the future again

In [5]:
features = []
for sound in all_sounds:
    one_hot_format = [0 for i in range(OUTPUT_SIZE)]
    if sound.isRest:
        one_hot_format[-1] = 1
    elif sound.isNote:
        one_hot_format[sound.pitch.midi] = 1
#     elif sound.isChord:
#         for pitch in sound.pitches:
#             one_hot_format[pitch.midi] = 1
    features.append(one_hot_format)

In [6]:
print np.argmax(features[0])
print np.argmax(features[1])
print np.argmax(features[2])
print np.argmax(features[3])
print np.argmax(features[4])

76
78
76
73
71


## Generating training data sets
After we get our list of sound notes in one-hot encoded format, we start generating our training data set. The thing that we want to achieve, is that through a certain number of music note, we want to predict the note that most likely appear in the next one. For example, using a window of 10, we want to predict the 11th note using 1st to 10th note. We can split our parsed sound notes into such format.

In [7]:
WINDOWS = 10  # use 10 notes to predict the 11th note
X, y = [], []
for i in range(len(features) - WINDOWS):
    X.append(features[i:(i+WINDOWS)])
    y.append(features[i+WINDOWS])
    
seed = features[0:WINDOWS]  # the seed here will be used to generate new music later

In [8]:
X = np.asarray(X)
y = np.asarray(y)

print X.shape
print y.shape

(515, 10, 129)
(515, 129)


## Building the LSTM model
Now we have our training data ready, we can start building the LSTM model!

In [9]:
# build LSTM model here!
from keras.layers import LSTM, Activation, Dropout, Dense
from keras.models import Sequential
from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam

Using Theano backend.


The model we are building is a sequential feed forward network.
We stack 2 *LSTM* layers in the beginning (to generate a more complicated network), with a *dropout* layer after each (to prevent overfitting). After that is just a normal *dense* layer connected to the output layer. We use *softmax* activation in the end to get the probability of each index being activated. Also, we use *categorical_crossentropy* as loss function as it will increase probability of one and decrease the rest, and we can check how accurate is it compare to the training label.

TODO: 
I mentioned that we skip chords as it contains multiple notes. If we want to include multi-label into this process as well we need to change the activation function to *sigmoid* and loss function to *binary_crossentropy* so that it can adapt to multi-label classification

In [10]:
model = Sequential()
model.add(LSTM(OUTPUT_SIZE, input_shape=(WINDOWS, OUTPUT_SIZE), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(OUTPUT_SIZE, input_shape=(WINDOWS, OUTPUT_SIZE), return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(OUTPUT_SIZE))
model.add(Activation('softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 10, 129)           133644    
_________________________________________________________________
dropout_1 (Dropout)          (None, 10, 129)           0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 129)               133644    
_________________________________________________________________
dropout_2 (Dropout)          (None, 129)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 129)               16770     
_________________________________________________________________
activation_1 (Activation)    (None, 129)               0         
Total params: 284,058
Trainable params: 284,058
Non-trainable params: 0
_________________________________________________________________


Training......

In [11]:
model.compile(optimizer=RMSprop(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X, y, batch_size=20, epochs=200, verbose=1)  # no need to care about validation? :) 

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

Epoch 89/200
Epoch 90/200
Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200
Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200
Epoch 103/200
Epoch 104/200
Epoch 105/200
Epoch 106/200
Epoch 107/200
Epoch 108/200
Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200
Epoch 115/200
Epoch 116/200
Epoch 117/200
Epoch 118/200
Epoch 119/200
Epoch 120/200
Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200
Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200
Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200
Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epo

Epoch 176/200
Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200
Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200
Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200
Epoch 193/200
Epoch 194/200
Epoch 195/200
Epoch 196/200
Epoch 197/200
Epoch 198/200
Epoch 199/200
Epoch 200/200


<keras.callbacks.History at 0x11608c6d0>

In [12]:
model.save('trained_lstm_model.h5')

## Generating music!
After training, we can use the model to start generating music! In this case since I am using the training data first 10 notes as seed, it doesn't really create a new piece of music.. Since this is just an experiment I don't bother to pass in 10 different notes.

We first transform the seed, or the new input for the model to be same shape as required

In [13]:
seed = np.asarray(seed)
seed = np.expand_dims(seed, axis=0)
print seed.shape
predictions = []
X = seed

(1, 10, 129)


Then we will use the model to generate the 11th note, slide the window to the 2nd - 11th note to generate the 12th note, and slide the window again to 3rd - 12th note to generate 13th note, so on and so forth...

In [14]:
for i in range(150):  # generate 150 notes
    preds = model.predict(X)
    index = np.argmax(preds)
    predictions.append(index)
    preds = np.zeros(OUTPUT_SIZE)
    preds[index] = 1
    preds = np.asarray(preds).reshape(1, -1)
    X = np.squeeze(X)
    X = np.concatenate((X[1:], preds))
    X = np.expand_dims(X, axis=0)

## Saving the music
Most straight forward portion, saving the notes to an output file. Notice we hardcoded the note duration to 0.75 seconds, something to be consider in the future to include duration in training and generating

In [15]:
NOTE_DURATION = 0.75
s = stream.Stream()
s.append(instrument.Piano())  # make the note a piano note
for pred in predictions:
    s.append(note.Note(
        ps=pred, duration=duration.Duration(NOTE_DURATION)
    ) if pred != 128 else note.Rest(duration=duration.Duration(NOTE_DURATION)))


In [16]:
mf = midi.translate.streamToMidiFile(s)
mf.open('qinghuaci_output.mid', 'wb')
mf.write()
mf.close()


Code below are not used, just some snippet which initially I used for further testing

In [None]:
from keras.models import load_model

model = load_model('trained_lstm_model.h5')

In [None]:
def export_to_midi_file(notes, output_name):  # another function copied from internet to output notes to file, not used
    mt = midi.MidiTrack(1)
    t = 0 
    tLast = 0
    duration = 1024
    for pitch in notes:
        dt = midi.DeltaTime(mt)
        dt.time = t - tLast
        mt.events.append(dt)

        me = midi.MidiEvent(mt, type="NOTE_ON", channel=1)
        me.pitch = pitch
        me.velocity = 127
        mt.events.append(me)

        dt = midi.DeltaTime(mt)
        dt.time = duration
        mt.events.append(dt)

        me = midi.MidiEvent(mt, type="NOTE_ON", channel=1)
        me.pitch = pitch
        me.velocity = 0
        mt.events.append(me)

        tLast = t + duration
        t += duration

    dt = midi.DeltaTime(mt)
    dt.time = 0
    mt.events.append(dt)

    me = midi.MidiEvent(mt, type="END_OF_TRACK", channel=1)
    me.data = ''
    mt.events.append(me)

    mf = midi.MidiFile()
    mf.ticksPerQuarterNote = 1024
    mf.tracks.append(mt)

    mf.open(output_name, 'wb')
    mf.write()
    mf.close()

export_to_midi_file(predictions, 'qinghuaci_output.mid')