### Übung2a)

Ich versuche in dieser Übung eine Mozart Melodie zu erzeugen.

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import tensorflow as tf
import joblib
import matplotlib.pyplot as plt
from tensorflow.keras import layers, callbacks
from tensorflow.keras.models import Model
from pathlib import Path
from tqdm import tqdm
from music21 import midi
from block_3_utils import *

tf.random.set_seed(42)

In [3]:
files = list(Path("/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/data/mozart_midis").glob("./*.mid"))
print("Number of midi files from Mozart:", len(files))

Number of midi files from Mozart: 9


Ich habe mir Online von https://www.midiworld.com/mozart.html 9 Midi Files mit Melodien von Mozart runtergeladen. 

In [4]:
s = midi.translate.midiFilePathToStream(files[0])
s.show('midi')

In [5]:
def stream2notes(stream):
    return [str(x.pitch.midi) if x.isNote else "0" for x in list(stream.flat.notesAndRests)]

In [6]:
if Path('/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/mozart.joblib').exists():
    midis = joblib.load('/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/mozart.joblib')
else: 
    midis = []
    for midi_file in tqdm(files):
        s = midi.translate.midiFilePathToStream(midi_file)  
        n = stream2notes(s)
        midis.append(n)
    joblib.dump(midis, '/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/mozart.joblib')

100%|██████████| 9/9 [01:03<00:00,  7.02s/it]


In [7]:
print("Number of midi files:", len(midis))
print("Number of notes in each midi file:", [len(mid) for mid in midis])
print("Total number of notes in dataset:", sum([len(mid) for mid in midis]))

Number of midi files: 9
Number of notes in each midi file: [2472, 1826, 7756, 4601, 4824, 9428, 40560, 1644, 4479]
Total number of notes in dataset: 77590


Aus den Noten der Files wird ein Vocabulary erstellt mit allen 67 verschiedenen Noten.

In [8]:
if Path('/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/mozart_vocab.joblib').exists():
    vocab = joblib.load('/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/mozart_vocab.joblib')
else:
    vocab = list(set([y for x in midis for y in x]))
    joblib.dump(vocab, '/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/mozart_vocab.joblib')

itos = vocab
stoi = {u:i for i, u in enumerate(vocab)}
N_VOCAB = len(itos)
print("Number of notes in vocab:", N_VOCAB)

Number of notes in vocab: 67


In [9]:
midis[0][:10]

['0', '0', '0', '0', '0', '0', '0', '0', '0', '0']

Das Vocabulary der 67 Noten:

In [17]:
stoi

{'66': 0,
 '55': 1,
 '49': 2,
 '61': 3,
 '87': 4,
 '42': 5,
 '56': 6,
 '27': 7,
 '86': 8,
 '73': 9,
 '84': 10,
 '72': 11,
 '0': 12,
 '51': 13,
 '45': 14,
 '70': 15,
 '52': 16,
 '30': 17,
 '77': 18,
 '93': 19,
 '92': 20,
 '48': 21,
 '74': 22,
 '46': 23,
 '31': 24,
 '85': 25,
 '62': 26,
 '33': 27,
 '54': 28,
 '83': 29,
 '68': 30,
 '71': 31,
 '64': 32,
 '75': 33,
 '44': 34,
 '81': 35,
 '59': 36,
 '88': 37,
 '91': 38,
 '29': 39,
 '36': 40,
 '32': 41,
 '78': 42,
 '69': 43,
 '50': 44,
 '82': 45,
 '35': 46,
 '58': 47,
 '40': 48,
 '79': 49,
 '39': 50,
 '38': 51,
 '65': 52,
 '60': 53,
 '37': 54,
 '34': 55,
 '53': 56,
 '57': 57,
 '67': 58,
 '80': 59,
 '41': 60,
 '47': 61,
 '63': 62,
 '43': 63,
 '76': 64,
 '89': 65,
 '90': 66}

Nun wird das Vocabulary mit den Noten noch encoded.

In [10]:
midis_enc = [[stoi[y] for y in x] for x in midis]
midis_enc[0][:10]

[12, 12, 12, 12, 12, 12, 12, 12, 12, 12]

Nun wird ein Trainingsdatensatz mit einer Batchsize von 128 und einer Länge von 64 Noten erstellt. Durch das Encoden des Vocabularies haben wir nun Integerwerte. 

In [26]:
SEQ_LEN    = 64
SHIFT      = 4
BATCH_SIZE = 128
SHUFFLE_BUFFER_SIZE = 100000

train_ds = batch_ds(midis_enc, seq_len=SEQ_LEN, shift=SHIFT,
                    shuffle_buffer_size=SHUFFLE_BUFFER_SIZE,
                    batch_size=BATCH_SIZE, prefetch=1000, only_last=False)

In [27]:
next(iter(train_ds))

(<tf.Tensor: shape=(128, 64), dtype=int32, numpy=
 array([[62, 53,  1, ..., 11, 58, 62],
        [12, 12, 12, ..., 12, 49, 49],
        [10, 33, 43, ..., 58, 45, 11],
        ...,
        [45, 12, 45, ..., 45, 45, 45],
        [12, 12, 12, ..., 42, 22, 58],
        [ 0,  1, 10, ..., 36, 49, 26]], dtype=int32)>,
 <tf.Tensor: shape=(128, 64), dtype=int32, numpy=
 array([[53,  1, 58, ..., 58, 62, 11],
        [12, 12, 12, ..., 49, 49, 49],
        [33, 43, 58, ..., 45, 11, 58],
        ...,
        [12, 45, 45, ..., 45, 45, 45],
        [12, 12, 12, ..., 22, 58, 42],
        [ 1, 10, 43, ..., 49, 26, 36]], dtype=int32)>)

Nun erstelle ich das Neuronale Netz mit einem Embedding Layer zu Beginn. Danach folgen 2 LSTM Layer mit return-Sequences = TRUE. Darauf noch ein 30 prozentiges Dropout und am Schluss noch 2 Denselayer. Der Output Layer hat die Aktivierungsfunktion Softmax und gibt 67 Dimensionen aus. 

In [45]:
N_EMBEDDING_DIMS = 30

model = tf.keras.models.Sequential([
    tf.keras.layers.Embedding(input_dim=N_VOCAB, output_dim=N_EMBEDDING_DIMS, input_shape=[None]),
    tf.keras.layers.LSTM(128, return_sequences=True),
    tf.keras.layers.LSTM(128, return_sequences=True),
    #tf.keras.layers.LSTM(128, return_sequences=True),
    #tf.keras.layers.Dense(200, activation="relu"),
    #tf.keras.layers.Activation(200, activation= "sigmoid"),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(200, activation="relu"),
    tf.keras.layers.Dense(N_VOCAB, activation="softmax")
])

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 30)          2010      
_________________________________________________________________
lstm_5 (LSTM)                (None, None, 128)         81408     
_________________________________________________________________
lstm_6 (LSTM)                (None, None, 128)         131584    
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 128)         0         
_________________________________________________________________
dense_3 (Dense)              (None, None, 200)         25800     
_________________________________________________________________
dense_4 (Dense)              (None, None, 67)          13467     
Total params: 254,269
Trainable params: 254,269
Non-trainable params: 0
________________________________________________

Das Neuronale Netz hat zuerst einen Embedding Layer mit 30 Outputdimensionen. Danach folgen 2 LSTM Layer mit einer Sequenlänge von 128. Danach kommt ein Dropout von 0,3. Im Anschluss noch 2 Denselayer mit einmal eine ReLu Aktivierung und einmal einer Softmax Aktivierungsfunktion.

In [46]:
lf  = tf.keras.losses.SparseCategoricalCrossentropy()
opt = tf.keras.optimizers.Adam()
#met1 = tf.keras.metrics.SparseCategoricalAccuracy(name="top1_accuracy")
met2 = tf.keras.metrics.SparseTopKCategoricalAccuracy(k=5, name='top5_accuracy')

model.compile(loss=lf, optimizer=opt, metrics=[met1])

In [48]:
if Path('/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/models/').exists():
    [p.unlink() for p in Path('/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/models').glob("./*.hdf5")]

Ich fitte das Modell 20 Epochen lang und verwende dazu callbacks mit Early Stopping, ModelCheckpoints und einer Learning Rate Reduction.

In [49]:
est = tf.keras.callbacks.EarlyStopping(monitor='loss', min_delta=0.001, patience=5, verbose=1, mode='min', baseline=None, restore_best_weights=True)
rlr = tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.5, patience=3, min_lr=1e-6, mode='min', verbose=1)
mcp = tf.keras.callbacks.ModelCheckpoint(filepath="/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/models/mozart_model.{epoch:03d}-{top1_accuracy:.4f}.hdf5", monitor="loss")
cbs = [est, rlr, mcp]   

model.fit(train_ds, epochs=20, callbacks = cbs, initial_epoch=0)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x147d7ea90>

In [50]:
#model.load_weights('/Users/retoheller/ml4ds_2020_g1/ml4ds_2020_g1/m')
Path('./mozart').mkdir(parents=True, exist_ok=True)
model.save("./mozart/mozart_model_final.h5")
model = tf.keras.models.load_model('./mozart/mozart_model_final.h5')

In [51]:
seed_stream = midi.translate.midiFilePathToStream(files[3])
seed_notes  = stream2notes(seed_stream)
seed_notes_enc = [stoi[x] for x in seed_notes][:50]

In [53]:
midi_out_enc = generate_music(model, seed_notes_enc, 50, temperature=2.)
midi_out     = np.array([itos[x] for x in midi_out_enc.tolist()])
print("Length of output midi:", len(midi_out))

100%|██████████| 50/50 [00:03<00:00, 12.96it/s]

Length of output midi: 100





In [55]:
for i in range(100):
    if i==50: print("="*50, "> Predictions from here", )
    print(f"Timestep {i:02d}: Original / Predicted note: {int(np.array(seed_notes)[i]):02d} / {int(midi_out[i]):02d}")

Timestep 00: Original / Predicted note: 00 / 00
Timestep 01: Original / Predicted note: 00 / 00
Timestep 02: Original / Predicted note: 00 / 00
Timestep 03: Original / Predicted note: 00 / 00
Timestep 04: Original / Predicted note: 00 / 00
Timestep 05: Original / Predicted note: 00 / 00
Timestep 06: Original / Predicted note: 00 / 00
Timestep 07: Original / Predicted note: 00 / 00
Timestep 08: Original / Predicted note: 00 / 00
Timestep 09: Original / Predicted note: 00 / 00
Timestep 10: Original / Predicted note: 00 / 00
Timestep 11: Original / Predicted note: 00 / 00
Timestep 12: Original / Predicted note: 00 / 00
Timestep 13: Original / Predicted note: 00 / 00
Timestep 14: Original / Predicted note: 00 / 00
Timestep 15: Original / Predicted note: 00 / 00
Timestep 16: Original / Predicted note: 00 / 00
Timestep 17: Original / Predicted note: 00 / 00
Timestep 18: Original / Predicted note: 00 / 00
Timestep 19: Original / Predicted note: 00 / 00
Timestep 20: Original / Predicted note: 

In [56]:
s_out = enc2stream(midi_out)

#### Meine Modell Melodie:

In [57]:
s_out.show('midi')

In [41]:
s_in = enc2stream(seed_notes[:100])

#### Das Original:

In [42]:
s_in.show("midi")

### Fazit:
Es ist natürlich etwas schwierig für das Modell, da die ersten 19 Noten Pausen sind. Dadurch kann man doch noch recht deutlich einen Unterschied zwischen dem Original und der vom LSTM erstellten Melodie erkennen.