# Environment-based Music Generation for Video Games
### A Computational Creativity Project by Tyler H. McIntosh

The sections below serve as a guide to using our system. We have provided our trained models so that you don't have to spend 30+ hours training them yourself. Even so, we have included the training functions just in case you are curious - though, we have commented them out. If you are unsure, avoid any sections containing the <font color='red'>[Training only]</font> tag. All enquiries should be directed to Hello@TylerHMc.com.

In [None]:
#@title Imports

from IPython.display import clear_output
!pip install pypianoroll
clear_output()
from keras.utils.data_utils import get_file
from IPython.display import HTML, display
from tensorflow.keras import layers
from keras.models import Sequential
from keras.backend import sigmoid
from keras.utils import np_utils
from tensorflow import keras
import tensorflow as tf
import matplotlib.pyplot as plt
import pypianoroll as p
from PIL import Image
from music21 import *
import numpy as np
import shutil
import glob
import os

!mkdir output
try:
  shutil.rmtree('/content/sample_data')
except:
  pass

<hr>
<br />

## Seed Generation

First, we must load our two datasets: Img2Mid and ZeldaMIDI. These datasets were created specifically for this project; Img2Mid being a set of Zelda screenshots and MIDI pairs, and ZeldaMIDI being a much larger collection of Zelda music. The datasets are stored on a personal server, and are downloaded/unzipped by the function bellow.

In [None]:
#@title Load the Datasets

!wget https://www.tylerhmc.com/datasets/Img2Mid.zip
!wget https://www.tylerhmc.com/datasets/ZeldaMIDI.zip
!wget https://www.tylerhmc.com/datasets/sample_image.jpg
!unzip "Img2Mid.zip" -d "/content/Img2Mid"
!unzip "ZeldaMIDI.zip" -d "/content/ZeldaMIDI"
!rm Img2Mid.zip
!rm ZeldaMIDI.zip
clear_output()

Next, we define the variational autoencoder (VAE) that serves as the seed generator. We use convolution layers to allow the VAE to interpret multidimensional features. This network takes the screenshots from the Img2Mid dataset as inputs, and the corresponding MIDI tracks as outputs. While training, it learns to associate the pictures of the game with the music found in that area. The trained network is able to produce a short MIDI seed from any image based on the similarity of its features to the training data.

In [None]:
#@title Define the VAE

# Based on Keras example implementation
# https://keras.io/examples/generative/vae/

class Sampling(layers.Layer):
    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = keras.metrics.Mean(
            name="reconstruction_loss"
        )
        self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")

    @property
    def metrics(self):
        return [
            self.total_loss_tracker,
            self.reconstruction_loss_tracker,
            self.kl_loss_tracker,
        ]

    def train_step(self, data):
        with tf.GradientTape() as tape:
            x, y = data
            z_mean, z_log_var, z = self.encoder(x)
            reconstruction = self.decoder(z)
            reconstruction_loss = tf.reduce_mean(
                tf.reduce_sum(
                    keras.losses.binary_crossentropy(y, reconstruction), axis=(1, 2)
                )
            )
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss
        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

# Modified input shape and structure
encoder_inputs = keras.Input(shape=(306, 544, 3))
x = layers.Conv2D(16, 3, activation="relu", strides=2, padding="same")(encoder_inputs)
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(10, name="z_mean")(x)
z_log_var = layers.Dense(10, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")

latent_inputs = keras.Input(shape=(10,))
x = layers.Dense(16*65*64, activation="relu")(latent_inputs)
x = layers.Reshape((16, 65, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(16, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")

While this network does not take terribly long to train, it learns stochastically - meaning its output will not always be exactly the same every time it is trained. Therefore, we do not recommend trying to train it yourself, as it can be something of a rabbit hole. Future work will look into making this process deterministic - for example, by using a deterministic regularized autoencoder. For now, use the "Load the trained VAE" function.

In [None]:
#@title Train the VAE <font color='red'>[Training only]</font>

# def findStart(array):               # Function to shift piano roll to first note
#   for i in range(array.shape[1]):   # For every (x,y) in piano roll
#     for j in range(array.shape[0]): #
#       if array[j,i] == True:        # If there is a note
#         return array[:,i:]          # Slice piano roll to position of first note

# def loadData(midiPath, imagePath):  # Loads the Img2Mid dataset
#   midi = list()                     # Initialise midi list
#   image = list()                         # Initialise image list
#   for filename in os.listdir(midiPath):  # For every midi in the midi path
#     imageName = filename[:-4]+'.png'     # Set image path
#     if '.mid' in filename:                    # If is a midi file
#       if imageName in os.listdir(imagePath):  # If there is an image for it
#         s = p.read(midiPath+filename)         # Load the midi file
#         try:                                  # Try to merge left and right hand
#           s = p.binarize(s).stack()[0] + p.binarize(s).stack()[1] # By combinging them
#         except:                                                   # If only 1 channel
#           s = p.binarize(s).stack()[0]                      # Just use that one channel
#         s = np.flip(np.swapaxes(s,0,1),0)             # Flip time and notes
#         s = findStart(s)[:,:520]              # Slice to the start
#         s = np.expand_dims(s, axis=-1)*1      # Add feature dimension
#         midi.append(s)                        # Add to midi list
#         print(filename)                       # Log progress
#         i = Image.open(imagePath+imageName)   # Load the image
#         i.thumbnail((544,306))                # Downsample
#         i = i.resize((544,306))               # Cut to size
#         i = np.array(i)                       # Convert to np array
#         image.append(i)                       # Add to images
#   return np.array(image), np.array(midi)      # Return dataset

# x_train, y_train = loadData(midiPath='/content/Img2Mid/', imagePath='/content/Img2Mid/')
# x_train, y_train = x_train/255, y_train  # Normalise

# vae = VAE(encoder, decoder)                    # Initialise VAE
# vae.compile(optimizer=keras.optimizers.Adam()) # Compile
# vae.fit(x_train, y_train, epochs=2500)         # Train for 2.5k epochs

In [None]:
#@title Load the trained VAE

vae = VAE(encoder, decoder)                     # Initialize
vae.compile(optimizer=keras.optimizers.Adam())  # Compile
vae.built = True                                # Build
vae.load_weights(get_file('VAE', 'https://www.tylerhmc.com/models/seedGenerator.hdf5')) # Load from server

Next, a helper function that allows us to grab an image and generate a seed using the trained VAE. In essence, this function loads the image, transforms it into a standardised thumbnail to fit the dimensions of the model, passes it through the VAE, applies a threshold to the output, and then saves the output into a MIDI file called "seed.mid".

In [None]:
#@title Seed generator helper function

def generateSeed(path):                # Generate VAE prediction
  try:                                 # Try to open file
    i = Image.open(path).convert('RGB') # Open input image
  except:                              # If not found
    print('File not found')            # Raise error
    return                             # Exit
  i.thumbnail((544,306))               # Downsample
  i = i.resize((544,306))              # Cut to size
  i = np.expand_dims(i,axis=0)         # Add batch dim
  i = i / 255                          # Normalize
  i = vae.encoder.predict(i)           # Encode to latent
  i = vae.decoder.predict(i[0])        # Generate prediction
  i = np.squeeze(i[0], axis=-1)        # Remove feature dim
  i = np.where(i<0.8, 0, i)            # Threshold
  i = np.where(i>0.8, 1, i)            # Binarize
  if np.any(i) == False:                # If seed is empty
    print('Sorry, please try another image') # Raise error
    return                             # Exit
  plt.imshow(i, interpolation='none')  # Generate piano roll
  plt.show()                           # Display piano roll
  i = np.flip(i,0).astype(int)         # Reverse note order
  i = np.swapaxes(i,0,1)               # Swap time and notes
  i = [p.BinaryTrack(name='Staff-1', program=1, is_drum=False, pianoroll=i)] # Create midi track
  i = p.Multitrack(tracks=i)  # Create multitrack object
  return i                    # Return multitrack

We're now ready to generate a MIDI seed. Find an image that you would like to turn into music and upload it to your Colab session. The closer its aspect ratio is to 16:9 the better, but this is not a requirement for it to work. Type the name of your image into the field below, and make sure to include the filetype (e.g., "cat.png"). Alternatively, you can use the sample image we have provided. Then, click the run button. The system will pass your image through the VAE and write a MIDI seed file to the output folder. If it doesn't pop up in your files immediately, give it a few seconds - Colab can be kinda slow.

In [None]:
#@title Generate a Seed <font color='limegreen'>+</font>
Filename = 'sample_image.jpg' #@param {type:"string"}
try:
  p.write("output/seed.mid", generateSeed(Filename)) # Generate seed
except:
  pass

<hr>
<br />

## Motif Generation

Now that we have a seed file to generate the music with, we can move onto to the next section of the system; the motif generator. This section uses a Long Short-Term Memory (LSTM) network to generate next-note predictions in a loop. If we give it the seed as the input, it will generate a composition based on it. First, we need to define the LSTM network and a few helper functions. This may take a minute or two, as the system needs to build a dictionary of notes from the dataset.

In [None]:
#@title Define the LSTM
# LSTM network and helper functions

def progress(value, max=100):         # Progress bar function
    return HTML("""
        <progress
            value='{value}'
            max='{max}',
            style='width: 100%'
        >
            {value}
        </progress>
    """.format(value=value, max=max)) # Some basic html

def midiGenerator(network_input, n_vocab):  # LSTM network
    model = Sequential([ # Standard LSTM, lots of dropout
      layers.LSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), recurrent_dropout=0.2, return_sequences=True),
      layers.LSTM(512, return_sequences=True, recurrent_dropout=0.2),
      layers.LSTM(512),
      layers.BatchNormalization(),
      layers.Dropout(0.2),
      layers.Dense(n_vocab, activation='relu'),
      layers.BatchNormalization(),
      layers.Dropout(0.2),
      layers.Dense(n_vocab, activation='softmax'),
    ])
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
    return model

def get_notes(path, define=False):      # Retrieve note list
    notes = list()                      # Initialize notes
    if define:                          # If define
      out = display(progress(0, 162), display_id=True) # Initialise progress bar
      x=0                                # Initialise progress counter
    for file in glob.glob(path):         # For MIDI in path
        midi = converter.parse(file)     # Load MIDI
        notes_to_parse = midi.flat.notes   # Convert to list of note objects
        for element in notes_to_parse:           # For note in list
            if isinstance(element, note.Note):     # If is single note
                notes.append(str(element.pitch))     # Add pitch to notes
            elif isinstance(element, chord.Chord):   # If chord
                notes.append('.'.join(str(n) for n in element.normalOrder)) # Encode and add chord code to list
        if define:                       # If define
          x+=1                           # Iterate progress
          out.update(progress(x, 162))   # Update progress
    return notes  # Return list of notes

def prepare_train_sequences(notes, n_vocab):  # Create time-stepped training sets
    sequence_length = 50                              # 50 note long chunks
    pitchnames = sorted(set(item for item in notes))  # Grab list of unique pitches
    # Generate note to index dictionary map
    note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

    network_input = []   # Initialise train input
    network_output = []  # Initialise train output

    for i in range(0, len(notes) - sequence_length, 1):  # For index in note window
        sequence_in = notes[i:i + sequence_length]       # Grab notes in sliding window
        sequence_out = notes[i + sequence_length]        # Grab note after that
        network_input.append([note_to_int[char] for char in sequence_in]) # Convert to index values and add to train input
        network_output.append(note_to_int[sequence_out])                  # Convert to index value and add to train output

    n_patterns = len(network_input)  # Grab number of input samples

    network_input = np.reshape(network_input, (n_patterns, sequence_length, 1)) # Reshape input to fit network
    network_input = network_input / float(n_vocab)                              # Normalize

    network_output = np_utils.to_categorical(network_output)  # Convert to class vector

    return network_input, network_output, pitchnames, note_to_int  # Return input, output, unique pitches, and mapping dictionary

notes = get_notes('ZeldaMIDI/*.mid', define=True)  # Grabs notes in train set
n_vocab = len(set(notes))                          # Calc number of unqiue values
network_input, network_output, pitchnames, direct = prepare_train_sequences(notes, n_vocab)  # Generate training data and dict map
midiGen = midiGenerator(network_input, n_vocab)                         # Initialise LSTM
midiGen.compile(loss='categorical_crossentropy', optimizer='RMSprop')   # Compile
clear_output()                                                          # Clear output

Unlike the VAE, the LSTM network takes an **extremely** long time to train. In our case, we spent roughly 30 hours training it on a Tesla P100 GPU, and it still is not perfect. However, the training time could be greatly reduced using a TPU. So, if you're feeling up to it, you can try to improve the model yourself... Or you can save yourself the hastle and use the trained model we supply below.

In [None]:
#@title Train the LSTM <font color='red'>[Training only]</font>
# midiGen.fit(network_input, network_output, epochs=200)  # Train network

In [None]:
#@title Load the pre-trained LSTM

# Load weights from server
midiGen.load_weights(get_file('LSTM', 'https://www.tylerhmc.com/models/motifGenerator.hdf5'))

Now for some motif generation helper functions. These functions allow us to load the seed file, turn it into a list of notes, pass it through the LSTM, and turn the resulting list of generated notes into a MIDI file called "motif.mid".

In [None]:
#@title Motif Generator helper functions

def generate(n_vocab, pitchnames, notes, direct):  # Generate motif function
    network_input, normalized_input = prepare_sequences(notes, pitchnames, n_vocab, direct) # Process seed
    prediction_output = generate_notes(midiGen, network_input, pitchnames, n_vocab, direct) # Generate prediction
    output = create_midi(prediction_output)  # Create MIDI
    return output                            # Return MIDI

def prepare_sequences(notes, pitchnames, n_vocab, direct): # Similar to training version

    note_to_int = direct  # Load the previously created mapping dictionary
 
    sequence_length = 50   # Set sequence length
    network_input = list() # Initialise input

    for note in notes:               # For note in the seed
      if note not in direct.keys():  # If note not in dictionary
        notes.remove(note)           # Remove it
        # This occurs when two notes overlap and generate
        # a chord that hasn't been seen previously

    while len(notes) < sequence_length: # If seed is short than min length
      notes += notes                    # Double it

    sequence_in = notes[0:sequence_length]  # Slice first 50 notes
    network_input.append([note_to_int[char] for char in sequence_in]) # Convert to index

    normalized_input = np.reshape(network_input, (1, sequence_length, 1)) # Reshape to fit network
    normalized_input = normalized_input / float(n_vocab) # Normalize


    return (network_input, normalized_input)  # Return input, normalised input

def generate_notes(model, network_input, pitchnames, n_vocab, direct):  # Prediction loop

    int_to_note = dict((v,k) for k,v in direct.items())  # Swap keys with values in mapping dictionary

    pattern = network_input[0]  # Grab length
    prediction_output = list()  # Initialize predictions

    # generate 100 notes
    for note_index in range(100):  # For 100 cycles
        prediction_input = np.reshape(pattern, (1, len(pattern), 1)) # Reshape to fit network
        prediction_input = prediction_input / float(n_vocab)         # Normalise

        prediction = model.predict(prediction_input, verbose=0)  # Predict next note

        index = np.argmax(prediction)  # Return index of highest output
        result = int_to_note[index]       # Convert to note names
        prediction_output.append(result)  # Add pred name to pred list

        pattern.append(index)             # Add pred index to input
        pattern = pattern[1:len(pattern)] # Remove first entry

    return prediction_output  # Return prediction

def create_midi(prediction_output):  # Creates a midi object from prediction
    offset = 0                             # Initialise offset
    output_notes = []                      # Initialise output
    for pattern in prediction_output:              # For entry in preds
        if ('.' in pattern) or pattern.isdigit():  # If entry is chord
            notes_in_chord = pattern.split('.')    # Retrieve notes in chord
            notes = []                             # Reset notes
            for current_note in notes_in_chord:         # For note in chord
                new_note = note.Note(int(current_note)) # Create note object
                new_note.storedInstrument = instrument.Piano() # Set program
                notes.append(new_note)             # Add note object to chord
            new_chord = chord.Chord(notes)         # Create chord object
            new_chord.offset = offset              # Add offset
            output_notes.append(new_chord)         # Add chord to output
        else:                                      # If is a single note
            new_note = note.Note(pattern)          # Create note object
            new_note.offset = offset               # Add offset
            new_note.storedInstrument = instrument.Piano() # Set program
            output_notes.append(new_note)          # Add note to output

        offset += 0.5  # Increment offset to stop notes overlapping

    midi_stream = stream.Stream(output_notes) # Create MIDI object from output

    midi_stream.write('midi', fp='output/motif.mid') # Save as MIDI file
    return output_notes                              # Return notes object

Next, we can use the section below to load the seed file, using the helper functions discussed above, and turn it into a generated motif. This function also returns the "output" variable, which is used later to generate an expressive performance.

In [None]:
#@title Generate motif <font color='limegreen'>+</font>

seed = get_notes('output/seed.mid')  # Load seed
output = generate(n_vocab, pitchnames, seed, direct) # Generate motif

<hr>
<br>

## Expressive Performance Rendering

So, we now have a musical motif generated from an image, but why stop there? The LSTM network only produces a list of notes in the order by which they are played. This isn't terribly musical, and certainly wouldn't win any game music awards. Therefore, we need to make use of the Expressive Performance Rendering (EPR) models detailed below. These are two fairly straight forward convolutional regression networks performing sequence-to-sequence vector translation. They take a list of 100 notes as input and output both the time-offsets and velocities that the notes *should* have. These separate networks are both trained using the same ZeldaMIDI dataset used to train the LSTM, but instead of the targets being the same notes time-shifted, they are either relative offsets or velocities. First, we must define them using the function below.

In [None]:
#@title Define the EPR models

def durations():  # Durations model architecture
    model = Sequential([
      layers.Input(shape=(100, 1)),
      layers.Conv1D(filters=8, kernel_size=3, activation='relu', padding='same'),
      layers.Conv1D(filters=16, kernel_size=3, activation='relu', padding='same'),
      layers.Conv1D(filters=32, kernel_size=3, activation='relu', padding='same'),
      layers.Conv1D(filters=64, kernel_size=3, activation='relu', padding='same'),
      layers.Dense(300, activation='sigmoid'), # Sigmoid for regression
      layers.Dense(200, activation='sigmoid'),
      layers.Dense(100, activation='sigmoid')
    ])
    model.compile(loss='mean_squared_error', optimizer='RMSprop')
    return model # MSE for regression

def velocity(): # Velocity model architecture
    model = Sequential([
      layers.Input(shape=(100, 1)),
      layers.Conv1D(filters=8, kernel_size=3, activation='relu', padding='same'),
      layers.Conv1D(filters=16, kernel_size=3, activation='relu', padding='same'),
      layers.Conv1D(filters=32, kernel_size=3, activation='relu', padding='same'),
      layers.Conv1D(filters=64, kernel_size=3, activation='relu', padding='same'),
      layers.Dense(300, activation='sigmoid'),
      layers.Dense(200, activation='sigmoid'),
      layers.Dense(100, activation='sigmoid')
    ])
    model.compile(loss='mean_squared_error', optimizer='RMSprop')
    return model

durro = durations() # Initialise durration model
velo = velocity()   # Initialise velocity model

Once again, you have the option of training these models yourself using the training function below. However, this is not recommended, as they can take quite a while to train depending on your hardware, and there is nothing to gain by doing so. Instead, you can use the trained models we have supplied by running the "Load trained EPR models" function below.

In [None]:
#@title Create EPR dataset and train <font color='red'>[Training only]</font>

# Creates both the duration and velocity datasets at the same time

# train_x = list()         # Initalize input data
# durro_train_y = list()   # Initalize durration output
# velo_train_y = list()    # Initalize velocity output
# for file in glob.glob('ZeldaMIDI/*.mid'): # For MIDI in database
#   midi = converter.parse(file)            # Load MIDI
#   print("Parsing %s" % file)              # Log progress
#   notes_to_parse = midi.flat.notes        # Convert to list of note objects
#   step = 0                                # Reset step
#   while step+100 <= len(notes_to_parse):  # While in range
#     notes = list()                        # Reset notes
#     offsets = list()                      # Reset offsets
#     velocity = list()                     # Reset velocity
#     rel_offsets = list()                  # Reset relative offsets
#     for element in notes_to_parse[step:step+100]:  # For entry in notes
#       if isinstance(element, note.Note):           # If a single note
#         notes.append(str(element.pitch.midi))      # Add to notes
#       elif isinstance(element, chord.Chord):       # If a chord
#         notes.append(''.join(str(n) for n in element.normalOrder)) # Encode notes
#       offsets.append(float(element.offset))         # Add offset to offsets
#       velocity.append(str(element.volume.velocity)) # Add velocity to velocities
#     train_x.append(notes)                           # Add notes to train set
#     for i in range(len(offsets)-1):                                # For every entry in offsets
#       rel_offsets.append(abs(round(offsets[i+1] - offsets[i],3)))  # Calulate distance between offsets
#     rel_offsets.append(rel_offsets[-1])                            # Duplicate last (has no effect, prevents error)
#     durro_train_y.append(rel_offsets)               # Add relative offsets to duration train set
#     velo_train_y.append(velocity)                   # Add velocties to velocities train set
#     step+=1                                         # Increment step

# train_x = np.expand_dims(np.array(train_x), -1)             # Add feature dim
# durro_train_y = np.expand_dims(np.array(durro_train_y), -1) # Add feature dim
# velo_train_y = np.expand_dims(np.array(velo_train_y), -1)   # Add feature dim

# train_y = np.clip(durro_train_y,0.,2.)/2  # Limit durations to 2 sec and normalize

# train_x = train_x.astype('float32')             # Convert to float
# durro_train_y = durro_train_y.astype('float32') # Convert to float
# velo_train_y = velo_train_y.astype('float32')   # Convert to float


# durro.fit(train_x, durro_train_y, epochs=2000)  # Train durration model
# velo.fit(train_x, velo_train_y, epochs=2000)    # Train velocities model

In [None]:
#@title Load trained EPR models

# Load weights from server
velo.load_weights(get_file('VELO', 'https://www.tylerhmc.com/models/velocity.hdf5'))
durro.load_weights(get_file('DURO', 'https://www.tylerhmc.com/models/durations.hdf5'))

Finally, we can pass our motif through the EPR models to render the final expressive performance. This produces a file called "performance.mid" containing the final rendering of the composition. As it is supplied in MIDI format, you can apply any instrument voice to it you like, and can modify it as much as you want. We suggest that you use it as inspiration for a new original composition, carrying on the cycle of creativity, or you can simply return to the start and try again with a new image.

*Note: You only need to re-run the cells with "<font color='limegreen'>+</font>" in the title.*

In [None]:
#@title Generate expressive performance <font color='limegreen'>+</font>

output_notes = list()     # Initialise output
for element in output:    # For note in the motif output
  if isinstance(element, note.Note):  # If is a single note
    output_notes.append(float(element.pitch.midi))  # Add the MIDI pitch value to output
  elif isinstance(element, chord.Chord):            # If is a chord
    output_notes.append(float(''.join(str(n) for n in element.normalOrder))) # Add encoded value to output

output_notes = np.expand_dims(np.expand_dims(np.array(output_notes), -1), 0) # Add feature and batch dims
offsets = np.round(durro.predict(output_notes), 2)[0].tolist()  # Generate durations
velocity = velo.predict(output_notes)[0].tolist()               # Generate velocities

offset = 0                                           # Reset offset
for i in range(len(output)):                      # For every note in output
  offset += (0.25 * round((offsets[i][0]*2)/0.25))   # Increment offset by pred rounded to quater note
  output[i].offset = offset                          # Add quantized offset to note
  output[i].volume.velocity = velocity[i][0]*127     # Add velocity to note

midi_stream = stream.Stream()                           # Initialize MIDI object
midi_stream.append(tempo.MetronomeMark(number=100))     # Set standard tempo
midi_stream.append(stream.Stream(output))               # Add output to MIDI steam
midi_stream.write('midi', fp='output/performance.mid')  # Write MIDI to file

<hr>