<a href="https://colab.research.google.com/github/asigalov61/Noobiano/blob/master/Noobiano.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Noobiano 2 (Ver. 4.0): Melody RNN

***

## A Beginner Introduction to Music Generation with Artificial Intelligence (Neural Networks)

***

## Huge thanks and all credit for this beautiful colab go out to Charles Martin https://github.com/cpmpercussion/creative-prediction

***

- Music is a complex phenomenon with many representations (e.g., digital audio, musical scores, lead sheets)
- A simple representation of music is as a sequence of notes and rests:
    - (equivalent to one line of melody)
- This can be one-hot encoded and applied to a CharRNN!

***

### A simple music representation

- Our musical representation is going be a sequence of integers between 0 and 129.
- Each integer represents a musical instruction lasting for one sixteenth note (one semiquaver) of duration.
    - This is a typical level of detail for electronic music sequencers.
- MIDI is a standard way of encoding instructions for synthesised instruments and can represent whole musical scores.
    - Standard MIDI allows 128 pitches (there are only 88 on a piano keyboard) where number 60 is 'middle C'.

***
    
#### Melody-RNN Encoding

- 0-127 play a note at that MIDI note number. (`MELODY_NOTE_ON`)
- 128 stop whatever note was playing. (`MELODY_NOTE_OFF`)
- 129 do nothing. (`MELODY_NO_EVENT`)
    
This encoding should allow long notes (a note-on followed by one or more no-change events, then a note-off), and rests (a note-off followed by one or more no-change events).

Here's a standard melody converted into this format:

![](https://github.com/cpmpercussion/creative-prediction/blob/master/notebooks/figures/wm_score_example.png?raw=1)

### Convert between MIDI files and numpy arrays in melody format

- Music is more complex than text (e.g., more than one note might happen at once). 
- We use the Music21 library to read MIDI music filescand then convert to our Melody-RNN format.
- The functions below turn a Music21 "stream" (of notes) into a numpy array of 8-bit integers.
- All complex rhythms are simplified to sixteenth note versions.
- Chords are simplified to the highest note.



#Setup the environment and install/import all dependencies

In [None]:
#@title Install all dependencies
!pip install pyFluidSynth
!apt install fluidsynth #Pip does not work for some reason. Only apt works
!pip install midi2audio
!pip install pretty_midi
!pip install pypianoroll
!pip install mir_eval
!pip install keras_self_attention
!git clone https://github.com/asigalov61/arc-diagrams

In [None]:
#@title Define all functions, variables, and import all modules.
output_instrument = "Piano"
max_number_of_silent_sequential_notes = 4


from midi2audio import FluidSynth
from google.colab import output
from IPython.display import display, Javascript, HTML, Audio


# Imports
from music21 import converter, instrument, note, chord, stream, midi, instrument
import glob
import time
import numpy as np
import keras.utils as utils
import pandas as pd
import tensorflow as tf
import os





# Melody-RNN Format is a sequence of 8-bit integers indicating the following:
# MELODY_NOTE_ON = [0, 127] # (note on at that MIDI pitch)
MELODY_NOTE_OFF = 128 # (stop playing all previous notes)
MELODY_NO_EVENT = 129 # (no change from previous event)
# Each element in the sequence lasts for one sixteenth note.
# This can encode monophonic music only.

def streamToNoteArray(stream):
    """
    Convert a Music21 sequence to a numpy array of int8s into Melody-RNN format:
        0-127 - note on at specified pitch
        128   - note off
        129   - no event
    """
    # Part one, extract from stream
    total_length = np.int(np.round(stream.flat.highestTime / 0.25)) # in semiquavers
    stream_list = []
    for element in stream.flat:
        if isinstance(element, note.Note):
            stream_list.append([np.round(element.offset / 0.25), np.round(element.quarterLength / 0.25), element.pitch.midi])
        elif isinstance(element, chord.Chord):
            stream_list.append([np.round(element.offset / 0.25), np.round(element.quarterLength / 0.25), element.sortAscending().pitches[-1].midi])
    np_stream_list = np.array(stream_list, dtype=np.int)
    df = pd.DataFrame({'pos': np_stream_list.T[0], 'dur': np_stream_list.T[1], 'pitch': np_stream_list.T[2]})
    df = df.sort_values(['pos','pitch'], ascending=[True, False]) # sort the dataframe properly
    df = df.drop_duplicates(subset=['pos']) # drop duplicate values
    # part 2, convert into a sequence of note events
    output = np.zeros(total_length+1, dtype=np.int16) + np.int16(MELODY_NO_EVENT)  # set array full of no events by default.
    # Fill in the output list
    for i in range(total_length):
        if not df[df.pos==i].empty:
          try:
            n = df[df.pos==i].iloc[0] # pick the highest pitch at each semiquaver
            output[i] = n.pitch # set note on
            output[i+n.dur] = MELODY_NOTE_OFF
          except:
              print('Bad note. Skipping...')
    return output


def noteArrayToDataFrame(note_array):
    """
    Convert a numpy array containing a Melody-RNN sequence into a dataframe.
    """
    df = pd.DataFrame({"code": note_array})
    df['offset'] = df.index
    df['duration'] = df.index
    df = df[df.code != MELODY_NO_EVENT]
    df.duration = df.duration.diff(-1) * -1 * 0.25  # calculate durations and change to quarter note fractions
    df = df.fillna(0.25)
    return df[['code','duration']]

z = 0

def noteArrayToStream(note_array):
    """
    Convert a numpy array containing a Melody-RNN sequence into a music21 stream.
    """
    df = noteArrayToDataFrame(note_array)
    melody_stream = stream.Stream()
    if output_instrument == 'Piano': melody_stream.append(instrument.Piano())
    if output_instrument == 'Violin': melody_stream.append(instrument.Violin())
    if output_instrument == 'Flute': melody_stream.append(instrument.Flute())
    if output_instrument == 'Clarinet': melody_stream.append(instrument.Clarinet())
    for index, row in df.iterrows():
        if row.code == MELODY_NO_EVENT:
            if z < max_number_of_silent_sequential_notes: new_note = note.Rest() # bit of an oversimplification, doesn't produce long notes.
            z += 1
        elif row.code == MELODY_NOTE_OFF:
            new_note = note.Rest()
        else:
            new_note = note.Note(row.code)
        new_note.quarterLength = row.duration
        melody_stream.append(new_note)
    return melody_stream

#wm_mid = converter.parse("/content/seed.mid")
#wm_mid.show()
#wm_mel_rnn = streamToNoteArray(wm_mid)
#print(wm_mel_rnn)
#noteArrayToStream(wm_mel_rnn)


## (Optional) Download pre-processed melody dataset and pre-trained model based on JBS Collection. 

In [None]:
#@title Download pre-processed dataset and pre-trained model
!wget https://github.com/asigalov61/Noobiano/raw/master/noobiano-pre-trained-model.h5
!wget https://github.com/asigalov61/Noobiano/raw/master/melody_training_dataset.npz

## Construct a dataset of popular melodies

Open some midi files and extract the melodies as numpy note sequence arrays.

In [None]:
#@title Alex Piano Only Original 450 MIDIs 
%cd /content/Performance-RNN-PyTorch/dataset/midi
!wget 'https://github.com/asigalov61/AlexMIDIDataSet/raw/master/AlexMIDIDataSet-CC-BY-NC-SA-Piano-Only.zip'
!unzip -j 'AlexMIDIDataSet-CC-BY-NC-SA-Piano-Only.zip'

In [None]:
#@title Execute this cell to upload your MIDIs Data Set. Do not upload a lot and make sure that the files are not broken or have unusual configuration/settings.
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

In [None]:
#@title Parse the uploaded MIDI DataSet into a special Numpy Array of notes
import time
import tqdm.auto


total_time = 0

z = 0

midi_files = glob.glob("/content/*.mid") # this won't work, no files there.

training_arrays = []
for f in tqdm.auto.tqdm(midi_files):
    try:
        start = time.clock()
        s = converter.parse(f)
        #print("Parsed:", f, "it took", time.clock() - start)
        total_time += time.clock() - start
    except:
        continue
    for p in s.parts:
        start = time.clock()
        arr = streamToNoteArray(p)
        training_arrays.append(arr)
        #print("Converted:", f, "it took", time.clock() - start)
        total_time += time.clock() - start 
    z+=1
training_dataset = np.array(training_arrays)
print('Writing Melody Training Dataset to file...')
np.savez('melody_training_dataset.npz', train=training_dataset)
print('Total number of converted files: ', z)
print('Total conversion time is:', total_time, 'seconds, which is', total_time/60, 'minutes' )

# Load Training Data and Create RNN

In the following we load in the training dataset, slice the melodies into example sequences and build our Melody RNN.

In [None]:
#@title Training Hyperparameters { run: "auto" }
generated_sequence_length = 512 #@param {type:"slider", min:0, max:512, step:8}
hidden_layer_size = 256 #@param {type:"slider", min:0, max:512, step:16}
number_of_training_epochs = 200 #@param {type:"slider", min:0, max:200, step:1}
training_batch_size = 1024 #@param {type:"number"}

VOCABULARY_SIZE = 130 # known 0-127 notes + 128 note_off + 129 no_event
SEQ_LEN = generated_sequence_length
BATCH_SIZE = training_batch_size
HIDDEN_UNITS = hidden_layer_size
EPOCHS = number_of_training_epochs
SEED = 2345  # 2345 seems to be good.
np.random.seed(SEED)

with np.load('./melody_training_dataset.npz', allow_pickle=True) as data:
    train_set = data['train']

print("Training melodies:", len(train_set))

In [None]:
#@title Defining additional Conversion Functions
def slice_sequence_examples(sequence, num_steps):
    """Slice a sequence into redundant sequences of lenght num_steps."""
    xs = []
    for i in range(len(sequence) - num_steps - 1):
        example = sequence[i: i + num_steps]
        xs.append(example)
    return xs

def seq_to_singleton_format(examples):
    """
    Return the examples in seq to singleton format.
    """
    xs = []
    ys = []
    for ex in examples:
        xs.append(ex[:-1])
        ys.append(ex[-1])
    return (xs,ys)

# Prepare training data as X and Y.
# This slices the melodies into sequences of length SEQ_LEN+1.
# Then, each sequence is split into an X of length SEQ_LEN and a y of length 1.

# Slice the sequences:
slices = []
for seq in train_set:
    slices +=  slice_sequence_examples(seq, SEQ_LEN+1)

# Split the sequences into Xs and ys:
X, y = seq_to_singleton_format(slices)
# Convert into numpy arrays.
X = np.array(X)
y = np.array(y)

# Look at the size of the training corpus:
print("Total Training Corpus:")
print("X:", X.shape)
print("y:", y.shape)
print()

# Have a look at one example:
print("Looking at one example:")
print("X:", X[95])
print("y:", y[95])
# Note: Music data is sparser than text, there's lots of 129s (do nothing)
# and few examples of any particular note on.
# As a result, it's a bit harder to train a melody-rnn.

In [None]:
#@title Uploaded MIDIs Statitics
# Do some stats on the corpus.
all_notes = np.concatenate(train_set)
print("Number of notes:")
print(all_notes.shape)
all_notes_df = pd.DataFrame(all_notes)
print("Notes that do appear:")
unique, counts = np.unique(all_notes, return_counts=True)
print(unique)
print("Notes that don't appear:")
print(np.setdiff1d(np.arange(0,129),unique))

print("Plot the relative occurences of each note:")
import matplotlib.pyplot as plt
%matplotlib inline

#plt.style.use('dark_background')
plt.bar(unique, counts)
plt.yscale('log')
plt.xlabel('melody RNN value')
plt.ylabel('occurences (log scale)')

# Define the Training RNN

- The training RNN will be more complex than in the text examples.
- Using 3 layers of LSTM 256 LSTM cells each.
- Using an Embedding layer on the input (saves some effort in creating one-hot examples)
- Using sparse categorical cross entropy for loss (so that ys don't have to be one-hot)

In [None]:
#@title Import needed modules and build the model
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM, Dropout, Bidirectional, Flatten
from keras.layers.embeddings import Embedding
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
from keras.models import load_model
from keras_self_attention import SeqSelfAttention
# build the model: 3-layer LSTM network.
# Using Embedding layer and sparse_categorical_crossentropy loss function 
# to save some effort in preparing data.
print('Building model...')
model_train = Sequential()
model_train.add(Embedding(VOCABULARY_SIZE, HIDDEN_UNITS, input_length=SEQ_LEN))

# LSTM part (you can easily add or remove layers below)
#model_train.add(LSTM(HIDDEN_UNITS, return_sequences=True))
#model_train.add(Dropout(0.3))
#model_train.add(LSTM(HIDDEN_UNITS, return_sequences=True))
#model_train.add(Dropout(0.3))
model_train.add(LSTM(HIDDEN_UNITS, return_sequences=True))
model_train.add(Dropout(0.3))
model_train.add(LSTM(HIDDEN_UNITS, return_sequences=True))
model_train.add(Dropout(0.3))
model_train.add(LSTM(HIDDEN_UNITS))
model_train.add(Dropout(0.3))
model_train.add(Flatten())
# Project back to vocabulary
model_train.add(Dense(VOCABULARY_SIZE, activation='softmax'))
model_train.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_train.summary()

# Training

- I trained this model on Google's Colaboratory system (free online Python machine learning environment, including GPU).
- Good to train for lots of epochs, I tried for 100: less is possible.
- Takes around 3 minutes per epoch on an NVidia K80 GPU = 5 hours to train.

Here's the training diagram:

<img src="https://github.com/cpmpercussion/creative-prediction/blob/master/notebooks/figures/training_melody_rnn.png?raw=1" style="width: 600px;"/>

Probably could have stopped after about 50 epochs to save some time!

This trained model is included in the repo, so you can go ahead and load that, or train again with your own dataset.

In [None]:
#@title Train the model (this takes time, 50 epochs min recommended) and plot the results
save_every_number_of_steps = 100 #@param {type:"slider", min:1, max:100, step:1}
save_only_best_checkpoints = True #@param {type:"boolean"}


from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
checkpoint = ModelCheckpoint(
        'noobiano-pre-trained-model.h5',
        save_freq=save_every_number_of_steps, #Every # epochs
        monitor='loss',
        verbose=1,
        save_best_only=save_only_best_checkpoints,
        mode='min'
    )
history = model_train.fit(X, y, validation_split=0.33, batch_size=BATCH_SIZE, epochs=EPOCHS, callbacks= [checkpoint])

print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
model_train.save("noobiano-pre-trained-model.h5")

Save the resulting trained model

In [None]:
#@title SAVE
model_train.save("noobiano-pre-trained-model.h5")


Load the saved model (if you need to restore/reload the model)

In [None]:
#@title LOAD
# Load if necessary - don't need to do this.
model_train = keras.models.load_model("noobiano-pre-trained-model.h5")
model_train.summary()

# Decoding Model

Now we build a 1-in, 1-out model for encoding. This is the same model as for training, just with a input length of 1, and LSTM statefulness turned on.

- Much faster to use the network with this model!
- The weights are loaded directly from the saved `train_model` file.

In [None]:
#@title Build a decoding model (input length 1, batch size 1, stateful)
# Build a decoding model (input length 1, batch size 1, stateful)
model_dec = Sequential()
model_dec.add(Embedding(VOCABULARY_SIZE, HIDDEN_UNITS, input_length=1, batch_input_shape=(1,1)))
# LSTM part
model_dec.add(LSTM(HIDDEN_UNITS, stateful=True, return_sequences=True))
model_dec.add(Dropout(0.3))
model_dec.add(LSTM(HIDDEN_UNITS, stateful=True, return_sequences=True))
model_dec.add(Dropout(0.3))
model_dec.add(LSTM(HIDDEN_UNITS, stateful=True))
model_dec.add(Dropout(0.3))
model_dec.add(Flatten())
# project back to vocabulary
model_dec.add(Dense(VOCABULARY_SIZE, activation='softmax'))
model_dec.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model_dec.summary()
# set weights from training model
#model_dec.set_weights(model_train.get_weights())
model_dec.load_weights("noobiano-pre-trained-model.h5")

# Sampling from the Model

- We need define two functions for sampling:
    - `sample`: samples from the categorical distribution output by the model, with a diversity adjustment procedure.
    - `sample_model`: samples number of notes from the model using a one-note seed.

In [None]:
#@title Define Sampling/Generation Functions
import tqdm.auto

def sample(preds, temperature=1.0):
    """ helper function to sample an index from a probability array"""
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

## Sampling function

def sample_model(seed, model_name, length=400, temperature=1.0):
    '''Samples a musicRNN given a seed sequence.'''
    generated = []  
    generated.append(seed)
    next_index = seed
    for i in tqdm.auto.tqdm(range(length)):
        x = np.array([next_index])
        x = np.reshape(x,(1,1))
        preds = model_name.predict(x, verbose=0)[0]
        next_index = sample(preds, temperature)        
        generated.append(next_index)
    return np.array(generated)

# Let's sample some music!

- Generate 127 notes + the starting note 60 (middle C) - this corresponds to 8 bars of melody
- Turn the sequence back into a music21 stream
- Show as musial score, play it back, or save as a MIDI file!



In [None]:
#@title Generate some Music from your model :) Play with parameters below until you get what you like
primer_length = 16 #@param {type:"slider", min:1, max:128, step:1}
desired_composition_length_in_tokens = 512 #@param {type:"slider", min:0, max:1024, step:8}
creativity_temperature = 0.7 #@param {type:"slider", min:0, max:4, step:0.1}
max_number_of_silent_sequential_notes = 10 #@param {type:"slider", min:0, max:64, step:1}
output_instrument = "Clarinet" #@param ["Piano", "Violin", "Flute", "Clarinet"]
%cd /content/

model_dec.reset_states() # Start with LSTM state blank
o = sample_model(primer_length, model_dec, length=desired_composition_length_in_tokens, temperature=creativity_temperature) # generate 8 bars of melody

melody_stream = noteArrayToStream(o) # turn into a music21 stream
#melody_stream.show() # show the score.
fp = melody_stream.write('midi', fp='output_midi.mid')
from google.colab import files
files.download('/content/output_midi.mid')

#Plot and Play generated Melody :)

In [None]:
#@title Plot and Graph the Output :)
graphs_length_inches = 18 #@param {type:"slider", min:0, max:20, step:1}
notes_graph_height = 6 #@param {type:"slider", min:0, max:20, step:1}
highest_displayed_pitch = 100 #@param {type:"slider", min:1, max:128, step:1}
lowest_displayed_pitch = 20 #@param {type:"slider", min:1, max:128, step:1}
%cd /content/
rendered_wav_graph_height = 3
import librosa
import numpy as np
import pretty_midi
import pypianoroll
from pypianoroll import Multitrack, Track
import matplotlib
import matplotlib.pyplot as plt
#matplotlib.use('SVG')
# For plotting
import mir_eval.display
import librosa.display
%matplotlib inline


midi_data = pretty_midi.PrettyMIDI('/content/output_midi.mid')

def plot_piano_roll(pm, start_pitch, end_pitch, fs=100):
    # Use librosa's specshow function for displaying the piano roll
    librosa.display.specshow(pm.get_piano_roll(fs)[start_pitch:end_pitch],
                             hop_length=1, sr=fs, x_axis='time', y_axis='cqt_note',
                             fmin=pretty_midi.note_number_to_hz(start_pitch))



roll = np.zeros([int(graphs_length_inches), 128])
# Plot the output

#track = Multitrack('/content/output_midi.mid', name='track')
#plt.figure(figsize=[graphs_length_inches, notes_graph_height])
#fig, ax = track.plot()
#fig.set_size_inches(graphs_length_inches, notes_graph_height)
plt.figure(figsize=[graphs_length_inches, notes_graph_height])
ax2 = plot_piano_roll(midi_data, lowest_displayed_pitch, highest_displayed_pitch)
plt.show(block=False)

## Play a melody stream


!cp /usr/share/sounds/sf2/FluidR3_GM.sf2 /content/font.sf2


FluidSynth("/content/font.sf2").midi_to_audio('output_midi.mid','output_wav.wav')
# set the src and play
Audio("output_wav.wav")

#Congratulations !!! You did it :)))

In [None]:
#@title Reward yourself by making a nice Arc diagram from the generated output/MIDI file
%cd '/content/arc-diagrams'

from mido import MidiFile
from arc_diagram import plot_arc_diagram

midi_file = '/content/output_midi.mid'
plot_title = "Noobiano Output Arc Diagram"

# midi_file = 'midis/fuer_elise.mid'
# plot_title = "Für Elise (Beethoven)"


def stringify_notes(midi_file, track_number ):

    mid = MidiFile(midi_file)
    track_notes = {}
    for i, track in enumerate(mid.tracks):
        track_notes[i] = ''
        for msg in track:
            if( msg.type == 'note_on'):
                track_notes[i] += str(msg.note) +'n'
            if( msg.type == 'note_off'):
                track_notes[i] += str(msg.note) +'f'
    return track_notes[track_number] 

plot_arc_diagram(stringify_notes(midi_file, 0), plot_title)

from google.colab import files
files.download('/content/arc-diagrams/output.png')

You can upload or download everything to your Google Drive here (standard GD connect code)

In [None]:
from google.colab import drive
drive.mount('/content/drive')