With fairly good internet connection, it takes about <font color=red>1min 22s</font> to run the code up to the "Use" section.

## **Libraries Installation**

In [None]:
!sudo apt install -y fluidsynth

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  fluid-soundfont-gm libevdev2 libfluidsynth3 libgudev-1.0-0 libinput-bin
  libinput10 libinstpatch-1.0-2 libmd4c0 libmtdev1 libqt5core5a libqt5dbus5
  libqt5gui5 libqt5network5 libqt5svg5 libqt5widgets5 libwacom-bin
  libwacom-common libwacom9 libxcb-icccm4 libxcb-image0 libxcb-keysyms1
  libxcb-render-util0 libxcb-util1 libxcb-xinerama0 libxcb-xinput0 libxcb-xkb1
  libxkbcommon-x11-0 qsynth qt5-gtk-platformtheme qttranslations5-l10n
  timgm6mb-soundfont
Suggested packages:
  fluid-soundfont-gs qt5-image-formats-plugins qtwayland5 jackd
The following NEW packages will be installed:
  fluid-soundfont-gm fluidsynth libevdev2 libfluidsynth3 libgudev-1.0-0
  libinput-bin libinput10 libinstpatch-1.0-2 libmd4c0 libmtdev1 libqt5core5a
  libqt5dbus5 libqt5gui5 libqt5network5 libqt5svg5 libqt5widgets5 libwacom-bin
  libwacom-common libwacom9 libx

In [None]:
!pip install --upgrade pyfluidsynth

Collecting pyfluidsynth
  Downloading pyFluidSynth-1.3.2-py3-none-any.whl (19 kB)
Installing collected packages: pyfluidsynth
Successfully installed pyfluidsynth-1.3.2


In [None]:
!pip install pretty_midi

Collecting pretty_midi
  Downloading pretty_midi-0.2.10.tar.gz (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting mido>=1.1.16 (from pretty_midi)
  Downloading mido-1.3.0-py3-none-any.whl (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.3/50.3 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: pretty_midi
  Building wheel for pretty_midi (setup.py) ... [?25l[?25hdone
  Created wheel for pretty_midi: filename=pretty_midi-0.2.10-py3-none-any.whl size=5592284 sha256=f93f00266b7bb6f11db7c9b54d067b13444c41e4a408cd332d276607bf97692a
  Stored in directory: /root/.cache/pip/wheels/cd/a5/30/7b8b7f58709f5150f67f98fde4b891ebf0be9ef07a8af49f25
Successfully built pretty_midi
Installing collected packages: mido, pretty_midi
Successfully installed mido-1.3.0 pretty_midi-0.2.10


## **Import Libraries**

In [None]:
import collections
# import datetime
import fluidsynth
import glob
import numpy as np
import pathlib
import pandas as pd
import pretty_midi
# import seaborn as sns
import tensorflow as tf


from tensorflow.keras.utils import register_keras_serializable
from IPython.display import Audio, display
import time
from google.colab import files


# from IPython import display
# from matplotlib import pyplot as plt
# from typing import Optional

### Setting seeds for random number generators

In [None]:
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)

## **Load Dataset**

In [None]:
data_dir = pathlib.Path('data/maestro-v2.0.0')
if not data_dir.exists():
  tf.keras.utils.get_file(
      'maestro-v2.0.0-midi.zip',
      origin='https://storage.googleapis.com/magentadata/datasets/maestro/v2.0.0/maestro-v2.0.0-midi.zip',
      extract=True,
      cache_dir='.', cache_subdir='data',
  )

filenames = glob.glob(str(data_dir/'**/*.mid*'))

Downloading data from https://storage.googleapis.com/magentadata/datasets/maestro/v2.0.0/maestro-v2.0.0-midi.zip


## **Global Parameters Definition**

In [None]:
# Sampling rate for audio playback
_SAMPLING_RATE = 16000
key_order = ['pitch', 'step', 'duration']
seq_length = 25
vocab_size = 128

## **Functions Definition**

**Functions for MIDI Processing and Music Generation**

These functions aid in converting MIDI files to notes, generating new music sequences based on trained models, and adding to existing musical sequences. This code segment contains utilities for MIDI-to-note conversion, model prediction, sequence regeneration, and audio display.



<font color=red>Main Concerns Here:</font>
<ul>
  <li> @add_sequence, I don't know whether I need to do the transformation "/ np.array([vocab_size, 1, 1])" to aux_1 or not;
  <li> @add_sequence, I don't know if the "instrument" part is going to work or not;*Texto em itálico*
</ul>

In [None]:
def midi_to_notes(midi_file: str) -> pd.DataFrame:
  pm = pretty_midi.PrettyMIDI(midi_file)
  instrument = pm.instruments[0]
  notes = collections.defaultdict(list)

  # Sort the notes by start time
  sorted_notes = sorted(instrument.notes, key=lambda note: note.start)
  prev_start = sorted_notes[0].start

  for note in sorted_notes:
    start = note.start
    end = note.end
    notes['pitch'].append(note.pitch)
    notes['start'].append(start)
    notes['end'].append(end)
    notes['step'].append(start - prev_start)
    notes['duration'].append(end - start)
    prev_start = start

  return pd.DataFrame({name: np.array(value) for name, value in notes.items()})

def notes_to_midi(
    notes: pd.DataFrame,
    out_file: str,
    instrument_name: str,
    velocity: int = 100,  # note loudness
    ) -> pretty_midi.PrettyMIDI:

    pm = pretty_midi.PrettyMIDI()
    instrument = pretty_midi.Instrument(
        program=pretty_midi.instrument_name_to_program(instrument_name))

    prev_start = 0

    for i, note in notes.iterrows():
      start = float(prev_start + note['step'])
      end = float(start + note['duration'])
      note = pretty_midi.Note(
          velocity=velocity,
          pitch=int(note['pitch']),
          start=start,
          end=end,
      )
      instrument.notes.append(note)
      prev_start = start

    pm.instruments.append(instrument)
    pm.write(out_file)
    return pm

@register_keras_serializable()
def mse_with_positive_pressure(y_true: tf.Tensor, y_pred: tf.Tensor):
  mse = (y_true - y_pred) ** 2
  positive_pressure = 10 * tf.maximum(-y_pred, 0.0)
  return tf.reduce_mean(mse + positive_pressure)

def predict_next_note(
    notes: np.ndarray,
    model: tf.keras.Model,
    temperature: float = 1.0) -> tuple[int, float, float]:

    """Generates a note as a tuple of (pitch, step, duration), using a trained
    sequence model."""

    assert temperature > 0

    # Add batch dimension
    inputs = tf.expand_dims(notes, 0)

    predictions = model.predict(inputs)
    pitch_logits = predictions['pitch']
    step = predictions['step']
    duration = predictions['duration']

    pitch_logits /= temperature
    pitch = tf.random.categorical(pitch_logits, num_samples=1)
    pitch = tf.squeeze(pitch, axis=-1)
    duration = tf.squeeze(duration, axis=-1)
    step = tf.squeeze(step, axis=-1)

    # `step` and `duration` values should be non-negative
    step = tf.maximum(0, step)
    duration = tf.maximum(0, duration)

    return int(pitch), float(step), float(duration)

"""
The difference between the next two functions is the size of the original song
it kept in the output; the first is proper to generate the first bar, once we
want to keep only a small piece from the original song; the second is proper to
add subsequent bars once it keeps the whole original song .
"""

def regenerate_sequence(midi_file, num_predictions=25, temperature=2.0):
  raw_notes = midi_to_notes(midi_file)
  # key_order = ['pitch', 'step', 'duration'] is defined above
  # (@Create the training dataset)
  sample_notes = np.stack([raw_notes[key] for key in key_order], axis=1)

  # The initial sequence of notes; pitch is normalized similar to training
  # sequences

  # seq_length = 25 and vocab_size = 128 are defined above
  # (@Create the training dataset)
  input_notes = (sample_notes[:seq_length] / np.array([vocab_size, 1, 1]))

  # I don't know whether I need to do the transformation
  # "/ np.array([vocab_size, 1, 1])" to aux_1 or not
  aux_1 = np.stack([raw_notes[key] for key in ['pitch', 'step', 'duration',
                                               'start', 'end']], axis=1)
  aux_1 = aux_1[:seq_length]

  generated_notes = [tuple(x) for x in aux_1]

  # generated_notes = []
  prev_start = 0
  for _ in range(num_predictions):
    # model is defined above (@Create and train the model)
    pitch, step, duration = predict_next_note(input_notes, model, temperature)
    start = prev_start + step
    end = start + duration
    input_note = (pitch, step, duration)
    generated_notes.append((*input_note, start, end))
    input_notes = np.delete(input_notes, 0, axis=0)
    input_notes = np.append(input_notes, np.expand_dims(input_note, 0), axis=0)
    prev_start = start

  generated_notes = pd.DataFrame(generated_notes,
                                 columns=(*key_order, 'start', 'end'))



  print(f'\nnr. notes: {len(generated_notes)}\n')



  # I don't know if the "instrument" part is going to work or not
  pm = pretty_midi.PrettyMIDI(midi_file)
  instrument = pm.instruments[0]

  out_file = 'output.mid'
  instrument_name = pretty_midi.program_to_instrument_name(instrument.program)
  out_pm = notes_to_midi(generated_notes, out_file=out_file,
                         instrument_name=instrument_name)




  raw_notes = midi_to_notes('output.mid')

  aux = len(np.array(raw_notes['pitch']))

  print(f'\nnr. notes: {aux}\n')




  return out_pm

def add_sequence(midi_file, num_predictions=25, temperature=2.0):
  raw_notes = midi_to_notes(midi_file)
  # key_order = ['pitch', 'step', 'duration'] is defined above
  # (@Create the training dataset)
  sample_notes = np.stack([raw_notes[key] for key in key_order], axis=1)

  # The initial sequence of notes; pitch is normalized similar to training
  # sequences

  # seq_length = 25 and vocab_size = 128 are defined above
  # (@Create the training dataset)
  input_notes = (sample_notes[:seq_length] / np.array([vocab_size, 1, 1]))

  # I don't know whether I need to do the transformation
  # "/ np.array([vocab_size, 1, 1])" to aux_1 or not
  aux_1 = np.stack([raw_notes[key] for key in ['pitch', 'step', 'duration',
                                               'start', 'end']], axis=1)

  generated_notes = [tuple(x) for x in aux_1]

  # generated_notes = []
  prev_start = 0
  for _ in range(num_predictions):
    # model is defined above (@Create and train the model)
    pitch, step, duration = predict_next_note(input_notes, model, temperature)
    start = prev_start + step
    end = start + duration
    input_note = (pitch, step, duration)
    generated_notes.append((*input_note, start, end))
    input_notes = np.delete(input_notes, 0, axis=0)
    input_notes = np.append(input_notes, np.expand_dims(input_note, 0), axis=0)
    prev_start = start

  generated_notes = pd.DataFrame(generated_notes,
                                 columns=(*key_order, 'start', 'end'))

  # I don't know if the "instrument" part is going to work or not
  pm = pretty_midi.PrettyMIDI(midi_file)
  instrument = pm.instruments[0]

  out_file = 'output.mid'
  instrument_name = pretty_midi.program_to_instrument_name(instrument.program)
  out_pm = notes_to_midi(generated_notes, out_file=out_file,
                         instrument_name=instrument_name)

  return out_pm


def display_audio(pm: pretty_midi.PrettyMIDI, seconds=0):
  waveform = pm.fluidsynth(fs=_SAMPLING_RATE)
  if seconds==0:
    seconds = int(pm.get_end_time())
  # Take a sample of the generated waveform to mitigate kernel resets
  waveform_short = waveform[:seconds*_SAMPLING_RATE]
  return display(Audio(waveform_short, rate=_SAMPLING_RATE))

"""
def display_audio(pm: pretty_midi.PrettyMIDI, seconds=0):
  waveform = pm.fluidsynth(fs=_SAMPLING_RATE)
  if seconds==0:
    seconds = int(pm.get_end_time())
  # Take a sample of the generated waveform to mitigate kernel resets
  waveform_short = waveform[:seconds*_SAMPLING_RATE]
  return display.Audio(waveform_short, rate=_SAMPLING_RATE)
"""


def regenerate(par):

  print("\n\nInput the generation paramaters below (press ENTER to move to the next): \n")

  index = input(f"\t\t file index (default = {par[0]}): ")
  if index:
    par[0] = int(index)

  nr_notes = input(f"\t\t nr. of notes (default = {par[1]}): ")
  if nr_notes:
    par[1] = int(nr_notes)

  temperature = input(f"\t\t temperature (default = {par[2]}): ")
  if temperature:
    par[2] = float(temperature)

  print('\nGenerating a new bar...\n\n')

  index, nr_notes, temperature = par[0], par[1], par[2]
  output = regenerate_sequence(midi_file = filenames[index], num_predictions = nr_notes,
                        temperature = temperature)

  display_audio(output)

  time.sleep(1)

  ans = input('Do you want to re-generate from scratch the last bar (yes or no)? ')

  return ans, output, par

def add(song, par):

  print("\n\nInput the add paramaters below (press ENTER to move to the next): \n")

  nr_notes = input(f"\t\t nr. of notes (default = {par[1]}): ")
  if nr_notes:
    par[1] = int(nr_notes)

  temperature = input(f"\t\t temperature (default = {par[2]}): ")
  if temperature:
    par[2] = float(temperature)

  print('\nGenerating a new bar...\n\n')

  nr_notes, temperature = par[1], par[2]
  output = add_sequence(midi_file = song, num_predictions = nr_notes,
                        temperature = temperature)

  display_audio(output)

  time.sleep(1)

  ans = input('Do you want to re-generate from scratch the last bar (yes or no)? ')

  return ans, output, par

## **Model Loading**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from google.colab import files
uploaded = files.upload()

Saving model.keras to model.keras


In [None]:
# load saved model
model = tf.keras.models.load_model('model.keras') # Make sure to change the directory from where you import the model

## **Use/ Experiment**

There are 1282 MIDI files in our database. Try different base compositions for your new song, here. Just change the index of filenames (accepts numbers between 0 and 1281) and run the code.

In [None]:
pm_original = pretty_midi.PrettyMIDI(filenames[77])
display_audio(pm_original, 15)

#### **The idea behind the loop**

In [None]:
"""
(1) Before entering the loop, we could give the user the possibility to choose
the desired base file for his new song, from which the machine would start
generating. Like: "To try different midi base files just give an integer index
to filenames[]"

(2) We ask the user for the parameters for add_sequence(). He/She can type
'pass' if he/she agrees with any of the default parameters. Then we update the
default parameters to the user's preferences, generate the new audio, save the
output and display it. Then, the user listens to the new audio. Go to (3).

(3) Prompt the user with "Do you want to re-generate from scratch the last bar?"
  - If "yes", go to (2).

  - If "no", prompt the user with "Do you want to 'add' a generated bar,
  'make changes' or 'finish'?"
      - if 'add', go to (2).
      - if 'make changes'
          - download file
          - prompt the user with upload field for the modified file
          - ask whether he/she wishes to keep adding or finish
              - if add, go to (2)
              - if finish, say "Bye!"
      - if 'finish', say "Bye!"
"""

'\n(1) Before entering the loop, we could give the user the possibility to choose\nthe desired base file for his new song, from which the machine would start\ngenerating. Like: "To try different midi base files just give an integer index\nto filenames[]"\n\n(2) We ask the user for the parameters for add_sequence(). He/She can type\n\'pass\' if he/she agrees with any of the default parameters. Then we update the\ndefault parameters to the user\'s preferences, generate the new audio, save the\noutput and display it. Then, the user listens to the new audio. Go to (3).\n\n(3) Prompt the user with "Do you want to re-generate from scratch the last bar?"\n  - If "yes", go to (2).\n\n  - If "no", prompt the user with "Do you want to \'add\' a generated bar,\n  \'make changes\' or \'finish\'?"\n      - if \'add\', go to (2).\n      - if \'make changes\'\n          - download file\n          - prompt the user with upload field for the modified file\n          - ask whether he/she wishes to keep 

## <font color=red>Note:This cell is for statistical purpose only:</font>
**Count human-made notes and machine-made notes**

In [None]:
# suppose the user only changes the last bar generated
# modified notes account for generated notes that were either deleted or modified by the user

def update_ratio(down_midi, up_midi, last_bar_length, human_notes, machine_notes, modified_notes):
  down_raw_notes = midi_to_notes(down_midi)
  down_raw_notes = np.array(down_raw_notes['pitch'])
  old_N = len(down_raw_notes)

  up_raw_notes = midi_to_notes(up_midi)
  up_raw_notes = np.array(up_raw_notes['pitch'])

  size_diff = len(up_raw_notes) - len(down_raw_notes)

  nr_added = 0
  if  size_diff > 0:
    nr_added = size_diff
  else:
    machine_notes += size_diff
    modified_notes += -size_diff

  nr_modified = 0

  minim = min(len(up_raw_notes), len(down_raw_notes))
  for i in range(minim - last_bar_length, minim):
    if up_raw_notes[i] != down_raw_notes[i]:
      nr_modified += 1

  machine_notes += -nr_modified

  new_notes = nr_added + nr_modified

  human_notes += new_notes

  modified_notes += nr_modified

  return human_notes, machine_notes, modified_notes

## **Welcome to the cycle of co-creativity!**
 <font color=red>Note:Run this to experiment our approach:</font>

In [None]:
par = [1281, 25, 2.0]
ans,output,par = regenerate(par)
machine_notes = par[1]
total_machine_notes = par[1]
output_midi_0  = 'output_midi_0.mid'
output.write(output_midi_0)
human_notes, modified_notes = 0, 0

while True:
    if ans == 'yes':
        # pm_original = pretty_midi.PrettyMIDI(filenames[par[0]])
        # display_audio(pm_original, 5)
        ans,output,par = regenerate(par)
        machine_notes = par[1]
        output_midi_0  = 'output_midi_0.mid'
        output.write(output_midi_0)

    if ans == 'no':
        ans_1 = input("Do you want to 'add' a generated bar, 'make changes' or 'finish'? ")

        while ans_1 == 'add' or ans_1 == 'make changes':
            if ans_1 == 'add':
                ans_2,output,par = add(output_midi_0,par)
                # while the user wants to regenerate the last bar (eventually he/she will give up on it)
                while ans_2 == 'yes':
                    ans_2,output,par = add(output_midi_0,par)

                machine_notes += par[1]
                total_machine_notes += par[1]
                output_midi_1 = 'output_midi_1.mid'
                output.write(output_midi_1)

                output_midi_0 = output_midi_1
                ans_1 = input("Do you want to 'add' a generated bar, 'make changes' or 'finish'? ")

            if ans_1 == 'make changes':
                file_path = "/content/output_midi_1.mid"
                print("To download the generated MIDI file:")
                print("\t1 - click on the folder icon on left")
                print("\t2 - hover over the file 'output_midi_1.mid")
                print("\t3 - click on the three dots and select the correct option")
                print("\n Please upload the modified file to check the differences...\n")
                print("Make sure you upload a .midi file ...")
                uploaded_file = files.upload()
                for fn in uploaded_file.keys():
                    content = uploaded_file[fn]
                    with open('/content/output_midi_2.mid', 'wb') as f:
                        f.write(content)
                print("The uploaded file was saved as output_midi_2.mid.")
                output_midi_0 = 'output_midi_2.mid'

                human_notes, machine_notes, modified_notes = update_ratio('output_midi_1.mid', 'output_midi_2.mid', par[1], human_notes, machine_notes, modified_notes)

                ans_1 = input("Next, would you wish to 'add' a generated bar to the file you've just modified or 'finish'? ")

        if ans_1 == 'finish':
            print("Thank you for generating music with us!!")
            print(f"\nThe percentage of notes input by the user is: {human_notes/(human_notes + machine_notes) * 100}%\n")
            print(f"\nThe percentage of notes modified by the user is: {modified_notes/total_machine_notes * 100}%\n")
            break

    else:
        print("Thank you for generating music with us!!")
        print(f"\nThe percentage of notes input by the user is: {human_notes/(human_notes + machine_notes) * 100}%\n")
        print(f"\nThe percentage of notes modified by the user is: {modified_notes/total_machine_notes * 100}%\n")
    break



Input the generation paramaters below (press ENTER to move to the next): 

		 file index (default = 1281): 300
		 nr. of notes (default = 25): 10
		 temperature (default = 2.0): 2

Generating a new bar...



nr. notes: 35


nr. notes: 31



Do you want to re-generate from scratch the last bar (yes or no)? yes


Input the generation paramaters below (press ENTER to move to the next): 

		 file index (default = 300): 300
		 nr. of notes (default = 10): 10
		 temperature (default = 2.0): 5

Generating a new bar...



nr. notes: 35


nr. notes: 26



Do you want to re-generate from scratch the last bar (yes or no)? no
Do you want to 'add' a generated bar, 'make changes' or 'finish'? make changes
To download the generated MIDI file:
	1 - click on the folder icon on left
	2 - hover over the file 'output_midi_1.mid
	3 - click on the three dots and select the correct option

 Please upload the modified file to check the differences...

Make sure you upload a .midi file ...


Saving output_midi_1.mid to output_midi_1.mid
The uploaded file was saved as output_midi_2.mid.
Next, would you wish to 'add' a generated bar to the file you've just modified or 'finish'? finish
Thank you for generating music with us!!

The percentage of notes input by the user is: 0.0%


The percentage of notes modified by the user is: 0.0%



<font color=red>Some concerns about our approach:</font>
* When we generate for example 10 notes and then add other 10, the final midi only contains 35 notes, instead of the 45 expected.
* The base RNN model we used needs some modifications

After generating for the 1st time

In [None]:
down_raw_notes = midi_to_notes('output_midi_0.mid')
down_raw_notes = np.array(down_raw_notes['pitch'])
len(down_raw_notes)

27

After adding

In [None]:
down_raw_notes = midi_to_notes('output_midi_1.mid')
down_raw_notes = np.array(down_raw_notes['pitch'])
len(down_raw_notes)

34