<a href="https://colab.research.google.com/github/GiovanniSorice/Deep_Music_Generator/blob/main/notebooks/Music_Generation_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Music Generator 



In this notebook, we use an Transformer to generate some music.


**This notebook was inspired (and part of the code comes from it) by [Music_Generation_LSTM](https://colab.research.google.com/drive/19TQqekOlnOSW36VCL8CPVEQKBBukmaEQ#scrollTo=DDOBVWULXfpz)**




**Load dependencies**

In [2]:
pip install compressive_transformer_pytorch

Collecting compressive_transformer_pytorch
  Downloading https://files.pythonhosted.org/packages/30/39/b8caf2671abcb8615977c08766aa9f450addd6949f57c7dda87224e844b5/compressive_transformer_pytorch-0.3.20-py3-none-any.whl
Collecting mogrifier
  Downloading https://files.pythonhosted.org/packages/77/01/62a55d0f8048e788fce435f2ade6478f443e4e53ed9b89b55ba0fc42c198/mogrifier-0.0.3-py3-none-any.whl
Installing collected packages: mogrifier, compressive-transformer-pytorch
Successfully installed compressive-transformer-pytorch-0.3.20 mogrifier-0.0.3


In [3]:
import torch
import tqdm
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from compressive_transformer_pytorch import CompressiveTransformer
from compressive_transformer_pytorch import AutoregressiveWrapper
from torchsummary import summary
from torch.utils.data import DataLoader, Dataset
from tensorflow.keras import utils
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt
import glob
import pickle
from music21 import converter, instrument, stream, note, chord

In [4]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

**Set hyperparameters**

In [20]:
# output directory name:
output_dir = 'model_output/Transformer'

# training:
epochs = 100
batch_size = 32
max_batch_size=4
learning_rate=1e-1
# vector-space embedding: 
n_dim = 64 
sequence_length = 16


VALIDATE_EVERY  = 100

GENERATE_EVERY  = 500



**Google drive configuration (only Colab)**

In [7]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**Load data** \\
Original MIDI files
 I have obtained  **MIDI files** from [The Lakh MIDI Dataset v0.1](https://colinraffel.com/projects/lmd/). 

## Processing data

Let's process the files, and load them into **music21**

In [8]:
file = "/content/drive/My Drive/ISPR_project/midi_songs/Andra tutto bene ('58).1.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord F3 F2> 4.0
<music21.note.Note A> 4.0
<music21.chord.Chord B1 F#3 F#2> 4.0
<music21.note.Note F> 4.0
<music21.chord.Chord C4 F4> 4.0
<music21.chord.Chord F#3 C#6 F#2> 4.5
<music21.note.Note C#> 4.75
<music21.chord.Chord F#2 E2 F#3> 5.0
<music21.chord.Chord A4 A3 F4 C4 A3> 5.0
<music21.note.Note F> 5.0


I will process all MIDI files obtaining data from each note of chord.

- If I process a **note**, I will store in the list a string representing the pitch (the note name) and the octave.

- If I process a **chord** (Remember that chords are set of notes that are played at the same time) I will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **I are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, I will consider them.

I are creating a big list with all the elements of all the compositions.

In [9]:
notes = []
for i,file in enumerate(glob.glob("/content/drive/My Drive/ISPR_project/midi_songs/*.mid")):
  midi = converter.parse(file)
  print('\r', 'Parsing file ', i, " ",file, end='')
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.parts[0].recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

 Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/Andra tutto bene ('58).1.mid

I obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [10]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

71

**Preprocess data** \\
Now, there is some **data processing** that I have to do:

- I will map each pitch or chord to an integer
- I will create pairs of input sequences and its corresponding output note

I can try different **sequence_length** to obtain different results. In this first version, I will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 


In [12]:
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = np.reshape(network_input, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


Let's see the new metwork_input size

In [13]:
network_input.shape

(4987, 16)

**Design neural network architecture** 

In [16]:
def create_network(sequence_length, n_vocab):
    """ create the structure of the neural network """
    model = CompressiveTransformer(
    num_tokens = n_vocab,
    dim = sequence_length,
    depth = 6,
    seq_len = sequence_length,
    mem_len = sequence_length,
    cmem_len = 256,
    cmem_ratio = 4,
    memory_layers = [5,6]
    )

    model = AutoregressiveWrapper(model)
    model.cuda()
    return model

In [17]:
model = create_network(sequence_length,n_vocab)

print(model)
#for loss, aux_loss, _ in model(inputs, return_loss = True):
#    (loss + aux_loss).backward()


AutoregressiveWrapper(
  (net): CompressiveTransformer(
    (token_emb): Embedding(71, 16)
    (to_model_dim): Identity()
    (to_logits): Sequential(
      (0): Identity()
      (1): Linear(in_features=16, out_features=71, bias=True)
    )
    (attn_layers): ModuleList(
      (0): GRUGating(
        (fn): PreNorm(
          (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True)
          (fn): SelfAttention(
            (compress_mem_fn): ConvCompress(
              (conv): Conv1d(16, 16, kernel_size=(4,), stride=(4,))
            )
            (to_q): Linear(in_features=16, out_features=16, bias=False)
            (to_kv): Linear(in_features=16, out_features=32, bias=False)
            (to_out): Linear(in_features=16, out_features=16, bias=True)
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (dropout): Dropout(p=0.0, inplace=False)
            (reconstruction_attn_dropout): Dropout(p=0.0, inplace=False)
          )
        )
        (gru): GRUCell(16, 16)

In [18]:
def cycle(loader):
    while True:
        for data in loader:
          yield data


data_train = torch.from_numpy(network_input).cuda()
train_loader = torch.utils.data.DataLoader(data_train, batch_size=32) 
cycle_train_loader  = cycle(DataLoader(data_train, batch_size = 32))

In [23]:
# optimizer

optim = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [24]:
# training

for i in tqdm.tqdm(range(epochs), mininterval=10., desc='training'):
    model.train()
    avg_loss = 0.0
    grad_accum_every = batch_size / max_batch_size

    for mlm_loss, aux_loss, is_last in model(next(cycle_train_loader), max_batch_size = max_batch_size, return_loss = True):
        loss = mlm_loss + aux_loss
        (loss / grad_accum_every).backward()

        avg_loss+=loss/batch_size;

        
        if is_last:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
            optim.step()
            optim.zero_grad()

    print(f'training loss: {avg_loss.item():.4f}')

print('Training complete.')







training:   0%|          | 0/100 [00:00<?, ?it/s][A

training loss: 0.8862
training loss: 0.8651
training loss: 0.9273
training loss: 0.9819
training loss: 0.8828
training loss: 0.8966
training loss: 0.8635
training loss: 0.9492
training loss: 0.8561
training loss: 0.7338
training loss: 0.8892
training loss: 0.8995
training loss: 0.7427
training loss: 1.0554
training loss: 1.0671
training loss: 0.8517
training loss: 0.6186
training loss: 0.5559
training loss: 0.4928
training loss: 0.4356
training loss: 0.4190
training loss: 0.4360
training loss: 1.6494
training loss: 1.7879
training loss: 1.7013
training loss: 1.5919
training loss: 1.6565
training loss: 0.9834
training loss: 1.1374
training loss: 0.9051
training loss: 0.8638
training loss: 1.1797
training loss: 1.0949
training loss: 0.8970
training loss: 2.0774
training loss: 2.1825
training loss: 1.8995
training loss: 2.0409
training loss: 1.3838
training loss: 1.1878
training loss: 1.0832
training loss: 1.5420
training loss: 1.7655
training loss: 1.6899
training loss: 2.3685
training l


training:  50%|█████     | 50/100 [00:10<00:10,  4.95it/s]

training loss: 1.5679
training loss: 1.2198


training:  51%|█████     | 51/100 [00:30<00:09,  5.04it/s]

training loss: 1.3785
training loss: 1.1233
training loss: 0.9529
training loss: 1.2272
training loss: 1.1438
training loss: 1.4745
training loss: 1.3102
training loss: 0.8996
training loss: 1.0432
training loss: 1.1418
training loss: 1.0032
training loss: 1.1581
training loss: 0.9525
training loss: 0.7799
training loss: 0.8819
training loss: 0.8967
training loss: 0.7695
training loss: 1.0040
training loss: 0.8082
training loss: 0.7679
training loss: 1.0046
training loss: 0.8354
training loss: 0.8012
training loss: 0.8823
training loss: 0.6618
training loss: 0.7689
training loss: 0.8271
training loss: 0.8306
training loss: 1.0038
training loss: 0.6479
training loss: 0.6644
training loss: 0.8270
training loss: 0.8321
training loss: 0.9220
training loss: 0.7226
training loss: 0.6649
training loss: 0.8286
training loss: 0.8174
training loss: 0.6721
training loss: 2.6208
training loss: 2.6112
training loss: 3.7898
training loss: 2.1128
training loss: 1.7219
training loss: 1.2591
training l

training: 100%|██████████| 100/100 [00:19<00:00,  5.02it/s]

training loss: 1.4283
training loss: 1.6007





In [35]:
size_input = np.array((1,16))
size_input.shape
#summary(model, size_input)

(2,)

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model.

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.


In [None]:
# In case we want to use previously trained weights
weights = ""
if(len(weights)>0): model.load_weights(weights)

**Configure model**

In [None]:
filepath = "/content/drive/My Drive/ISPR_project/LSTM{epoch:02d}-{loss:.4f}.h5"

checkpoint = ModelCheckpoint(filepath, monitor='loss',verbose=0,
                             save_best_only=True,mode='min')

callbacks_list = [checkpoint]

**Music generation**

In [None]:
# In case we want to use other previously trained weights
weights = "path/to/weights"
if(len(weights)>0): model.load_weights(weights)

ValueError: ignored

In [None]:
# Generate network input again
network_input = []
output = []
for i in range(0, len(notes) - sequence_length, 1):
  sequence_in = notes[i:i + sequence_length]
  sequence_out = notes[i + sequence_length]
  network_input.append([note_to_int[char] for char in sequence_in])
  output.append(note_to_int[sequence_out])
n_patterns = len(network_input)

The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2


In [160]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = torch.from_numpy(network_input[start]).cuda()

prediction_output = model.generate(pattern, 500)


<enumerate object at 0x7efbbf11ce10>


In [185]:
result_sample=[]

for i in range(500):
  print(i)
  result = int_to_note[prediction_output[i].item()]
  print('\r', 'Predicted ', i, " ",result, end='')
  result_sample.append(result)

prediction_output=result_sample

0
 Predicted  0   61
 Predicted  1   G52
 Predicted  2   4.6.113
 Predicted  3   G44
 Predicted  4   G55
 Predicted  5   F#36
 Predicted  6   G57
 Predicted  7   C58
 Predicted  8   F#39
 Predicted  9   G410
 Predicted  10   C511
 Predicted  11   G112
 Predicted  12   G513
 Predicted  13   G114
 Predicted  14   C515
 Predicted  15   G516
 Predicted  16   F#317
 Predicted  17   F#318
 Predicted  18   G119
 Predicted  19   G520
 Predicted  20   C521
 Predicted  21   G122
 Predicted  22   C523
 Predicted  23   G524
 Predicted  24   625
 Predicted  25   G526
 Predicted  26   C527
 Predicted  27   G528
 Predicted  28   4.6.1129
 Predicted  29   4.6.1130
 Predicted  30   G131
 Predicted  31   F#332
 Predicted  32   633
 Predicted  33   F#334
 Predicted  34   G435
 Predicted  35   G436
 Predicted  36   G537
 Predicted  37   G438
 Predicted  38   G439
 Predicted  39   F#340
 Predicted  40   F#341
 Predicted  41   F#342
 Predicted  42   C543
 Predicte

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [186]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'