<a href="https://colab.research.google.com/github/GiovanniSorice/Deep_Music_Generator/blob/main/notebooks/Music_Generation_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Music Generator 



In this notebook, we use an Transformer to generate some music.


**This notebook was inspired (and part of the code comes from it) by [Music_Generation_LSTM](https://colab.research.google.com/drive/19TQqekOlnOSW36VCL8CPVEQKBBukmaEQ#scrollTo=DDOBVWULXfpz)**




**Load dependencies**

In [1]:
pip install compressive_transformer_pytorch

Collecting compressive_transformer_pytorch
  Downloading https://files.pythonhosted.org/packages/30/39/b8caf2671abcb8615977c08766aa9f450addd6949f57c7dda87224e844b5/compressive_transformer_pytorch-0.3.20-py3-none-any.whl
Collecting mogrifier
  Downloading https://files.pythonhosted.org/packages/77/01/62a55d0f8048e788fce435f2ade6478f443e4e53ed9b89b55ba0fc42c198/mogrifier-0.0.3-py3-none-any.whl
Installing collected packages: mogrifier, compressive-transformer-pytorch
Successfully installed compressive-transformer-pytorch-0.3.20 mogrifier-0.0.3


In [2]:
import torch
import tqdm
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from compressive_transformer_pytorch import CompressiveTransformer
from compressive_transformer_pytorch import AutoregressiveWrapper
from torchsummary import summary
from torch.utils.data import DataLoader, Dataset
from tensorflow.keras import utils
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt
import glob
import pickle
from music21 import converter, instrument, stream, note, chord
import math
import shutil

In [3]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

**Set hyperparameters**

In [56]:
# output directory name:
output_dir = '/content/drive/My Drive/ISPR_project/Transformer/'
current_path ='/content/drive/My Drive/ISPR_project/'
# training:
epochs = 100
batch_size = 64
learning_rate=1e-3
# vector-space embedding: 
n_dim = 64 
sequence_length = 128


VALIDATE_EVERY  = 5

GENERATE_EVERY  = 500



**Save model function**

In [5]:
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, output_dir+filename)
    if is_best:
        shutil.copyfile(output_dir+filename, output_dir+'model_best.pth.tar')

**Google drive configuration (only Colab)**

In [6]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Mounted at /content/drive


**Load data** \\
Original MIDI files
 I have obtained  **MIDI files** from [The Lakh MIDI Dataset v0.1](https://colinraffel.com/projects/lmd/). 

## Processing data

Let's process the files, and load them into **music21**

In [8]:
file = current_path+"midi_songs/small_dataset/Metal/Metallica/Am I Evil?.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.note.Note E> 0.0
<music21.chord.Chord C2 C#3> 0.0
<music21.note.Note G#> 2.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.note.Note D> 3.0
<music21.chord.Chord C#3 C2> 3.0
<music21.chord.Chord B3 E3 E4> 3.5


I will process all MIDI files obtaining data from each note of chord.

- If I process a **note**, I will store in the list a string representing the pitch (the note name) and the octave.

- If I process a **chord** (Remember that chords are set of notes that are played at the same time) I will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **I are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, I will consider them.

I are creating a big list with all the elements of all the compositions.

In [10]:
notes = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/small_dataset/*/*/*.mid")):
  midi = converter.parse(file)
  print('Parsing file ', i, " ",file)
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Nessun rimpianto.1.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Grazie mille.1.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).1.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.1.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/I'll Be Over You.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_proje

In [12]:
notes_validation = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/test/*.mid")):
  midi = converter.parse(file)
  print( 'Parsing file ', i, " ",file)
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes_validation.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes_validation.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes_validation, filepath)

Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/test/I Disappear.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/test/Hit the Lights.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/test/Fight Fire With Fire.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/test/Smile.mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/test/Another One Bites The Dust.2.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/test/Bicycle Race.1.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/test/Se tornerai.1.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_project/midi_songs/test/I'll Be Over You.mid


I obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [48]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

476

In [49]:
# Count different possible outputs valifation
print(len(set(notes_validation)))

287


**Preprocess data** \\
Now, there is some **data processing** that I have to do:

- I will map each pitch or chord to an integer
- I will create pairs of input sequences and its corresponding output note

I can try different **sequence_length** to obtain different results. In this first version, I will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 


In [15]:
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = np.reshape(network_input, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


In [16]:
# create a dictionary to map pitches to integers
note_to_int_valifation = dict((notes_validation, number) for number, notes_validation in enumerate(pitchnames))
network_input_validation = []
network_output_validation = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes_validation) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input_validation.append([note_to_int_valifation[char] for char in notes_validation[i:i + sequence_length]])
n_patterns = len(network_input_validation)
# reshape the input into a format compatible with LSTM layers
network_input_validation = np.reshape(network_input_validation, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


Let's see the new metwork_input size

In [50]:
network_input.shape

(135036, 128)

**Design neural network architecture** 

In [51]:
def create_network(sequence_length, n_vocab):
    """ create the structure of the neural network """
    model = CompressiveTransformer(
    num_tokens = n_vocab,
    dim = sequence_length,
    depth = 6,
    seq_len = sequence_length,
    mem_len = sequence_length,
    cmem_len = 256,
    cmem_ratio = 4,
    memory_layers = [5,6]
    )

    model = AutoregressiveWrapper(model)
    model.cuda()
    return model

In [58]:
model = create_network(sequence_length,n_vocab)

print(model)


AutoregressiveWrapper(
  (net): CompressiveTransformer(
    (token_emb): Embedding(476, 128)
    (to_model_dim): Identity()
    (to_logits): Sequential(
      (0): Identity()
      (1): Linear(in_features=128, out_features=476, bias=True)
    )
    (attn_layers): ModuleList(
      (0): GRUGating(
        (fn): PreNorm(
          (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (fn): SelfAttention(
            (compress_mem_fn): ConvCompress(
              (conv): Conv1d(128, 128, kernel_size=(4,), stride=(4,))
            )
            (to_q): Linear(in_features=128, out_features=128, bias=False)
            (to_kv): Linear(in_features=128, out_features=256, bias=False)
            (to_out): Linear(in_features=128, out_features=128, bias=True)
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (dropout): Dropout(p=0.0, inplace=False)
            (reconstruction_attn_dropout): Dropout(p=0.0, inplace=False)
          )
        )
        (gru): GR

In [53]:
def cycle(loader):
    while True:
        for data in loader:
          yield data


data_train = torch.from_numpy(network_input).cuda()
train_loader = torch.utils.data.DataLoader(data_train, batch_size=32) 
cycle_train_loader  = cycle(DataLoader(data_train, batch_size = data_train.shape[0]))
num_batches=math.ceil(data_train.shape[0]/batch_size) # Total number of batches

In [54]:
#Validation
data_validation = torch.from_numpy(network_input_validation).cuda()
validation_loader = torch.utils.data.DataLoader(data_validation, batch_size=32) 
cycle_validation_loader  = cycle(DataLoader(data_validation, batch_size = data_validation.shape[0]))
num_batches_val=math.ceil(data_validation.shape[0]/batch_size) # Total number of batches

In [59]:
# optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model.

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.


In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load("/content/drive/My Drive/ISPR_project/Transformer/model_best.pth.tar")
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# training

for i in tqdm.tqdm(range(epochs), mininterval=20., desc='training'):
    model.train()
    tot_loss = 0.0
    is_best=0
    best_loss_value=n_vocab
    avg_loss_val=0
    for mlm_loss, aux_loss, is_last in model(next(cycle_train_loader), max_batch_size = batch_size, return_loss = True):
        loss = mlm_loss + aux_loss

        loss.backward()

        tot_loss+=loss;

        if is_last:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
            optimizer.step()
            optimizer.zero_grad()
    
    if i % VALIDATE_EVERY == 0 or i==epochs-1:
      model.eval()
      with torch.no_grad():
          for loss_val, aux_loss_val, is_last_val in model(next(cycle_validation_loader), max_batch_size = batch_size, return_loss = True):
            avg_loss_val+=loss_val/num_batches_val;

            if is_last_val:
              print(f'validation loss: {avg_loss_val.item():.4f}')


    avg_loss=tot_loss/num_batches

    if i%5==0 or i==epochs-1:
      if best_loss_value>avg_loss:
        best_loss_value=avg_loss;
        is_best=1

      save_checkpoint({
      'epoch': i,
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict' : optimizer.state_dict(),
      'loss':avg_loss.item(),
     }, is_best, 'Tran_128_Checkpoint'+str(i)+'_'+"{:.4f}".format(avg_loss.item())+'.pth.tar')
      is_best=0
    print(f'/n Epoch: {i} |Training loss: {avg_loss.item():.4f}')
print('Training complete.')



















training:   0%|          | 0/100 [00:00<?, ?it/s][A[A[A[A[A[A[A[A[A[A[A[A[A

validation loss: 6.1077















training:   1%|          | 1/100 [04:50<7:59:35, 290.67s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 0 |Training loss: 6.1499















training:   2%|▏         | 2/100 [09:23<7:45:50, 285.21s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 1 |Training loss: 6.0965















training:   3%|▎         | 3/100 [13:55<7:34:49, 281.34s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 2 |Training loss: 6.0164















training:   4%|▍         | 4/100 [18:27<7:25:46, 278.61s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 3 |Training loss: 5.9095















training:   5%|▌         | 5/100 [22:59<7:17:56, 276.60s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 4 |Training loss: 5.7784
validation loss: 5.5115















training:   6%|▌         | 6/100 [27:50<7:20:15, 281.02s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 5 |Training loss: 5.6232















training:   7%|▋         | 7/100 [32:22<7:11:21, 278.30s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 6 |Training loss: 5.4580















training:   8%|▊         | 8/100 [36:54<7:03:48, 276.40s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 7 |Training loss: 5.2964















training:   9%|▉         | 9/100 [41:26<6:57:13, 275.09s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 8 |Training loss: 5.1512















training:  10%|█         | 10/100 [45:58<6:51:15, 274.18s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 9 |Training loss: 5.0287
validation loss: 4.9797















training:  11%|█         | 11/100 [50:50<6:54:26, 279.40s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 10 |Training loss: 4.9286















training:  12%|█▏        | 12/100 [55:22<6:46:39, 277.26s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 11 |Training loss: 4.8472















training:  13%|█▎        | 13/100 [59:55<6:39:50, 275.75s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 12 |Training loss: 4.7794















training:  14%|█▍        | 14/100 [1:04:27<6:33:52, 274.79s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 13 |Training loss: 4.7201















training:  15%|█▌        | 15/100 [1:09:00<6:28:17, 274.09s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 14 |Training loss: 4.6712
validation loss: 4.7741















training:  16%|█▌        | 16/100 [1:13:51<6:31:11, 279.42s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 15 |Training loss: 4.6260















training:  17%|█▋        | 17/100 [1:18:24<6:23:42, 277.38s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 16 |Training loss: 4.5847















training:  18%|█▊        | 18/100 [1:22:56<6:17:03, 275.89s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 17 |Training loss: 4.5422















training:  19%|█▉        | 19/100 [1:27:29<6:10:59, 274.80s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 18 |Training loss: 4.5011















training:  20%|██        | 20/100 [1:32:01<6:05:24, 274.06s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 19 |Training loss: 5.5504
validation loss: 4.6230















training:  21%|██        | 21/100 [1:36:53<6:07:51, 279.39s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 20 |Training loss: 4.4529















training:  22%|██▏       | 22/100 [1:41:25<6:00:28, 277.29s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 21 |Training loss: 4.4183















training:  23%|██▎       | 23/100 [1:45:58<5:53:56, 275.80s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 22 |Training loss: 4.3953















training:  24%|██▍       | 24/100 [1:50:30<5:48:01, 274.75s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 23 |Training loss: 4.3593















training:  25%|██▌       | 25/100 [1:55:02<5:42:37, 274.10s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 24 |Training loss: 4.3385
validation loss: 4.4742















training:  26%|██▌       | 26/100 [1:59:54<5:44:38, 279.44s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 25 |Training loss: 4.3077















training:  27%|██▋       | 27/100 [2:04:27<5:37:26, 277.35s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 26 |Training loss: 4.2802















training:  28%|██▊       | 28/100 [2:08:59<5:31:05, 275.91s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 27 |Training loss: 4.2514















training:  29%|██▉       | 29/100 [2:13:32<5:25:24, 274.99s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 28 |Training loss: 4.2228















training:  30%|███       | 30/100 [2:18:04<5:19:52, 274.18s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 29 |Training loss: 4.1944
validation loss: 4.3256















training:  31%|███       | 31/100 [2:22:56<5:21:25, 279.51s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 30 |Training loss: 4.1708















training:  32%|███▏      | 32/100 [2:27:29<5:14:21, 277.37s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 31 |Training loss: 4.1418















training:  33%|███▎      | 33/100 [2:32:02<5:08:10, 275.98s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 32 |Training loss: 4.1141















training:  34%|███▍      | 34/100 [2:36:34<5:02:26, 274.94s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 33 |Training loss: 4.0998















training:  35%|███▌      | 35/100 [2:41:07<4:57:04, 274.22s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 34 |Training loss: 4.0683
validation loss: 4.2245















training:  36%|███▌      | 36/100 [2:45:59<4:58:14, 279.60s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 35 |Training loss: 4.0447















training:  37%|███▋      | 37/100 [2:50:31<4:51:22, 277.50s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 36 |Training loss: 4.0272















training:  38%|███▊      | 38/100 [2:55:04<4:45:09, 275.96s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 37 |Training loss: 3.9974















training:  39%|███▉      | 39/100 [2:59:36<4:39:33, 274.97s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 38 |Training loss: 3.9800















training:  40%|████      | 40/100 [3:04:09<4:34:12, 274.21s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 39 |Training loss: 3.9567
validation loss: 4.1325















training:  41%|████      | 41/100 [3:09:01<4:35:00, 279.67s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 40 |Training loss: 3.9352















training:  42%|████▏     | 42/100 [3:13:34<4:28:15, 277.52s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 41 |Training loss: 3.9329















training:  43%|████▎     | 43/100 [3:18:06<4:22:16, 276.08s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 42 |Training loss: 3.8973















training:  44%|████▍     | 44/100 [3:22:39<4:16:45, 275.10s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 43 |Training loss: 3.9075















training:  45%|████▌     | 45/100 [3:27:12<4:11:36, 274.47s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 44 |Training loss: 3.8702
validation loss: 4.0560















training:  46%|████▌     | 46/100 [3:32:04<4:11:45, 279.72s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 45 |Training loss: 3.8911















training:  47%|████▋     | 47/100 [3:36:37<4:05:06, 277.48s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 46 |Training loss: 3.8538















training:  48%|████▊     | 48/100 [3:41:09<3:59:05, 275.88s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 47 |Training loss: 3.8340















training:  49%|████▉     | 49/100 [3:45:41<3:53:31, 274.74s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 48 |Training loss: 3.8273















training:  50%|█████     | 50/100 [3:50:13<3:48:15, 273.91s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 49 |Training loss: 3.8008
validation loss: 3.9974















training:  51%|█████     | 51/100 [3:55:05<3:48:04, 279.27s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 50 |Training loss: 3.7906















training:  52%|█████▏    | 52/100 [3:59:37<3:41:42, 277.13s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 51 |Training loss: 3.7759















training:  53%|█████▎    | 53/100 [4:04:09<3:35:51, 275.56s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 52 |Training loss: 3.7555















training:  54%|█████▍    | 54/100 [4:08:41<3:30:30, 274.57s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 53 |Training loss: 3.7406















training:  55%|█████▌    | 55/100 [4:13:13<3:25:21, 273.81s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 54 |Training loss: 3.7306
validation loss: 3.9053















training:  56%|█████▌    | 56/100 [4:18:05<3:24:43, 279.17s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 55 |Training loss: 3.7063















training:  57%|█████▋    | 57/100 [4:22:37<3:18:34, 277.08s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 56 |Training loss: 3.6963















training:  58%|█████▊    | 58/100 [4:27:09<3:12:54, 275.59s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 57 |Training loss: 3.6769















training:  59%|█████▉    | 59/100 [4:31:41<3:07:37, 274.58s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 58 |Training loss: 3.6667















training:  60%|██████    | 60/100 [4:36:13<3:02:36, 273.91s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 59 |Training loss: 3.6507
validation loss: 3.8173















training:  61%|██████    | 61/100 [4:41:05<3:01:27, 279.18s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 60 |Training loss: 3.6300















training:  62%|██████▏   | 62/100 [4:45:37<2:55:29, 277.08s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 61 |Training loss: 3.6177















training:  63%|██████▎   | 63/100 [4:50:09<2:49:59, 275.66s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 62 |Training loss: 3.6022















training:  64%|██████▍   | 64/100 [4:54:42<2:44:45, 274.59s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 63 |Training loss: 3.5866















training:  65%|██████▌   | 65/100 [4:59:13<2:39:42, 273.78s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 64 |Training loss: 3.5715
validation loss: 3.7393















training:  66%|██████▌   | 66/100 [5:04:05<2:38:11, 279.16s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 65 |Training loss: 3.5628















training:  67%|██████▋   | 67/100 [5:08:37<2:32:23, 277.07s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 66 |Training loss: 3.5400















training:  68%|██████▊   | 68/100 [5:13:10<2:27:00, 275.63s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 67 |Training loss: 3.5314















training:  69%|██████▉   | 69/100 [5:17:42<2:21:51, 274.56s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 68 |Training loss: 3.5128















training:  70%|███████   | 70/100 [5:22:14<2:16:57, 273.92s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 69 |Training loss: 3.5031
validation loss: 3.6745















training:  71%|███████   | 71/100 [5:27:06<2:14:58, 279.25s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 70 |Training loss: 3.4878















training:  72%|███████▏  | 72/100 [5:31:38<2:09:21, 277.19s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 71 |Training loss: 3.4804















training:  73%|███████▎  | 73/100 [5:36:11<2:04:05, 275.76s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 72 |Training loss: 3.4639















training:  74%|███████▍  | 74/100 [5:40:43<1:59:03, 274.77s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 73 |Training loss: 3.4462















training:  75%|███████▌  | 75/100 [5:45:15<1:54:09, 273.97s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 74 |Training loss: 3.4387
validation loss: 3.6200















training:  76%|███████▌  | 76/100 [5:50:07<1:51:46, 279.44s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 75 |Training loss: 3.4219















training:  77%|███████▋  | 77/100 [5:54:40<1:46:18, 277.34s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 76 |Training loss: 3.4148















training:  78%|███████▊  | 78/100 [5:59:12<1:41:07, 275.77s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 77 |Training loss: 3.4005















training:  79%|███████▉  | 79/100 [6:03:44<1:36:11, 274.82s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 78 |Training loss: 3.3973















training:  80%|████████  | 80/100 [6:08:17<1:31:19, 273.99s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 79 |Training loss: 3.3805
validation loss: 3.5534















training:  81%|████████  | 81/100 [6:13:08<1:28:27, 279.34s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 80 |Training loss: 3.3698















training:  82%|████████▏ | 82/100 [6:17:41<1:23:10, 277.26s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 81 |Training loss: 3.3611















training:  83%|████████▎ | 83/100 [6:22:13<1:18:07, 275.73s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 82 |Training loss: 3.3484















training:  84%|████████▍ | 84/100 [6:26:45<1:13:14, 274.65s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 83 |Training loss: 3.3397















training:  85%|████████▌ | 85/100 [6:31:18<1:08:30, 274.01s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 84 |Training loss: 3.3259
validation loss: 3.4989















training:  86%|████████▌ | 86/100 [6:36:09<1:05:10, 279.31s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 85 |Training loss: 3.3173















training:  87%|████████▋ | 87/100 [6:40:42<1:00:04, 277.28s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 86 |Training loss: 3.3061















training:  88%|████████▊ | 88/100 [6:45:14<55:09, 275.78s/it]  [A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 87 |Training loss: 3.3037















training:  89%|████████▉ | 89/100 [6:49:46<50:21, 274.69s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 88 |Training loss: 3.2928















training:  90%|█████████ | 90/100 [6:54:18<45:39, 273.93s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 89 |Training loss: 3.2771
validation loss: 3.4494















training:  91%|█████████ | 91/100 [6:59:10<41:54, 279.35s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 90 |Training loss: 3.2760















training:  92%|█████████▏| 92/100 [7:03:43<36:58, 277.29s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 91 |Training loss: 3.2566















training:  93%|█████████▎| 93/100 [7:08:15<32:10, 275.80s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 92 |Training loss: 3.2823















training:  94%|█████████▍| 94/100 [7:12:47<27:28, 274.72s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 93 |Training loss: 3.2603















training:  95%|█████████▌| 95/100 [7:17:20<22:50, 274.04s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 94 |Training loss: 3.2610
validation loss: 3.4309















training:  96%|█████████▌| 96/100 [7:22:11<18:37, 279.30s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 95 |Training loss: 3.2454















training:  97%|█████████▋| 97/100 [7:26:44<13:51, 277.22s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 96 |Training loss: 3.2543















training:  98%|█████████▊| 98/100 [7:31:16<09:11, 275.80s/it][A[A[A[A[A[A[A[A[A[A[A[A[A

/n Epoch: 97 |Training loss: 3.2377


In [1]:
model.eval()
with torch.no_grad():
    for loss_val, aux_loss_val, is_last_val in model(next(cycle_validation_loader), max_batch_size = batch_size, return_loss = True):
      avg_loss_val+=loss_val/num_batches_val;

      if is_last_val:
        print(f'validation loss: {avg_loss_val.item():.4f}')


NameError: ignored

**Music generation**

In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load(output_dir+weights)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# Generate network input again
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
network_input = np.reshape(network_input, (n_patterns, sequence_length))


The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2


In [None]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = torch.from_numpy(network_input[start]).cuda()

prediction_output = model.generate(pattern, 500)


In [None]:
result_sample=[]

for i in range(500):
  print(i)
  result = int_to_note[prediction_output[i].item()]
  print('\r', 'Predicted ', i, " ",result, end='')
  result_sample.append(result)

prediction_output=result_sample

0
 Predicted  0   61
 Predicted  1   4.62
 Predicted  2   6.113
 Predicted  3   64
 Predicted  4   6.115
 Predicted  5   A46
 Predicted  6   4.67
 Predicted  7   F48
 Predicted  8   69
 Predicted  9   610
 Predicted  10   5.7.9.011
 Predicted  11   2.3.7.1012
 Predicted  12   D513
 Predicted  13   C514
 Predicted  14   5.7.9.015
 Predicted  15   C516
 Predicted  16   4.617
 Predicted  17   B-118
 Predicted  18   10.2.519
 Predicted  19   C520
 Predicted  20   6.1121
 Predicted  21   622
 Predicted  22   F223
 Predicted  23   6.1124
 Predicted  24   4.625
 Predicted  25   B-226
 Predicted  26   B-127
 Predicted  27   A428
 Predicted  28   629
 Predicted  29   C530
 Predicted  30   E-331
 Predicted  31   F232
 Predicted  32   4.633
 Predicted  33   534
 Predicted  34   5.1035
 Predicted  35   4.636
 Predicted  36   637
 Predicted  37   4.638
 Predicted  38   4.639
 Predicted  39   F240
 Predicted  40   4.641
 Predicted  41   B-242
 Predicted  42

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [None]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'