<a href="https://colab.research.google.com/github/GiovanniSorice/Deep_Music_Generator/blob/main/notebooks/Music_Generation_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Music Generator 



In this notebook, we use an Transformer to generate some music.


**This notebook was inspired (and part of the code comes from it) by [Music_Generation_LSTM](https://colab.research.google.com/drive/19TQqekOlnOSW36VCL8CPVEQKBBukmaEQ#scrollTo=DDOBVWULXfpz)**




**Load dependencies**

In [1]:
pip install compressive_transformer_pytorch

Collecting compressive_transformer_pytorch
  Downloading https://files.pythonhosted.org/packages/30/39/b8caf2671abcb8615977c08766aa9f450addd6949f57c7dda87224e844b5/compressive_transformer_pytorch-0.3.20-py3-none-any.whl
Collecting mogrifier
  Downloading https://files.pythonhosted.org/packages/77/01/62a55d0f8048e788fce435f2ade6478f443e4e53ed9b89b55ba0fc42c198/mogrifier-0.0.3-py3-none-any.whl
Installing collected packages: mogrifier, compressive-transformer-pytorch
Successfully installed compressive-transformer-pytorch-0.3.20 mogrifier-0.0.3


In [2]:
import torch
import tqdm
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from compressive_transformer_pytorch import CompressiveTransformer
from compressive_transformer_pytorch import AutoregressiveWrapper
from torchsummary import summary
from torch.utils.data import DataLoader, Dataset
from tensorflow.keras import utils
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt
import glob
import pickle
from music21 import converter, instrument, stream, note, chord
import math
import shutil

In [3]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

**Set hyperparameters**

In [19]:
# output directory name:
output_dir = '/content/drive/My Drive/ISPR_project/Transformer/'
current_path ='/content/drive/My Drive/ISPR_project/'
# training:
epochs = 500
batch_size = 64
learning_rate=1e-3
# vector-space embedding: 
n_dim = 64 
sequence_length = 128


VALIDATE_EVERY  = 100

GENERATE_EVERY  = 500



**Save model function**

In [5]:
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, output_dir+filename)
    if is_best:
        shutil.copyfile(output_dir+filename, output_dir+'model_best.pth.tar')

**Google drive configuration (only Colab)**

In [6]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Mounted at /content/drive


**Load data** \\
Original MIDI files
 I have obtained  **MIDI files** from [The Lakh MIDI Dataset v0.1](https://colinraffel.com/projects/lmd/). 

## Processing data

Let's process the files, and load them into **music21**

In [7]:
file = current_path+"midi_songs/Andra tutto bene ('58).1.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord F3 F2> 4.0
<music21.note.Note A> 4.0
<music21.chord.Chord B1 F#3 F#2> 4.0
<music21.note.Note F> 4.0
<music21.chord.Chord C4 F4> 4.0
<music21.chord.Chord F#3 C#6 F#2> 4.5
<music21.note.Note C#> 4.75
<music21.chord.Chord F#2 E2 F#3> 5.0
<music21.chord.Chord A4 A3 F4 C4 A3> 5.0
<music21.note.Note F> 5.0


I will process all MIDI files obtaining data from each note of chord.

- If I process a **note**, I will store in the list a string representing the pitch (the note name) and the octave.

- If I process a **chord** (Remember that chords are set of notes that are played at the same time) I will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **I are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, I will consider them.

I are creating a big list with all the elements of all the compositions.

In [8]:
notes = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/*.mid")):
  midi = converter.parse(file)
  print('\r', 'Parsing file ', i, " ",file, end='')
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

 Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/Andra tutto bene ('58).1.mid

I obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [9]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

145

**Preprocess data** \\
Now, there is some **data processing** that I have to do:

- I will map each pitch or chord to an integer
- I will create pairs of input sequences and its corresponding output note

I can try different **sequence_length** to obtain different results. In this first version, I will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 


In [11]:
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = np.reshape(network_input, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


Let's see the new metwork_input size

In [12]:
network_input.shape

(13343, 128)

**Design neural network architecture** 

In [13]:
def create_network(sequence_length, n_vocab):
    """ create the structure of the neural network """
    model = CompressiveTransformer(
    num_tokens = n_vocab,
    dim = sequence_length,
    depth = 6,
    seq_len = sequence_length,
    mem_len = sequence_length,
    cmem_len = 256,
    cmem_ratio = 4,
    memory_layers = [5,6]
    )

    model = AutoregressiveWrapper(model)
    model.cuda()
    return model

In [14]:
model = create_network(sequence_length,n_vocab)

print(model)


AutoregressiveWrapper(
  (net): CompressiveTransformer(
    (token_emb): Embedding(145, 128)
    (to_model_dim): Identity()
    (to_logits): Sequential(
      (0): Identity()
      (1): Linear(in_features=128, out_features=145, bias=True)
    )
    (attn_layers): ModuleList(
      (0): GRUGating(
        (fn): PreNorm(
          (norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
          (fn): SelfAttention(
            (compress_mem_fn): ConvCompress(
              (conv): Conv1d(128, 128, kernel_size=(4,), stride=(4,))
            )
            (to_q): Linear(in_features=128, out_features=128, bias=False)
            (to_kv): Linear(in_features=128, out_features=256, bias=False)
            (to_out): Linear(in_features=128, out_features=128, bias=True)
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (dropout): Dropout(p=0.0, inplace=False)
            (reconstruction_attn_dropout): Dropout(p=0.0, inplace=False)
          )
        )
        (gru): GR

In [15]:
def cycle(loader):
    while True:
        for data in loader:
          yield data


data_train = torch.from_numpy(network_input).cuda()
train_loader = torch.utils.data.DataLoader(data_train, batch_size=32) 
cycle_train_loader  = cycle(DataLoader(data_train, batch_size = data_train.shape[0]))
num_bathes=math.ceil(data_train.shape[0]/batch_size) # Total number of batches

In [16]:
# optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model.

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.


In [39]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load("/content/drive/My Drive/ISPR_project/Transformer/model_best.pth.tar")
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [20]:
# training

for i in tqdm.tqdm(range(epochs), mininterval=20., desc='training'):
    model.train()
    avg_loss = 0.0
    is_best=0
    best_loss_value=n_vocab
    for mlm_loss, aux_loss, is_last in model(next(cycle_train_loader), max_batch_size = batch_size, return_loss = True):
        loss = mlm_loss + aux_loss
        loss.backward()

        avg_loss+=loss/num_bathes;

        if is_last:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
            optimizer.step()
            optimizer.zero_grad()

    if i%20==0:
      if best_loss_value>avg_loss:
        best_loss_value=avg_loss;
        is_best=1

      save_checkpoint({
      'epoch': i,
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict' : optimizer.state_dict(),
      'loss':avg_loss.item(),
     }, is_best, 'Tran_Checkpoint'+str(i)+'_'+"{:.4f}".format(avg_loss.item())+'.pth.tar')
      is_best=0

    print(f'Epoch: {i} |Training loss: {avg_loss.item():.4f}')
print('Training complete.')








training:   0%|          | 0/500 [00:00<?, ?it/s][A[A

training:   0%|          | 1/500 [00:27<3:49:45, 27.63s/it][A[A

Epoch: 0 |Training loss: 2.4668




training:   0%|          | 2/500 [00:54<3:48:09, 27.49s/it][A[A

Epoch: 1 |Training loss: 2.4180




training:   1%|          | 3/500 [01:21<3:45:58, 27.28s/it][A[A

Epoch: 2 |Training loss: 2.3709




training:   1%|          | 4/500 [01:48<3:44:39, 27.18s/it][A[A

Epoch: 3 |Training loss: 2.3166




training:   1%|          | 5/500 [02:15<3:43:47, 27.13s/it][A[A

Epoch: 4 |Training loss: 2.2580




training:   1%|          | 6/500 [02:42<3:42:48, 27.06s/it][A[A

Epoch: 5 |Training loss: 2.2006




training:   1%|▏         | 7/500 [03:09<3:41:57, 27.01s/it][A[A

Epoch: 6 |Training loss: 2.1511




training:   2%|▏         | 8/500 [03:36<3:41:15, 26.98s/it][A[A

Epoch: 7 |Training loss: 2.1286




training:   2%|▏         | 9/500 [04:03<3:40:42, 26.97s/it][A[A

Epoch: 8 |Training loss: 2.0893




training:   2%|▏         | 10/500 [04:30<3:40:16, 26.97s/it][A[A

Epoch: 9 |Training loss: 2.1122




training:   2%|▏         | 11/500 [04:57<3:40:28, 27.05s/it][A[A

Epoch: 10 |Training loss: 2.0559




training:   2%|▏         | 12/500 [05:24<3:39:41, 27.01s/it][A[A

Epoch: 11 |Training loss: 2.0402




training:   3%|▎         | 13/500 [05:51<3:38:53, 26.97s/it][A[A

Epoch: 12 |Training loss: 2.0213




training:   3%|▎         | 14/500 [06:18<3:38:14, 26.94s/it][A[A

Epoch: 13 |Training loss: 2.0088




training:   3%|▎         | 15/500 [06:44<3:37:39, 26.93s/it][A[A

Epoch: 14 |Training loss: 1.9918




training:   3%|▎         | 16/500 [07:11<3:37:11, 26.92s/it][A[A

Epoch: 15 |Training loss: 1.9746




training:   3%|▎         | 17/500 [07:38<3:36:43, 26.92s/it][A[A

Epoch: 16 |Training loss: 1.9511




training:   4%|▎         | 18/500 [08:05<3:36:10, 26.91s/it][A[A

Epoch: 17 |Training loss: 1.9273




training:   4%|▍         | 19/500 [08:32<3:35:41, 26.91s/it][A[A

Epoch: 18 |Training loss: 1.9363




training:   4%|▍         | 20/500 [08:59<3:35:13, 26.90s/it][A[A

Epoch: 19 |Training loss: 1.8999




training:   4%|▍         | 21/500 [09:26<3:35:27, 26.99s/it][A[A

Epoch: 20 |Training loss: 1.8875




training:   4%|▍         | 22/500 [09:53<3:34:50, 26.97s/it][A[A

Epoch: 21 |Training loss: 1.8725




training:   5%|▍         | 23/500 [10:20<3:34:37, 27.00s/it][A[A

Epoch: 22 |Training loss: 1.8576




training:   5%|▍         | 24/500 [10:47<3:34:04, 26.98s/it][A[A

Epoch: 23 |Training loss: 1.8462




training:   5%|▌         | 25/500 [11:14<3:33:28, 26.97s/it][A[A

Epoch: 24 |Training loss: 1.8326




training:   5%|▌         | 26/500 [11:41<3:32:54, 26.95s/it][A[A

Epoch: 25 |Training loss: 1.8210




training:   5%|▌         | 27/500 [12:08<3:32:22, 26.94s/it][A[A

Epoch: 26 |Training loss: 1.8099




training:   6%|▌         | 28/500 [12:35<3:31:52, 26.93s/it][A[A

Epoch: 27 |Training loss: 1.8032




training:   6%|▌         | 29/500 [13:02<3:31:22, 26.93s/it][A[A

Epoch: 28 |Training loss: 1.7916




training:   6%|▌         | 30/500 [13:29<3:30:58, 26.93s/it][A[A

Epoch: 29 |Training loss: 1.7803




training:   6%|▌         | 31/500 [13:56<3:30:31, 26.93s/it][A[A

Epoch: 30 |Training loss: 1.7755




training:   6%|▋         | 32/500 [14:22<3:30:00, 26.92s/it][A[A

Epoch: 31 |Training loss: 1.7655




training:   7%|▋         | 33/500 [14:49<3:29:31, 26.92s/it][A[A

Epoch: 32 |Training loss: 1.7591




training:   7%|▋         | 34/500 [15:16<3:29:03, 26.92s/it][A[A

Epoch: 33 |Training loss: 1.7511




training:   7%|▋         | 35/500 [15:43<3:29:03, 26.97s/it][A[A

Epoch: 34 |Training loss: 1.7449




training:   7%|▋         | 36/500 [16:10<3:28:30, 26.96s/it][A[A

Epoch: 35 |Training loss: 1.7377




training:   7%|▋         | 37/500 [16:37<3:28:01, 26.96s/it][A[A

Epoch: 36 |Training loss: 1.7284




training:   8%|▊         | 38/500 [17:04<3:27:30, 26.95s/it][A[A

Epoch: 37 |Training loss: 1.7176




training:   8%|▊         | 39/500 [17:31<3:27:07, 26.96s/it][A[A

Epoch: 38 |Training loss: 1.7093




training:   8%|▊         | 40/500 [17:58<3:26:41, 26.96s/it][A[A

Epoch: 39 |Training loss: 1.7032




training:   8%|▊         | 41/500 [18:25<3:26:58, 27.06s/it][A[A

Epoch: 40 |Training loss: 1.6973




training:   8%|▊         | 42/500 [18:52<3:26:16, 27.02s/it][A[A

Epoch: 41 |Training loss: 1.6897




training:   9%|▊         | 43/500 [19:19<3:25:33, 26.99s/it][A[A

Epoch: 42 |Training loss: 1.6846




training:   9%|▉         | 44/500 [19:46<3:24:58, 26.97s/it][A[A

Epoch: 43 |Training loss: 1.6780




training:   9%|▉         | 45/500 [20:13<3:24:22, 26.95s/it][A[A

Epoch: 44 |Training loss: 1.6720




training:   9%|▉         | 46/500 [20:40<3:24:12, 26.99s/it][A[A

Epoch: 45 |Training loss: 1.6665




training:   9%|▉         | 47/500 [21:07<3:23:35, 26.97s/it][A[A

Epoch: 46 |Training loss: 1.6596




training:  10%|▉         | 48/500 [21:34<3:23:03, 26.96s/it][A[A

Epoch: 47 |Training loss: 1.6537




training:  10%|▉         | 49/500 [22:01<3:22:32, 26.95s/it][A[A

Epoch: 48 |Training loss: 1.6467




training:  10%|█         | 50/500 [22:28<3:22:00, 26.93s/it][A[A

Epoch: 49 |Training loss: 1.6424




training:  10%|█         | 51/500 [22:55<3:21:31, 26.93s/it][A[A

Epoch: 50 |Training loss: 1.6376




training:  10%|█         | 52/500 [23:22<3:21:06, 26.93s/it][A[A

Epoch: 51 |Training loss: 1.6330




training:  11%|█         | 53/500 [23:49<3:20:41, 26.94s/it][A[A

Epoch: 52 |Training loss: 1.6320




training:  11%|█         | 54/500 [24:16<3:20:16, 26.94s/it][A[A

Epoch: 53 |Training loss: 1.6234




training:  11%|█         | 55/500 [24:43<3:19:54, 26.95s/it][A[A

Epoch: 54 |Training loss: 1.6192




training:  11%|█         | 56/500 [25:10<3:19:28, 26.96s/it][A[A

Epoch: 55 |Training loss: 1.6152




training:  11%|█▏        | 57/500 [25:37<3:19:00, 26.95s/it][A[A

Epoch: 56 |Training loss: 1.6100




training:  12%|█▏        | 58/500 [26:03<3:18:28, 26.94s/it][A[A

Epoch: 57 |Training loss: 1.6045




training:  12%|█▏        | 59/500 [26:31<3:18:18, 26.98s/it][A[A

Epoch: 58 |Training loss: 1.6016




training:  12%|█▏        | 60/500 [26:57<3:17:41, 26.96s/it][A[A

Epoch: 59 |Training loss: 1.5937




training:  12%|█▏        | 61/500 [27:25<3:17:45, 27.03s/it][A[A

Epoch: 60 |Training loss: 1.5906




training:  12%|█▏        | 62/500 [27:52<3:17:02, 26.99s/it][A[A

Epoch: 61 |Training loss: 1.5843




training:  13%|█▎        | 63/500 [28:18<3:16:25, 26.97s/it][A[A

Epoch: 62 |Training loss: 1.5793




training:  13%|█▎        | 64/500 [28:45<3:15:51, 26.95s/it][A[A

Epoch: 63 |Training loss: 1.5748




training:  13%|█▎        | 65/500 [29:12<3:15:19, 26.94s/it][A[A

Epoch: 64 |Training loss: 1.5685




training:  13%|█▎        | 66/500 [29:39<3:14:47, 26.93s/it][A[A

Epoch: 65 |Training loss: 1.5635




training:  13%|█▎        | 67/500 [30:06<3:14:20, 26.93s/it][A[A

Epoch: 66 |Training loss: 1.5566




training:  14%|█▎        | 68/500 [30:33<3:13:54, 26.93s/it][A[A

Epoch: 67 |Training loss: 1.5503




training:  14%|█▍        | 69/500 [31:00<3:13:30, 26.94s/it][A[A

Epoch: 68 |Training loss: 1.5462




training:  14%|█▍        | 70/500 [31:27<3:12:59, 26.93s/it][A[A

Epoch: 69 |Training loss: 1.5395




training:  14%|█▍        | 71/500 [31:54<3:12:30, 26.92s/it][A[A

Epoch: 70 |Training loss: 1.5324




training:  14%|█▍        | 72/500 [32:21<3:12:23, 26.97s/it][A[A

Epoch: 71 |Training loss: 1.5265




training:  15%|█▍        | 73/500 [32:48<3:11:51, 26.96s/it][A[A

Epoch: 72 |Training loss: 1.5251




training:  15%|█▍        | 74/500 [33:15<3:11:22, 26.96s/it][A[A

Epoch: 73 |Training loss: 1.5147




training:  15%|█▌        | 75/500 [33:42<3:10:51, 26.95s/it][A[A

Epoch: 74 |Training loss: 1.5065




training:  15%|█▌        | 76/500 [34:09<3:10:25, 26.95s/it][A[A

Epoch: 75 |Training loss: 1.5013




training:  15%|█▌        | 77/500 [34:36<3:09:57, 26.94s/it][A[A

Epoch: 76 |Training loss: 1.5021




training:  16%|█▌        | 78/500 [35:03<3:09:33, 26.95s/it][A[A

Epoch: 77 |Training loss: 1.4873




training:  16%|█▌        | 79/500 [35:29<3:09:07, 26.95s/it][A[A

Epoch: 78 |Training loss: 1.5060




training:  16%|█▌        | 80/500 [35:56<3:08:39, 26.95s/it][A[A

Epoch: 79 |Training loss: 1.4885




training:  16%|█▌        | 81/500 [36:24<3:08:42, 27.02s/it][A[A

Epoch: 80 |Training loss: 1.4888




training:  16%|█▋        | 82/500 [36:51<3:08:06, 27.00s/it][A[A

Epoch: 81 |Training loss: 1.4866




training:  17%|█▋        | 83/500 [37:18<3:07:35, 26.99s/it][A[A

Epoch: 82 |Training loss: 1.4744




training:  17%|█▋        | 84/500 [37:45<3:07:26, 27.04s/it][A[A

Epoch: 83 |Training loss: 1.4644




training:  17%|█▋        | 85/500 [38:12<3:06:54, 27.02s/it][A[A

Epoch: 84 |Training loss: 1.4927




training:  17%|█▋        | 86/500 [38:39<3:06:21, 27.01s/it][A[A

Epoch: 85 |Training loss: 1.4631




training:  17%|█▋        | 87/500 [39:06<3:05:51, 27.00s/it][A[A

Epoch: 86 |Training loss: 1.4531




training:  18%|█▊        | 88/500 [39:33<3:05:20, 26.99s/it][A[A

Epoch: 87 |Training loss: 1.4508




training:  18%|█▊        | 89/500 [40:00<3:04:48, 26.98s/it][A[A

Epoch: 88 |Training loss: 1.4447




training:  18%|█▊        | 90/500 [40:26<3:04:14, 26.96s/it][A[A

Epoch: 89 |Training loss: 1.4412




training:  18%|█▊        | 91/500 [40:53<3:03:47, 26.96s/it][A[A

Epoch: 90 |Training loss: 1.4415




training:  18%|█▊        | 92/500 [41:20<3:03:15, 26.95s/it][A[A

Epoch: 91 |Training loss: 1.4301




training:  19%|█▊        | 93/500 [41:47<3:02:47, 26.95s/it][A[A

Epoch: 92 |Training loss: 1.4285




training:  19%|█▉        | 94/500 [42:14<3:02:15, 26.93s/it][A[A

Epoch: 93 |Training loss: 1.4294




training:  19%|█▉        | 95/500 [42:41<3:02:02, 26.97s/it][A[A

Epoch: 94 |Training loss: 1.4139




training:  19%|█▉        | 96/500 [43:08<3:01:28, 26.95s/it][A[A

Epoch: 95 |Training loss: 1.4359




training:  19%|█▉        | 97/500 [43:35<3:00:57, 26.94s/it][A[A

Epoch: 96 |Training loss: 1.4227




training:  20%|█▉        | 98/500 [44:02<3:00:37, 26.96s/it][A[A

Epoch: 97 |Training loss: 1.4201




training:  20%|█▉        | 99/500 [44:29<3:00:09, 26.96s/it][A[A

Epoch: 98 |Training loss: 1.4091




training:  20%|██        | 100/500 [44:56<2:59:44, 26.96s/it][A[A

Epoch: 99 |Training loss: 1.4064




training:  20%|██        | 101/500 [45:23<2:59:53, 27.05s/it][A[A

Epoch: 100 |Training loss: 1.4048




training:  20%|██        | 102/500 [45:50<2:59:21, 27.04s/it][A[A

Epoch: 101 |Training loss: 1.3953




training:  21%|██        | 103/500 [46:17<2:58:47, 27.02s/it][A[A

Epoch: 102 |Training loss: 1.3938




training:  21%|██        | 104/500 [46:44<2:58:16, 27.01s/it][A[A

Epoch: 103 |Training loss: 1.3841




training:  21%|██        | 105/500 [47:11<2:57:43, 27.00s/it][A[A

Epoch: 104 |Training loss: 1.3831




training:  21%|██        | 106/500 [47:38<2:57:09, 26.98s/it][A[A

Epoch: 105 |Training loss: 1.3772




training:  21%|██▏       | 107/500 [48:05<2:56:40, 26.97s/it][A[A

Epoch: 106 |Training loss: 1.3713




training:  22%|██▏       | 108/500 [48:32<2:56:13, 26.97s/it][A[A

Epoch: 107 |Training loss: 1.3704




training:  22%|██▏       | 109/500 [48:59<2:56:06, 27.03s/it][A[A

Epoch: 108 |Training loss: 1.3630




training:  22%|██▏       | 110/500 [49:26<2:55:36, 27.02s/it][A[A

Epoch: 109 |Training loss: 1.3584




training:  22%|██▏       | 111/500 [49:53<2:55:03, 27.00s/it][A[A

Epoch: 110 |Training loss: 1.3535




training:  22%|██▏       | 112/500 [50:20<2:54:31, 26.99s/it][A[A

Epoch: 111 |Training loss: 1.3447




training:  23%|██▎       | 113/500 [50:47<2:54:01, 26.98s/it][A[A

Epoch: 112 |Training loss: 1.3511




training:  23%|██▎       | 114/500 [51:14<2:53:29, 26.97s/it][A[A

Epoch: 113 |Training loss: 1.3388




training:  23%|██▎       | 115/500 [51:41<2:52:57, 26.96s/it][A[A

Epoch: 114 |Training loss: 1.3443




training:  23%|██▎       | 116/500 [52:08<2:52:31, 26.96s/it][A[A

Epoch: 115 |Training loss: 1.3366




training:  23%|██▎       | 117/500 [52:35<2:51:59, 26.94s/it][A[A

Epoch: 116 |Training loss: 1.3326




training:  24%|██▎       | 118/500 [53:02<2:51:31, 26.94s/it][A[A

Epoch: 117 |Training loss: 1.3237




training:  24%|██▍       | 119/500 [53:29<2:51:05, 26.94s/it][A[A

Epoch: 118 |Training loss: 1.3201




training:  24%|██▍       | 120/500 [53:56<2:50:36, 26.94s/it][A[A

Epoch: 119 |Training loss: 1.3151




training:  24%|██▍       | 121/500 [54:23<2:50:40, 27.02s/it][A[A

Epoch: 120 |Training loss: 1.3078




training:  24%|██▍       | 122/500 [54:50<2:50:24, 27.05s/it][A[A

Epoch: 121 |Training loss: 1.3095




training:  25%|██▍       | 123/500 [55:17<2:49:47, 27.02s/it][A[A

Epoch: 122 |Training loss: 1.2961




training:  25%|██▍       | 124/500 [55:44<2:49:14, 27.01s/it][A[A

Epoch: 123 |Training loss: 1.2953




training:  25%|██▌       | 125/500 [56:11<2:48:41, 26.99s/it][A[A

Epoch: 124 |Training loss: 1.2914




training:  25%|██▌       | 126/500 [56:38<2:48:10, 26.98s/it][A[A

Epoch: 125 |Training loss: 1.2839




training:  25%|██▌       | 127/500 [57:05<2:47:38, 26.97s/it][A[A

Epoch: 126 |Training loss: 1.2785




training:  26%|██▌       | 128/500 [57:32<2:47:06, 26.95s/it][A[A

Epoch: 127 |Training loss: 1.2733




training:  26%|██▌       | 129/500 [57:59<2:46:39, 26.95s/it][A[A

Epoch: 128 |Training loss: 1.2732




training:  26%|██▌       | 130/500 [58:26<2:46:10, 26.95s/it][A[A

Epoch: 129 |Training loss: 1.2651




training:  26%|██▌       | 131/500 [58:53<2:45:44, 26.95s/it][A[A

Epoch: 130 |Training loss: 1.2603




training:  26%|██▋       | 132/500 [59:20<2:45:20, 26.96s/it][A[A

Epoch: 131 |Training loss: 1.2548




training:  27%|██▋       | 133/500 [59:47<2:44:59, 26.97s/it][A[A

Epoch: 132 |Training loss: 1.2517




training:  27%|██▋       | 134/500 [1:00:14<2:44:46, 27.01s/it][A[A

Epoch: 133 |Training loss: 1.2457




training:  27%|██▋       | 135/500 [1:00:41<2:44:12, 26.99s/it][A[A

Epoch: 134 |Training loss: 1.2431




training:  27%|██▋       | 136/500 [1:01:08<2:43:42, 26.98s/it][A[A

Epoch: 135 |Training loss: 1.2395




training:  27%|██▋       | 137/500 [1:01:34<2:43:07, 26.96s/it][A[A

Epoch: 136 |Training loss: 1.2360




training:  28%|██▊       | 138/500 [1:02:01<2:42:35, 26.95s/it][A[A

Epoch: 137 |Training loss: 1.2295




training:  28%|██▊       | 139/500 [1:02:28<2:42:04, 26.94s/it][A[A

Epoch: 138 |Training loss: 1.2294




training:  28%|██▊       | 140/500 [1:02:55<2:41:34, 26.93s/it][A[A

Epoch: 139 |Training loss: 1.2197




training:  28%|██▊       | 141/500 [1:03:22<2:41:36, 27.01s/it][A[A

Epoch: 140 |Training loss: 1.2214




training:  28%|██▊       | 142/500 [1:03:49<2:41:01, 26.99s/it][A[A

Epoch: 141 |Training loss: 1.2137




training:  29%|██▊       | 143/500 [1:04:16<2:40:27, 26.97s/it][A[A

Epoch: 142 |Training loss: 1.2124




training:  29%|██▉       | 144/500 [1:04:43<2:39:55, 26.95s/it][A[A

Epoch: 143 |Training loss: 1.2084




training:  29%|██▉       | 145/500 [1:05:10<2:39:40, 26.99s/it][A[A

Epoch: 144 |Training loss: 1.2039




training:  29%|██▉       | 146/500 [1:05:37<2:39:09, 26.98s/it][A[A

Epoch: 145 |Training loss: 1.1994




training:  29%|██▉       | 147/500 [1:06:04<2:38:37, 26.96s/it][A[A

Epoch: 146 |Training loss: 1.1935




training:  30%|██▉       | 148/500 [1:06:31<2:38:11, 26.96s/it][A[A

Epoch: 147 |Training loss: 1.1909




training:  30%|██▉       | 149/500 [1:06:58<2:37:47, 26.97s/it][A[A

Epoch: 148 |Training loss: 1.1839




training:  30%|███       | 150/500 [1:07:25<2:37:18, 26.97s/it][A[A

Epoch: 149 |Training loss: 1.1848




training:  30%|███       | 151/500 [1:07:52<2:36:52, 26.97s/it][A[A

Epoch: 150 |Training loss: 1.1839




training:  30%|███       | 152/500 [1:08:19<2:36:24, 26.97s/it][A[A

Epoch: 151 |Training loss: 1.1752




training:  31%|███       | 153/500 [1:08:46<2:36:00, 26.98s/it][A[A

Epoch: 152 |Training loss: 1.1760




training:  31%|███       | 154/500 [1:09:13<2:35:34, 26.98s/it][A[A

Epoch: 153 |Training loss: 1.1666




training:  31%|███       | 155/500 [1:09:40<2:35:08, 26.98s/it][A[A

Epoch: 154 |Training loss: 1.1764




training:  31%|███       | 156/500 [1:10:07<2:34:39, 26.97s/it][A[A

Epoch: 155 |Training loss: 1.1628




training:  31%|███▏      | 157/500 [1:10:34<2:34:09, 26.97s/it][A[A

Epoch: 156 |Training loss: 1.1794




training:  32%|███▏      | 158/500 [1:11:01<2:33:52, 27.00s/it][A[A

Epoch: 157 |Training loss: 1.1722




training:  32%|███▏      | 159/500 [1:11:28<2:33:18, 26.98s/it][A[A

Epoch: 158 |Training loss: 1.1742




training:  32%|███▏      | 160/500 [1:11:55<2:32:44, 26.95s/it][A[A

Epoch: 159 |Training loss: 1.1783




training:  32%|███▏      | 161/500 [1:12:22<2:32:40, 27.02s/it][A[A

Epoch: 160 |Training loss: 1.1684




training:  32%|███▏      | 162/500 [1:12:49<2:32:03, 26.99s/it][A[A

Epoch: 161 |Training loss: 1.1620




training:  33%|███▎      | 163/500 [1:13:16<2:31:31, 26.98s/it][A[A

Epoch: 162 |Training loss: 1.1594




training:  33%|███▎      | 164/500 [1:13:43<2:31:01, 26.97s/it][A[A

Epoch: 163 |Training loss: 1.1515




training:  33%|███▎      | 165/500 [1:14:10<2:30:36, 26.97s/it][A[A

Epoch: 164 |Training loss: 1.1520




training:  33%|███▎      | 166/500 [1:14:37<2:30:09, 26.98s/it][A[A

Epoch: 165 |Training loss: 1.1455




training:  33%|███▎      | 167/500 [1:15:04<2:29:45, 26.98s/it][A[A

Epoch: 166 |Training loss: 1.1408




training:  34%|███▎      | 168/500 [1:15:31<2:29:17, 26.98s/it][A[A

Epoch: 167 |Training loss: 1.1380




training:  34%|███▍      | 169/500 [1:15:58<2:28:50, 26.98s/it][A[A

Epoch: 168 |Training loss: 1.1337




training:  34%|███▍      | 170/500 [1:16:25<2:28:22, 26.98s/it][A[A

Epoch: 169 |Training loss: 1.1364




training:  34%|███▍      | 171/500 [1:16:52<2:28:08, 27.02s/it][A[A

Epoch: 170 |Training loss: 1.1281




training:  34%|███▍      | 172/500 [1:17:19<2:27:35, 27.00s/it][A[A

Epoch: 171 |Training loss: 1.1240




training:  35%|███▍      | 173/500 [1:17:46<2:27:05, 26.99s/it][A[A

Epoch: 172 |Training loss: 1.1217




training:  35%|███▍      | 174/500 [1:18:13<2:26:35, 26.98s/it][A[A

Epoch: 173 |Training loss: 1.1203




training:  35%|███▌      | 175/500 [1:18:40<2:26:09, 26.98s/it][A[A

Epoch: 174 |Training loss: 1.1137




training:  35%|███▌      | 176/500 [1:19:07<2:25:44, 26.99s/it][A[A

Epoch: 175 |Training loss: 1.1125




training:  35%|███▌      | 177/500 [1:19:34<2:25:15, 26.98s/it][A[A

Epoch: 176 |Training loss: 1.1176




training:  36%|███▌      | 178/500 [1:20:01<2:24:45, 26.97s/it][A[A

Epoch: 177 |Training loss: 1.1061




training:  36%|███▌      | 179/500 [1:20:27<2:24:18, 26.97s/it][A[A

Epoch: 178 |Training loss: 1.1172




training:  36%|███▌      | 180/500 [1:20:54<2:23:47, 26.96s/it][A[A

Epoch: 179 |Training loss: 1.1069




training:  36%|███▌      | 181/500 [1:21:22<2:23:42, 27.03s/it][A[A

Epoch: 180 |Training loss: 1.1131




training:  36%|███▋      | 182/500 [1:21:49<2:23:23, 27.05s/it][A[A

Epoch: 181 |Training loss: 1.1020




training:  37%|███▋      | 183/500 [1:22:16<2:22:46, 27.02s/it][A[A

Epoch: 182 |Training loss: 1.1155




training:  37%|███▋      | 184/500 [1:22:43<2:22:16, 27.01s/it][A[A

Epoch: 183 |Training loss: 1.1081




training:  37%|███▋      | 185/500 [1:23:10<2:21:44, 27.00s/it][A[A

Epoch: 184 |Training loss: 1.1023




training:  37%|███▋      | 186/500 [1:23:37<2:21:15, 26.99s/it][A[A

Epoch: 185 |Training loss: 1.1013




training:  37%|███▋      | 187/500 [1:24:04<2:20:48, 26.99s/it][A[A

Epoch: 186 |Training loss: 1.0925




training:  38%|███▊      | 188/500 [1:24:31<2:20:22, 26.99s/it][A[A

Epoch: 187 |Training loss: 1.0945




training:  38%|███▊      | 189/500 [1:24:58<2:19:56, 27.00s/it][A[A

Epoch: 188 |Training loss: 1.0914




training:  38%|███▊      | 190/500 [1:25:25<2:19:27, 26.99s/it][A[A

Epoch: 189 |Training loss: 1.0870




training:  38%|███▊      | 191/500 [1:25:52<2:18:59, 26.99s/it][A[A

Epoch: 190 |Training loss: 1.0815




training:  38%|███▊      | 192/500 [1:26:19<2:18:30, 26.98s/it][A[A

Epoch: 191 |Training loss: 1.0780




training:  39%|███▊      | 193/500 [1:26:45<2:18:00, 26.97s/it][A[A

Epoch: 192 |Training loss: 1.0754




training:  39%|███▉      | 194/500 [1:27:13<2:17:44, 27.01s/it][A[A

Epoch: 193 |Training loss: 1.0747




training:  39%|███▉      | 195/500 [1:27:40<2:17:13, 26.99s/it][A[A

Epoch: 194 |Training loss: 1.0747




training:  39%|███▉      | 196/500 [1:28:07<2:16:44, 26.99s/it][A[A

Epoch: 195 |Training loss: 1.0703




training:  39%|███▉      | 197/500 [1:28:34<2:16:17, 26.99s/it][A[A

Epoch: 196 |Training loss: 1.0633




training:  40%|███▉      | 198/500 [1:29:00<2:15:50, 26.99s/it][A[A

Epoch: 197 |Training loss: 1.0702




training:  40%|███▉      | 199/500 [1:29:28<2:15:26, 27.00s/it][A[A

Epoch: 198 |Training loss: 1.0633




training:  40%|████      | 200/500 [1:29:55<2:14:58, 27.00s/it][A[A

Epoch: 199 |Training loss: 1.0623




training:  40%|████      | 201/500 [1:30:22<2:14:52, 27.07s/it][A[A

Epoch: 200 |Training loss: 1.0551




training:  40%|████      | 202/500 [1:30:49<2:14:15, 27.03s/it][A[A

Epoch: 201 |Training loss: 1.0567




training:  41%|████      | 203/500 [1:31:16<2:13:38, 27.00s/it][A[A

Epoch: 202 |Training loss: 1.0522




training:  41%|████      | 204/500 [1:31:42<2:13:02, 26.97s/it][A[A

Epoch: 203 |Training loss: 1.0476




training:  41%|████      | 205/500 [1:32:09<2:12:29, 26.95s/it][A[A

Epoch: 204 |Training loss: 1.0466




training:  41%|████      | 206/500 [1:32:36<2:12:01, 26.94s/it][A[A

Epoch: 205 |Training loss: 1.0425




training:  41%|████▏     | 207/500 [1:33:03<2:11:48, 26.99s/it][A[A

Epoch: 206 |Training loss: 1.0453




training:  42%|████▏     | 208/500 [1:33:30<2:11:20, 26.99s/it][A[A

Epoch: 207 |Training loss: 1.0446


KeyboardInterrupt: ignored

**Music generation**

In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load(output_dir+weights)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# Generate network input again
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
network_input = np.reshape(network_input, (n_patterns, sequence_length))


The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2


In [21]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = torch.from_numpy(network_input[start]).cuda()

prediction_output = model.generate(pattern, 500)


In [22]:
result_sample=[]

for i in range(500):
  print(i)
  result = int_to_note[prediction_output[i].item()]
  print('\r', 'Predicted ', i, " ",result, end='')
  result_sample.append(result)

prediction_output=result_sample

0
 Predicted  0   61
 Predicted  1   4.62
 Predicted  2   6.113
 Predicted  3   64
 Predicted  4   6.115
 Predicted  5   A46
 Predicted  6   4.67
 Predicted  7   F48
 Predicted  8   69
 Predicted  9   610
 Predicted  10   5.7.9.011
 Predicted  11   2.3.7.1012
 Predicted  12   D513
 Predicted  13   C514
 Predicted  14   5.7.9.015
 Predicted  15   C516
 Predicted  16   4.617
 Predicted  17   B-118
 Predicted  18   10.2.519
 Predicted  19   C520
 Predicted  20   6.1121
 Predicted  21   622
 Predicted  22   F223
 Predicted  23   6.1124
 Predicted  24   4.625
 Predicted  25   B-226
 Predicted  26   B-127
 Predicted  27   A428
 Predicted  28   629
 Predicted  29   C530
 Predicted  30   E-331
 Predicted  31   F232
 Predicted  32   4.633
 Predicted  33   534
 Predicted  34   5.1035
 Predicted  35   4.636
 Predicted  36   637
 Predicted  37   4.638
 Predicted  38   4.639
 Predicted  39   F240
 Predicted  40   4.641
 Predicted  41   B-242
 Predicted  42

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [23]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'