<a href="https://colab.research.google.com/github/GiovanniSorice/Deep_Music_Generator/blob/main/notebooks/Music_Generation_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Music Generator 



In this notebook, we use an Transformer to generate some music.


**This notebook was inspired (and part of the code comes from it) by [Music_Generation_LSTM](https://colab.research.google.com/drive/19TQqekOlnOSW36VCL8CPVEQKBBukmaEQ#scrollTo=DDOBVWULXfpz)**




**Load dependencies**

In [1]:
pip install compressive_transformer_pytorch

Collecting compressive_transformer_pytorch
  Downloading https://files.pythonhosted.org/packages/30/39/b8caf2671abcb8615977c08766aa9f450addd6949f57c7dda87224e844b5/compressive_transformer_pytorch-0.3.20-py3-none-any.whl
Collecting mogrifier
  Downloading https://files.pythonhosted.org/packages/77/01/62a55d0f8048e788fce435f2ade6478f443e4e53ed9b89b55ba0fc42c198/mogrifier-0.0.3-py3-none-any.whl
Installing collected packages: mogrifier, compressive-transformer-pytorch
Successfully installed compressive-transformer-pytorch-0.3.20 mogrifier-0.0.3


In [2]:
import torch
import tqdm
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from compressive_transformer_pytorch import CompressiveTransformer
from compressive_transformer_pytorch import AutoregressiveWrapper
from torchsummary import summary
from torch.utils.data import DataLoader, Dataset
from tensorflow.keras import utils
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt
import glob
import pickle
from music21 import converter, instrument, stream, note, chord
import math
import shutil

In [3]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

**Set hyperparameters**

In [4]:
# output directory name:
output_dir = '/content/drive/My Drive/ISPR_project/Transformer/'
current_path ='/content/drive/My Drive/ISPR_project/'
# training:
epochs = 2000
batch_size = 64
learning_rate=1e-2
# vector-space embedding: 
n_dim = 64 
sequence_length = 64


VALIDATE_EVERY  = 5

GENERATE_EVERY  = 500



**Save model function**

In [5]:
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, output_dir+filename)
    if is_best:
        shutil.copyfile(output_dir+filename, output_dir+'model_best.pth.tar')

**Google drive configuration (only Colab)**

In [6]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Mounted at /content/drive


**Load data** \\
Original MIDI files
 I have obtained  **MIDI files** from [The Lakh MIDI Dataset v0.1](https://colinraffel.com/projects/lmd/). 

## Processing data

Let's process the files, and load them into **music21**

In [7]:
file = current_path+"midi_songs/small_dataset/Metal/Metallica/Am I Evil?.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.note.Note E> 0.0
<music21.chord.Chord C2 C#3> 0.0
<music21.note.Note G#> 2.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.note.Note D> 3.0
<music21.chord.Chord C#3 C2> 3.0
<music21.chord.Chord B3 E3 E4> 3.5


I will process all MIDI files obtaining data from each note of chord.

- If I process a **note**, I will store in the list a string representing the pitch (the note name) and the octave.

- If I process a **chord** (Remember that chords are set of notes that are played at the same time) I will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **I are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, I will consider them.

I are creating a big list with all the elements of all the compositions.

In [8]:
notes_for_instruments = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/small_dataset/*/*/*.mid")):
      midi = converter.parse(file)
      print('Parsing file ', i, " ", file)
      notes_to_parse = None
      try:  # file has instrument parts
          s2 = instrument.partitionByInstrument(midi)
          notes_to_parse = s2.recurse()
      except:  # file has notes in a flat structure
          notes_to_parse = midi.flat.notes
      notes_instrument = []
      for element in notes_to_parse:
          if isinstance(element, note.Note):
              notes_instrument.append(str(element.pitch))
          elif isinstance(element, chord.Chord):
              notes_instrument.append('.'.join(str(n) for n in element.normalOrder))
      notes_for_instruments.append(notes_instrument)
with open(current_path + 'SMALL_notes_for_instruments', 'wb') as filepath:
    pickle.dump(notes_for_instruments, filepath)


Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Nessun rimpianto.1.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Grazie mille.1.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).1.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.1.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/I'll Be Over You.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_proje

In [9]:
notes_for_instruments_validation = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/test/*.mid")):
      midi = converter.parse(file)
      print('Parsing file ', i, " ", file)
      notes_to_parse = None
      try:  # file has instrument parts
          s2 = instrument.partitionByInstrument(midi)
          notes_to_parse = s2.recurse()
      except:  # file has notes in a flat structure
          notes_to_parse = midi.flat.notes
      notes_instrument = []
      for element in notes_to_parse:
          if isinstance(element, note.Note):
              notes_instrument.append(str(element.pitch))
          elif isinstance(element, chord.Chord):
              notes_instrument.append('.'.join(str(n) for n in element.normalOrder))
      notes_for_instruments_validation.append(notes_instrument)
with open(current_path + 'SMALL_VALIDATION_notes_for_instruments', 'wb') as filepath:
    pickle.dump(notes_for_instruments_validation, filepath)


Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/test/I Disappear.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/test/Hit the Lights.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/test/Fight Fire With Fire.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/test/Smile.mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/test/Another One Bites The Dust.2.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/test/Bicycle Race.1.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/test/Se tornerai.1.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_project/midi_songs/test/I'll Be Over You.mid


In [21]:
with open(current_path + 'SMALL_notes_for_instruments', 'rb') as f:
    notes_for_instruments = pickle.load(f)

In [25]:
with open(current_path + 'SMALL_VALIDATION_notes_for_instruments', 'rb') as f:
    notes_for_instruments_validation = pickle.load(f)

I obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [10]:
# Count different possible outputs
n_vocab = (len(set(item for notes_for_instrument in notes_for_instruments for item in notes_for_instrument)))
n_vocab

476

In [12]:
# Count different possible outputs valifation
print(len(set(item for notes_for_instrument in notes_for_instruments_validation for item in notes_for_instrument)))

287


**Preprocess data** \\
Now, there is some **data processing** that I have to do:

- I will map each pitch or chord to an integer
- I will create pairs of input sequences and its corresponding output note

I can try different **sequence_length** to obtain different results. In this first version, I will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 


In [13]:
# get all pitch names
pitchnames_training = set(item for notes_for_instrument in notes_for_instruments for item in notes_for_instrument)
pitchnames_validation = set(item for notes_for_instrument in notes_for_instruments_validation for item in notes_for_instrument)
pitchnames = sorted(pitchnames_training.union(pitchnames_validation))

In [14]:
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
for notes in notes_for_instruments:
    if len(notes) - sequence_length<=0:
        print("canzone troppo corta")
    # create input sequences and the corresponding outputs
    for i in range(0, len(notes) - sequence_length, 1):
      # Map pitches of sequence_in to integers
      network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
# reshape the input into a format compatible with Transormer layers
network_input = np.reshape(network_input, (n_patterns, sequence_length))

In [15]:
# create a dictionary to map pitches to integers
note_to_int_validation = dict((notes_validation, number) for number, notes_validation in enumerate(pitchnames))
network_input_validation = []
network_output_validation = []
for notes_validation in notes_for_instruments:
    if len(notes) - sequence_length<=0:
        print("canzone troppo corta")
    # create input sequences and the corresponding outputs
    for i in range(0, len(notes_validation) - sequence_length, 1):
      # Map pitches of sequence_in to integers
      network_input_validation.append([note_to_int_validation[char] for char in notes_validation[i:i + sequence_length]])
n_patterns = len(network_input_validation)
# reshape the input into a format compatible with Transormer layers
network_input_validation = np.reshape(network_input_validation, (n_patterns, sequence_length))

Let's see the new metwork_input size

In [19]:
network_input.shape

(132668, 64)

**Design neural network architecture** 

In [20]:
def create_network(sequence_length, n_vocab):
    """ create the structure of the neural network """
    model = CompressiveTransformer(
    num_tokens = n_vocab,
    dim = sequence_length,
    depth = 6,
    seq_len = sequence_length,
    mem_len = sequence_length,
    cmem_len = 256,
    cmem_ratio = 4,
    memory_layers = [5,6]
    )

    model = AutoregressiveWrapper(model)
    model.cuda()
    return model

In [21]:
model = create_network(sequence_length,n_vocab)

print(model)


AutoregressiveWrapper(
  (net): CompressiveTransformer(
    (token_emb): Embedding(476, 64)
    (to_model_dim): Identity()
    (to_logits): Sequential(
      (0): Identity()
      (1): Linear(in_features=64, out_features=476, bias=True)
    )
    (attn_layers): ModuleList(
      (0): GRUGating(
        (fn): PreNorm(
          (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          (fn): SelfAttention(
            (compress_mem_fn): ConvCompress(
              (conv): Conv1d(64, 64, kernel_size=(4,), stride=(4,))
            )
            (to_q): Linear(in_features=64, out_features=64, bias=False)
            (to_kv): Linear(in_features=64, out_features=128, bias=False)
            (to_out): Linear(in_features=64, out_features=64, bias=True)
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (dropout): Dropout(p=0.0, inplace=False)
            (reconstruction_attn_dropout): Dropout(p=0.0, inplace=False)
          )
        )
        (gru): GRUCell(64, 

In [22]:
def cycle(loader):
    while True:
        for data in loader:
          yield data


data_train = torch.from_numpy(network_input).cuda()
train_loader = torch.utils.data.DataLoader(data_train, batch_size=32) 
cycle_train_loader  = cycle(DataLoader(data_train, batch_size = data_train.shape[0]))
num_batches=math.ceil(data_train.shape[0]/batch_size) # Total number of batches

In [23]:
#Validation
data_validation = torch.from_numpy(network_input_validation).cuda()
validation_loader = torch.utils.data.DataLoader(data_validation, batch_size=32) 
cycle_validation_loader  = cycle(DataLoader(data_validation, batch_size = data_validation.shape[0]))
num_batches_val=math.ceil(data_validation.shape[0]/batch_size) # Total number of batches

In [24]:
# optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model.

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.


In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load("/content/drive/MyDrive/ISPR_project/Transformer/model_32_408_epoche_best.pth.tar")
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# training

for i in tqdm.tqdm(range(epochs), mininterval=20., desc='training'):
    model.train()
    tot_loss = 0.0
    is_best=0
    best_loss_value=n_vocab
    avg_loss_val=0
    for mlm_loss, aux_loss, is_last in model(next(cycle_train_loader), max_batch_size = batch_size, return_loss = True):
        loss = mlm_loss + aux_loss

        loss.backward()

        tot_loss+=loss;

        if is_last:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
            optimizer.step()
            optimizer.zero_grad()
    
    if i % VALIDATE_EVERY == 0 or i==epochs-1:
      model.eval()
      with torch.no_grad():
          for loss_val, aux_loss_val, is_last_val in model(next(cycle_validation_loader), max_batch_size = batch_size, return_loss = True):
            avg_loss_val+=loss_val/num_batches_val;

            if is_last_val:
              print(f'\n validation loss: {avg_loss_val.item():.4f}')


    avg_loss=tot_loss/num_batches

    if i%5==0 or i==epochs-1:
      if best_loss_value>avg_loss:
        best_loss_value=avg_loss;
        is_best=1

      save_checkpoint({
      'epoch': i,
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict' : optimizer.state_dict(),
      'loss':avg_loss.item(),
     }, is_best, 'Tran_64_Checkpoint'+str(i)+'_'+"{:.4f}".format(avg_loss.item())+'.pth.tar')
      is_best=0
    print(f'\n Epoch: {i} |Training loss: {avg_loss.item():.4f}')
print('\nTraining complete.')







training:   0%|          | 0/2000 [00:00<?, ?it/s][A


 validation loss: 5.5796



training:   0%|          | 1/2000 [01:38<54:38:56, 98.42s/it][A


 Epoch: 0 |Training loss: 6.0865



training:   0%|          | 2/2000 [02:51<50:25:24, 90.85s/it][A


 Epoch: 1 |Training loss: 5.5796



training:   0%|          | 3/2000 [04:05<47:31:37, 85.68s/it][A


 Epoch: 2 |Training loss: 5.0980



training:   0%|          | 4/2000 [05:18<45:29:01, 82.03s/it][A


 Epoch: 3 |Training loss: 4.7692



training:   0%|          | 5/2000 [06:32<44:02:52, 79.49s/it][A


 Epoch: 4 |Training loss: 4.6566

 validation loss: 4.6055



training:   0%|          | 6/2000 [08:11<47:15:37, 85.32s/it][A


 Epoch: 5 |Training loss: 4.6178



training:   0%|          | 7/2000 [09:24<45:15:02, 81.74s/it][A


 Epoch: 6 |Training loss: 4.6055



training:   0%|          | 8/2000 [10:37<43:49:11, 79.19s/it][A


 Epoch: 7 |Training loss: 4.5979



training:   0%|          | 9/2000 [11:51<42:47:55, 77.39s/it][A


 Epoch: 8 |Training loss: 4.5900



training:   0%|          | 10/2000 [13:04<42:02:44, 76.06s/it][A


 Epoch: 9 |Training loss: 4.5896

 validation loss: 4.5745



training:   1%|          | 11/2000 [14:43<45:50:31, 82.97s/it][A


 Epoch: 10 |Training loss: 4.5798



training:   1%|          | 12/2000 [15:56<44:11:59, 80.04s/it][A


 Epoch: 11 |Training loss: 4.5745



training:   1%|          | 13/2000 [17:09<43:02:27, 77.98s/it][A


 Epoch: 12 |Training loss: 4.5688



training:   1%|          | 14/2000 [18:22<42:13:29, 76.54s/it][A


 Epoch: 13 |Training loss: 4.5715



training:   1%|          | 15/2000 [19:35<41:36:39, 75.47s/it][A


 Epoch: 14 |Training loss: 4.5590

 validation loss: 4.5335



training:   1%|          | 16/2000 [21:13<45:22:42, 82.34s/it][A


 Epoch: 15 |Training loss: 4.5519



training:   1%|          | 17/2000 [22:27<43:52:03, 79.64s/it][A


 Epoch: 16 |Training loss: 4.5335



training:   1%|          | 18/2000 [23:40<42:43:32, 77.60s/it][A


 Epoch: 17 |Training loss: 4.5212



training:   1%|          | 19/2000 [24:53<41:59:22, 76.31s/it][A


 Epoch: 18 |Training loss: 4.5120



training:   1%|          | 20/2000 [26:06<41:23:39, 75.26s/it][A


 Epoch: 19 |Training loss: 4.5009

 validation loss: 4.4729



training:   1%|          | 21/2000 [27:44<45:12:52, 82.25s/it][A


 Epoch: 20 |Training loss: 4.4914



training:   1%|          | 22/2000 [28:57<43:40:16, 79.48s/it][A


 Epoch: 21 |Training loss: 4.4729



training:   1%|          | 23/2000 [30:10<42:36:02, 77.57s/it][A


 Epoch: 22 |Training loss: 4.4623



training:   1%|          | 24/2000 [31:24<41:51:30, 76.26s/it][A


 Epoch: 23 |Training loss: 4.4418



training:   1%|▏         | 25/2000 [32:36<41:15:24, 75.20s/it][A


 Epoch: 24 |Training loss: 4.4343

 validation loss: 4.3969



training:   1%|▏         | 26/2000 [34:15<45:03:03, 82.16s/it][A


 Epoch: 25 |Training loss: 4.4157



training:   1%|▏         | 27/2000 [35:27<43:27:30, 79.30s/it][A


 Epoch: 26 |Training loss: 4.3969



training:   1%|▏         | 28/2000 [36:41<42:26:29, 77.48s/it][A


 Epoch: 27 |Training loss: 4.3781



training:   1%|▏         | 29/2000 [37:54<41:40:34, 76.12s/it][A


 Epoch: 28 |Training loss: 4.3621



training:   2%|▏         | 30/2000 [39:07<41:09:14, 75.21s/it][A


 Epoch: 29 |Training loss: 4.3464

 validation loss: 4.3107



training:   2%|▏         | 31/2000 [40:46<45:00:49, 82.30s/it][A


 Epoch: 30 |Training loss: 4.3281



training:   2%|▏         | 32/2000 [41:59<43:30:16, 79.58s/it][A


 Epoch: 31 |Training loss: 4.3107



training:   2%|▏         | 33/2000 [43:12<42:22:05, 77.54s/it][A


 Epoch: 32 |Training loss: 4.2922



training:   2%|▏         | 34/2000 [44:25<41:36:47, 76.20s/it][A


 Epoch: 33 |Training loss: 4.2756



training:   2%|▏         | 35/2000 [45:38<41:05:27, 75.28s/it][A


 Epoch: 34 |Training loss: 4.2580

 validation loss: 4.2250



training:   2%|▏         | 36/2000 [47:16<44:51:33, 82.23s/it][A


 Epoch: 35 |Training loss: 4.2421



training:   2%|▏         | 37/2000 [48:29<43:19:25, 79.45s/it][A


 Epoch: 36 |Training loss: 4.2250



training:   2%|▏         | 38/2000 [49:42<42:15:13, 77.53s/it][A


 Epoch: 37 |Training loss: 4.2127



training:   2%|▏         | 39/2000 [50:55<41:30:49, 76.21s/it][A


 Epoch: 38 |Training loss: 4.1978



training:   2%|▏         | 40/2000 [52:08<40:58:00, 75.25s/it][A


 Epoch: 39 |Training loss: 4.1828

 validation loss: 4.1490



training:   2%|▏         | 41/2000 [53:47<44:47:24, 82.31s/it][A


 Epoch: 40 |Training loss: 4.1641



training:   2%|▏         | 42/2000 [55:00<43:14:52, 79.52s/it][A


 Epoch: 41 |Training loss: 4.1490



training:   2%|▏         | 43/2000 [56:13<42:12:33, 77.65s/it][A


 Epoch: 42 |Training loss: 4.1278



training:   2%|▏         | 44/2000 [57:26<41:25:54, 76.25s/it][A


 Epoch: 43 |Training loss: 4.1175



training:   2%|▏         | 45/2000 [58:39<40:51:45, 75.25s/it][A


 Epoch: 44 |Training loss: 4.0952

 validation loss: 4.0585



training:   2%|▏         | 46/2000 [1:00:18<44:37:12, 82.21s/it][A


 Epoch: 45 |Training loss: 4.0786



training:   2%|▏         | 47/2000 [1:01:31<43:08:46, 79.53s/it][A


 Epoch: 46 |Training loss: 4.0585



training:   2%|▏         | 48/2000 [1:02:44<42:00:57, 77.49s/it][A


 Epoch: 47 |Training loss: 4.0434



training:   2%|▏         | 49/2000 [1:03:57<41:19:47, 76.26s/it][A


 Epoch: 48 |Training loss: 4.0263



training:   2%|▎         | 50/2000 [1:05:10<40:45:54, 75.26s/it][A


 Epoch: 49 |Training loss: 4.0164

 validation loss: 3.9811



training:   3%|▎         | 51/2000 [1:06:49<44:32:21, 82.27s/it][A


 Epoch: 50 |Training loss: 3.9935



training:   3%|▎         | 52/2000 [1:08:02<43:00:45, 79.49s/it][A


 Epoch: 51 |Training loss: 3.9811



training:   3%|▎         | 53/2000 [1:09:14<41:52:57, 77.44s/it][A


 Epoch: 52 |Training loss: 3.9651



training:   3%|▎         | 54/2000 [1:10:27<41:07:47, 76.09s/it][A


 Epoch: 53 |Training loss: 3.9440



training:   3%|▎         | 55/2000 [1:11:40<40:36:38, 75.17s/it][A


 Epoch: 54 |Training loss: 3.9238

 validation loss: 3.8946



training:   3%|▎         | 56/2000 [1:13:19<44:22:40, 82.18s/it][A


 Epoch: 55 |Training loss: 3.9109



training:   3%|▎         | 57/2000 [1:14:32<42:48:37, 79.32s/it][A


 Epoch: 56 |Training loss: 3.8946



training:   3%|▎         | 58/2000 [1:15:45<41:45:44, 77.42s/it][A


 Epoch: 57 |Training loss: 3.8808



training:   3%|▎         | 59/2000 [1:16:57<40:58:50, 76.01s/it][A


 Epoch: 58 |Training loss: 3.8654



training:   3%|▎         | 60/2000 [1:18:10<40:28:34, 75.11s/it][A


 Epoch: 59 |Training loss: 3.8456

 validation loss: 3.8255



training:   3%|▎         | 61/2000 [1:19:49<44:14:10, 82.13s/it][A


 Epoch: 60 |Training loss: 3.8395



training:   3%|▎         | 62/2000 [1:21:02<42:44:00, 79.38s/it][A


 Epoch: 61 |Training loss: 3.8255



training:   3%|▎         | 63/2000 [1:22:15<41:39:09, 77.41s/it][A


 Epoch: 62 |Training loss: 3.8060



training:   3%|▎         | 64/2000 [1:23:28<40:56:32, 76.13s/it][A


 Epoch: 63 |Training loss: 3.7988



training:   3%|▎         | 65/2000 [1:24:41<40:25:03, 75.20s/it][A


 Epoch: 64 |Training loss: 3.7798

 validation loss: 3.7570



training:   3%|▎         | 66/2000 [1:26:19<44:07:10, 82.13s/it][A


 Epoch: 65 |Training loss: 3.7726



training:   3%|▎         | 67/2000 [1:27:32<42:39:09, 79.44s/it][A


 Epoch: 66 |Training loss: 3.7570



training:   3%|▎         | 68/2000 [1:28:45<41:36:18, 77.53s/it][A


 Epoch: 67 |Training loss: 3.7412



training:   3%|▎         | 69/2000 [1:29:58<40:51:41, 76.18s/it][A


 Epoch: 68 |Training loss: 3.7279



training:   4%|▎         | 70/2000 [1:31:11<40:19:56, 75.23s/it][A


 Epoch: 69 |Training loss: 3.7183

 validation loss: 3.6989



training:   4%|▎         | 71/2000 [1:32:50<44:03:24, 82.22s/it][A


 Epoch: 70 |Training loss: 3.7043



training:   4%|▎         | 72/2000 [1:34:03<42:35:04, 79.51s/it][A


 Epoch: 71 |Training loss: 3.6989



training:   4%|▎         | 73/2000 [1:35:16<41:32:13, 77.60s/it][A


 Epoch: 72 |Training loss: 3.6812



training:   4%|▎         | 74/2000 [1:36:29<40:46:59, 76.23s/it][A


 Epoch: 73 |Training loss: 3.6703



training:   4%|▍         | 75/2000 [1:37:42<40:16:33, 75.32s/it][A


 Epoch: 74 |Training loss: 3.6636

 validation loss: 3.6366



training:   4%|▍         | 76/2000 [1:39:21<43:55:53, 82.20s/it][A


 Epoch: 75 |Training loss: 3.6495



training:   4%|▍         | 77/2000 [1:40:34<42:29:32, 79.55s/it][A


 Epoch: 76 |Training loss: 3.6366



training:   4%|▍         | 78/2000 [1:41:47<41:23:54, 77.54s/it][A


 Epoch: 77 |Training loss: 3.6341



training:   4%|▍         | 79/2000 [1:43:00<40:40:19, 76.22s/it][A


 Epoch: 78 |Training loss: 3.6213



training:   4%|▍         | 80/2000 [1:44:13<40:06:49, 75.21s/it][A


 Epoch: 79 |Training loss: 3.6089

 validation loss: 3.5939



training:   4%|▍         | 81/2000 [1:45:52<43:51:00, 82.26s/it][A


 Epoch: 80 |Training loss: 3.6036



training:   4%|▍         | 82/2000 [1:47:05<42:24:15, 79.59s/it][A


 Epoch: 81 |Training loss: 3.5939



training:   4%|▍         | 83/2000 [1:48:18<41:21:23, 77.67s/it][A


 Epoch: 82 |Training loss: 3.5865



training:   4%|▍         | 84/2000 [1:49:31<40:38:39, 76.37s/it][A


 Epoch: 83 |Training loss: 3.5711



training:   4%|▍         | 85/2000 [1:50:44<40:05:00, 75.35s/it][A


 Epoch: 84 |Training loss: 3.5617

 validation loss: 3.5571



training:   4%|▍         | 86/2000 [1:52:23<43:45:56, 82.32s/it][A


 Epoch: 85 |Training loss: 3.5923



training:   4%|▍         | 87/2000 [1:53:36<42:12:57, 79.44s/it][A


 Epoch: 86 |Training loss: 3.5571



training:   4%|▍         | 88/2000 [1:54:49<41:11:20, 77.55s/it][A


 Epoch: 87 |Training loss: 3.5418



training:   4%|▍         | 89/2000 [1:56:02<40:26:47, 76.19s/it][A


 Epoch: 88 |Training loss: 3.5453



training:   4%|▍         | 90/2000 [1:57:15<39:59:38, 75.38s/it][A


 Epoch: 89 |Training loss: 3.5202

 validation loss: 3.4996



training:   5%|▍         | 91/2000 [1:58:54<43:41:04, 82.38s/it][A


 Epoch: 90 |Training loss: 3.5198



training:   5%|▍         | 92/2000 [2:00:07<42:10:45, 79.58s/it][A


 Epoch: 91 |Training loss: 3.4996



training:   5%|▍         | 93/2000 [2:01:20<41:04:06, 77.53s/it][A


 Epoch: 92 |Training loss: 3.4994



training:   5%|▍         | 94/2000 [2:02:33<40:19:57, 76.18s/it][A


 Epoch: 93 |Training loss: 3.4836



training:   5%|▍         | 95/2000 [2:03:46<39:48:49, 75.24s/it][A


 Epoch: 94 |Training loss: 3.4840

 validation loss: 3.4706



training:   5%|▍         | 96/2000 [2:05:24<43:28:16, 82.19s/it][A


 Epoch: 95 |Training loss: 3.4689



training:   5%|▍         | 97/2000 [2:06:38<42:01:55, 79.51s/it][A


 Epoch: 96 |Training loss: 3.4706



training:   5%|▍         | 98/2000 [2:07:51<41:00:29, 77.62s/it][A


 Epoch: 97 |Training loss: 3.4520



training:   5%|▍         | 99/2000 [2:09:04<40:16:13, 76.26s/it][A


 Epoch: 98 |Training loss: 3.4467



training:   5%|▌         | 100/2000 [2:10:17<39:44:19, 75.29s/it][A


 Epoch: 99 |Training loss: 3.4315

 validation loss: 3.4118



training:   5%|▌         | 101/2000 [2:11:55<43:22:59, 82.24s/it][A


 Epoch: 100 |Training loss: 3.4256



training:   5%|▌         | 102/2000 [2:13:09<41:57:11, 79.57s/it][A


 Epoch: 101 |Training loss: 3.4118



training:   5%|▌         | 103/2000 [2:14:22<40:57:12, 77.72s/it][A


 Epoch: 102 |Training loss: 3.4041



training:   5%|▌         | 104/2000 [2:15:35<40:11:25, 76.31s/it][A


 Epoch: 103 |Training loss: 3.3918



training:   5%|▌         | 105/2000 [2:16:48<39:37:19, 75.27s/it][A


 Epoch: 104 |Training loss: 3.3836

 validation loss: 3.3643



training:   5%|▌         | 106/2000 [2:18:26<43:15:13, 82.21s/it][A


 Epoch: 105 |Training loss: 3.3727



training:   5%|▌         | 107/2000 [2:19:40<41:50:10, 79.56s/it][A


 Epoch: 106 |Training loss: 3.3643



training:   5%|▌         | 108/2000 [2:20:52<40:42:44, 77.47s/it][A


 Epoch: 107 |Training loss: 3.3560



training:   5%|▌         | 109/2000 [2:22:05<40:00:18, 76.16s/it][A


 Epoch: 108 |Training loss: 3.3451



training:   6%|▌         | 110/2000 [2:23:19<39:30:36, 75.26s/it][A


 Epoch: 109 |Training loss: 3.3302

 validation loss: 3.3103



training:   6%|▌         | 111/2000 [2:24:57<43:10:28, 82.28s/it][A


 Epoch: 110 |Training loss: 3.3212



training:   6%|▌         | 112/2000 [2:26:10<41:42:39, 79.53s/it][A


 Epoch: 111 |Training loss: 3.3104



training:   6%|▌         | 113/2000 [2:27:24<40:40:18, 77.59s/it][A


 Epoch: 112 |Training loss: 3.2975



training:   6%|▌         | 114/2000 [2:28:36<39:54:30, 76.18s/it][A


 Epoch: 113 |Training loss: 3.2963



training:   6%|▌         | 115/2000 [2:29:50<39:25:55, 75.31s/it][A


 Epoch: 114 |Training loss: 3.2819

 validation loss: 3.2607



training:   6%|▌         | 116/2000 [2:31:28<43:04:24, 82.31s/it][A


 Epoch: 115 |Training loss: 3.2700



training:   6%|▌         | 117/2000 [2:32:41<41:32:38, 79.43s/it][A


 Epoch: 116 |Training loss: 3.2608



training:   6%|▌         | 118/2000 [2:33:54<40:31:53, 77.53s/it][A


 Epoch: 117 |Training loss: 3.2499



training:   6%|▌         | 119/2000 [2:35:07<39:46:39, 76.13s/it][A


 Epoch: 118 |Training loss: 3.2467



training:   6%|▌         | 120/2000 [2:36:20<39:19:20, 75.30s/it][A


 Epoch: 119 |Training loss: 3.2317

 validation loss: 3.2092



training:   6%|▌         | 121/2000 [2:37:59<42:57:37, 82.31s/it][A


 Epoch: 120 |Training loss: 3.2222



training:   6%|▌         | 122/2000 [2:39:12<41:30:40, 79.57s/it][A


 Epoch: 121 |Training loss: 3.2092



training:   6%|▌         | 123/2000 [2:40:25<40:28:07, 77.62s/it][A


 Epoch: 122 |Training loss: 3.1990



training:   6%|▌         | 124/2000 [2:41:38<39:44:43, 76.27s/it][A


 Epoch: 123 |Training loss: 3.1925



training:   6%|▋         | 125/2000 [2:42:52<39:15:07, 75.36s/it][A


 Epoch: 124 |Training loss: 3.1854

 validation loss: 3.1652



training:   6%|▋         | 126/2000 [2:44:30<42:50:30, 82.30s/it][A


 Epoch: 125 |Training loss: 3.1707



training:   6%|▋         | 127/2000 [2:45:43<41:22:20, 79.52s/it][A


 Epoch: 126 |Training loss: 3.1652



training:   6%|▋         | 128/2000 [2:46:56<40:19:54, 77.56s/it][A


 Epoch: 127 |Training loss: 3.1532



training:   6%|▋         | 129/2000 [2:48:09<39:36:17, 76.20s/it][A


 Epoch: 128 |Training loss: 3.1438



training:   6%|▋         | 130/2000 [2:49:22<39:05:24, 75.25s/it][A


 Epoch: 129 |Training loss: 3.1426

 validation loss: 3.1256



training:   7%|▋         | 131/2000 [2:51:01<42:45:08, 82.35s/it][A


 Epoch: 130 |Training loss: 3.1331



training:   7%|▋         | 132/2000 [2:52:14<41:19:17, 79.63s/it][A


 Epoch: 131 |Training loss: 3.1256



training:   7%|▋         | 133/2000 [2:53:27<40:16:30, 77.66s/it][A


 Epoch: 132 |Training loss: 3.1123



training:   7%|▋         | 134/2000 [2:54:40<39:31:35, 76.26s/it][A


 Epoch: 133 |Training loss: 3.1152



training:   7%|▋         | 135/2000 [2:55:53<38:59:23, 75.26s/it][A


 Epoch: 134 |Training loss: 3.1030



training:   7%|▋         | 136/2000 [2:57:32<42:35:05, 82.25s/it][A


 validation loss: 3.0886

 Epoch: 135 |Training loss: 3.0994



training:   7%|▋         | 137/2000 [2:58:45<41:10:10, 79.55s/it][A


 Epoch: 136 |Training loss: 3.0886



training:   7%|▋         | 138/2000 [2:59:58<40:06:05, 77.53s/it][A


 Epoch: 137 |Training loss: 3.0836



training:   7%|▋         | 139/2000 [3:01:11<39:27:11, 76.32s/it][A


 Epoch: 138 |Training loss: 3.0813



training:   7%|▋         | 140/2000 [3:02:24<38:53:24, 75.27s/it][A


 Epoch: 139 |Training loss: 3.0633

 validation loss: 3.0531



training:   7%|▋         | 141/2000 [3:04:03<42:27:03, 82.21s/it][A


 Epoch: 140 |Training loss: 3.0578



training:   7%|▋         | 142/2000 [3:05:16<41:00:07, 79.44s/it][A


 Epoch: 141 |Training loss: 3.0531



training:   7%|▋         | 143/2000 [3:06:29<39:58:44, 77.50s/it][A


 Epoch: 142 |Training loss: 3.0493



training:   7%|▋         | 144/2000 [3:07:41<39:13:42, 76.09s/it][A


 Epoch: 143 |Training loss: 3.0412



training:   7%|▋         | 145/2000 [3:08:55<38:45:13, 75.21s/it][A


 Epoch: 144 |Training loss: 3.0350

 validation loss: 3.0181



training:   7%|▋         | 146/2000 [3:10:33<42:21:17, 82.24s/it][A


 Epoch: 145 |Training loss: 3.0228



training:   7%|▋         | 147/2000 [3:11:46<40:53:02, 79.43s/it][A


 Epoch: 146 |Training loss: 3.0181



training:   7%|▋         | 148/2000 [3:12:59<39:54:56, 77.59s/it][A


 Epoch: 147 |Training loss: 3.0095



training:   7%|▋         | 149/2000 [3:14:12<39:09:52, 76.17s/it][A


 Epoch: 148 |Training loss: 3.0098



training:   8%|▊         | 150/2000 [3:15:25<38:40:19, 75.25s/it][A


 Epoch: 149 |Training loss: 2.9943

 validation loss: 2.9868



training:   8%|▊         | 151/2000 [3:17:04<42:16:01, 82.29s/it][A


 Epoch: 150 |Training loss: 2.9890



training:   8%|▊         | 152/2000 [3:18:17<40:50:55, 79.58s/it][A


 Epoch: 151 |Training loss: 2.9868



training:   8%|▊         | 153/2000 [3:19:30<39:46:43, 77.53s/it][A


 Epoch: 152 |Training loss: 2.9739



training:   8%|▊         | 154/2000 [3:20:43<39:05:32, 76.24s/it][A


 Epoch: 153 |Training loss: 2.9722



training:   8%|▊         | 155/2000 [3:21:56<38:35:06, 75.29s/it][A


 Epoch: 154 |Training loss: 2.9648

 validation loss: 2.9502



training:   8%|▊         | 156/2000 [3:23:35<42:07:44, 82.25s/it][A


 Epoch: 155 |Training loss: 2.9594



training:   8%|▊         | 157/2000 [3:24:48<40:43:20, 79.54s/it][A


 Epoch: 156 |Training loss: 2.9502



training:   8%|▊         | 158/2000 [3:26:01<39:40:49, 77.55s/it][A


 Epoch: 157 |Training loss: 2.9517



training:   8%|▊         | 159/2000 [3:27:14<38:58:38, 76.22s/it][A


 Epoch: 158 |Training loss: 2.9468



training:   8%|▊         | 160/2000 [3:28:27<38:27:47, 75.25s/it][A


 Epoch: 159 |Training loss: 2.9891

 validation loss: 2.9657



training:   8%|▊         | 161/2000 [3:30:06<42:04:10, 82.35s/it][A


 Epoch: 160 |Training loss: 2.9517



training:   8%|▊         | 162/2000 [3:31:19<40:39:39, 79.64s/it][A


 Epoch: 161 |Training loss: 2.9657



training:   8%|▊         | 163/2000 [3:32:33<39:39:22, 77.72s/it][A


 Epoch: 162 |Training loss: 2.9688



training:   8%|▊         | 164/2000 [3:33:46<38:56:28, 76.36s/it][A


 Epoch: 163 |Training loss: 2.9461



training:   8%|▊         | 165/2000 [3:34:59<38:24:52, 75.36s/it][A


 Epoch: 164 |Training loss: 2.9442

 validation loss: 2.9284



training:   8%|▊         | 166/2000 [3:36:37<41:53:45, 82.24s/it][A


 Epoch: 165 |Training loss: 2.9370



training:   8%|▊         | 167/2000 [3:37:50<40:30:02, 79.54s/it][A


 Epoch: 166 |Training loss: 2.9284



training:   8%|▊         | 168/2000 [3:39:03<39:28:32, 77.57s/it][A


 Epoch: 167 |Training loss: 2.9198



training:   8%|▊         | 169/2000 [3:40:17<38:46:57, 76.25s/it][A


 Epoch: 168 |Training loss: 2.9185



training:   8%|▊         | 170/2000 [3:41:30<38:17:52, 75.34s/it][A


 Epoch: 169 |Training loss: 2.9048



training:   9%|▊         | 171/2000 [3:43:09<41:51:54, 82.40s/it][A


 validation loss: 2.8862

 Epoch: 170 |Training loss: 2.8909



training:   9%|▊         | 172/2000 [3:44:22<40:27:25, 79.67s/it][A


 Epoch: 171 |Training loss: 2.8862



training:   9%|▊         | 173/2000 [3:45:35<39:23:26, 77.62s/it][A


 Epoch: 172 |Training loss: 2.8727



training:   9%|▊         | 174/2000 [3:46:48<38:39:09, 76.20s/it][A


 Epoch: 173 |Training loss: 2.8733



training:   9%|▉         | 175/2000 [3:48:01<38:08:06, 75.23s/it][A


 Epoch: 174 |Training loss: 2.8728

 validation loss: 2.8602



training:   9%|▉         | 176/2000 [3:49:39<41:41:10, 82.28s/it][A


 Epoch: 175 |Training loss: 2.9061



training:   9%|▉         | 177/2000 [3:50:52<40:15:34, 79.50s/it][A


 Epoch: 176 |Training loss: 2.8602



training:   9%|▉         | 178/2000 [3:52:05<39:14:54, 77.55s/it][A


 Epoch: 177 |Training loss: 2.8556



training:   9%|▉         | 179/2000 [3:53:18<38:31:37, 76.17s/it][A


 Epoch: 178 |Training loss: 2.8474



training:   9%|▉         | 180/2000 [3:54:31<38:03:03, 75.27s/it][A


 Epoch: 179 |Training loss: 2.8364

 validation loss: 2.8242



training:   9%|▉         | 181/2000 [3:56:10<41:33:19, 82.24s/it][A


 Epoch: 180 |Training loss: 2.8300



training:   9%|▉         | 182/2000 [3:57:23<40:10:59, 79.57s/it][A


 Epoch: 181 |Training loss: 2.8242



training:   9%|▉         | 183/2000 [3:58:36<39:08:05, 77.54s/it][A


 Epoch: 182 |Training loss: 2.8177



training:   9%|▉         | 184/2000 [3:59:49<38:28:41, 76.28s/it][A


 Epoch: 183 |Training loss: 2.8074



training:   9%|▉         | 185/2000 [4:01:03<37:59:24, 75.35s/it][A


 Epoch: 184 |Training loss: 2.8038

 validation loss: 2.7903



training:   9%|▉         | 186/2000 [4:02:41<41:30:28, 82.38s/it][A


 Epoch: 185 |Training loss: 2.8029



training:   9%|▉         | 187/2000 [4:03:55<40:05:20, 79.60s/it][A


 Epoch: 186 |Training loss: 2.7903



training:   9%|▉         | 188/2000 [4:05:07<39:03:20, 77.59s/it][A


 Epoch: 187 |Training loss: 2.7936



training:   9%|▉         | 189/2000 [4:06:20<38:19:37, 76.19s/it][A


 Epoch: 188 |Training loss: 2.7771



training:  10%|▉         | 190/2000 [4:07:33<37:49:47, 75.24s/it][A


 Epoch: 189 |Training loss: 2.7827

 validation loss: 2.7791



training:  10%|▉         | 191/2000 [4:09:12<41:18:42, 82.21s/it][A


 Epoch: 190 |Training loss: 2.7676



training:  10%|▉         | 192/2000 [4:10:25<39:54:33, 79.47s/it][A


 Epoch: 191 |Training loss: 2.7791



training:  10%|▉         | 193/2000 [4:11:38<38:56:58, 77.60s/it][A


 Epoch: 192 |Training loss: 2.7674



training:  10%|▉         | 194/2000 [4:12:51<38:16:11, 76.29s/it][A


 Epoch: 193 |Training loss: 2.7603



training:  10%|▉         | 195/2000 [4:14:04<37:45:59, 75.32s/it][A


 Epoch: 194 |Training loss: 2.7571

 validation loss: 2.7454



training:  10%|▉         | 196/2000 [4:15:43<41:14:12, 82.29s/it][A


 Epoch: 195 |Training loss: 2.7422



training:  10%|▉         | 197/2000 [4:16:56<39:51:18, 79.58s/it][A


 Epoch: 196 |Training loss: 2.7454



training:  10%|▉         | 198/2000 [4:18:09<38:48:29, 77.53s/it][A


 Epoch: 197 |Training loss: 2.7490



training:  10%|▉         | 199/2000 [4:19:22<38:08:50, 76.25s/it][A


 Epoch: 198 |Training loss: 2.7264



training:  10%|█         | 200/2000 [4:20:35<37:38:22, 75.28s/it][A


 Epoch: 199 |Training loss: 2.7436

 validation loss: 2.7239



training:  10%|█         | 201/2000 [4:22:14<41:05:47, 82.24s/it][A


 Epoch: 200 |Training loss: 2.7498



training:  10%|█         | 202/2000 [4:23:27<39:44:23, 79.57s/it][A


 Epoch: 201 |Training loss: 2.7239



training:  10%|█         | 203/2000 [4:24:40<38:44:32, 77.61s/it][A


 Epoch: 202 |Training loss: 2.7229



training:  10%|█         | 204/2000 [4:25:53<38:03:53, 76.30s/it][A


 Epoch: 203 |Training loss: 2.7191



training:  10%|█         | 205/2000 [4:27:07<37:35:27, 75.39s/it][A


 Epoch: 204 |Training loss: 2.7046

 validation loss: 2.6926



training:  10%|█         | 206/2000 [4:28:45<41:03:00, 82.38s/it][A


 Epoch: 205 |Training loss: 2.6940



training:  10%|█         | 207/2000 [4:29:58<39:36:54, 79.54s/it][A


 Epoch: 206 |Training loss: 2.6926



training:  10%|█         | 208/2000 [4:31:12<38:42:17, 77.76s/it][A


 Epoch: 207 |Training loss: 2.6933



training:  10%|█         | 209/2000 [4:32:25<37:56:46, 76.27s/it][A


 Epoch: 208 |Training loss: 2.6797



training:  10%|█         | 210/2000 [4:33:38<37:30:08, 75.42s/it][A


 Epoch: 209 |Training loss: 2.6765

 validation loss: 2.6643



training:  11%|█         | 211/2000 [4:35:17<40:57:19, 82.41s/it][A


 Epoch: 210 |Training loss: 2.6629



training:  11%|█         | 212/2000 [4:36:30<39:36:22, 79.74s/it][A


 Epoch: 211 |Training loss: 2.6643



training:  11%|█         | 213/2000 [4:37:43<38:35:42, 77.75s/it][A


 Epoch: 212 |Training loss: 2.6518



training:  11%|█         | 214/2000 [4:38:57<37:57:06, 76.50s/it][A


 Epoch: 213 |Training loss: 2.6546



training:  11%|█         | 215/2000 [4:40:10<37:26:35, 75.52s/it][A


 Epoch: 214 |Training loss: 2.6474

 validation loss: 2.6471



training:  11%|█         | 216/2000 [4:41:49<40:52:17, 82.48s/it][A


 Epoch: 215 |Training loss: 2.6463



training:  11%|█         | 217/2000 [4:43:02<39:30:55, 79.78s/it][A


 Epoch: 216 |Training loss: 2.6471



training:  11%|█         | 218/2000 [4:44:16<38:32:24, 77.86s/it][A


 Epoch: 217 |Training loss: 2.6317



training:  11%|█         | 219/2000 [4:45:29<37:48:42, 76.43s/it][A


 Epoch: 218 |Training loss: 2.6351



training:  11%|█         | 220/2000 [4:46:42<37:19:01, 75.47s/it][A


 Epoch: 219 |Training loss: 2.6306

 validation loss: 2.6196



training:  11%|█         | 221/2000 [4:48:21<40:44:02, 82.43s/it][A


 Epoch: 220 |Training loss: 2.6178



training:  11%|█         | 222/2000 [4:49:34<39:19:21, 79.62s/it][A


 Epoch: 221 |Training loss: 2.6196



training:  11%|█         | 223/2000 [4:50:47<38:23:37, 77.78s/it][A


 Epoch: 222 |Training loss: 2.6058



training:  11%|█         | 224/2000 [4:52:01<37:42:59, 76.45s/it][A


 Epoch: 223 |Training loss: 2.6050



training:  11%|█▏        | 225/2000 [4:53:14<37:13:42, 75.51s/it][A


 Epoch: 224 |Training loss: 2.5977

 validation loss: 2.5942



training:  11%|█▏        | 226/2000 [4:54:53<40:38:51, 82.49s/it][A


 Epoch: 225 |Training loss: 2.5899



training:  11%|█▏        | 227/2000 [4:56:06<39:17:51, 79.79s/it][A


 Epoch: 226 |Training loss: 2.5942



training:  11%|█▏        | 228/2000 [4:57:19<38:17:18, 77.79s/it][A


 Epoch: 227 |Training loss: 2.5799



training:  11%|█▏        | 229/2000 [4:58:33<37:38:07, 76.50s/it][A


 Epoch: 228 |Training loss: 2.5777



training:  12%|█▏        | 230/2000 [4:59:46<37:05:35, 75.44s/it][A


 Epoch: 229 |Training loss: 2.5766

 validation loss: 2.5602



training:  12%|█▏        | 231/2000 [5:01:25<40:30:02, 82.42s/it][A


 Epoch: 230 |Training loss: 2.5746



training:  12%|█▏        | 232/2000 [5:02:38<39:05:52, 79.61s/it][A


 Epoch: 231 |Training loss: 2.5602



training:  12%|█▏        | 233/2000 [5:03:51<38:06:08, 77.63s/it][A


 Epoch: 232 |Training loss: 2.5752



training:  12%|█▏        | 234/2000 [5:05:04<37:27:30, 76.36s/it][A


 Epoch: 233 |Training loss: 2.5564



training:  12%|█▏        | 235/2000 [5:06:17<36:57:11, 75.37s/it][A


 Epoch: 234 |Training loss: 2.5726

 validation loss: 2.5544



training:  12%|█▏        | 236/2000 [5:07:56<40:22:30, 82.40s/it][A


 Epoch: 235 |Training loss: 2.5596



training:  12%|█▏        | 237/2000 [5:09:09<39:00:00, 79.64s/it][A


 Epoch: 236 |Training loss: 2.5544



training:  12%|█▏        | 238/2000 [5:10:23<38:05:06, 77.81s/it][A


 Epoch: 237 |Training loss: 2.5466



training:  12%|█▏        | 239/2000 [5:11:36<37:22:18, 76.40s/it][A


 Epoch: 238 |Training loss: 2.5504



training:  12%|█▏        | 240/2000 [5:12:49<36:54:11, 75.48s/it][A


 Epoch: 239 |Training loss: 2.5392

 validation loss: 2.5299



training:  12%|█▏        | 241/2000 [5:14:28<40:16:24, 82.42s/it][A


 Epoch: 240 |Training loss: 2.5345



training:  12%|█▏        | 242/2000 [5:15:41<38:54:35, 79.68s/it][A


 Epoch: 241 |Training loss: 2.5299



training:  12%|█▏        | 243/2000 [5:16:54<37:57:24, 77.77s/it][A


 Epoch: 242 |Training loss: 2.5337



training:  12%|█▏        | 244/2000 [5:18:08<37:17:36, 76.46s/it][A


 Epoch: 243 |Training loss: 2.5177



training:  12%|█▏        | 245/2000 [5:19:21<36:45:44, 75.41s/it][A


 Epoch: 244 |Training loss: 2.5124

 validation loss: 2.5213



training:  12%|█▏        | 246/2000 [5:20:59<40:09:43, 82.43s/it][A


 Epoch: 245 |Training loss: 2.5060



training:  12%|█▏        | 247/2000 [5:22:13<38:48:14, 79.69s/it][A


 Epoch: 246 |Training loss: 2.5213



training:  12%|█▏        | 248/2000 [5:23:26<37:48:27, 77.69s/it][A


 Epoch: 247 |Training loss: 2.5107



training:  12%|█▏        | 249/2000 [5:24:39<37:07:43, 76.34s/it][A


 Epoch: 248 |Training loss: 2.5027



training:  12%|█▎        | 250/2000 [5:25:52<36:38:58, 75.39s/it][A


 Epoch: 249 |Training loss: 2.5015

 validation loss: 2.5002



training:  13%|█▎        | 251/2000 [5:27:31<40:01:04, 82.37s/it][A


 Epoch: 250 |Training loss: 2.4902



training:  13%|█▎        | 252/2000 [5:28:44<38:39:17, 79.61s/it][A


 Epoch: 251 |Training loss: 2.5002



training:  13%|█▎        | 253/2000 [5:29:57<37:38:40, 77.57s/it][A


 Epoch: 252 |Training loss: 2.4867



training:  13%|█▎        | 254/2000 [5:31:10<36:56:00, 76.15s/it][A


 Epoch: 253 |Training loss: 2.4820



training:  13%|█▎        | 255/2000 [5:32:23<36:27:50, 75.23s/it][A


 Epoch: 254 |Training loss: 2.4853

 validation loss: 2.4665



training:  13%|█▎        | 256/2000 [5:34:01<39:48:30, 82.17s/it][A


 Epoch: 255 |Training loss: 2.4770



training:  13%|█▎        | 257/2000 [5:35:15<38:33:11, 79.63s/it][A


 Epoch: 256 |Training loss: 2.4665



training:  13%|█▎        | 258/2000 [5:36:28<37:33:52, 77.63s/it][A


 Epoch: 257 |Training loss: 2.4610



training:  13%|█▎        | 259/2000 [5:37:41<36:53:54, 76.30s/it][A


 Epoch: 258 |Training loss: 2.4680



training:  13%|█▎        | 260/2000 [5:38:54<36:26:14, 75.39s/it][A


 Epoch: 259 |Training loss: 2.4584

 validation loss: 2.4491



training:  13%|█▎        | 261/2000 [5:40:33<39:46:55, 82.35s/it][A


 Epoch: 260 |Training loss: 2.4494



training:  13%|█▎        | 262/2000 [5:41:46<38:26:01, 79.61s/it][A


 Epoch: 261 |Training loss: 2.4491



training:  13%|█▎        | 263/2000 [5:42:59<37:28:48, 77.68s/it][A


 Epoch: 262 |Training loss: 2.4524



training:  13%|█▎        | 264/2000 [5:44:12<36:48:36, 76.33s/it][A


 Epoch: 263 |Training loss: 2.4343



training:  13%|█▎        | 265/2000 [5:45:25<36:18:31, 75.34s/it][A


 Epoch: 264 |Training loss: 2.4472

 validation loss: 2.4422



training:  13%|█▎        | 266/2000 [5:47:04<39:41:44, 82.41s/it][A


 Epoch: 265 |Training loss: 2.4314



training:  13%|█▎        | 267/2000 [5:48:18<38:21:34, 79.69s/it][A


 Epoch: 266 |Training loss: 2.4422



training:  13%|█▎        | 268/2000 [5:49:31<37:26:58, 77.84s/it][A


 Epoch: 267 |Training loss: 2.4288



training:  13%|█▎        | 269/2000 [5:50:44<36:43:34, 76.38s/it][A


 Epoch: 268 |Training loss: 2.4415



training:  14%|█▎        | 270/2000 [5:51:58<36:16:18, 75.48s/it][A


 Epoch: 269 |Training loss: 2.4295

 validation loss: 2.4334



training:  14%|█▎        | 271/2000 [5:53:36<39:36:12, 82.46s/it][A


 Epoch: 270 |Training loss: 2.4306



training:  14%|█▎        | 272/2000 [5:54:49<38:14:52, 79.68s/it][A


 Epoch: 271 |Training loss: 2.4334



training:  14%|█▎        | 273/2000 [5:56:02<37:15:50, 77.68s/it][A


 Epoch: 272 |Training loss: 2.4238



training:  14%|█▎        | 274/2000 [5:57:16<36:39:29, 76.46s/it][A


 Epoch: 273 |Training loss: 2.4239



training:  14%|█▍        | 275/2000 [5:58:30<36:13:18, 75.59s/it][A


 Epoch: 274 |Training loss: 2.4124

 validation loss: 2.4113



training:  14%|█▍        | 276/2000 [6:00:09<39:32:15, 82.56s/it][A


 Epoch: 275 |Training loss: 2.4204



training:  14%|█▍        | 277/2000 [6:01:22<38:12:34, 79.83s/it][A


 Epoch: 276 |Training loss: 2.4113



training:  14%|█▍        | 278/2000 [6:02:35<37:14:06, 77.84s/it][A


 Epoch: 277 |Training loss: 2.4073



training:  14%|█▍        | 279/2000 [6:03:48<36:32:53, 76.45s/it][A


 Epoch: 278 |Training loss: 2.4015



training:  14%|█▍        | 280/2000 [6:05:02<36:05:01, 75.52s/it][A


 Epoch: 279 |Training loss: 2.3902

 validation loss: 2.3925



training:  14%|█▍        | 281/2000 [6:06:41<39:24:34, 82.53s/it][A


 Epoch: 280 |Training loss: 2.3945



training:  14%|█▍        | 282/2000 [6:07:54<38:03:21, 79.74s/it][A


 Epoch: 281 |Training loss: 2.3925



training:  14%|█▍        | 283/2000 [6:09:07<37:04:24, 77.73s/it][A


 Epoch: 282 |Training loss: 2.3778



training:  14%|█▍        | 284/2000 [6:10:20<36:22:42, 76.32s/it][A


 Epoch: 283 |Training loss: 2.3828



training:  14%|█▍        | 285/2000 [6:11:33<35:56:29, 75.45s/it][A


 Epoch: 284 |Training loss: 2.3798

 validation loss: 2.3678



training:  14%|█▍        | 286/2000 [6:13:12<39:13:44, 82.39s/it][A


 Epoch: 285 |Training loss: 2.3791



training:  14%|█▍        | 287/2000 [6:14:25<37:54:31, 79.67s/it][A


 Epoch: 286 |Training loss: 2.3678



training:  14%|█▍        | 288/2000 [6:15:38<36:54:21, 77.61s/it][A


 Epoch: 287 |Training loss: 2.3712



training:  14%|█▍        | 289/2000 [6:16:51<36:17:30, 76.36s/it][A


 Epoch: 288 |Training loss: 2.3600



training:  14%|█▍        | 290/2000 [6:18:05<35:48:52, 75.40s/it][A


 Epoch: 289 |Training loss: 2.3660

 validation loss: 2.3455



training:  15%|█▍        | 291/2000 [6:19:43<39:06:57, 82.40s/it][A


 Epoch: 290 |Training loss: 2.3541



training:  15%|█▍        | 292/2000 [6:20:57<37:48:41, 79.70s/it][A


 Epoch: 291 |Training loss: 2.3455



training:  15%|█▍        | 293/2000 [6:22:10<36:53:23, 77.80s/it][A


 Epoch: 292 |Training loss: 2.3539



training:  15%|█▍        | 294/2000 [6:23:24<36:14:26, 76.48s/it][A


 Epoch: 293 |Training loss: 2.3469



training:  15%|█▍        | 295/2000 [6:24:37<35:44:24, 75.46s/it][A


 Epoch: 294 |Training loss: 2.3457

 validation loss: 2.3421



training:  15%|█▍        | 296/2000 [6:26:15<39:00:01, 82.40s/it][A


 Epoch: 295 |Training loss: 2.3371



training:  15%|█▍        | 297/2000 [6:27:28<37:38:50, 79.58s/it][A


 Epoch: 296 |Training loss: 2.3421



training:  15%|█▍        | 298/2000 [6:28:42<36:44:33, 77.72s/it][A


 Epoch: 297 |Training loss: 2.3348



training:  15%|█▍        | 299/2000 [6:29:55<36:03:54, 76.33s/it][A


 Epoch: 298 |Training loss: 2.3386



training:  15%|█▌        | 300/2000 [6:31:08<35:36:47, 75.42s/it][A


 Epoch: 299 |Training loss: 2.3243

 validation loss: 2.3182



training:  15%|█▌        | 301/2000 [6:32:47<38:52:36, 82.38s/it][A


 Epoch: 300 |Training loss: 2.3299



training:  15%|█▌        | 302/2000 [6:34:00<37:33:06, 79.61s/it][A


 Epoch: 301 |Training loss: 2.3182



training:  15%|█▌        | 303/2000 [6:35:13<36:35:23, 77.62s/it][A


 Epoch: 302 |Training loss: 2.3282



training:  15%|█▌        | 304/2000 [6:36:26<35:57:38, 76.33s/it][A


 Epoch: 303 |Training loss: 2.3154



training:  15%|█▌        | 305/2000 [6:37:39<35:29:43, 75.39s/it][A


 Epoch: 304 |Training loss: 2.3250

 validation loss: 2.3219



training:  15%|█▌        | 306/2000 [6:39:18<38:45:39, 82.37s/it][A


 Epoch: 305 |Training loss: 2.3159



training:  15%|█▌        | 307/2000 [6:40:31<37:29:50, 79.73s/it][A


 Epoch: 306 |Training loss: 2.3219



training:  15%|█▌        | 308/2000 [6:41:45<36:33:53, 77.80s/it][A


 Epoch: 307 |Training loss: 2.3236



training:  15%|█▌        | 309/2000 [6:42:58<35:51:47, 76.35s/it][A


 Epoch: 308 |Training loss: 2.3006



training:  16%|█▌        | 310/2000 [6:44:11<35:25:34, 75.46s/it][A


 Epoch: 309 |Training loss: 2.3096

 validation loss: 2.3042



training:  16%|█▌        | 311/2000 [6:45:50<38:43:04, 82.53s/it][A


 Epoch: 310 |Training loss: 2.2987



training:  16%|█▌        | 312/2000 [6:47:04<37:25:36, 79.82s/it][A


 Epoch: 311 |Training loss: 2.3042



training:  16%|█▌        | 313/2000 [6:48:17<36:30:07, 77.89s/it][A


 Epoch: 312 |Training loss: 2.3002



training:  16%|█▌        | 314/2000 [6:49:30<35:49:39, 76.50s/it][A


 Epoch: 313 |Training loss: 2.2888



training:  16%|█▌        | 315/2000 [6:50:44<35:21:19, 75.54s/it][A


 Epoch: 314 |Training loss: 2.3031

 validation loss: 2.2984



training:  16%|█▌        | 316/2000 [6:52:22<38:34:09, 82.45s/it][A


 Epoch: 315 |Training loss: 2.2809



training:  16%|█▌        | 317/2000 [6:53:36<37:18:43, 79.81s/it][A


 Epoch: 316 |Training loss: 2.2984



training:  16%|█▌        | 318/2000 [6:54:49<36:22:45, 77.86s/it][A


 Epoch: 317 |Training loss: 2.2827



training:  16%|█▌        | 319/2000 [6:56:03<35:43:51, 76.52s/it][A


 Epoch: 318 |Training loss: 2.2772



training:  16%|█▌        | 320/2000 [6:57:16<35:14:58, 75.53s/it][A


 Epoch: 319 |Training loss: 2.2868

 validation loss: 2.2827



training:  16%|█▌        | 321/2000 [6:58:55<38:30:28, 82.57s/it][A


 Epoch: 320 |Training loss: 2.2712



training:  16%|█▌        | 322/2000 [7:00:08<37:12:39, 79.83s/it][A


 Epoch: 321 |Training loss: 2.2827



training:  16%|█▌        | 323/2000 [7:01:21<36:14:43, 77.81s/it][A


 Epoch: 322 |Training loss: 2.2723



training:  16%|█▌        | 324/2000 [7:02:34<35:34:56, 76.43s/it][A


 Epoch: 323 |Training loss: 2.2751



training:  16%|█▋        | 325/2000 [7:03:48<35:07:03, 75.48s/it][A


 Epoch: 324 |Training loss: 2.2627

 validation loss: 2.2617



training:  16%|█▋        | 326/2000 [7:05:27<38:22:37, 82.53s/it][A


 Epoch: 325 |Training loss: 2.2636



training:  16%|█▋        | 327/2000 [7:06:40<37:01:54, 79.69s/it][A


 Epoch: 326 |Training loss: 2.2617



training:  16%|█▋        | 328/2000 [7:07:53<36:07:14, 77.77s/it][A


 Epoch: 327 |Training loss: 2.2588



training:  16%|█▋        | 329/2000 [7:09:06<35:24:48, 76.29s/it][A


 Epoch: 328 |Training loss: 2.2425



training:  16%|█▋        | 330/2000 [7:10:20<35:01:30, 75.50s/it][A


 Epoch: 329 |Training loss: 2.2597

 validation loss: 2.2479



training:  17%|█▋        | 331/2000 [7:11:58<38:15:07, 82.51s/it][A


 Epoch: 330 |Training loss: 2.2419



training:  17%|█▋        | 332/2000 [7:13:12<36:56:44, 79.74s/it][A


 Epoch: 331 |Training loss: 2.2479



training:  17%|█▋        | 333/2000 [7:14:25<36:00:37, 77.77s/it][A


 Epoch: 332 |Training loss: 2.2449



training:  17%|█▋        | 334/2000 [7:15:39<35:25:30, 76.55s/it][A


 Epoch: 333 |Training loss: 2.2340



training:  17%|█▋        | 335/2000 [7:16:52<34:57:36, 75.59s/it][A


 Epoch: 334 |Training loss: 2.2314

 validation loss: 2.2217



training:  17%|█▋        | 336/2000 [7:18:31<38:09:51, 82.57s/it][A


 Epoch: 335 |Training loss: 2.2325



training:  17%|█▋        | 337/2000 [7:19:45<36:55:28, 79.93s/it][A


 Epoch: 336 |Training loss: 2.2217



training:  17%|█▋        | 338/2000 [7:20:58<35:59:27, 77.96s/it][A


 Epoch: 337 |Training loss: 2.2308



training:  17%|█▋        | 339/2000 [7:22:11<35:21:03, 76.62s/it][A


 Epoch: 338 |Training loss: 2.2168



training:  17%|█▋        | 340/2000 [7:23:25<34:52:21, 75.63s/it][A


 Epoch: 339 |Training loss: 2.2248

 validation loss: 2.2101



training:  17%|█▋        | 341/2000 [7:25:04<38:08:08, 82.75s/it][A


 Epoch: 340 |Training loss: 2.2290



training:  17%|█▋        | 342/2000 [7:26:18<36:51:43, 80.04s/it][A


 Epoch: 341 |Training loss: 2.2101



training:  17%|█▋        | 343/2000 [7:27:31<35:57:18, 78.12s/it][A


 Epoch: 342 |Training loss: 2.2138



training:  17%|█▋        | 344/2000 [7:28:45<35:19:02, 76.78s/it][A


 Epoch: 343 |Training loss: 2.2065



training:  17%|█▋        | 345/2000 [7:29:59<34:52:59, 75.88s/it][A


 Epoch: 344 |Training loss: 2.2049

 validation loss: 2.1936



training:  17%|█▋        | 346/2000 [7:31:38<38:01:21, 82.76s/it][A


 Epoch: 345 |Training loss: 2.2031



training:  17%|█▋        | 347/2000 [7:32:52<36:48:07, 80.15s/it][A


 Epoch: 346 |Training loss: 2.1936



training:  17%|█▋        | 348/2000 [7:34:05<35:51:53, 78.16s/it][A


 Epoch: 347 |Training loss: 2.2025



training:  17%|█▋        | 349/2000 [7:35:19<35:16:22, 76.91s/it][A


 Epoch: 348 |Training loss: 2.1867



training:  18%|█▊        | 350/2000 [7:36:33<34:45:39, 75.84s/it][A


 Epoch: 349 |Training loss: 2.2009

 validation loss: 2.2063



training:  18%|█▊        | 351/2000 [7:38:12<37:56:44, 82.84s/it][A


 Epoch: 350 |Training loss: 2.1826



training:  18%|█▊        | 352/2000 [7:39:26<36:41:21, 80.15s/it][A


 Epoch: 351 |Training loss: 2.2063



training:  18%|█▊        | 353/2000 [7:40:39<35:44:38, 78.13s/it][A


 Epoch: 352 |Training loss: 2.1906



training:  18%|█▊        | 354/2000 [7:41:53<35:07:19, 76.82s/it][A


 Epoch: 353 |Training loss: 2.1869



training:  18%|█▊        | 355/2000 [7:43:06<34:38:58, 75.83s/it][A


 Epoch: 354 |Training loss: 2.1912

 validation loss: 2.1706



training:  18%|█▊        | 356/2000 [7:44:46<37:52:19, 82.93s/it][A


 Epoch: 355 |Training loss: 2.1924



training:  18%|█▊        | 357/2000 [7:46:00<36:36:39, 80.22s/it][A


 Epoch: 356 |Training loss: 2.1706



training:  18%|█▊        | 358/2000 [7:47:14<35:43:57, 78.34s/it][A


 Epoch: 357 |Training loss: 2.1943



training:  18%|█▊        | 359/2000 [7:48:27<35:03:25, 76.91s/it][A


 Epoch: 358 |Training loss: 2.1787



training:  18%|█▊        | 360/2000 [7:49:41<34:36:50, 75.98s/it][A


 Epoch: 359 |Training loss: 2.1795

 validation loss: 2.1729



training:  18%|█▊        | 361/2000 [7:51:20<37:47:29, 83.01s/it][A


 Epoch: 360 |Training loss: 2.1699



training:  18%|█▊        | 362/2000 [7:52:34<36:29:25, 80.20s/it][A


 Epoch: 361 |Training loss: 2.1729



training:  18%|█▊        | 363/2000 [7:53:48<35:32:43, 78.17s/it][A


 Epoch: 362 |Training loss: 2.1650



training:  18%|█▊        | 364/2000 [7:55:01<34:55:40, 76.86s/it][A


 Epoch: 363 |Training loss: 2.1612



training:  18%|█▊        | 365/2000 [7:56:15<34:26:52, 75.85s/it][A


 Epoch: 364 |Training loss: 2.1621

 validation loss: 2.1592



training:  18%|█▊        | 366/2000 [7:57:54<37:35:33, 82.82s/it][A


 Epoch: 365 |Training loss: 2.1474



training:  18%|█▊        | 367/2000 [7:59:08<36:19:35, 80.08s/it][A


 Epoch: 366 |Training loss: 2.1592



training:  18%|█▊        | 368/2000 [8:00:21<35:25:57, 78.16s/it][A


 Epoch: 367 |Training loss: 2.1470



training:  18%|█▊        | 369/2000 [8:01:35<34:47:30, 76.79s/it][A


 Epoch: 368 |Training loss: 2.1500



training:  18%|█▊        | 370/2000 [8:02:49<34:21:34, 75.89s/it][A


 Epoch: 369 |Training loss: 2.1430

 validation loss: 2.1360



training:  19%|█▊        | 371/2000 [8:04:28<37:29:46, 82.86s/it][A


 Epoch: 370 |Training loss: 2.1443



training:  19%|█▊        | 372/2000 [8:05:41<36:11:57, 80.05s/it][A


 Epoch: 371 |Training loss: 2.1360



training:  19%|█▊        | 373/2000 [8:06:55<35:17:29, 78.09s/it][A


 Epoch: 372 |Training loss: 2.1436



training:  19%|█▊        | 374/2000 [8:08:08<34:39:27, 76.73s/it][A


 Epoch: 373 |Training loss: 2.1295



training:  19%|█▉        | 375/2000 [8:09:22<34:14:20, 75.85s/it][A


 Epoch: 374 |Training loss: 2.1350

 validation loss: 2.1242



training:  19%|█▉        | 376/2000 [8:11:01<37:22:54, 82.87s/it][A


 Epoch: 375 |Training loss: 2.1302



training:  19%|█▉        | 377/2000 [8:12:15<36:09:25, 80.20s/it][A


 Epoch: 376 |Training loss: 2.1242



training:  19%|█▉        | 378/2000 [8:13:29<35:12:08, 78.13s/it][A


 Epoch: 377 |Training loss: 2.1271



training:  19%|█▉        | 379/2000 [8:14:43<34:35:57, 76.84s/it][A


 Epoch: 378 |Training loss: 2.1135



training:  19%|█▉        | 380/2000 [8:15:56<34:09:37, 75.91s/it][A


 Epoch: 379 |Training loss: 2.1202

 validation loss: 2.1369



training:  19%|█▉        | 381/2000 [8:17:36<37:17:40, 82.93s/it][A


 Epoch: 380 |Training loss: 2.1047



training:  19%|█▉        | 382/2000 [8:18:49<36:02:43, 80.20s/it][A


 Epoch: 381 |Training loss: 2.1369



training:  19%|█▉        | 383/2000 [8:20:03<35:07:54, 78.22s/it][A


 Epoch: 382 |Training loss: 2.1081



training:  19%|█▉        | 384/2000 [8:21:17<34:33:57, 77.00s/it][A


 Epoch: 383 |Training loss: 2.1247



training:  19%|█▉        | 385/2000 [8:22:31<34:06:47, 76.04s/it][A


 Epoch: 384 |Training loss: 2.1090

 validation loss: 2.1224



training:  19%|█▉        | 386/2000 [8:24:11<37:16:33, 83.14s/it][A


 Epoch: 385 |Training loss: 2.1271



training:  19%|█▉        | 387/2000 [8:25:24<35:56:06, 80.20s/it][A


 Epoch: 386 |Training loss: 2.1224



training:  19%|█▉        | 388/2000 [8:26:38<35:04:56, 78.35s/it][A


 Epoch: 387 |Training loss: 2.1100



training:  19%|█▉        | 389/2000 [8:27:52<34:25:36, 76.93s/it][A


 Epoch: 388 |Training loss: 2.1116



training:  20%|█▉        | 390/2000 [8:29:05<33:57:52, 75.95s/it][A


 Epoch: 389 |Training loss: 2.1097

 validation loss: 2.1042



training:  20%|█▉        | 391/2000 [8:30:45<37:04:32, 82.95s/it][A


 Epoch: 390 |Training loss: 2.0961



training:  20%|█▉        | 392/2000 [8:31:58<35:48:37, 80.17s/it][A


 Epoch: 391 |Training loss: 2.1042



training:  20%|█▉        | 393/2000 [8:33:12<34:52:18, 78.12s/it][A


 Epoch: 392 |Training loss: 2.1023



training:  20%|█▉        | 394/2000 [8:34:25<34:16:15, 76.82s/it][A


 Epoch: 393 |Training loss: 2.0836



training:  20%|█▉        | 395/2000 [8:35:39<33:48:45, 75.84s/it][A


 Epoch: 394 |Training loss: 2.1006

 validation loss: 2.0937



training:  20%|█▉        | 396/2000 [8:37:18<36:54:21, 82.83s/it][A


 Epoch: 395 |Training loss: 2.0862



training:  20%|█▉        | 397/2000 [8:38:32<35:39:12, 80.07s/it][A


 Epoch: 396 |Training loss: 2.0937



training:  20%|█▉        | 398/2000 [8:39:45<34:47:12, 78.17s/it][A


 Epoch: 397 |Training loss: 2.0891



training:  20%|█▉        | 399/2000 [8:40:59<34:08:45, 76.78s/it][A


 Epoch: 398 |Training loss: 2.0835



training:  20%|██        | 400/2000 [8:42:13<33:43:17, 75.87s/it][A


 Epoch: 399 |Training loss: 2.0829

 validation loss: 2.0697



training:  20%|██        | 401/2000 [8:43:52<36:48:36, 82.87s/it][A


 Epoch: 400 |Training loss: 2.0739



training:  20%|██        | 402/2000 [8:45:06<35:33:44, 80.12s/it][A


 Epoch: 401 |Training loss: 2.0697



training:  20%|██        | 403/2000 [8:46:19<34:39:18, 78.12s/it][A


 Epoch: 402 |Training loss: 2.0744



training:  20%|██        | 404/2000 [8:47:33<34:03:47, 76.83s/it][A


 Epoch: 403 |Training loss: 2.0661



training:  20%|██        | 405/2000 [8:48:47<33:38:05, 75.92s/it][A


 Epoch: 404 |Training loss: 2.0684

 validation loss: 2.0521



training:  20%|██        | 406/2000 [8:50:26<36:41:43, 82.88s/it][A


 Epoch: 405 |Training loss: 2.0606



training:  20%|██        | 407/2000 [8:51:40<35:29:18, 80.20s/it][A


 Epoch: 406 |Training loss: 2.0521



training:  20%|██        | 408/2000 [8:52:53<34:36:01, 78.24s/it][A


 Epoch: 407 |Training loss: 2.0677



training:  20%|██        | 409/2000 [8:54:07<33:58:38, 76.88s/it][A


 Epoch: 408 |Training loss: 2.0562



training:  20%|██        | 410/2000 [8:55:21<33:32:01, 75.93s/it][A


 Epoch: 409 |Training loss: 2.0482

 validation loss: 2.0568



training:  21%|██        | 411/2000 [8:57:00<36:34:38, 82.87s/it][A


 Epoch: 410 |Training loss: 2.0520



training:  21%|██        | 412/2000 [8:58:14<35:24:01, 80.25s/it][A


 Epoch: 411 |Training loss: 2.0568



training:  21%|██        | 413/2000 [8:59:28<34:29:31, 78.24s/it][A


 Epoch: 412 |Training loss: 2.0399



training:  21%|██        | 414/2000 [9:00:41<33:51:49, 76.87s/it][A


 Epoch: 413 |Training loss: 2.0459



training:  21%|██        | 415/2000 [9:01:55<33:24:59, 75.90s/it][A


 Epoch: 414 |Training loss: 2.0375

 validation loss: 2.0246



training:  21%|██        | 416/2000 [9:03:35<36:31:47, 83.02s/it][A


 Epoch: 415 |Training loss: 2.0505



training:  21%|██        | 417/2000 [9:04:48<35:17:10, 80.25s/it][A


 Epoch: 416 |Training loss: 2.0246



training:  21%|██        | 418/2000 [9:06:02<34:26:21, 78.37s/it][A


 Epoch: 417 |Training loss: 2.0686



training:  21%|██        | 419/2000 [9:07:16<33:46:00, 76.89s/it][A


 Epoch: 418 |Training loss: 2.0392



training:  21%|██        | 420/2000 [9:08:30<33:20:20, 75.96s/it][A


 Epoch: 419 |Training loss: 2.0738


**Music generation**

In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load(output_dir+weights)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# Generate network input again
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
network_input = np.reshape(network_input, (n_patterns, sequence_length))


The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2


In [None]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = torch.from_numpy(network_input[start]).cuda()

prediction_output = model.generate(pattern, 500)


In [None]:
result_sample=[]

for i in range(500):
  print(i)
  result = int_to_note[prediction_output[i].item()]
  print('\r', 'Predicted ', i, " ",result, end='')
  result_sample.append(result)

prediction_output=result_sample

0
 Predicted  0   61
 Predicted  1   4.62
 Predicted  2   6.113
 Predicted  3   64
 Predicted  4   6.115
 Predicted  5   A46
 Predicted  6   4.67
 Predicted  7   F48
 Predicted  8   69
 Predicted  9   610
 Predicted  10   5.7.9.011
 Predicted  11   2.3.7.1012
 Predicted  12   D513
 Predicted  13   C514
 Predicted  14   5.7.9.015
 Predicted  15   C516
 Predicted  16   4.617
 Predicted  17   B-118
 Predicted  18   10.2.519
 Predicted  19   C520
 Predicted  20   6.1121
 Predicted  21   622
 Predicted  22   F223
 Predicted  23   6.1124
 Predicted  24   4.625
 Predicted  25   B-226
 Predicted  26   B-127
 Predicted  27   A428
 Predicted  28   629
 Predicted  29   C530
 Predicted  30   E-331
 Predicted  31   F232
 Predicted  32   4.633
 Predicted  33   534
 Predicted  34   5.1035
 Predicted  35   4.636
 Predicted  36   637
 Predicted  37   4.638
 Predicted  38   4.639
 Predicted  39   F240
 Predicted  40   4.641
 Predicted  41   B-242
 Predicted  42

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [None]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'