<a href="https://colab.research.google.com/github/GiovanniSorice/Deep_Music_Generator/blob/main/notebooks/Music_Generation_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Music Generator 



In this notebook, we use an Transformer to generate some music.


**This notebook was inspired (and part of the code comes from it) by [Music_Generation_LSTM](https://colab.research.google.com/drive/19TQqekOlnOSW36VCL8CPVEQKBBukmaEQ#scrollTo=DDOBVWULXfpz)**




**Load dependencies**

In [1]:
pip install compressive_transformer_pytorch

Collecting compressive_transformer_pytorch
  Downloading https://files.pythonhosted.org/packages/30/39/b8caf2671abcb8615977c08766aa9f450addd6949f57c7dda87224e844b5/compressive_transformer_pytorch-0.3.20-py3-none-any.whl
Collecting mogrifier
  Downloading https://files.pythonhosted.org/packages/77/01/62a55d0f8048e788fce435f2ade6478f443e4e53ed9b89b55ba0fc42c198/mogrifier-0.0.3-py3-none-any.whl
Installing collected packages: mogrifier, compressive-transformer-pytorch
Successfully installed compressive-transformer-pytorch-0.3.20 mogrifier-0.0.3


In [2]:
import torch
import tqdm
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from compressive_transformer_pytorch import CompressiveTransformer
from compressive_transformer_pytorch import AutoregressiveWrapper
from torchsummary import summary
from torch.utils.data import DataLoader, Dataset
from tensorflow.keras import utils
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt
import glob
import pickle
from music21 import converter, instrument, stream, note, chord
import math
import shutil

In [3]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

**Set hyperparameters**

In [30]:
# output directory name:
output_dir = '/content/drive/My Drive/ISPR_project/Transformer/'
current_path ='/content/drive/My Drive/ISPR_project/'
# training:
epochs = 2000
batch_size = 64
learning_rate=1e-2
# vector-space embedding: 
n_dim = 64 
sequence_length = 32


VALIDATE_EVERY  = 5

GENERATE_EVERY  = 500



**Save model function**

In [6]:
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, output_dir+filename)
    if is_best:
        shutil.copyfile(output_dir+filename, output_dir+'model_best.pth.tar')

**Google drive configuration (only Colab)**

In [7]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Mounted at /content/drive


**Load data** \\
Original MIDI files
 I have obtained  **MIDI files** from [The Lakh MIDI Dataset v0.1](https://colinraffel.com/projects/lmd/). 

## Processing data

Let's process the files, and load them into **music21**

In [8]:
file = current_path+"midi_songs/small_dataset/Metal/Metallica/Am I Evil?.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.note.Note E> 0.0
<music21.chord.Chord C2 C#3> 0.0
<music21.note.Note G#> 2.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.note.Note D> 3.0
<music21.chord.Chord C#3 C2> 3.0
<music21.chord.Chord B3 E3 E4> 3.5


I will process all MIDI files obtaining data from each note of chord.

- If I process a **note**, I will store in the list a string representing the pitch (the note name) and the octave.

- If I process a **chord** (Remember that chords are set of notes that are played at the same time) I will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **I are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, I will consider them.

I are creating a big list with all the elements of all the compositions.

In [9]:
notes = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/small_dataset/*/*/*.mid")):
  midi = converter.parse(file)
  print('Parsing file ', i, " ",file)
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Nessun rimpianto.1.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Grazie mille.1.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).1.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.1.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/I'll Be Over You.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_proje

In [10]:
notes_validation = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/test/*.mid")):
  midi = converter.parse(file)
  print( 'Parsing file ', i, " ",file)
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes_validation.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes_validation.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes_validation, filepath)

Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/test/I Disappear.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/test/Hit the Lights.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/test/Fight Fire With Fire.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/test/Smile.mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/test/Another One Bites The Dust.2.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/test/Bicycle Race.1.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/test/Se tornerai.1.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_project/midi_songs/test/I'll Be Over You.mid


I obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [11]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

476

In [12]:
# Count different possible outputs valifation
print(len(set(notes_validation)))

287


**Preprocess data** \\
Now, there is some **data processing** that I have to do:

- I will map each pitch or chord to an integer
- I will create pairs of input sequences and its corresponding output note

I can try different **sequence_length** to obtain different results. In this first version, I will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 


In [13]:
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = np.reshape(network_input, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


In [14]:
# create a dictionary to map pitches to integers
note_to_int_valifation = dict((notes_validation, number) for number, notes_validation in enumerate(pitchnames))
network_input_validation = []
network_output_validation = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes_validation) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input_validation.append([note_to_int_valifation[char] for char in notes_validation[i:i + sequence_length]])
n_patterns = len(network_input_validation)
# reshape the input into a format compatible with LSTM layers
network_input_validation = np.reshape(network_input_validation, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


Let's see the new metwork_input size

In [15]:
network_input.shape

(135132, 32)

**Design neural network architecture** 

In [16]:
def create_network(sequence_length, n_vocab):
    """ create the structure of the neural network """
    model = CompressiveTransformer(
    num_tokens = n_vocab,
    dim = sequence_length,
    depth = 6,
    seq_len = sequence_length,
    mem_len = sequence_length,
    cmem_len = 256,
    cmem_ratio = 4,
    memory_layers = [5,6]
    )

    model = AutoregressiveWrapper(model)
    model.cuda()
    return model

In [17]:
model = create_network(sequence_length,n_vocab)

print(model)


AutoregressiveWrapper(
  (net): CompressiveTransformer(
    (token_emb): Embedding(476, 32)
    (to_model_dim): Identity()
    (to_logits): Sequential(
      (0): Identity()
      (1): Linear(in_features=32, out_features=476, bias=True)
    )
    (attn_layers): ModuleList(
      (0): GRUGating(
        (fn): PreNorm(
          (norm): LayerNorm((32,), eps=1e-05, elementwise_affine=True)
          (fn): SelfAttention(
            (compress_mem_fn): ConvCompress(
              (conv): Conv1d(32, 32, kernel_size=(4,), stride=(4,))
            )
            (to_q): Linear(in_features=32, out_features=32, bias=False)
            (to_kv): Linear(in_features=32, out_features=64, bias=False)
            (to_out): Linear(in_features=32, out_features=32, bias=True)
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (dropout): Dropout(p=0.0, inplace=False)
            (reconstruction_attn_dropout): Dropout(p=0.0, inplace=False)
          )
        )
        (gru): GRUCell(32, 3

In [18]:
def cycle(loader):
    while True:
        for data in loader:
          yield data


data_train = torch.from_numpy(network_input).cuda()
train_loader = torch.utils.data.DataLoader(data_train, batch_size=32) 
cycle_train_loader  = cycle(DataLoader(data_train, batch_size = data_train.shape[0]))
num_batches=math.ceil(data_train.shape[0]/batch_size) # Total number of batches

In [19]:
#Validation
data_validation = torch.from_numpy(network_input_validation).cuda()
validation_loader = torch.utils.data.DataLoader(data_validation, batch_size=32) 
cycle_validation_loader  = cycle(DataLoader(data_validation, batch_size = data_validation.shape[0]))
num_batches_val=math.ceil(data_validation.shape[0]/batch_size) # Total number of batches

In [31]:
# optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model.

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.


In [26]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load("/content/drive/MyDrive/ISPR_project/Transformer/model_32_best.pth.tar")
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [32]:
# training

for i in tqdm.tqdm(range(epochs), mininterval=20., desc='training'):
    model.train()
    tot_loss = 0.0
    is_best=0
    best_loss_value=n_vocab
    avg_loss_val=0
    for mlm_loss, aux_loss, is_last in model(next(cycle_train_loader), max_batch_size = batch_size, return_loss = True):
        loss = mlm_loss + aux_loss

        loss.backward()

        tot_loss+=loss;

        if is_last:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
            optimizer.step()
            optimizer.zero_grad()
    
    if i % VALIDATE_EVERY == 0 or i==epochs-1:
      model.eval()
      with torch.no_grad():
          for loss_val, aux_loss_val, is_last_val in model(next(cycle_validation_loader), max_batch_size = batch_size, return_loss = True):
            avg_loss_val+=loss_val/num_batches_val;

            if is_last_val:
              print(f'validation loss: {avg_loss_val.item():.4f}')


    avg_loss=tot_loss/num_batches

    if i%5==0 or i==epochs-1:
      if best_loss_value>avg_loss:
        best_loss_value=avg_loss;
        is_best=1

      save_checkpoint({
      'epoch': i,
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict' : optimizer.state_dict(),
      'loss':avg_loss.item(),
     }, is_best, 'Tran_32_Checkpoint'+str(i)+'_'+"{:.4f}".format(avg_loss.item())+'.pth.tar')
      is_best=0
    print(f'/n Epoch: {i} |Training loss: {avg_loss.item():.4f}')
print('Training complete.')







training:   0%|          | 0/2000 [00:00<?, ?it/s][A

validation loss: 6.8448



training:   0%|          | 1/2000 [00:54<30:19:57, 54.63s/it][A

/n Epoch: 0 |Training loss: 3.9707



training:   0%|          | 2/2000 [01:44<29:29:57, 53.15s/it][A

/n Epoch: 1 |Training loss: 6.9679



training:   0%|          | 3/2000 [02:33<28:51:02, 52.01s/it][A

/n Epoch: 2 |Training loss: 5.0047



training:   0%|          | 4/2000 [03:22<28:14:10, 50.93s/it][A

/n Epoch: 3 |Training loss: 4.8742



training:   0%|          | 5/2000 [04:09<27:42:55, 50.01s/it][A

/n Epoch: 4 |Training loss: 4.7228
validation loss: 4.7899



training:   0%|          | 6/2000 [05:01<27:55:19, 50.41s/it][A

/n Epoch: 5 |Training loss: 4.6481



training:   0%|          | 7/2000 [05:49<27:32:44, 49.76s/it][A

/n Epoch: 6 |Training loss: 4.6078



training:   0%|          | 8/2000 [06:37<27:10:50, 49.12s/it][A

/n Epoch: 7 |Training loss: 4.5901



training:   0%|          | 9/2000 [07:24<26:56:40, 48.72s/it][A

/n Epoch: 8 |Training loss: 4.5773



training:   0%|          | 10/2000 [08:12<26:48:20, 48.49s/it][A

/n Epoch: 9 |Training loss: 4.5798
validation loss: 4.7807



training:   1%|          | 11/2000 [09:04<27:16:14, 49.36s/it][A

/n Epoch: 10 |Training loss: 4.5794



training:   1%|          | 12/2000 [09:52<27:00:07, 48.90s/it][A

/n Epoch: 11 |Training loss: 4.5764



training:   1%|          | 13/2000 [10:40<26:50:23, 48.63s/it][A

/n Epoch: 12 |Training loss: 4.5707



training:   1%|          | 14/2000 [11:27<26:37:40, 48.27s/it][A

/n Epoch: 13 |Training loss: 4.5667



training:   1%|          | 15/2000 [12:15<26:30:54, 48.09s/it][A

/n Epoch: 14 |Training loss: 4.5702
validation loss: 4.7716



training:   1%|          | 16/2000 [13:06<27:01:36, 49.04s/it][A

/n Epoch: 15 |Training loss: 4.5669



training:   1%|          | 17/2000 [13:53<26:43:08, 48.51s/it][A

/n Epoch: 16 |Training loss: 4.5713



training:   1%|          | 18/2000 [14:40<26:28:46, 48.10s/it][A

/n Epoch: 17 |Training loss: 4.5663



training:   1%|          | 19/2000 [15:27<26:17:26, 47.78s/it][A

/n Epoch: 18 |Training loss: 4.5621



training:   1%|          | 20/2000 [16:15<26:12:10, 47.64s/it][A

/n Epoch: 19 |Training loss: 4.5537
validation loss: 4.7212



training:   1%|          | 21/2000 [17:05<26:42:07, 48.57s/it][A

/n Epoch: 20 |Training loss: 4.5564



training:   1%|          | 22/2000 [17:53<26:32:57, 48.32s/it][A

/n Epoch: 21 |Training loss: 4.5478



training:   1%|          | 23/2000 [18:42<26:35:18, 48.42s/it][A

/n Epoch: 22 |Training loss: 4.5423



training:   1%|          | 24/2000 [19:33<26:57:21, 49.11s/it][A

/n Epoch: 23 |Training loss: 4.5257



training:   1%|▏         | 25/2000 [20:23<27:06:01, 49.40s/it][A

/n Epoch: 24 |Training loss: 4.5178
validation loss: 4.7008



training:   1%|▏         | 26/2000 [21:16<27:48:49, 50.72s/it][A

/n Epoch: 25 |Training loss: 4.5041



training:   1%|▏         | 27/2000 [22:06<27:40:33, 50.50s/it][A

/n Epoch: 26 |Training loss: 4.4939



training:   1%|▏         | 28/2000 [22:58<27:49:53, 50.81s/it][A

/n Epoch: 27 |Training loss: 4.4761



training:   1%|▏         | 29/2000 [23:50<28:02:19, 51.21s/it][A

/n Epoch: 28 |Training loss: 4.4555



training:   2%|▏         | 30/2000 [24:43<28:13:38, 51.58s/it][A

/n Epoch: 29 |Training loss: 4.4404
validation loss: 4.6361



training:   2%|▏         | 31/2000 [25:39<29:04:45, 53.17s/it][A

/n Epoch: 30 |Training loss: 4.4416



training:   2%|▏         | 32/2000 [26:32<28:58:02, 52.99s/it][A

/n Epoch: 31 |Training loss: 4.4034



training:   2%|▏         | 33/2000 [27:25<28:52:45, 52.85s/it][A

/n Epoch: 32 |Training loss: 4.3865



training:   2%|▏         | 34/2000 [28:17<28:51:28, 52.84s/it][A

/n Epoch: 33 |Training loss: 4.3660



training:   2%|▏         | 35/2000 [29:10<28:48:32, 52.78s/it][A

/n Epoch: 34 |Training loss: 4.3477
validation loss: 4.5376



training:   2%|▏         | 36/2000 [30:07<29:26:51, 53.98s/it][A

/n Epoch: 35 |Training loss: 4.3748



training:   2%|▏         | 37/2000 [31:00<29:18:03, 53.74s/it][A

/n Epoch: 36 |Training loss: 4.3242



training:   2%|▏         | 38/2000 [31:53<29:07:54, 53.45s/it][A

/n Epoch: 37 |Training loss: 4.3173



training:   2%|▏         | 39/2000 [32:46<29:05:14, 53.40s/it][A

/n Epoch: 38 |Training loss: 4.3008



training:   2%|▏         | 40/2000 [33:39<29:03:10, 53.36s/it][A

/n Epoch: 39 |Training loss: 4.2841
validation loss: 4.4566



training:   2%|▏         | 41/2000 [34:36<29:39:57, 54.52s/it][A

/n Epoch: 40 |Training loss: 4.2806



training:   2%|▏         | 42/2000 [35:30<29:26:36, 54.14s/it][A

/n Epoch: 41 |Training loss: 4.2639



training:   2%|▏         | 43/2000 [36:24<29:30:42, 54.29s/it][A

/n Epoch: 42 |Training loss: 4.2595



training:   2%|▏         | 44/2000 [37:18<29:23:53, 54.11s/it][A

/n Epoch: 43 |Training loss: 4.2405



training:   2%|▏         | 45/2000 [38:11<29:14:20, 53.84s/it][A

/n Epoch: 44 |Training loss: 4.2304
validation loss: 4.4133



training:   2%|▏         | 46/2000 [39:08<29:42:48, 54.74s/it][A

/n Epoch: 45 |Training loss: 4.2209



training:   2%|▏         | 47/2000 [40:00<29:10:00, 53.76s/it][A

/n Epoch: 46 |Training loss: 4.2041



training:   2%|▏         | 48/2000 [40:52<29:00:12, 53.49s/it][A

/n Epoch: 47 |Training loss: 4.1876



training:   2%|▏         | 49/2000 [41:46<29:03:09, 53.61s/it][A

/n Epoch: 48 |Training loss: 4.1777



training:   2%|▎         | 50/2000 [42:38<28:46:35, 53.13s/it][A

/n Epoch: 49 |Training loss: 4.1651
validation loss: 4.3751



training:   3%|▎         | 51/2000 [43:35<29:20:47, 54.21s/it][A

/n Epoch: 50 |Training loss: 4.1625



training:   3%|▎         | 52/2000 [44:27<28:56:59, 53.50s/it][A

/n Epoch: 51 |Training loss: 4.1519



training:   3%|▎         | 53/2000 [45:17<28:25:07, 52.55s/it][A

/n Epoch: 52 |Training loss: 4.1374



training:   3%|▎         | 54/2000 [46:07<28:00:01, 51.80s/it][A

/n Epoch: 53 |Training loss: 4.1231



training:   3%|▎         | 55/2000 [46:58<27:44:15, 51.34s/it][A

/n Epoch: 54 |Training loss: 4.1155
validation loss: 4.3197



training:   3%|▎         | 56/2000 [47:53<28:19:09, 52.44s/it][A

/n Epoch: 55 |Training loss: 4.0979



training:   3%|▎         | 57/2000 [48:44<28:07:23, 52.11s/it][A

/n Epoch: 56 |Training loss: 4.1067



training:   3%|▎         | 58/2000 [49:35<27:57:10, 51.82s/it][A

/n Epoch: 57 |Training loss: 4.0792



training:   3%|▎         | 59/2000 [50:25<27:34:58, 51.16s/it][A

/n Epoch: 58 |Training loss: 4.0720



training:   3%|▎         | 60/2000 [51:14<27:13:01, 50.51s/it][A

/n Epoch: 59 |Training loss: 4.0586
validation loss: 4.2400



training:   3%|▎         | 61/2000 [52:06<27:27:10, 50.97s/it][A

/n Epoch: 60 |Training loss: 4.0404



training:   3%|▎         | 62/2000 [52:56<27:18:04, 50.71s/it][A

/n Epoch: 61 |Training loss: 4.0206



training:   3%|▎         | 63/2000 [53:45<27:02:25, 50.26s/it][A

/n Epoch: 62 |Training loss: 4.0058



training:   3%|▎         | 64/2000 [54:33<26:42:52, 49.68s/it][A

/n Epoch: 63 |Training loss: 3.9942



training:   3%|▎         | 65/2000 [55:21<26:26:12, 49.18s/it][A

/n Epoch: 64 |Training loss: 3.9748
validation loss: 4.1991



training:   3%|▎         | 66/2000 [56:13<26:48:07, 49.89s/it][A

/n Epoch: 65 |Training loss: 4.0155



training:   3%|▎         | 67/2000 [57:01<26:26:12, 49.24s/it][A

/n Epoch: 66 |Training loss: 3.9598



training:   3%|▎         | 68/2000 [57:49<26:12:30, 48.84s/it][A

/n Epoch: 67 |Training loss: 3.9810



training:   3%|▎         | 69/2000 [58:36<26:00:48, 48.50s/it][A

/n Epoch: 68 |Training loss: 3.9771



training:   4%|▎         | 70/2000 [59:25<26:00:55, 48.53s/it][A

/n Epoch: 69 |Training loss: 3.9642
validation loss: 4.1704



training:   4%|▎         | 71/2000 [1:00:17<26:38:52, 49.73s/it][A

/n Epoch: 70 |Training loss: 3.9286



training:   4%|▎         | 72/2000 [1:01:06<26:31:03, 49.51s/it][A

/n Epoch: 71 |Training loss: 3.9279



training:   4%|▎         | 73/2000 [1:01:55<26:23:05, 49.29s/it][A

/n Epoch: 72 |Training loss: 3.9097



training:   4%|▎         | 74/2000 [1:02:43<26:11:35, 48.96s/it][A

/n Epoch: 73 |Training loss: 3.8928



training:   4%|▍         | 75/2000 [1:03:32<26:03:47, 48.74s/it][A

/n Epoch: 74 |Training loss: 3.8897
validation loss: 4.0885



training:   4%|▍         | 76/2000 [1:04:23<26:30:25, 49.60s/it][A

/n Epoch: 75 |Training loss: 3.8681



training:   4%|▍         | 77/2000 [1:05:12<26:21:14, 49.34s/it][A

/n Epoch: 76 |Training loss: 3.8583



training:   4%|▍         | 78/2000 [1:06:00<26:09:58, 49.01s/it][A

/n Epoch: 77 |Training loss: 3.8545



training:   4%|▍         | 79/2000 [1:06:48<25:57:25, 48.64s/it][A

/n Epoch: 78 |Training loss: 3.8405



training:   4%|▍         | 80/2000 [1:07:36<25:48:03, 48.38s/it][A

/n Epoch: 79 |Training loss: 3.8279
validation loss: 4.0430



training:   4%|▍         | 81/2000 [1:08:27<26:18:34, 49.36s/it][A

/n Epoch: 80 |Training loss: 3.8143



training:   4%|▍         | 82/2000 [1:09:15<26:03:50, 48.92s/it][A

/n Epoch: 81 |Training loss: 3.8096



training:   4%|▍         | 83/2000 [1:10:03<25:54:59, 48.67s/it][A

/n Epoch: 82 |Training loss: 3.8022



training:   4%|▍         | 84/2000 [1:10:51<25:44:44, 48.37s/it][A

/n Epoch: 83 |Training loss: 3.7884



training:   4%|▍         | 85/2000 [1:11:39<25:35:36, 48.11s/it][A

/n Epoch: 84 |Training loss: 3.7840
validation loss: 4.0029



training:   4%|▍         | 86/2000 [1:12:30<26:05:31, 49.08s/it][A

/n Epoch: 85 |Training loss: 3.7707



training:   4%|▍         | 87/2000 [1:13:17<25:49:49, 48.61s/it][A

/n Epoch: 86 |Training loss: 3.7744



training:   4%|▍         | 88/2000 [1:14:05<25:40:45, 48.35s/it][A

/n Epoch: 87 |Training loss: 3.7607



training:   4%|▍         | 89/2000 [1:14:53<25:32:43, 48.12s/it][A

/n Epoch: 88 |Training loss: 3.7584



training:   4%|▍         | 90/2000 [1:15:40<25:26:06, 47.94s/it][A

/n Epoch: 89 |Training loss: 3.7514
validation loss: 3.9729



training:   5%|▍         | 91/2000 [1:16:31<25:55:16, 48.88s/it][A

/n Epoch: 90 |Training loss: 3.7355



training:   5%|▍         | 92/2000 [1:17:19<25:41:09, 48.46s/it][A

/n Epoch: 91 |Training loss: 3.7462



training:   5%|▍         | 93/2000 [1:18:06<25:30:45, 48.16s/it][A

/n Epoch: 92 |Training loss: 3.7266



training:   5%|▍         | 94/2000 [1:18:53<25:21:14, 47.89s/it][A

/n Epoch: 93 |Training loss: 3.7515



training:   5%|▍         | 95/2000 [1:19:41<25:17:50, 47.81s/it][A

/n Epoch: 94 |Training loss: 3.7438
validation loss: 3.9401



training:   5%|▍         | 96/2000 [1:20:33<25:53:51, 48.97s/it][A

/n Epoch: 95 |Training loss: 3.7096



training:   5%|▍         | 97/2000 [1:21:20<25:39:55, 48.55s/it][A

/n Epoch: 96 |Training loss: 3.7075



training:   5%|▍         | 98/2000 [1:22:08<25:32:27, 48.34s/it][A

/n Epoch: 97 |Training loss: 3.6978



training:   5%|▍         | 99/2000 [1:22:56<25:25:04, 48.14s/it][A

/n Epoch: 98 |Training loss: 3.6896



training:   5%|▌         | 100/2000 [1:23:44<25:21:09, 48.04s/it][A

/n Epoch: 99 |Training loss: 3.6880
validation loss: 3.8979



training:   5%|▌         | 101/2000 [1:24:35<25:49:43, 48.96s/it][A

/n Epoch: 100 |Training loss: 3.6763



training:   5%|▌         | 102/2000 [1:25:23<25:37:52, 48.62s/it][A

/n Epoch: 101 |Training loss: 3.6725



training:   5%|▌         | 103/2000 [1:26:10<25:26:35, 48.28s/it][A

/n Epoch: 102 |Training loss: 3.6653



training:   5%|▌         | 104/2000 [1:26:58<25:18:02, 48.04s/it][A

/n Epoch: 103 |Training loss: 3.6783



training:   5%|▌         | 105/2000 [1:27:45<25:13:03, 47.91s/it][A

/n Epoch: 104 |Training loss: 3.6668
validation loss: 3.8650



training:   5%|▌         | 106/2000 [1:28:37<25:49:41, 49.09s/it][A

/n Epoch: 105 |Training loss: 3.6561



training:   5%|▌         | 107/2000 [1:29:25<25:42:46, 48.90s/it][A

/n Epoch: 106 |Training loss: 3.6412



training:   5%|▌         | 108/2000 [1:30:15<25:50:20, 49.17s/it][A

/n Epoch: 107 |Training loss: 3.6282



training:   5%|▌         | 109/2000 [1:31:04<25:42:41, 48.95s/it][A

/n Epoch: 108 |Training loss: 3.6219



training:   6%|▌         | 110/2000 [1:31:52<25:37:55, 48.82s/it][A

/n Epoch: 109 |Training loss: 3.6244
validation loss: 3.8257



training:   6%|▌         | 111/2000 [1:32:44<26:01:02, 49.58s/it][A

/n Epoch: 110 |Training loss: 3.6161



training:   6%|▌         | 112/2000 [1:33:31<25:41:29, 48.99s/it][A

/n Epoch: 111 |Training loss: 3.5998



training:   6%|▌         | 113/2000 [1:34:19<25:30:08, 48.65s/it][A

/n Epoch: 112 |Training loss: 3.6026



training:   6%|▌         | 114/2000 [1:35:07<25:24:13, 48.49s/it][A

/n Epoch: 113 |Training loss: 3.5890



training:   6%|▌         | 115/2000 [1:35:55<25:15:10, 48.23s/it][A

/n Epoch: 114 |Training loss: 3.5803
validation loss: 3.7915



training:   6%|▌         | 116/2000 [1:36:46<25:43:54, 49.17s/it][A

/n Epoch: 115 |Training loss: 3.5793



training:   6%|▌         | 117/2000 [1:37:33<25:25:42, 48.62s/it][A

/n Epoch: 116 |Training loss: 3.5699



training:   6%|▌         | 118/2000 [1:38:21<25:13:27, 48.25s/it][A

/n Epoch: 117 |Training loss: 3.5707



training:   6%|▌         | 119/2000 [1:39:09<25:07:06, 48.07s/it][A

/n Epoch: 118 |Training loss: 3.5500



training:   6%|▌         | 120/2000 [1:39:56<25:00:52, 47.90s/it][A

/n Epoch: 119 |Training loss: 3.6008
validation loss: 3.7642



training:   6%|▌         | 121/2000 [1:40:48<25:33:44, 48.98s/it][A

/n Epoch: 120 |Training loss: 3.6034



training:   6%|▌         | 122/2000 [1:41:35<25:19:38, 48.55s/it][A

/n Epoch: 121 |Training loss: 3.5389



training:   6%|▌         | 123/2000 [1:42:23<25:10:46, 48.29s/it][A

/n Epoch: 122 |Training loss: 3.5876



training:   6%|▌         | 124/2000 [1:43:10<25:02:38, 48.06s/it][A

/n Epoch: 123 |Training loss: 3.5970



training:   6%|▋         | 125/2000 [1:43:58<25:00:19, 48.01s/it][A

/n Epoch: 124 |Training loss: 3.5362
validation loss: 3.8533



training:   6%|▋         | 126/2000 [1:44:50<25:31:20, 49.03s/it][A

/n Epoch: 125 |Training loss: 3.5671



training:   6%|▋         | 127/2000 [1:45:37<25:14:28, 48.52s/it][A

/n Epoch: 126 |Training loss: 3.6152



training:   6%|▋         | 128/2000 [1:46:25<25:05:35, 48.26s/it][A

/n Epoch: 127 |Training loss: 3.5519



training:   6%|▋         | 129/2000 [1:47:12<24:55:43, 47.97s/it][A

/n Epoch: 128 |Training loss: 3.5233



training:   6%|▋         | 130/2000 [1:47:59<24:50:37, 47.83s/it][A

/n Epoch: 129 |Training loss: 3.5396
validation loss: 3.7209



training:   7%|▋         | 131/2000 [1:48:51<25:25:10, 48.96s/it][A

/n Epoch: 130 |Training loss: 3.5336



training:   7%|▋         | 132/2000 [1:49:40<25:23:53, 48.95s/it][A

/n Epoch: 131 |Training loss: 3.5039



training:   7%|▋         | 133/2000 [1:50:28<25:14:00, 48.66s/it][A

/n Epoch: 132 |Training loss: 3.5144



training:   7%|▋         | 134/2000 [1:51:16<25:09:45, 48.55s/it][A

/n Epoch: 133 |Training loss: 3.5092



training:   7%|▋         | 135/2000 [1:52:05<25:08:21, 48.53s/it][A

/n Epoch: 134 |Training loss: 3.4933
validation loss: 3.6902



training:   7%|▋         | 136/2000 [1:52:57<25:40:23, 49.58s/it][A

/n Epoch: 135 |Training loss: 3.4975



training:   7%|▋         | 137/2000 [1:53:45<25:24:36, 49.10s/it][A

/n Epoch: 136 |Training loss: 3.4859



training:   7%|▋         | 138/2000 [1:54:33<25:20:21, 48.99s/it][A

/n Epoch: 137 |Training loss: 3.4734



training:   7%|▋         | 139/2000 [1:55:22<25:16:18, 48.89s/it][A

/n Epoch: 138 |Training loss: 3.4773



training:   7%|▋         | 140/2000 [1:56:11<25:12:20, 48.79s/it][A

/n Epoch: 139 |Training loss: 3.4692
validation loss: 3.6604



training:   7%|▋         | 141/2000 [1:57:03<25:42:09, 49.77s/it][A

/n Epoch: 140 |Training loss: 3.4545



training:   7%|▋         | 142/2000 [1:57:51<25:28:05, 49.35s/it][A

/n Epoch: 141 |Training loss: 3.4596



training:   7%|▋         | 143/2000 [1:58:39<25:17:09, 49.02s/it][A

/n Epoch: 142 |Training loss: 3.4528



training:   7%|▋         | 144/2000 [1:59:28<25:10:05, 48.82s/it][A

/n Epoch: 143 |Training loss: 3.4429



training:   7%|▋         | 145/2000 [2:00:15<25:00:43, 48.54s/it][A

/n Epoch: 144 |Training loss: 3.4398
validation loss: 3.6267



training:   7%|▋         | 146/2000 [2:01:07<25:29:52, 49.51s/it][A

/n Epoch: 145 |Training loss: 3.4264



training:   7%|▋         | 147/2000 [2:01:56<25:23:39, 49.34s/it][A

/n Epoch: 146 |Training loss: 3.4208



training:   7%|▋         | 148/2000 [2:02:45<25:14:03, 49.05s/it][A

/n Epoch: 147 |Training loss: 3.4257



training:   7%|▋         | 149/2000 [2:03:34<25:13:56, 49.07s/it][A

/n Epoch: 148 |Training loss: 3.4099



training:   8%|▊         | 150/2000 [2:04:23<25:14:38, 49.12s/it][A

/n Epoch: 149 |Training loss: 3.4190
validation loss: 3.6255



training:   8%|▊         | 151/2000 [2:05:16<25:52:31, 50.38s/it][A

/n Epoch: 150 |Training loss: 3.4170



training:   8%|▊         | 152/2000 [2:06:06<25:46:13, 50.20s/it][A

/n Epoch: 151 |Training loss: 3.4193



training:   8%|▊         | 153/2000 [2:06:55<25:38:01, 49.96s/it][A

/n Epoch: 152 |Training loss: 3.3996



training:   8%|▊         | 154/2000 [2:07:45<25:34:15, 49.87s/it][A

/n Epoch: 153 |Training loss: 3.4123



training:   8%|▊         | 155/2000 [2:08:34<25:23:18, 49.54s/it][A

/n Epoch: 154 |Training loss: 3.4131
validation loss: 3.5880



training:   8%|▊         | 156/2000 [2:09:26<25:45:31, 50.29s/it][A

/n Epoch: 155 |Training loss: 3.3881



training:   8%|▊         | 157/2000 [2:10:14<25:26:37, 49.70s/it][A

/n Epoch: 156 |Training loss: 3.3779



training:   8%|▊         | 158/2000 [2:11:02<25:11:49, 49.25s/it][A

/n Epoch: 157 |Training loss: 3.3722



training:   8%|▊         | 159/2000 [2:11:51<25:01:36, 48.94s/it][A

/n Epoch: 158 |Training loss: 3.3678



training:   8%|▊         | 160/2000 [2:12:39<24:54:39, 48.74s/it][A

/n Epoch: 159 |Training loss: 3.3565
validation loss: 3.5551



training:   8%|▊         | 161/2000 [2:13:31<25:22:12, 49.66s/it][A

/n Epoch: 160 |Training loss: 3.3580



training:   8%|▊         | 162/2000 [2:14:19<25:06:44, 49.19s/it][A

/n Epoch: 161 |Training loss: 3.3432



training:   8%|▊         | 163/2000 [2:15:07<24:54:20, 48.81s/it][A

/n Epoch: 162 |Training loss: 3.3809



training:   8%|▊         | 164/2000 [2:15:55<24:44:22, 48.51s/it][A

/n Epoch: 163 |Training loss: 3.3787



training:   8%|▊         | 165/2000 [2:16:42<24:35:53, 48.26s/it][A

/n Epoch: 164 |Training loss: 3.3482
validation loss: 3.5312



training:   8%|▊         | 166/2000 [2:17:34<25:06:29, 49.29s/it][A

/n Epoch: 165 |Training loss: 3.3463



training:   8%|▊         | 167/2000 [2:18:22<24:53:30, 48.89s/it][A

/n Epoch: 166 |Training loss: 3.3309



training:   8%|▊         | 168/2000 [2:19:10<24:41:33, 48.52s/it][A

/n Epoch: 167 |Training loss: 3.3274



training:   8%|▊         | 169/2000 [2:19:57<24:34:23, 48.31s/it][A

/n Epoch: 168 |Training loss: 3.3213



training:   8%|▊         | 170/2000 [2:20:45<24:26:45, 48.09s/it][A

/n Epoch: 169 |Training loss: 3.3082
validation loss: 3.5073



training:   9%|▊         | 171/2000 [2:21:36<24:57:10, 49.11s/it][A

/n Epoch: 170 |Training loss: 3.3041



training:   9%|▊         | 172/2000 [2:22:25<24:48:56, 48.87s/it][A

/n Epoch: 171 |Training loss: 3.3015



training:   9%|▊         | 173/2000 [2:23:13<24:38:22, 48.55s/it][A

/n Epoch: 172 |Training loss: 3.2882



training:   9%|▊         | 174/2000 [2:24:00<24:27:53, 48.23s/it][A

/n Epoch: 173 |Training loss: 3.3223



training:   9%|▉         | 175/2000 [2:24:48<24:23:34, 48.12s/it][A

/n Epoch: 174 |Training loss: 3.3220
validation loss: 3.5015



training:   9%|▉         | 176/2000 [2:25:39<24:51:43, 49.07s/it][A

/n Epoch: 175 |Training loss: 3.2937



training:   9%|▉         | 177/2000 [2:26:27<24:39:16, 48.69s/it][A

/n Epoch: 176 |Training loss: 3.2885



training:   9%|▉         | 178/2000 [2:27:15<24:28:30, 48.36s/it][A

/n Epoch: 177 |Training loss: 3.2770



training:   9%|▉         | 179/2000 [2:28:02<24:23:21, 48.22s/it][A

/n Epoch: 178 |Training loss: 3.2778



training:   9%|▉         | 180/2000 [2:28:50<24:16:09, 48.01s/it][A

/n Epoch: 179 |Training loss: 3.2600
validation loss: 3.4998



training:   9%|▉         | 181/2000 [2:29:41<24:46:52, 49.04s/it][A

/n Epoch: 180 |Training loss: 3.2879



training:   9%|▉         | 182/2000 [2:30:29<24:35:00, 48.68s/it][A

/n Epoch: 181 |Training loss: 3.2829



training:   9%|▉         | 183/2000 [2:31:17<24:22:52, 48.31s/it][A

/n Epoch: 182 |Training loss: 3.2506



training:   9%|▉         | 184/2000 [2:32:05<24:23:20, 48.35s/it][A

/n Epoch: 183 |Training loss: 3.2572



training:   9%|▉         | 185/2000 [2:32:54<24:22:42, 48.35s/it][A

/n Epoch: 184 |Training loss: 3.2466
validation loss: 3.4614



training:   9%|▉         | 186/2000 [2:33:45<24:52:34, 49.37s/it][A

/n Epoch: 185 |Training loss: 3.2594



training:   9%|▉         | 187/2000 [2:34:34<24:42:04, 49.05s/it][A

/n Epoch: 186 |Training loss: 3.2575



training:   9%|▉         | 188/2000 [2:35:22<24:34:26, 48.82s/it][A

/n Epoch: 187 |Training loss: 3.2477



training:   9%|▉         | 189/2000 [2:36:10<24:25:05, 48.54s/it][A

/n Epoch: 188 |Training loss: 3.2438



training:  10%|▉         | 190/2000 [2:36:58<24:19:40, 48.39s/it][A

/n Epoch: 189 |Training loss: 3.2326
validation loss: 3.4223



training:  10%|▉         | 191/2000 [2:37:50<24:50:14, 49.43s/it][A

/n Epoch: 190 |Training loss: 3.2224



training:  10%|▉         | 192/2000 [2:38:37<24:35:21, 48.96s/it][A

/n Epoch: 191 |Training loss: 3.2186



training:  10%|▉         | 193/2000 [2:39:25<24:23:09, 48.58s/it][A

/n Epoch: 192 |Training loss: 3.2037



training:  10%|▉         | 194/2000 [2:40:13<24:14:18, 48.32s/it][A

/n Epoch: 193 |Training loss: 3.1952



training:  10%|▉         | 195/2000 [2:41:01<24:09:46, 48.19s/it][A

/n Epoch: 194 |Training loss: 3.1909
validation loss: 3.3888



training:  10%|▉         | 196/2000 [2:41:52<24:35:54, 49.09s/it][A

/n Epoch: 195 |Training loss: 3.1810



training:  10%|▉         | 197/2000 [2:42:40<24:23:10, 48.69s/it][A

/n Epoch: 196 |Training loss: 3.1800



training:  10%|▉         | 198/2000 [2:43:28<24:17:14, 48.52s/it][A

/n Epoch: 197 |Training loss: 3.1752



training:  10%|▉         | 199/2000 [2:44:16<24:13:37, 48.43s/it][A

/n Epoch: 198 |Training loss: 3.1620



training:  10%|█         | 200/2000 [2:45:04<24:10:04, 48.34s/it][A

/n Epoch: 199 |Training loss: 3.1664
validation loss: 3.3546



training:  10%|█         | 201/2000 [2:45:56<24:36:39, 49.25s/it][A

/n Epoch: 200 |Training loss: 3.1637



training:  10%|█         | 202/2000 [2:46:44<24:30:24, 49.07s/it][A

/n Epoch: 201 |Training loss: 3.1467



training:  10%|█         | 203/2000 [2:47:33<24:22:58, 48.85s/it][A

/n Epoch: 202 |Training loss: 3.1502



training:  10%|█         | 204/2000 [2:48:21<24:17:29, 48.69s/it][A

/n Epoch: 203 |Training loss: 3.1427



training:  10%|█         | 205/2000 [2:49:09<24:12:48, 48.56s/it][A

/n Epoch: 204 |Training loss: 3.1386
validation loss: 3.3336



training:  10%|█         | 206/2000 [2:50:01<24:40:16, 49.51s/it][A

/n Epoch: 205 |Training loss: 3.1272



training:  10%|█         | 207/2000 [2:50:49<24:26:07, 49.06s/it][A

/n Epoch: 206 |Training loss: 3.1227



training:  10%|█         | 208/2000 [2:51:37<24:13:50, 48.68s/it][A

/n Epoch: 207 |Training loss: 3.1206



training:  10%|█         | 209/2000 [2:52:24<24:04:16, 48.38s/it][A

/n Epoch: 208 |Training loss: 3.1118



training:  10%|█         | 210/2000 [2:53:13<24:06:36, 48.49s/it][A

/n Epoch: 209 |Training loss: 3.1069
validation loss: 3.3124



training:  11%|█         | 211/2000 [2:54:06<24:41:52, 49.70s/it][A

/n Epoch: 210 |Training loss: 3.0991



training:  11%|█         | 212/2000 [2:54:55<24:36:41, 49.55s/it][A

/n Epoch: 211 |Training loss: 3.0998



training:  11%|█         | 213/2000 [2:55:44<24:35:49, 49.55s/it][A

/n Epoch: 212 |Training loss: 3.0885



training:  11%|█         | 214/2000 [2:56:32<24:20:47, 49.07s/it][A

/n Epoch: 213 |Training loss: 3.0811



training:  11%|█         | 215/2000 [2:57:20<24:11:12, 48.78s/it][A

/n Epoch: 214 |Training loss: 3.0714
validation loss: 3.2721



training:  11%|█         | 216/2000 [2:58:12<24:38:58, 49.74s/it][A

/n Epoch: 215 |Training loss: 3.0726



training:  11%|█         | 217/2000 [2:59:00<24:22:12, 49.20s/it][A

/n Epoch: 216 |Training loss: 3.0621



training:  11%|█         | 218/2000 [2:59:48<24:09:15, 48.80s/it][A

/n Epoch: 217 |Training loss: 3.0602



training:  11%|█         | 219/2000 [3:00:36<23:59:35, 48.50s/it][A

/n Epoch: 218 |Training loss: 3.0589



training:  11%|█         | 220/2000 [3:01:24<23:52:59, 48.30s/it][A

/n Epoch: 219 |Training loss: 3.0403
validation loss: 3.2524



training:  11%|█         | 221/2000 [3:02:15<24:19:38, 49.23s/it][A

/n Epoch: 220 |Training loss: 3.0597



training:  11%|█         | 222/2000 [3:03:03<24:04:28, 48.74s/it][A

/n Epoch: 221 |Training loss: 3.0434



training:  11%|█         | 223/2000 [3:03:51<23:56:16, 48.50s/it][A

/n Epoch: 222 |Training loss: 3.0609



training:  11%|█         | 224/2000 [3:04:38<23:46:25, 48.19s/it][A

/n Epoch: 223 |Training loss: 3.0369



training:  11%|█▏        | 225/2000 [3:05:26<23:42:46, 48.09s/it][A

/n Epoch: 224 |Training loss: 3.0763
validation loss: 3.2416



training:  11%|█▏        | 226/2000 [3:06:17<24:11:05, 49.08s/it][A

/n Epoch: 225 |Training loss: 3.0692



training:  11%|█▏        | 227/2000 [3:07:05<23:57:00, 48.63s/it][A

/n Epoch: 226 |Training loss: 3.0318



training:  11%|█▏        | 228/2000 [3:07:53<23:47:16, 48.33s/it][A

/n Epoch: 227 |Training loss: 3.0312



training:  11%|█▏        | 229/2000 [3:08:40<23:39:02, 48.08s/it][A

/n Epoch: 228 |Training loss: 3.0167



training:  12%|█▏        | 230/2000 [3:09:28<23:35:12, 47.97s/it][A

/n Epoch: 229 |Training loss: 3.0070
validation loss: 3.2081



training:  12%|█▏        | 231/2000 [3:10:19<24:05:17, 49.02s/it][A

/n Epoch: 230 |Training loss: 3.0190



training:  12%|█▏        | 232/2000 [3:11:07<23:56:14, 48.74s/it][A

/n Epoch: 231 |Training loss: 2.9965



training:  12%|█▏        | 233/2000 [3:11:55<23:47:25, 48.47s/it][A

/n Epoch: 232 |Training loss: 3.0438



training:  12%|█▏        | 234/2000 [3:12:43<23:39:36, 48.23s/it][A

/n Epoch: 233 |Training loss: 3.0504



training:  12%|█▏        | 235/2000 [3:13:30<23:32:31, 48.02s/it][A

/n Epoch: 234 |Training loss: 3.0083
validation loss: 3.2096



training:  12%|█▏        | 236/2000 [3:14:22<24:06:50, 49.21s/it][A

/n Epoch: 235 |Training loss: 3.0024



training:  12%|█▏        | 237/2000 [3:15:10<23:50:46, 48.69s/it][A

/n Epoch: 236 |Training loss: 3.0037



training:  12%|█▏        | 238/2000 [3:15:58<23:42:58, 48.46s/it][A

/n Epoch: 237 |Training loss: 2.9901



training:  12%|█▏        | 239/2000 [3:16:45<23:32:13, 48.12s/it][A

/n Epoch: 238 |Training loss: 2.9824



training:  12%|█▏        | 240/2000 [3:17:33<23:30:33, 48.09s/it][A

/n Epoch: 239 |Training loss: 2.9763
validation loss: 3.1651



training:  12%|█▏        | 241/2000 [3:18:25<23:59:44, 49.11s/it][A

/n Epoch: 240 |Training loss: 2.9689



training:  12%|█▏        | 242/2000 [3:19:13<23:49:10, 48.78s/it][A

/n Epoch: 241 |Training loss: 2.9586



training:  12%|█▏        | 243/2000 [3:20:00<23:38:44, 48.45s/it][A

/n Epoch: 242 |Training loss: 2.9472



training:  12%|█▏        | 244/2000 [3:20:48<23:31:29, 48.23s/it][A

/n Epoch: 243 |Training loss: 2.9528



training:  12%|█▏        | 245/2000 [3:21:36<23:24:39, 48.02s/it][A

/n Epoch: 244 |Training loss: 2.9656
validation loss: 3.1748



training:  12%|█▏        | 246/2000 [3:22:27<23:55:41, 49.11s/it][A

/n Epoch: 245 |Training loss: 2.9520



training:  12%|█▏        | 247/2000 [3:23:15<23:42:33, 48.69s/it][A

/n Epoch: 246 |Training loss: 2.9688



training:  12%|█▏        | 248/2000 [3:24:03<23:35:21, 48.47s/it][A

/n Epoch: 247 |Training loss: 2.9459



training:  12%|█▏        | 249/2000 [3:24:51<23:31:21, 48.36s/it][A

/n Epoch: 248 |Training loss: 2.9559



training:  12%|█▎        | 250/2000 [3:25:39<23:26:45, 48.23s/it][A

/n Epoch: 249 |Training loss: 2.9490
validation loss: 3.1282



training:  13%|█▎        | 251/2000 [3:26:31<23:55:41, 49.25s/it][A

/n Epoch: 250 |Training loss: 2.9366



training:  13%|█▎        | 252/2000 [3:27:18<23:41:32, 48.79s/it][A

/n Epoch: 251 |Training loss: 2.9336



training:  13%|█▎        | 253/2000 [3:28:06<23:32:11, 48.50s/it][A

/n Epoch: 252 |Training loss: 2.9274



training:  13%|█▎        | 254/2000 [3:28:54<23:24:10, 48.25s/it][A

/n Epoch: 253 |Training loss: 2.9156



training:  13%|█▎        | 255/2000 [3:29:42<23:21:19, 48.18s/it][A

/n Epoch: 254 |Training loss: 2.9197
validation loss: 3.0929



training:  13%|█▎        | 256/2000 [3:30:33<23:48:57, 49.16s/it][A

/n Epoch: 255 |Training loss: 2.9158



training:  13%|█▎        | 257/2000 [3:31:21<23:34:55, 48.71s/it][A

/n Epoch: 256 |Training loss: 2.9016



training:  13%|█▎        | 258/2000 [3:32:09<23:26:20, 48.44s/it][A

/n Epoch: 257 |Training loss: 2.8900



training:  13%|█▎        | 259/2000 [3:32:56<23:16:32, 48.13s/it][A

/n Epoch: 258 |Training loss: 2.8973



training:  13%|█▎        | 260/2000 [3:33:44<23:14:04, 48.07s/it][A

/n Epoch: 259 |Training loss: 2.8758
validation loss: 3.0758



training:  13%|█▎        | 261/2000 [3:34:35<23:41:29, 49.04s/it][A

/n Epoch: 260 |Training loss: 2.9018



training:  13%|█▎        | 262/2000 [3:35:23<23:28:45, 48.63s/it][A

/n Epoch: 261 |Training loss: 2.8952



training:  13%|█▎        | 263/2000 [3:36:11<23:19:56, 48.36s/it][A

/n Epoch: 262 |Training loss: 2.8685



training:  13%|█▎        | 264/2000 [3:36:58<23:11:39, 48.10s/it][A

/n Epoch: 263 |Training loss: 2.8726



training:  13%|█▎        | 265/2000 [3:37:46<23:05:39, 47.92s/it][A

/n Epoch: 264 |Training loss: 2.8629
validation loss: 3.0411



training:  13%|█▎        | 266/2000 [3:38:37<23:32:17, 48.87s/it][A

/n Epoch: 265 |Training loss: 2.8536



training:  13%|█▎        | 267/2000 [3:39:24<23:18:21, 48.41s/it][A

/n Epoch: 266 |Training loss: 2.8593



training:  13%|█▎        | 268/2000 [3:40:12<23:12:35, 48.24s/it][A

/n Epoch: 267 |Training loss: 2.8524



training:  13%|█▎        | 269/2000 [3:41:00<23:05:23, 48.02s/it][A

/n Epoch: 268 |Training loss: 2.8366



training:  14%|█▎        | 270/2000 [3:41:47<23:01:31, 47.91s/it][A

/n Epoch: 269 |Training loss: 2.8416
validation loss: 3.0134



training:  14%|█▎        | 271/2000 [3:42:39<23:29:46, 48.92s/it][A

/n Epoch: 270 |Training loss: 2.8378



training:  14%|█▎        | 272/2000 [3:43:26<23:15:20, 48.45s/it][A

/n Epoch: 271 |Training loss: 2.8235



training:  14%|█▎        | 273/2000 [3:44:13<23:06:16, 48.16s/it][A

/n Epoch: 272 |Training loss: 2.8254



training:  14%|█▎        | 274/2000 [3:45:01<23:00:28, 47.99s/it][A

/n Epoch: 273 |Training loss: 2.8183



training:  14%|█▍        | 275/2000 [3:45:49<22:56:14, 47.87s/it][A

/n Epoch: 274 |Training loss: 2.8328
validation loss: 2.9880



training:  14%|█▍        | 276/2000 [3:46:40<23:24:24, 48.88s/it][A

/n Epoch: 275 |Training loss: 2.8223



training:  14%|█▍        | 277/2000 [3:47:27<23:12:32, 48.49s/it][A

/n Epoch: 276 |Training loss: 2.8213



training:  14%|█▍        | 278/2000 [3:48:15<23:05:50, 48.29s/it][A

/n Epoch: 277 |Training loss: 2.8068



training:  14%|█▍        | 279/2000 [3:49:03<22:59:10, 48.08s/it][A

/n Epoch: 278 |Training loss: 2.8010



training:  14%|█▍        | 280/2000 [3:49:51<22:56:38, 48.02s/it][A

/n Epoch: 279 |Training loss: 2.7996
validation loss: 2.9527



training:  14%|█▍        | 281/2000 [3:50:43<23:29:42, 49.20s/it][A

/n Epoch: 280 |Training loss: 2.7943



training:  14%|█▍        | 282/2000 [3:51:30<23:15:00, 48.72s/it][A

/n Epoch: 281 |Training loss: 2.7881



training:  14%|█▍        | 283/2000 [3:52:18<23:04:33, 48.38s/it][A

/n Epoch: 282 |Training loss: 2.7788



training:  14%|█▍        | 284/2000 [3:53:05<22:57:13, 48.15s/it][A

/n Epoch: 283 |Training loss: 2.7755



training:  14%|█▍        | 285/2000 [3:53:53<22:52:10, 48.01s/it][A

/n Epoch: 284 |Training loss: 2.7635
validation loss: 2.9155



training:  14%|█▍        | 286/2000 [3:54:45<23:27:33, 49.27s/it][A

/n Epoch: 285 |Training loss: 2.7650



training:  14%|█▍        | 287/2000 [3:55:37<23:43:02, 49.84s/it][A

/n Epoch: 286 |Training loss: 2.7522



training:  14%|█▍        | 288/2000 [3:56:27<23:44:36, 49.93s/it][A

/n Epoch: 287 |Training loss: 2.7551



training:  14%|█▍        | 289/2000 [3:57:17<23:44:03, 49.94s/it][A

/n Epoch: 288 |Training loss: 2.7465



training:  14%|█▍        | 290/2000 [3:58:07<23:44:12, 49.97s/it][A

/n Epoch: 289 |Training loss: 2.7355
validation loss: 2.8867



training:  15%|█▍        | 291/2000 [3:59:02<24:25:12, 51.44s/it][A

/n Epoch: 290 |Training loss: 2.7437



training:  15%|█▍        | 292/2000 [3:59:54<24:30:29, 51.66s/it][A

/n Epoch: 291 |Training loss: 2.7277



training:  15%|█▍        | 293/2000 [4:00:46<24:37:28, 51.93s/it][A

/n Epoch: 292 |Training loss: 2.7378



training:  15%|█▍        | 294/2000 [4:01:37<24:30:07, 51.70s/it][A

/n Epoch: 293 |Training loss: 2.7259



training:  15%|█▍        | 295/2000 [4:02:27<24:14:37, 51.19s/it][A

/n Epoch: 294 |Training loss: 2.7263
validation loss: 2.8825



training:  15%|█▍        | 296/2000 [4:03:23<24:52:35, 52.56s/it][A

/n Epoch: 295 |Training loss: 2.7214



training:  15%|█▍        | 297/2000 [4:04:15<24:48:49, 52.45s/it][A

/n Epoch: 296 |Training loss: 2.7182



training:  15%|█▍        | 298/2000 [4:05:07<24:37:38, 52.09s/it][A

/n Epoch: 297 |Training loss: 2.7015



training:  15%|█▍        | 299/2000 [4:05:58<24:28:25, 51.80s/it][A

/n Epoch: 298 |Training loss: 2.7105



training:  15%|█▌        | 300/2000 [4:06:49<24:24:36, 51.69s/it][A

/n Epoch: 299 |Training loss: 2.7104
validation loss: 2.8549



training:  15%|█▌        | 301/2000 [4:07:44<24:52:38, 52.71s/it][A

/n Epoch: 300 |Training loss: 2.6889



training:  15%|█▌        | 302/2000 [4:08:34<24:29:05, 51.91s/it][A

/n Epoch: 301 |Training loss: 2.7127



training:  15%|█▌        | 303/2000 [4:09:24<24:07:02, 51.16s/it][A

/n Epoch: 302 |Training loss: 2.7041



training:  15%|█▌        | 304/2000 [4:10:12<23:45:20, 50.42s/it][A

/n Epoch: 303 |Training loss: 2.6934



training:  15%|█▌        | 305/2000 [4:11:01<23:31:14, 49.96s/it][A

/n Epoch: 304 |Training loss: 2.6816
validation loss: 2.8471



training:  15%|█▌        | 306/2000 [4:11:53<23:48:49, 50.61s/it][A

/n Epoch: 305 |Training loss: 2.6929



training:  15%|█▌        | 307/2000 [4:12:42<23:34:01, 50.11s/it][A

/n Epoch: 306 |Training loss: 2.6862



training:  15%|█▌        | 308/2000 [4:13:32<23:25:04, 49.83s/it][A

/n Epoch: 307 |Training loss: 2.6814



training:  15%|█▌        | 309/2000 [4:14:22<23:27:30, 49.94s/it][A

/n Epoch: 308 |Training loss: 2.6679



training:  16%|█▌        | 310/2000 [4:15:13<23:39:17, 50.39s/it][A

/n Epoch: 309 |Training loss: 2.6669
validation loss: 2.8136



training:  16%|█▌        | 311/2000 [4:16:10<24:29:37, 52.21s/it][A

/n Epoch: 310 |Training loss: 2.6674



training:  16%|█▌        | 312/2000 [4:17:02<24:30:57, 52.29s/it][A

/n Epoch: 311 |Training loss: 2.6592



training:  16%|█▌        | 313/2000 [4:17:55<24:31:28, 52.33s/it][A

/n Epoch: 312 |Training loss: 2.6485



training:  16%|█▌        | 314/2000 [4:18:47<24:34:43, 52.48s/it][A

/n Epoch: 313 |Training loss: 2.6542



training:  16%|█▌        | 315/2000 [4:19:40<24:32:57, 52.45s/it][A

/n Epoch: 314 |Training loss: 2.6402
validation loss: 2.7940



training:  16%|█▌        | 316/2000 [4:20:36<25:08:04, 53.73s/it][A

/n Epoch: 315 |Training loss: 2.6425



training:  16%|█▌        | 317/2000 [4:21:31<25:10:04, 53.84s/it][A

/n Epoch: 316 |Training loss: 2.6379



training:  16%|█▌        | 318/2000 [4:22:25<25:14:11, 54.01s/it][A

/n Epoch: 317 |Training loss: 2.6338



training:  16%|█▌        | 319/2000 [4:23:19<25:15:35, 54.10s/it][A

/n Epoch: 318 |Training loss: 2.6338



training:  16%|█▌        | 320/2000 [4:24:14<25:16:36, 54.16s/it][A

/n Epoch: 319 |Training loss: 2.6302
validation loss: 2.8391



training:  16%|█▌        | 321/2000 [4:25:12<25:51:30, 55.44s/it][A

/n Epoch: 320 |Training loss: 2.6113



training:  16%|█▌        | 322/2000 [4:26:06<25:42:18, 55.15s/it][A

/n Epoch: 321 |Training loss: 2.6464



training:  16%|█▌        | 323/2000 [4:27:01<25:34:58, 54.92s/it][A

/n Epoch: 322 |Training loss: 2.6231



training:  16%|█▌        | 324/2000 [4:27:55<25:26:18, 54.64s/it][A

/n Epoch: 323 |Training loss: 2.6826



training:  16%|█▋        | 325/2000 [4:28:49<25:20:10, 54.45s/it][A

/n Epoch: 324 |Training loss: 2.6991
validation loss: 2.8164



training:  16%|█▋        | 326/2000 [4:29:47<25:49:21, 55.53s/it][A

/n Epoch: 325 |Training loss: 2.6468



training:  16%|█▋        | 327/2000 [4:30:41<25:34:56, 55.05s/it][A

/n Epoch: 326 |Training loss: 2.6463



training:  16%|█▋        | 328/2000 [4:31:34<25:21:13, 54.59s/it][A

/n Epoch: 327 |Training loss: 2.6365



training:  16%|█▋        | 329/2000 [4:32:27<25:07:16, 54.12s/it][A

/n Epoch: 328 |Training loss: 2.6200



training:  16%|█▋        | 330/2000 [4:33:20<24:50:48, 53.56s/it][A

/n Epoch: 329 |Training loss: 2.6159
validation loss: 2.7626



training:  17%|█▋        | 331/2000 [4:34:15<25:06:40, 54.16s/it][A

/n Epoch: 330 |Training loss: 2.6098



training:  17%|█▋        | 332/2000 [4:35:06<24:35:07, 53.06s/it][A

/n Epoch: 331 |Training loss: 2.6059



training:  17%|█▋        | 333/2000 [4:35:58<24:29:49, 52.90s/it][A

/n Epoch: 332 |Training loss: 2.5951



training:  17%|█▋        | 334/2000 [4:36:51<24:27:37, 52.86s/it][A

/n Epoch: 333 |Training loss: 2.6064



training:  17%|█▋        | 335/2000 [4:37:43<24:16:51, 52.50s/it][A

/n Epoch: 334 |Training loss: 2.5813
validation loss: 2.7383



training:  17%|█▋        | 336/2000 [4:38:37<24:33:02, 53.11s/it][A

/n Epoch: 335 |Training loss: 2.5991



training:  17%|█▋        | 337/2000 [4:39:28<24:09:04, 52.28s/it][A

/n Epoch: 336 |Training loss: 2.5815



training:  17%|█▋        | 338/2000 [4:40:16<23:37:40, 51.18s/it][A

/n Epoch: 337 |Training loss: 2.5947



training:  17%|█▋        | 339/2000 [4:41:05<23:19:29, 50.55s/it][A

/n Epoch: 338 |Training loss: 2.5957



training:  17%|█▋        | 340/2000 [4:41:54<23:07:19, 50.14s/it][A

/n Epoch: 339 |Training loss: 2.5720
validation loss: 2.7147



training:  17%|█▋        | 341/2000 [4:42:47<23:29:44, 50.99s/it][A

/n Epoch: 340 |Training loss: 2.5749



training:  17%|█▋        | 342/2000 [4:43:37<23:13:45, 50.44s/it][A

/n Epoch: 341 |Training loss: 2.5605



training:  17%|█▋        | 343/2000 [4:44:25<22:57:12, 49.87s/it][A

/n Epoch: 342 |Training loss: 2.5678



training:  17%|█▋        | 344/2000 [4:45:13<22:38:35, 49.22s/it][A

/n Epoch: 343 |Training loss: 2.5657



training:  17%|█▋        | 345/2000 [4:46:00<22:25:05, 48.76s/it][A

/n Epoch: 344 |Training loss: 2.5484
validation loss: 2.6955



training:  17%|█▋        | 346/2000 [4:46:52<22:48:37, 49.65s/it][A

/n Epoch: 345 |Training loss: 2.5549



training:  17%|█▋        | 347/2000 [4:47:41<22:43:55, 49.51s/it][A

/n Epoch: 346 |Training loss: 2.5474



training:  17%|█▋        | 348/2000 [4:48:30<22:35:07, 49.22s/it][A

/n Epoch: 347 |Training loss: 2.5418



training:  17%|█▋        | 349/2000 [4:49:18<22:28:54, 49.02s/it][A

/n Epoch: 348 |Training loss: 2.5343



training:  18%|█▊        | 350/2000 [4:50:08<22:30:59, 49.13s/it][A

/n Epoch: 349 |Training loss: 2.5391
validation loss: 2.6800



training:  18%|█▊        | 351/2000 [4:51:00<22:57:45, 50.13s/it][A

/n Epoch: 350 |Training loss: 2.5333



training:  18%|█▊        | 352/2000 [4:51:49<22:42:32, 49.61s/it][A

/n Epoch: 351 |Training loss: 2.5284



training:  18%|█▊        | 353/2000 [4:52:38<22:36:11, 49.41s/it][A

/n Epoch: 352 |Training loss: 2.5391



training:  18%|█▊        | 354/2000 [4:53:26<22:22:37, 48.94s/it][A

/n Epoch: 353 |Training loss: 2.5200



training:  18%|█▊        | 355/2000 [4:54:13<22:12:28, 48.60s/it][A

/n Epoch: 354 |Training loss: 2.5148
validation loss: 2.6606



training:  18%|█▊        | 356/2000 [4:55:05<22:36:29, 49.51s/it][A

/n Epoch: 355 |Training loss: 2.5209



training:  18%|█▊        | 357/2000 [4:55:53<22:21:31, 48.99s/it][A

/n Epoch: 356 |Training loss: 2.5053



training:  18%|█▊        | 358/2000 [4:56:41<22:13:12, 48.72s/it][A

/n Epoch: 357 |Training loss: 2.5209



training:  18%|█▊        | 359/2000 [4:57:29<22:07:37, 48.54s/it][A

/n Epoch: 358 |Training loss: 2.5090



training:  18%|█▊        | 360/2000 [4:58:17<22:01:14, 48.34s/it][A

/n Epoch: 359 |Training loss: 2.5234
validation loss: 2.6790



training:  18%|█▊        | 361/2000 [4:59:08<22:26:02, 49.28s/it][A

/n Epoch: 360 |Training loss: 2.5105



training:  18%|█▊        | 362/2000 [4:59:56<22:13:34, 48.85s/it][A

/n Epoch: 361 |Training loss: 2.5290



training:  18%|█▊        | 363/2000 [5:00:44<22:05:11, 48.57s/it][A

/n Epoch: 362 |Training loss: 2.5208



training:  18%|█▊        | 364/2000 [5:01:32<21:58:10, 48.34s/it][A

/n Epoch: 363 |Training loss: 2.5210



training:  18%|█▊        | 365/2000 [5:02:19<21:50:45, 48.10s/it][A

/n Epoch: 364 |Training loss: 2.5070
validation loss: 2.6537



training:  18%|█▊        | 366/2000 [5:03:11<22:21:17, 49.25s/it][A

/n Epoch: 365 |Training loss: 2.5122



training:  18%|█▊        | 367/2000 [5:03:59<22:08:59, 48.83s/it][A

/n Epoch: 366 |Training loss: 2.5089



training:  18%|█▊        | 368/2000 [5:04:47<21:58:44, 48.48s/it][A

/n Epoch: 367 |Training loss: 2.4882



training:  18%|█▊        | 369/2000 [5:05:35<21:52:13, 48.27s/it][A

/n Epoch: 368 |Training loss: 2.5108



training:  18%|█▊        | 370/2000 [5:06:23<21:49:08, 48.19s/it][A

/n Epoch: 369 |Training loss: 2.4854
validation loss: 2.6513



training:  19%|█▊        | 371/2000 [5:07:14<22:15:14, 49.18s/it][A

/n Epoch: 370 |Training loss: 2.5078



training:  19%|█▊        | 372/2000 [5:08:03<22:08:18, 48.95s/it][A

/n Epoch: 371 |Training loss: 2.5108



training:  19%|█▊        | 373/2000 [5:08:50<21:54:49, 48.49s/it][A

/n Epoch: 372 |Training loss: 2.4944



training:  19%|█▊        | 374/2000 [5:09:38<21:48:19, 48.28s/it][A

/n Epoch: 373 |Training loss: 2.4849



training:  19%|█▉        | 375/2000 [5:10:27<21:59:43, 48.73s/it][A

/n Epoch: 374 |Training loss: 2.4811
validation loss: 2.6120



training:  19%|█▉        | 376/2000 [5:11:20<22:30:22, 49.89s/it][A

/n Epoch: 375 |Training loss: 2.4709



training:  19%|█▉        | 377/2000 [5:12:09<22:25:25, 49.74s/it][A

/n Epoch: 376 |Training loss: 2.4664



training:  19%|█▉        | 378/2000 [5:12:58<22:18:07, 49.50s/it][A

/n Epoch: 377 |Training loss: 2.4779



training:  19%|█▉        | 379/2000 [5:13:47<22:09:58, 49.23s/it][A

/n Epoch: 378 |Training loss: 2.4588



training:  19%|█▉        | 380/2000 [5:14:35<21:56:29, 48.76s/it][A

/n Epoch: 379 |Training loss: 2.4771
validation loss: 2.6424



training:  19%|█▉        | 381/2000 [5:15:26<22:18:25, 49.60s/it][A

/n Epoch: 380 |Training loss: 2.4987



training:  19%|█▉        | 382/2000 [5:16:14<22:04:16, 49.11s/it][A

/n Epoch: 381 |Training loss: 2.4785



training:  19%|█▉        | 383/2000 [5:17:03<21:58:20, 48.92s/it][A

/n Epoch: 382 |Training loss: 2.4554



training:  19%|█▉        | 384/2000 [5:17:51<21:52:04, 48.72s/it][A

/n Epoch: 383 |Training loss: 2.4695



training:  19%|█▉        | 385/2000 [5:18:39<21:43:43, 48.44s/it][A

/n Epoch: 384 |Training loss: 2.4610
validation loss: 2.6291



training:  19%|█▉        | 386/2000 [5:19:31<22:14:08, 49.60s/it][A

/n Epoch: 385 |Training loss: 2.4417



training:  19%|█▉        | 387/2000 [5:20:21<22:19:50, 49.84s/it][A

/n Epoch: 386 |Training loss: 2.4648



training:  19%|█▉        | 388/2000 [5:21:13<22:29:34, 50.23s/it][A

/n Epoch: 387 |Training loss: 2.4468



training:  19%|█▉        | 389/2000 [5:22:02<22:26:15, 50.14s/it][A

/n Epoch: 388 |Training loss: 2.4623



training:  20%|█▉        | 390/2000 [5:22:52<22:17:39, 49.85s/it][A

/n Epoch: 389 |Training loss: 2.4520
validation loss: 2.6381



training:  20%|█▉        | 391/2000 [5:23:45<22:40:58, 50.75s/it][A

/n Epoch: 390 |Training loss: 2.4974



training:  20%|█▉        | 392/2000 [5:24:34<22:30:37, 50.40s/it][A

/n Epoch: 391 |Training loss: 2.4958



training:  20%|█▉        | 393/2000 [5:25:23<22:15:10, 49.85s/it][A

/n Epoch: 392 |Training loss: 2.4643



training:  20%|█▉        | 394/2000 [5:26:11<22:00:38, 49.34s/it][A

/n Epoch: 393 |Training loss: 2.4670



training:  20%|█▉        | 395/2000 [5:26:59<21:51:03, 49.01s/it][A

/n Epoch: 394 |Training loss: 2.4515
validation loss: 2.5934



training:  20%|█▉        | 396/2000 [5:27:51<22:15:03, 49.94s/it][A

/n Epoch: 395 |Training loss: 2.4415



training:  20%|█▉        | 397/2000 [5:28:40<22:01:31, 49.46s/it][A

/n Epoch: 396 |Training loss: 2.4342



training:  20%|█▉        | 398/2000 [5:29:27<21:48:01, 48.99s/it][A

/n Epoch: 397 |Training loss: 2.4397



training:  20%|█▉        | 399/2000 [5:30:15<21:38:38, 48.67s/it][A

/n Epoch: 398 |Training loss: 2.4329



training:  20%|██        | 400/2000 [5:31:03<21:31:47, 48.44s/it][A

/n Epoch: 399 |Training loss: 2.4378
validation loss: 2.5788



training:  20%|██        | 401/2000 [5:31:55<21:55:27, 49.36s/it][A

/n Epoch: 400 |Training loss: 2.4488



training:  20%|██        | 402/2000 [5:32:43<21:42:45, 48.91s/it][A

/n Epoch: 401 |Training loss: 2.4226



training:  20%|██        | 403/2000 [5:33:30<21:31:15, 48.51s/it][A

/n Epoch: 402 |Training loss: 2.4335



training:  20%|██        | 404/2000 [5:34:18<21:25:46, 48.34s/it][A

/n Epoch: 403 |Training loss: 2.4143



training:  20%|██        | 405/2000 [5:35:06<21:21:53, 48.22s/it][A

/n Epoch: 404 |Training loss: 2.4335
validation loss: 2.5890



training:  20%|██        | 406/2000 [5:35:58<21:50:29, 49.33s/it][A

/n Epoch: 405 |Training loss: 2.4342



training:  20%|██        | 407/2000 [5:36:46<21:37:54, 48.89s/it][A

/n Epoch: 406 |Training loss: 2.4320



training:  20%|██        | 408/2000 [5:37:34<21:27:59, 48.54s/it][A

/n Epoch: 407 |Training loss: 2.4337



training:  20%|██        | 409/2000 [5:38:21<21:20:05, 48.28s/it][A

/n Epoch: 408 |Training loss: 2.4222


KeyboardInterrupt: ignored

In [28]:
model.eval()
avg_loss_val=0
with torch.no_grad():
    for loss_val, aux_loss_val, is_last_val in model(next(cycle_validation_loader), max_batch_size = batch_size, return_loss = True):
      avg_loss_val+=loss_val/num_batches_val;

      if is_last_val:
        print(f'validation loss: {avg_loss_val.item():.4f}')


validation loss: 4.2550


**Music generation**

In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load(output_dir+weights)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# Generate network input again
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
network_input = np.reshape(network_input, (n_patterns, sequence_length))


The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2


In [None]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = torch.from_numpy(network_input[start]).cuda()

prediction_output = model.generate(pattern, 500)


In [None]:
result_sample=[]

for i in range(500):
  print(i)
  result = int_to_note[prediction_output[i].item()]
  print('\r', 'Predicted ', i, " ",result, end='')
  result_sample.append(result)

prediction_output=result_sample

0
 Predicted  0   61
 Predicted  1   4.62
 Predicted  2   6.113
 Predicted  3   64
 Predicted  4   6.115
 Predicted  5   A46
 Predicted  6   4.67
 Predicted  7   F48
 Predicted  8   69
 Predicted  9   610
 Predicted  10   5.7.9.011
 Predicted  11   2.3.7.1012
 Predicted  12   D513
 Predicted  13   C514
 Predicted  14   5.7.9.015
 Predicted  15   C516
 Predicted  16   4.617
 Predicted  17   B-118
 Predicted  18   10.2.519
 Predicted  19   C520
 Predicted  20   6.1121
 Predicted  21   622
 Predicted  22   F223
 Predicted  23   6.1124
 Predicted  24   4.625
 Predicted  25   B-226
 Predicted  26   B-127
 Predicted  27   A428
 Predicted  28   629
 Predicted  29   C530
 Predicted  30   E-331
 Predicted  31   F232
 Predicted  32   4.633
 Predicted  33   534
 Predicted  34   5.1035
 Predicted  35   4.636
 Predicted  36   637
 Predicted  37   4.638
 Predicted  38   4.639
 Predicted  39   F240
 Predicted  40   4.641
 Predicted  41   B-242
 Predicted  42

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [None]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'