<a href="https://colab.research.google.com/github/GiovanniSorice/Deep_Music_Generator/blob/main/notebooks/Music_Generation_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Music Generator 



In this notebook, we use an Transformer to generate some music.


**This notebook was inspired (and part of the code comes from it) by [Music_Generation_LSTM](https://colab.research.google.com/drive/19TQqekOlnOSW36VCL8CPVEQKBBukmaEQ#scrollTo=DDOBVWULXfpz)**




**Load dependencies**

In [1]:
pip install compressive_transformer_pytorch

Collecting compressive_transformer_pytorch
  Downloading https://files.pythonhosted.org/packages/30/39/b8caf2671abcb8615977c08766aa9f450addd6949f57c7dda87224e844b5/compressive_transformer_pytorch-0.3.20-py3-none-any.whl
Collecting mogrifier
  Downloading https://files.pythonhosted.org/packages/77/01/62a55d0f8048e788fce435f2ade6478f443e4e53ed9b89b55ba0fc42c198/mogrifier-0.0.3-py3-none-any.whl
Installing collected packages: mogrifier, compressive-transformer-pytorch
Successfully installed compressive-transformer-pytorch-0.3.20 mogrifier-0.0.3


In [2]:
import torch
import tqdm
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from compressive_transformer_pytorch import CompressiveTransformer
from compressive_transformer_pytorch import AutoregressiveWrapper
from torchsummary import summary
from torch.utils.data import DataLoader, Dataset
from tensorflow.keras import utils
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt
import glob
import pickle
from music21 import converter, instrument, stream, note, chord
import math
import shutil

In [3]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

**Set hyperparameters**

In [4]:
# output directory name:
output_dir = '/content/drive/My Drive/ISPR_project/Transformer/'
current_path ='/content/drive/My Drive/ISPR_project/'
# training:
epochs = 2000
batch_size = 64
learning_rate=1e-2
# vector-space embedding: 
n_dim = 64 
sequence_length = 32


VALIDATE_EVERY  = 5

GENERATE_EVERY  = 500



**Save model function**

In [5]:
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, output_dir+filename)
    if is_best:
        shutil.copyfile(output_dir+filename, output_dir+'model_best.pth.tar')

**Google drive configuration (only Colab)**

In [6]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Mounted at /content/drive


**Load data** \\
Original MIDI files
 I have obtained  **MIDI files** from [The Lakh MIDI Dataset v0.1](https://colinraffel.com/projects/lmd/). 

## Processing data

Let's process the files, and load them into **music21**

In [7]:
file = current_path+"midi_songs/small_dataset/Metal/Metallica/Am I Evil?.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.note.Note E> 0.0
<music21.chord.Chord C2 C#3> 0.0
<music21.note.Note G#> 2.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.note.Note D> 3.0
<music21.chord.Chord C#3 C2> 3.0
<music21.chord.Chord B3 E3 E4> 3.5


I will process all MIDI files obtaining data from each note of chord.

- If I process a **note**, I will store in the list a string representing the pitch (the note name) and the octave.

- If I process a **chord** (Remember that chords are set of notes that are played at the same time) I will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **I are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, I will consider them.

I are creating a big list with all the elements of all the compositions.

In [8]:
notes = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/small_dataset/*/*/*.mid")):
  midi = converter.parse(file)
  print('Parsing file ', i, " ",file)
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes, filepath)

Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Nessun rimpianto.1.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Grazie mille.1.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).1.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.1.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/I'll Be Over You.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_proje

In [9]:
notes_validation = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/test/*.mid")):
  midi = converter.parse(file)
  print( 'Parsing file ', i, " ",file)
  notes_to_parse = None
  try: # file has instrument parts
    s2 = instrument.partitionByInstrument(midi)
    notes_to_parse = s2.recurse() 
  except: # file has notes in a flat structure
    notes_to_parse = midi.flat.notes
  for element in notes_to_parse:
    if isinstance(element, note.Note):
      notes_validation.append(str(element.pitch))
    elif isinstance(element, chord.Chord):
      notes_validation.append('.'.join(str(n) for n in element.normalOrder))
with open('notes', 'wb') as filepath:
  pickle.dump(notes_validation, filepath)

Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/test/I Disappear.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/test/Hit the Lights.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/test/Fight Fire With Fire.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/test/Smile.mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/test/Another One Bites The Dust.2.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/test/Bicycle Race.1.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/test/Se tornerai.1.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_project/midi_songs/test/I'll Be Over You.mid


I obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [10]:
# Count different possible outputs
n_vocab = (len(set(notes)))
n_vocab

476

In [11]:
# Count different possible outputs valifation
print(len(set(notes_validation)))

287


**Preprocess data** \\
Now, there is some **data processing** that I have to do:

- I will map each pitch or chord to an integer
- I will create pairs of input sequences and its corresponding output note

I can try different **sequence_length** to obtain different results. In this first version, I will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 


In [12]:
# get all pitch names
pitchnames = sorted(set(item for item in notes))
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
network_output = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
# reshape the input into a format compatible with LSTM layers
network_input = np.reshape(network_input, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


In [13]:
# create a dictionary to map pitches to integers
note_to_int_valifation = dict((notes_validation, number) for number, notes_validation in enumerate(pitchnames))
network_input_validation = []
network_output_validation = []
# create input sequences and the corresponding outputs
for i in range(0, len(notes_validation) - sequence_length, 1):
  # Map pitches of sequence_in to integers
  network_input_validation.append([note_to_int_valifation[char] for char in notes_validation[i:i + sequence_length]])
n_patterns = len(network_input_validation)
# reshape the input into a format compatible with LSTM layers
network_input_validation = np.reshape(network_input_validation, (n_patterns, sequence_length))
# normalize input
#network_input = network_input / float(n_vocab)


Let's see the new metwork_input size

In [14]:
network_input.shape

(135132, 32)

**Design neural network architecture** 

In [15]:
def create_network(sequence_length, n_vocab):
    """ create the structure of the neural network """
    model = CompressiveTransformer(
    num_tokens = n_vocab,
    dim = sequence_length,
    depth = 6,
    seq_len = sequence_length,
    mem_len = sequence_length,
    cmem_len = 256,
    cmem_ratio = 4,
    memory_layers = [5,6]
    )

    model = AutoregressiveWrapper(model)
    model.cuda()
    return model

In [16]:
model = create_network(sequence_length,n_vocab)

print(model)


AutoregressiveWrapper(
  (net): CompressiveTransformer(
    (token_emb): Embedding(476, 32)
    (to_model_dim): Identity()
    (to_logits): Sequential(
      (0): Identity()
      (1): Linear(in_features=32, out_features=476, bias=True)
    )
    (attn_layers): ModuleList(
      (0): GRUGating(
        (fn): PreNorm(
          (norm): LayerNorm((32,), eps=1e-05, elementwise_affine=True)
          (fn): SelfAttention(
            (compress_mem_fn): ConvCompress(
              (conv): Conv1d(32, 32, kernel_size=(4,), stride=(4,))
            )
            (to_q): Linear(in_features=32, out_features=32, bias=False)
            (to_kv): Linear(in_features=32, out_features=64, bias=False)
            (to_out): Linear(in_features=32, out_features=32, bias=True)
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (dropout): Dropout(p=0.0, inplace=False)
            (reconstruction_attn_dropout): Dropout(p=0.0, inplace=False)
          )
        )
        (gru): GRUCell(32, 3

In [17]:
def cycle(loader):
    while True:
        for data in loader:
          yield data


data_train = torch.from_numpy(network_input).cuda()
train_loader = torch.utils.data.DataLoader(data_train, batch_size=32) 
cycle_train_loader  = cycle(DataLoader(data_train, batch_size = data_train.shape[0]))
num_batches=math.ceil(data_train.shape[0]/batch_size) # Total number of batches

In [18]:
#Validation
data_validation = torch.from_numpy(network_input_validation).cuda()
validation_loader = torch.utils.data.DataLoader(data_validation, batch_size=32) 
cycle_validation_loader  = cycle(DataLoader(data_validation, batch_size = data_validation.shape[0]))
num_batches_val=math.ceil(data_validation.shape[0]/batch_size) # Total number of batches

In [19]:
# optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model.

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.


In [20]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load("/content/drive/MyDrive/ISPR_project/Transformer/model_32_408_epoche_best.pth.tar")
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# training

for i in tqdm.tqdm(range(408,epochs), mininterval=20., desc='training'):
    model.train()
    tot_loss = 0.0
    is_best=0
    best_loss_value=n_vocab
    avg_loss_val=0
    for mlm_loss, aux_loss, is_last in model(next(cycle_train_loader), max_batch_size = batch_size, return_loss = True):
        loss = mlm_loss + aux_loss

        loss.backward()

        tot_loss+=loss;

        if is_last:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
            optimizer.step()
            optimizer.zero_grad()
    
    if i % VALIDATE_EVERY == 0 or i==epochs-1:
      model.eval()
      with torch.no_grad():
          for loss_val, aux_loss_val, is_last_val in model(next(cycle_validation_loader), max_batch_size = batch_size, return_loss = True):
            avg_loss_val+=loss_val/num_batches_val;

            if is_last_val:
              print(f'validation loss: {avg_loss_val.item():.4f}')


    avg_loss=tot_loss/num_batches

    if i%5==0 or i==epochs-1:
      if best_loss_value>avg_loss:
        best_loss_value=avg_loss;
        is_best=1

      save_checkpoint({
      'epoch': i,
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict' : optimizer.state_dict(),
      'loss':avg_loss.item(),
     }, is_best, 'Tran_32_Checkpoint'+str(i)+'_'+"{:.4f}".format(avg_loss.item())+'.pth.tar')
      is_best=0
    print(f'/n Epoch: {i} |Training loss: {avg_loss.item():.4f}')
print('Training complete.')








training:   0%|          | 0/1592 [00:00<?, ?it/s][A[A

training:   0%|          | 1/1592 [00:56<24:53:28, 56.32s/it][A[A

/n Epoch: 408 |Training loss: 2.4320




training:   0%|          | 2/1592 [01:53<24:57:53, 56.52s/it][A[A

/n Epoch: 409 |Training loss: 2.4461
validation loss: 2.5750




training:   0%|          | 3/1592 [02:53<25:24:45, 57.57s/it][A[A

/n Epoch: 410 |Training loss: 2.4259




training:   0%|          | 4/1592 [03:49<25:14:03, 57.21s/it][A[A

/n Epoch: 411 |Training loss: 2.4198




training:   0%|          | 5/1592 [04:45<25:03:40, 56.85s/it][A[A

/n Epoch: 412 |Training loss: 2.4209




training:   0%|          | 6/1592 [05:41<24:57:06, 56.64s/it][A[A

/n Epoch: 413 |Training loss: 2.4035




training:   0%|          | 7/1592 [06:38<24:52:30, 56.50s/it][A[A

/n Epoch: 414 |Training loss: 2.3995
validation loss: 2.5514




training:   1%|          | 8/1592 [07:38<25:21:41, 57.64s/it][A[A

/n Epoch: 415 |Training loss: 2.3927




training:   1%|          | 9/1592 [08:34<25:10:37, 57.26s/it][A[A

/n Epoch: 416 |Training loss: 2.3963




training:   1%|          | 10/1592 [09:31<25:03:00, 57.00s/it][A[A

/n Epoch: 417 |Training loss: 2.3810




training:   1%|          | 11/1592 [10:27<24:55:43, 56.76s/it][A[A

/n Epoch: 418 |Training loss: 2.3971




training:   1%|          | 12/1592 [11:23<24:50:20, 56.60s/it][A[A

/n Epoch: 419 |Training loss: 2.3812
validation loss: 2.5494




training:   1%|          | 13/1592 [12:23<25:18:19, 57.69s/it][A[A

/n Epoch: 420 |Training loss: 2.4069




training:   1%|          | 14/1592 [13:20<25:07:12, 57.31s/it][A[A

/n Epoch: 421 |Training loss: 2.3967




training:   1%|          | 15/1592 [14:16<24:59:32, 57.05s/it][A[A

/n Epoch: 422 |Training loss: 2.4037




training:   1%|          | 16/1592 [15:12<24:52:44, 56.83s/it][A[A

/n Epoch: 423 |Training loss: 2.3942




training:   1%|          | 17/1592 [16:09<24:45:48, 56.60s/it][A[A

/n Epoch: 424 |Training loss: 2.3911
validation loss: 2.5548




training:   1%|          | 18/1592 [17:08<25:11:24, 57.61s/it][A[A

/n Epoch: 425 |Training loss: 2.3751




training:   1%|          | 19/1592 [18:05<24:59:05, 57.18s/it][A[A

/n Epoch: 426 |Training loss: 2.3942




training:   1%|▏         | 20/1592 [19:01<24:49:57, 56.87s/it][A[A

/n Epoch: 427 |Training loss: 2.3806




training:   1%|▏         | 21/1592 [19:57<24:46:53, 56.79s/it][A[A

/n Epoch: 428 |Training loss: 2.3782




training:   1%|▏         | 22/1592 [20:54<24:42:10, 56.64s/it][A[A

/n Epoch: 429 |Training loss: 2.3717
validation loss: 2.5347




training:   1%|▏         | 23/1592 [21:54<25:09:10, 57.71s/it][A[A

/n Epoch: 430 |Training loss: 2.3739




training:   2%|▏         | 24/1592 [22:50<24:59:04, 57.36s/it][A[A

/n Epoch: 431 |Training loss: 2.3816




training:   2%|▏         | 25/1592 [23:47<24:50:07, 57.06s/it][A[A

/n Epoch: 432 |Training loss: 2.3456




training:   2%|▏         | 26/1592 [24:43<24:45:37, 56.92s/it][A[A

/n Epoch: 433 |Training loss: 2.3644




training:   2%|▏         | 27/1592 [25:40<24:40:38, 56.77s/it][A[A

/n Epoch: 434 |Training loss: 2.3455
validation loss: 2.5197




training:   2%|▏         | 28/1592 [26:40<25:06:22, 57.79s/it][A[A

/n Epoch: 435 |Training loss: 2.3666




training:   2%|▏         | 29/1592 [27:36<24:55:14, 57.40s/it][A[A

/n Epoch: 436 |Training loss: 2.3663




training:   2%|▏         | 30/1592 [28:33<24:46:26, 57.10s/it][A[A

/n Epoch: 437 |Training loss: 2.3356




training:   2%|▏         | 31/1592 [29:29<24:41:17, 56.94s/it][A[A

/n Epoch: 438 |Training loss: 2.3472




training:   2%|▏         | 32/1592 [30:26<24:36:33, 56.79s/it][A[A

/n Epoch: 439 |Training loss: 2.3489
validation loss: 2.5098




training:   2%|▏         | 33/1592 [31:26<25:02:22, 57.82s/it][A[A

/n Epoch: 440 |Training loss: 2.3261




training:   2%|▏         | 34/1592 [32:23<24:52:24, 57.47s/it][A[A

/n Epoch: 441 |Training loss: 2.3497




training:   2%|▏         | 35/1592 [33:19<24:41:27, 57.09s/it][A[A

/n Epoch: 442 |Training loss: 2.3385




training:   2%|▏         | 36/1592 [34:15<24:33:13, 56.81s/it][A[A

/n Epoch: 443 |Training loss: 2.3397




training:   2%|▏         | 37/1592 [35:11<24:26:29, 56.58s/it][A[A

/n Epoch: 444 |Training loss: 2.3436
validation loss: 2.4942




training:   2%|▏         | 38/1592 [36:11<24:53:08, 57.65s/it][A[A

/n Epoch: 445 |Training loss: 2.3454




training:   2%|▏         | 39/1592 [37:07<24:40:21, 57.19s/it][A[A

/n Epoch: 446 |Training loss: 2.3264




training:   3%|▎         | 40/1592 [38:04<24:32:09, 56.91s/it][A[A

/n Epoch: 447 |Training loss: 2.3460




training:   3%|▎         | 41/1592 [39:00<24:23:56, 56.63s/it][A[A

/n Epoch: 448 |Training loss: 2.3570




training:   3%|▎         | 42/1592 [39:56<24:21:21, 56.57s/it][A[A

/n Epoch: 449 |Training loss: 2.3374
validation loss: 2.4789




training:   3%|▎         | 43/1592 [40:56<24:46:24, 57.58s/it][A[A

/n Epoch: 450 |Training loss: 2.3262




training:   3%|▎         | 44/1592 [41:52<24:35:00, 57.17s/it][A[A

/n Epoch: 451 |Training loss: 2.3257




training:   3%|▎         | 45/1592 [42:48<24:26:04, 56.86s/it][A[A

/n Epoch: 452 |Training loss: 2.3133




training:   3%|▎         | 46/1592 [43:44<24:17:25, 56.56s/it][A[A

/n Epoch: 453 |Training loss: 2.3276




training:   3%|▎         | 47/1592 [44:41<24:14:26, 56.48s/it][A[A

/n Epoch: 454 |Training loss: 2.3149
validation loss: 2.4650




training:   3%|▎         | 48/1592 [45:41<24:41:25, 57.57s/it][A[A

/n Epoch: 455 |Training loss: 2.3207




training:   3%|▎         | 49/1592 [46:37<24:30:16, 57.17s/it][A[A

/n Epoch: 456 |Training loss: 2.3098




training:   3%|▎         | 50/1592 [47:33<24:22:13, 56.90s/it][A[A

/n Epoch: 457 |Training loss: 2.3026




training:   3%|▎         | 51/1592 [48:29<24:15:09, 56.66s/it][A[A

/n Epoch: 458 |Training loss: 2.3016




training:   3%|▎         | 52/1592 [49:26<24:12:49, 56.60s/it][A[A

/n Epoch: 459 |Training loss: 2.2918
validation loss: 2.4628




training:   3%|▎         | 53/1592 [50:26<24:39:49, 57.69s/it][A[A

/n Epoch: 460 |Training loss: 2.2871




training:   3%|▎         | 54/1592 [51:22<24:28:04, 57.27s/it][A[A

/n Epoch: 461 |Training loss: 2.2996




training:   3%|▎         | 55/1592 [52:19<24:19:44, 56.98s/it][A[A

/n Epoch: 462 |Training loss: 2.3010




training:   4%|▎         | 56/1592 [53:15<24:15:59, 56.87s/it][A[A

/n Epoch: 463 |Training loss: 2.2897




training:   4%|▎         | 57/1592 [54:12<24:12:00, 56.76s/it][A[A

/n Epoch: 464 |Training loss: 2.3016
validation loss: 2.4730




training:   4%|▎         | 58/1592 [55:12<24:39:28, 57.87s/it][A[A

/n Epoch: 465 |Training loss: 2.2749




training:   4%|▎         | 59/1592 [56:09<24:31:56, 57.61s/it][A[A

/n Epoch: 466 |Training loss: 2.3054




training:   4%|▍         | 60/1592 [57:06<24:21:52, 57.25s/it][A[A

/n Epoch: 467 |Training loss: 2.2968




training:   4%|▍         | 61/1592 [58:02<24:14:42, 57.01s/it][A[A

/n Epoch: 468 |Training loss: 2.2998




training:   4%|▍         | 62/1592 [58:59<24:10:04, 56.87s/it][A[A

/n Epoch: 469 |Training loss: 2.3018
validation loss: 2.4339




training:   4%|▍         | 63/1592 [59:59<24:35:51, 57.91s/it][A[A

/n Epoch: 470 |Training loss: 2.2940




training:   4%|▍         | 64/1592 [1:00:56<24:25:44, 57.56s/it][A[A

/n Epoch: 471 |Training loss: 2.2733




training:   4%|▍         | 65/1592 [1:01:52<24:16:02, 57.21s/it][A[A

/n Epoch: 472 |Training loss: 2.2994




training:   4%|▍         | 66/1592 [1:02:48<24:08:12, 56.94s/it][A[A

/n Epoch: 473 |Training loss: 2.2758




training:   4%|▍         | 67/1592 [1:03:44<24:00:09, 56.66s/it][A[A

/n Epoch: 474 |Training loss: 2.3013
validation loss: 2.4331




training:   4%|▍         | 68/1592 [1:04:44<24:24:41, 57.66s/it][A[A

/n Epoch: 475 |Training loss: 2.2953




training:   4%|▍         | 69/1592 [1:05:40<24:10:59, 57.16s/it][A[A

/n Epoch: 476 |Training loss: 2.2678




training:   4%|▍         | 70/1592 [1:06:36<24:01:33, 56.83s/it][A[A

/n Epoch: 477 |Training loss: 2.2650




training:   4%|▍         | 71/1592 [1:07:32<23:52:51, 56.52s/it][A[A

/n Epoch: 478 |Training loss: 2.2571




training:   5%|▍         | 72/1592 [1:08:28<23:48:34, 56.39s/it][A[A

/n Epoch: 479 |Training loss: 2.2469
validation loss: 2.4027




training:   5%|▍         | 73/1592 [1:09:28<24:12:07, 57.36s/it][A[A

/n Epoch: 480 |Training loss: 2.2607




training:   5%|▍         | 74/1592 [1:10:24<24:01:33, 56.98s/it][A[A

/n Epoch: 481 |Training loss: 2.2398




training:   5%|▍         | 75/1592 [1:11:20<23:53:30, 56.70s/it][A[A

/n Epoch: 482 |Training loss: 2.2847




training:   5%|▍         | 76/1592 [1:12:16<23:44:48, 56.39s/it][A[A

/n Epoch: 483 |Training loss: 2.2881




training:   5%|▍         | 77/1592 [1:13:12<23:41:11, 56.29s/it][A[A

/n Epoch: 484 |Training loss: 2.2434
validation loss: 2.4034




training:   5%|▍         | 78/1592 [1:14:11<24:04:36, 57.25s/it][A[A

/n Epoch: 485 |Training loss: 2.2490




training:   5%|▍         | 79/1592 [1:15:07<23:53:30, 56.85s/it][A[A

/n Epoch: 486 |Training loss: 2.2426




training:   5%|▌         | 80/1592 [1:16:03<23:47:55, 56.66s/it][A[A

/n Epoch: 487 |Training loss: 2.2238




training:   5%|▌         | 81/1592 [1:16:59<23:40:57, 56.42s/it][A[A

/n Epoch: 488 |Training loss: 2.2450




training:   5%|▌         | 82/1592 [1:17:55<23:37:39, 56.33s/it][A[A

/n Epoch: 489 |Training loss: 2.2351
validation loss: 2.3713




training:   5%|▌         | 83/1592 [1:18:55<24:02:16, 57.35s/it][A[A

/n Epoch: 490 |Training loss: 2.2252




training:   5%|▌         | 84/1592 [1:19:51<23:49:57, 56.89s/it][A[A

/n Epoch: 491 |Training loss: 2.2155




training:   5%|▌         | 85/1592 [1:20:47<23:42:12, 56.62s/it][A[A

/n Epoch: 492 |Training loss: 2.2364




training:   5%|▌         | 86/1592 [1:21:43<23:36:52, 56.45s/it][A[A

/n Epoch: 493 |Training loss: 2.2359




training:   5%|▌         | 87/1592 [1:22:39<23:32:24, 56.31s/it][A[A

/n Epoch: 494 |Training loss: 2.2211
validation loss: 2.3932




training:   6%|▌         | 88/1592 [1:23:39<23:56:51, 57.32s/it][A[A

/n Epoch: 495 |Training loss: 2.2078




training:   6%|▌         | 89/1592 [1:24:35<23:45:12, 56.89s/it][A[A

/n Epoch: 496 |Training loss: 2.2250




training:   6%|▌         | 90/1592 [1:25:30<23:35:08, 56.53s/it][A[A

/n Epoch: 497 |Training loss: 2.2045




training:   6%|▌         | 91/1592 [1:26:26<23:29:44, 56.35s/it][A[A

/n Epoch: 498 |Training loss: 2.2298




training:   6%|▌         | 92/1592 [1:27:22<23:25:34, 56.22s/it][A[A

/n Epoch: 499 |Training loss: 2.2385
validation loss: 2.3677




training:   6%|▌         | 93/1592 [1:28:22<23:51:08, 57.28s/it][A[A

/n Epoch: 500 |Training loss: 2.2001




training:   6%|▌         | 94/1592 [1:29:18<23:39:11, 56.84s/it][A[A

/n Epoch: 501 |Training loss: 2.2119




training:   6%|▌         | 95/1592 [1:30:14<23:31:30, 56.57s/it][A[A

/n Epoch: 502 |Training loss: 2.2030




training:   6%|▌         | 96/1592 [1:31:09<23:23:55, 56.31s/it][A[A

/n Epoch: 503 |Training loss: 2.2003




training:   6%|▌         | 97/1592 [1:32:05<23:20:50, 56.22s/it][A[A

/n Epoch: 504 |Training loss: 2.2029
validation loss: 2.3442




training:   6%|▌         | 98/1592 [1:33:05<23:46:46, 57.30s/it][A[A

/n Epoch: 505 |Training loss: 2.1948




training:   6%|▌         | 99/1592 [1:34:01<23:35:34, 56.89s/it][A[A

/n Epoch: 506 |Training loss: 2.1849




training:   6%|▋         | 100/1592 [1:34:57<23:31:13, 56.75s/it][A[A

/n Epoch: 507 |Training loss: 2.1893




training:   6%|▋         | 101/1592 [1:35:54<23:28:37, 56.69s/it][A[A

/n Epoch: 508 |Training loss: 2.1712




training:   6%|▋         | 102/1592 [1:36:51<23:26:40, 56.64s/it][A[A

/n Epoch: 509 |Training loss: 2.1877
validation loss: 2.3383




training:   6%|▋         | 103/1592 [1:37:51<23:54:42, 57.81s/it][A[A

/n Epoch: 510 |Training loss: 2.1758




training:   7%|▋         | 104/1592 [1:38:47<23:42:34, 57.36s/it][A[A

/n Epoch: 511 |Training loss: 2.1783




training:   7%|▋         | 105/1592 [1:39:43<23:30:57, 56.93s/it][A[A

/n Epoch: 512 |Training loss: 2.1774




training:   7%|▋         | 106/1592 [1:40:39<23:21:22, 56.58s/it][A[A

/n Epoch: 513 |Training loss: 2.1647




training:   7%|▋         | 107/1592 [1:41:35<23:15:10, 56.37s/it][A[A

/n Epoch: 514 |Training loss: 2.1701
validation loss: 2.3107




training:   7%|▋         | 108/1592 [1:42:35<23:39:27, 57.39s/it][A[A

/n Epoch: 515 |Training loss: 2.1690




training:   7%|▋         | 109/1592 [1:43:30<23:26:17, 56.90s/it][A[A

/n Epoch: 516 |Training loss: 2.1530




training:   7%|▋         | 110/1592 [1:44:26<23:14:43, 56.47s/it][A[A

/n Epoch: 517 |Training loss: 2.1612




training:   7%|▋         | 111/1592 [1:45:21<23:03:11, 56.04s/it][A[A

/n Epoch: 518 |Training loss: 2.1491




training:   7%|▋         | 112/1592 [1:46:16<22:58:18, 55.88s/it][A[A

/n Epoch: 519 |Training loss: 2.1571
validation loss: 2.3178




training:   7%|▋         | 113/1592 [1:47:16<23:21:55, 56.87s/it][A[A

/n Epoch: 520 |Training loss: 2.1504




training:   7%|▋         | 114/1592 [1:48:11<23:10:00, 56.43s/it][A[A

/n Epoch: 521 |Training loss: 2.1488




training:   7%|▋         | 115/1592 [1:49:07<23:02:07, 56.15s/it][A[A

/n Epoch: 522 |Training loss: 2.1464




training:   7%|▋         | 116/1592 [1:50:02<22:55:28, 55.91s/it][A[A

/n Epoch: 523 |Training loss: 2.1451




training:   7%|▋         | 117/1592 [1:50:57<22:50:24, 55.75s/it][A[A

/n Epoch: 524 |Training loss: 2.1356
validation loss: 2.2942




training:   7%|▋         | 118/1592 [1:51:56<23:14:05, 56.75s/it][A[A

/n Epoch: 525 |Training loss: 2.1376




training:   7%|▋         | 119/1592 [1:52:52<23:05:27, 56.43s/it][A[A

/n Epoch: 526 |Training loss: 2.1261




training:   8%|▊         | 120/1592 [1:53:47<22:56:31, 56.11s/it][A[A

/n Epoch: 527 |Training loss: 2.1355




training:   8%|▊         | 121/1592 [1:54:43<22:53:21, 56.02s/it][A[A

/n Epoch: 528 |Training loss: 2.1369




training:   8%|▊         | 122/1592 [1:55:39<22:52:14, 56.01s/it][A[A

/n Epoch: 529 |Training loss: 2.1219
validation loss: 2.2995




training:   8%|▊         | 123/1592 [1:56:39<23:17:43, 57.09s/it][A[A

/n Epoch: 530 |Training loss: 2.1369




training:   8%|▊         | 124/1592 [1:57:35<23:11:19, 56.87s/it][A[A

/n Epoch: 531 |Training loss: 2.1253




training:   8%|▊         | 125/1592 [1:58:31<23:02:18, 56.54s/it][A[A

/n Epoch: 532 |Training loss: 2.1231




training:   8%|▊         | 126/1592 [1:59:27<22:56:56, 56.36s/it][A[A

/n Epoch: 533 |Training loss: 2.1225




training:   8%|▊         | 127/1592 [2:00:22<22:49:37, 56.09s/it][A[A

/n Epoch: 534 |Training loss: 2.1209
validation loss: 2.2705




training:   8%|▊         | 128/1592 [2:01:22<23:13:36, 57.12s/it][A[A

/n Epoch: 535 |Training loss: 2.1119




training:   8%|▊         | 129/1592 [2:02:18<23:03:12, 56.73s/it][A[A

/n Epoch: 536 |Training loss: 2.1077




training:   8%|▊         | 130/1592 [2:03:13<22:53:52, 56.38s/it][A[A

/n Epoch: 537 |Training loss: 2.1082




training:   8%|▊         | 131/1592 [2:04:08<22:44:21, 56.03s/it][A[A

/n Epoch: 538 |Training loss: 2.1074




training:   8%|▊         | 132/1592 [2:05:04<22:39:11, 55.86s/it][A[A

/n Epoch: 539 |Training loss: 2.1017
validation loss: 2.2690




training:   8%|▊         | 133/1592 [2:06:03<23:00:45, 56.78s/it][A[A

/n Epoch: 540 |Training loss: 2.0965




training:   8%|▊         | 134/1592 [2:06:58<22:49:15, 56.35s/it][A[A

/n Epoch: 541 |Training loss: 2.0999




training:   8%|▊         | 135/1592 [2:07:54<22:42:18, 56.10s/it][A[A

/n Epoch: 542 |Training loss: 2.0936




training:   9%|▊         | 136/1592 [2:08:49<22:35:41, 55.87s/it][A[A

/n Epoch: 543 |Training loss: 2.0893




training:   9%|▊         | 137/1592 [2:09:45<22:32:55, 55.79s/it][A[A

/n Epoch: 544 |Training loss: 2.0927
validation loss: 2.2536




training:   9%|▊         | 138/1592 [2:10:44<22:54:21, 56.71s/it][A[A

/n Epoch: 545 |Training loss: 2.0808




training:   9%|▊         | 139/1592 [2:11:39<22:43:48, 56.32s/it][A[A

/n Epoch: 546 |Training loss: 2.0894




training:   9%|▉         | 140/1592 [2:12:34<22:35:21, 56.01s/it][A[A

/n Epoch: 547 |Training loss: 2.0751




training:   9%|▉         | 141/1592 [2:13:29<22:28:49, 55.78s/it][A[A

/n Epoch: 548 |Training loss: 2.0896




training:   9%|▉         | 142/1592 [2:14:25<22:26:01, 55.70s/it][A[A

/n Epoch: 549 |Training loss: 2.0782
validation loss: 2.2460




training:   9%|▉         | 143/1592 [2:15:24<22:48:25, 56.66s/it][A[A

/n Epoch: 550 |Training loss: 2.0890




training:   9%|▉         | 144/1592 [2:16:19<22:38:10, 56.28s/it][A[A

/n Epoch: 551 |Training loss: 2.0782




training:   9%|▉         | 145/1592 [2:17:15<22:30:48, 56.01s/it][A[A

/n Epoch: 552 |Training loss: 2.0798




training:   9%|▉         | 146/1592 [2:18:10<22:25:09, 55.82s/it][A[A

/n Epoch: 553 |Training loss: 2.0663




training:   9%|▉         | 147/1592 [2:19:05<22:20:39, 55.67s/it][A[A

/n Epoch: 554 |Training loss: 2.0856
validation loss: 2.2565




training:   9%|▉         | 148/1592 [2:20:04<22:43:36, 56.66s/it][A[A

/n Epoch: 555 |Training loss: 2.0653




training:   9%|▉         | 149/1592 [2:21:00<22:33:01, 56.26s/it][A[A

/n Epoch: 556 |Training loss: 2.0950




training:   9%|▉         | 150/1592 [2:21:55<22:22:58, 55.88s/it][A[A

/n Epoch: 557 |Training loss: 2.0849




training:   9%|▉         | 151/1592 [2:22:50<22:17:16, 55.68s/it][A[A

/n Epoch: 558 |Training loss: 2.0788




training:  10%|▉         | 152/1592 [2:23:45<22:13:00, 55.54s/it][A[A

/n Epoch: 559 |Training loss: 2.0821
validation loss: 2.2427




training:  10%|▉         | 153/1592 [2:24:44<22:35:09, 56.50s/it][A[A

/n Epoch: 560 |Training loss: 2.0684




training:  10%|▉         | 154/1592 [2:25:39<22:25:17, 56.13s/it][A[A

/n Epoch: 561 |Training loss: 2.0757




training:  10%|▉         | 155/1592 [2:26:34<22:17:36, 55.85s/it][A[A

/n Epoch: 562 |Training loss: 2.0779




training:  10%|▉         | 156/1592 [2:27:29<22:11:24, 55.63s/it][A[A

/n Epoch: 563 |Training loss: 2.0584




training:  10%|▉         | 157/1592 [2:28:25<22:07:17, 55.50s/it][A[A

/n Epoch: 564 |Training loss: 2.0707
validation loss: 2.2254




training:  10%|▉         | 158/1592 [2:29:24<22:32:14, 56.58s/it][A[A

/n Epoch: 565 |Training loss: 2.0681




training:  10%|▉         | 159/1592 [2:30:19<22:21:48, 56.18s/it][A[A

/n Epoch: 566 |Training loss: 2.0591




training:  10%|█         | 160/1592 [2:31:14<22:13:56, 55.89s/it][A[A

/n Epoch: 567 |Training loss: 2.0615




training:  10%|█         | 161/1592 [2:32:09<22:08:22, 55.70s/it][A[A

/n Epoch: 568 |Training loss: 2.0510




training:  10%|█         | 162/1592 [2:33:05<22:03:36, 55.54s/it][A[A

/n Epoch: 569 |Training loss: 2.0544
validation loss: 2.2256




training:  10%|█         | 163/1592 [2:34:03<22:25:57, 56.51s/it][A[A

/n Epoch: 570 |Training loss: 2.0491




training:  10%|█         | 164/1592 [2:34:59<22:16:07, 56.14s/it][A[A

/n Epoch: 571 |Training loss: 2.0503




training:  10%|█         | 165/1592 [2:35:54<22:08:27, 55.86s/it][A[A

/n Epoch: 572 |Training loss: 2.0363




training:  10%|█         | 166/1592 [2:36:49<22:01:10, 55.59s/it][A[A

/n Epoch: 573 |Training loss: 2.0481




training:  10%|█         | 167/1592 [2:37:44<21:58:37, 55.52s/it][A[A

/n Epoch: 574 |Training loss: 2.0394
validation loss: 2.2042




training:  11%|█         | 168/1592 [2:38:43<22:20:08, 56.47s/it][A[A

/n Epoch: 575 |Training loss: 2.0403




training:  11%|█         | 169/1592 [2:39:38<22:11:39, 56.15s/it][A[A

/n Epoch: 576 |Training loss: 2.0394




training:  11%|█         | 170/1592 [2:40:33<22:03:14, 55.83s/it][A[A

/n Epoch: 577 |Training loss: 2.0307




training:  11%|█         | 171/1592 [2:41:28<21:56:01, 55.57s/it][A[A

/n Epoch: 578 |Training loss: 2.0306




training:  11%|█         | 172/1592 [2:42:24<21:53:04, 55.48s/it][A[A

/n Epoch: 579 |Training loss: 2.0282
validation loss: 2.1925




training:  11%|█         | 173/1592 [2:43:22<22:14:18, 56.42s/it][A[A

/n Epoch: 580 |Training loss: 2.0195




training:  11%|█         | 174/1592 [2:44:17<22:05:53, 56.10s/it][A[A

/n Epoch: 581 |Training loss: 2.0231




training:  11%|█         | 175/1592 [2:45:13<21:58:09, 55.81s/it][A[A

/n Epoch: 582 |Training loss: 2.0140




training:  11%|█         | 176/1592 [2:46:08<21:51:36, 55.58s/it][A[A

/n Epoch: 583 |Training loss: 2.0226




training:  11%|█         | 177/1592 [2:47:03<21:46:50, 55.41s/it][A[A

/n Epoch: 584 |Training loss: 2.0151
validation loss: 2.1800




training:  11%|█         | 178/1592 [2:48:01<22:09:08, 56.40s/it][A[A

/n Epoch: 585 |Training loss: 2.0106




training:  11%|█         | 179/1592 [2:48:56<21:58:46, 56.00s/it][A[A

/n Epoch: 586 |Training loss: 2.0182




training:  11%|█▏        | 180/1592 [2:49:52<21:52:48, 55.78s/it][A[A

/n Epoch: 587 |Training loss: 2.0057




training:  11%|█▏        | 181/1592 [2:50:47<21:46:32, 55.56s/it][A[A

/n Epoch: 588 |Training loss: 2.0150




training:  11%|█▏        | 182/1592 [2:51:42<21:41:50, 55.40s/it][A[A

/n Epoch: 589 |Training loss: 2.0079
validation loss: 2.1758




training:  11%|█▏        | 183/1592 [2:52:40<22:04:05, 56.38s/it][A[A

/n Epoch: 590 |Training loss: 2.0095




training:  12%|█▏        | 184/1592 [2:53:36<21:54:51, 56.03s/it][A[A

/n Epoch: 591 |Training loss: 1.9993




training:  12%|█▏        | 185/1592 [2:54:31<21:47:36, 55.76s/it][A[A

/n Epoch: 592 |Training loss: 1.9999




training:  12%|█▏        | 186/1592 [2:55:26<21:41:54, 55.56s/it][A[A

/n Epoch: 593 |Training loss: 2.0031




training:  12%|█▏        | 187/1592 [2:56:21<21:37:35, 55.41s/it][A[A

/n Epoch: 594 |Training loss: 2.0027
validation loss: 2.1697




training:  12%|█▏        | 188/1592 [2:57:20<22:00:12, 56.42s/it][A[A

/n Epoch: 595 |Training loss: 2.0003




training:  12%|█▏        | 189/1592 [2:58:15<21:49:35, 56.01s/it][A[A

/n Epoch: 596 |Training loss: 1.9963




training:  12%|█▏        | 190/1592 [2:59:10<21:41:45, 55.71s/it][A[A

/n Epoch: 597 |Training loss: 1.9882




training:  12%|█▏        | 191/1592 [3:00:05<21:38:16, 55.60s/it][A[A

/n Epoch: 598 |Training loss: 1.9970




training:  12%|█▏        | 192/1592 [3:01:00<21:33:26, 55.43s/it][A[A

/n Epoch: 599 |Training loss: 1.9879
validation loss: 2.1550




training:  12%|█▏        | 193/1592 [3:01:59<21:56:02, 56.44s/it][A[A

/n Epoch: 600 |Training loss: 2.0036




training:  12%|█▏        | 194/1592 [3:02:54<21:46:41, 56.08s/it][A[A

/n Epoch: 601 |Training loss: 1.9899




training:  12%|█▏        | 195/1592 [3:03:49<21:39:14, 55.80s/it][A[A

/n Epoch: 602 |Training loss: 2.0001




training:  12%|█▏        | 196/1592 [3:04:44<21:32:13, 55.54s/it][A[A

/n Epoch: 603 |Training loss: 1.9879




training:  12%|█▏        | 197/1592 [3:05:40<21:31:06, 55.53s/it][A[A

/n Epoch: 604 |Training loss: 1.9938
validation loss: 2.1685




training:  12%|█▏        | 198/1592 [3:06:39<21:53:30, 56.54s/it][A[A

/n Epoch: 605 |Training loss: 1.9821




training:  12%|█▎        | 199/1592 [3:07:34<21:43:06, 56.13s/it][A[A

/n Epoch: 606 |Training loss: 1.9955




training:  13%|█▎        | 200/1592 [3:08:29<21:35:56, 55.86s/it][A[A

/n Epoch: 607 |Training loss: 1.9794




training:  13%|█▎        | 201/1592 [3:09:24<21:29:43, 55.63s/it][A[A

/n Epoch: 608 |Training loss: 1.9980




training:  13%|█▎        | 202/1592 [3:10:20<21:28:46, 55.63s/it][A[A

/n Epoch: 609 |Training loss: 1.9841
validation loss: 2.1447




training:  13%|█▎        | 203/1592 [3:11:19<21:51:10, 56.64s/it][A[A

/n Epoch: 610 |Training loss: 1.9810




training:  13%|█▎        | 204/1592 [3:12:14<21:42:55, 56.32s/it][A[A

/n Epoch: 611 |Training loss: 1.9771




training:  13%|█▎        | 205/1592 [3:13:10<21:34:29, 56.00s/it][A[A

/n Epoch: 612 |Training loss: 1.9774




training:  13%|█▎        | 206/1592 [3:14:05<21:27:49, 55.75s/it][A[A

/n Epoch: 613 |Training loss: 1.9762




training:  13%|█▎        | 207/1592 [3:15:00<21:22:14, 55.55s/it][A[A

/n Epoch: 614 |Training loss: 1.9724
validation loss: 2.1331




training:  13%|█▎        | 208/1592 [3:15:59<21:45:15, 56.59s/it][A[A

/n Epoch: 615 |Training loss: 1.9768




training:  13%|█▎        | 209/1592 [3:16:54<21:33:25, 56.11s/it][A[A

/n Epoch: 616 |Training loss: 1.9627




training:  13%|█▎        | 210/1592 [3:17:49<21:27:09, 55.88s/it][A[A

/n Epoch: 617 |Training loss: 1.9736




training:  13%|█▎        | 211/1592 [3:18:45<21:22:02, 55.70s/it][A[A

/n Epoch: 618 |Training loss: 1.9591




training:  13%|█▎        | 212/1592 [3:19:40<21:17:36, 55.55s/it][A[A

/n Epoch: 619 |Training loss: 1.9771
validation loss: 2.1559




training:  13%|█▎        | 213/1592 [3:20:39<21:40:24, 56.58s/it][A[A

/n Epoch: 620 |Training loss: 1.9592




training:  13%|█▎        | 214/1592 [3:21:34<21:29:57, 56.17s/it][A[A

/n Epoch: 621 |Training loss: 1.9832




training:  14%|█▎        | 215/1592 [3:22:29<21:21:49, 55.85s/it][A[A

/n Epoch: 622 |Training loss: 1.9672




training:  14%|█▎        | 216/1592 [3:23:24<21:16:01, 55.64s/it][A[A

/n Epoch: 623 |Training loss: 1.9730




training:  14%|█▎        | 217/1592 [3:24:19<21:10:20, 55.43s/it][A[A

/n Epoch: 624 |Training loss: 1.9711
validation loss: 2.1432




training:  14%|█▎        | 218/1592 [3:25:18<21:33:38, 56.49s/it][A[A

/n Epoch: 625 |Training loss: 1.9641




training:  14%|█▍        | 219/1592 [3:26:13<21:24:42, 56.14s/it][A[A

/n Epoch: 626 |Training loss: 1.9763




training:  14%|█▍        | 220/1592 [3:27:09<21:16:50, 55.84s/it][A[A

/n Epoch: 627 |Training loss: 1.9586




training:  14%|█▍        | 221/1592 [3:28:03<21:09:35, 55.56s/it][A[A

/n Epoch: 628 |Training loss: 1.9749




training:  14%|█▍        | 222/1592 [3:28:59<21:06:13, 55.46s/it][A[A

/n Epoch: 629 |Training loss: 1.9577
validation loss: 2.1334




training:  14%|█▍        | 223/1592 [3:29:57<21:28:20, 56.47s/it][A[A

/n Epoch: 630 |Training loss: 1.9629




training:  14%|█▍        | 224/1592 [3:30:53<21:20:42, 56.17s/it][A[A

/n Epoch: 631 |Training loss: 1.9655




training:  14%|█▍        | 225/1592 [3:31:48<21:12:43, 55.86s/it][A[A

/n Epoch: 632 |Training loss: 1.9603




training:  14%|█▍        | 226/1592 [3:32:43<21:06:03, 55.61s/it][A[A

/n Epoch: 633 |Training loss: 1.9662




training:  14%|█▍        | 227/1592 [3:33:38<21:03:25, 55.54s/it][A[A

/n Epoch: 634 |Training loss: 1.9517
validation loss: 2.1247




training:  14%|█▍        | 228/1592 [3:34:37<21:24:20, 56.50s/it][A[A

/n Epoch: 635 |Training loss: 1.9551




training:  14%|█▍        | 229/1592 [3:35:32<21:14:07, 56.09s/it][A[A

/n Epoch: 636 |Training loss: 1.9560




training:  14%|█▍        | 230/1592 [3:36:28<21:07:52, 55.85s/it][A[A

/n Epoch: 637 |Training loss: 1.9457




training:  15%|█▍        | 231/1592 [3:37:23<21:00:43, 55.58s/it][A[A

/n Epoch: 638 |Training loss: 1.9503




training:  15%|█▍        | 232/1592 [3:38:18<20:57:38, 55.48s/it][A[A

/n Epoch: 639 |Training loss: 1.9420
validation loss: 2.1000




training:  15%|█▍        | 233/1592 [3:39:17<21:19:14, 56.48s/it][A[A

/n Epoch: 640 |Training loss: 1.9437




training:  15%|█▍        | 234/1592 [3:40:12<21:09:30, 56.09s/it][A[A

/n Epoch: 641 |Training loss: 1.9323




training:  15%|█▍        | 235/1592 [3:41:07<21:04:07, 55.89s/it][A[A

/n Epoch: 642 |Training loss: 1.9358




training:  15%|█▍        | 236/1592 [3:42:03<20:59:20, 55.72s/it][A[A

/n Epoch: 643 |Training loss: 1.9323




training:  15%|█▍        | 237/1592 [3:42:58<20:55:38, 55.60s/it][A[A

/n Epoch: 644 |Training loss: 1.9387
validation loss: 2.1066




training:  15%|█▍        | 238/1592 [3:43:57<21:17:17, 56.60s/it][A[A

/n Epoch: 645 |Training loss: 1.9292




training:  15%|█▌        | 239/1592 [3:44:52<21:07:39, 56.22s/it][A[A

/n Epoch: 646 |Training loss: 1.9333




training:  15%|█▌        | 240/1592 [3:45:47<21:00:39, 55.95s/it][A[A

/n Epoch: 647 |Training loss: 1.9276




training:  15%|█▌        | 241/1592 [3:46:43<20:57:02, 55.83s/it][A[A

/n Epoch: 648 |Training loss: 1.9286




training:  15%|█▌        | 242/1592 [3:47:38<20:53:34, 55.71s/it][A[A

/n Epoch: 649 |Training loss: 1.9200
validation loss: 2.0786




training:  15%|█▌        | 243/1592 [3:48:38<21:15:22, 56.73s/it][A[A

/n Epoch: 650 |Training loss: 1.9231




training:  15%|█▌        | 244/1592 [3:49:33<21:06:29, 56.37s/it][A[A

/n Epoch: 651 |Training loss: 1.9094




training:  15%|█▌        | 245/1592 [3:50:29<21:00:19, 56.14s/it][A[A

/n Epoch: 652 |Training loss: 1.9326




training:  15%|█▌        | 246/1592 [3:51:24<20:55:47, 55.98s/it][A[A

/n Epoch: 653 |Training loss: 1.9133




training:  16%|█▌        | 247/1592 [3:52:20<20:52:01, 55.85s/it][A[A

/n Epoch: 654 |Training loss: 1.9441
validation loss: 2.1024




training:  16%|█▌        | 248/1592 [3:53:19<21:14:19, 56.89s/it][A[A

/n Epoch: 655 |Training loss: 1.9260




training:  16%|█▌        | 249/1592 [3:54:15<21:04:49, 56.51s/it][A[A

/n Epoch: 656 |Training loss: 1.9379




training:  16%|█▌        | 250/1592 [3:55:11<20:58:42, 56.28s/it][A[A

/n Epoch: 657 |Training loss: 1.9362




training:  16%|█▌        | 251/1592 [3:56:06<20:53:40, 56.09s/it][A[A

/n Epoch: 658 |Training loss: 1.9289




training:  16%|█▌        | 252/1592 [3:57:02<20:50:39, 56.00s/it][A[A

/n Epoch: 659 |Training loss: 1.9304
validation loss: 2.1034




training:  16%|█▌        | 253/1592 [3:58:01<21:12:11, 57.01s/it][A[A

/n Epoch: 660 |Training loss: 1.9273




training:  16%|█▌        | 254/1592 [3:58:57<21:02:57, 56.64s/it][A[A

/n Epoch: 661 |Training loss: 1.9303




training:  16%|█▌        | 255/1592 [3:59:53<20:55:29, 56.34s/it][A[A

/n Epoch: 662 |Training loss: 1.9246




training:  16%|█▌        | 256/1592 [4:00:48<20:49:13, 56.10s/it][A[A

/n Epoch: 663 |Training loss: 1.9167




training:  16%|█▌        | 257/1592 [4:01:44<20:47:03, 56.05s/it][A[A

/n Epoch: 664 |Training loss: 1.9140
validation loss: 2.0720




training:  16%|█▌        | 258/1592 [4:02:44<21:08:21, 57.05s/it][A[A

/n Epoch: 665 |Training loss: 1.9126




training:  16%|█▋        | 259/1592 [4:03:39<20:58:34, 56.65s/it][A[A

/n Epoch: 666 |Training loss: 1.9012




training:  16%|█▋        | 260/1592 [4:04:35<20:51:29, 56.37s/it][A[A

/n Epoch: 667 |Training loss: 1.9219




training:  16%|█▋        | 261/1592 [4:05:31<20:45:59, 56.17s/it][A[A

/n Epoch: 668 |Training loss: 1.9041




training:  16%|█▋        | 262/1592 [4:06:27<20:44:33, 56.15s/it][A[A

/n Epoch: 669 |Training loss: 1.9289
validation loss: 2.0753




training:  17%|█▋        | 263/1592 [4:07:26<21:06:01, 57.16s/it][A[A

/n Epoch: 670 |Training loss: 1.9082




training:  17%|█▋        | 264/1592 [4:08:22<20:54:45, 56.69s/it][A[A

/n Epoch: 671 |Training loss: 1.9126




training:  17%|█▋        | 265/1592 [4:09:18<20:48:07, 56.43s/it][A[A

/n Epoch: 672 |Training loss: 1.9136




training:  17%|█▋        | 266/1592 [4:10:14<20:43:21, 56.26s/it][A[A

/n Epoch: 673 |Training loss: 1.9068




training:  17%|█▋        | 267/1592 [4:11:10<20:39:59, 56.15s/it][A[A

/n Epoch: 674 |Training loss: 1.9128
validation loss: 2.0805




training:  17%|█▋        | 268/1592 [4:12:09<21:00:02, 57.10s/it][A[A

/n Epoch: 675 |Training loss: 1.8996




training:  17%|█▋        | 269/1592 [4:13:04<20:48:36, 56.63s/it][A[A

/n Epoch: 676 |Training loss: 1.9080




training:  17%|█▋        | 270/1592 [4:14:00<20:41:19, 56.34s/it][A[A

/n Epoch: 677 |Training loss: 1.8974




training:  17%|█▋        | 271/1592 [4:14:56<20:35:47, 56.13s/it][A[A

/n Epoch: 678 |Training loss: 1.9030




training:  17%|█▋        | 272/1592 [4:15:51<20:31:36, 55.98s/it][A[A

/n Epoch: 679 |Training loss: 1.8968
validation loss: 2.0603




training:  17%|█▋        | 273/1592 [4:16:50<20:51:02, 56.91s/it][A[A

/n Epoch: 680 |Training loss: 1.8974




training:  17%|█▋        | 274/1592 [4:17:46<20:42:07, 56.55s/it][A[A

/n Epoch: 681 |Training loss: 1.8923




training:  17%|█▋        | 275/1592 [4:18:42<20:34:46, 56.25s/it][A[A

/n Epoch: 682 |Training loss: 1.8888




training:  17%|█▋        | 276/1592 [4:19:37<20:30:35, 56.11s/it][A[A

/n Epoch: 683 |Training loss: 1.8865




training:  17%|█▋        | 277/1592 [4:20:33<20:25:37, 55.92s/it][A[A

/n Epoch: 684 |Training loss: 1.8868
validation loss: 2.0579




training:  17%|█▋        | 278/1592 [4:21:32<20:47:50, 56.98s/it][A[A

/n Epoch: 685 |Training loss: 1.8847




training:  18%|█▊        | 279/1592 [4:22:28<20:38:16, 56.58s/it][A[A

/n Epoch: 686 |Training loss: 1.8847




training:  18%|█▊        | 280/1592 [4:23:24<20:32:31, 56.37s/it][A[A

/n Epoch: 687 |Training loss: 1.8878




training:  18%|█▊        | 281/1592 [4:24:19<20:26:28, 56.13s/it][A[A

/n Epoch: 688 |Training loss: 1.8762




training:  18%|█▊        | 282/1592 [4:25:15<20:24:00, 56.06s/it][A[A

/n Epoch: 689 |Training loss: 1.8771
validation loss: 2.0438




training:  18%|█▊        | 283/1592 [4:26:15<20:44:55, 57.06s/it][A[A

/n Epoch: 690 |Training loss: 1.8783




training:  18%|█▊        | 284/1592 [4:27:11<20:39:36, 56.86s/it][A[A

/n Epoch: 691 |Training loss: 1.8751




training:  18%|█▊        | 285/1592 [4:28:08<20:37:20, 56.80s/it][A[A

/n Epoch: 692 |Training loss: 1.8694




training:  18%|█▊        | 286/1592 [4:29:04<20:30:57, 56.55s/it][A[A

/n Epoch: 693 |Training loss: 1.8727




training:  18%|█▊        | 287/1592 [4:30:00<20:29:33, 56.53s/it][A[A

/n Epoch: 694 |Training loss: 1.8721
validation loss: 2.0446




training:  18%|█▊        | 288/1592 [4:31:01<20:52:48, 57.64s/it][A[A

/n Epoch: 695 |Training loss: 1.8670




training:  18%|█▊        | 289/1592 [4:31:57<20:43:22, 57.25s/it][A[A

/n Epoch: 696 |Training loss: 1.8695




training:  18%|█▊        | 290/1592 [4:32:53<20:34:37, 56.90s/it][A[A

/n Epoch: 697 |Training loss: 1.8557




training:  18%|█▊        | 291/1592 [4:33:48<20:24:52, 56.49s/it][A[A

/n Epoch: 698 |Training loss: 1.8848




training:  18%|█▊        | 292/1592 [4:34:44<20:19:14, 56.27s/it][A[A

/n Epoch: 699 |Training loss: 1.8632
validation loss: 2.0306




training:  18%|█▊        | 293/1592 [4:35:43<20:37:13, 57.15s/it][A[A

/n Epoch: 700 |Training loss: 1.8914




training:  18%|█▊        | 294/1592 [4:36:39<20:26:32, 56.70s/it][A[A

/n Epoch: 701 |Training loss: 1.8577




training:  19%|█▊        | 295/1592 [4:37:35<20:18:19, 56.36s/it][A[A

/n Epoch: 702 |Training loss: 1.9304




training:  19%|█▊        | 296/1592 [4:38:30<20:14:01, 56.20s/it][A[A

/n Epoch: 703 |Training loss: 1.9014




training:  19%|█▊        | 297/1592 [4:39:26<20:07:46, 55.96s/it][A[A

/n Epoch: 704 |Training loss: 1.9102
validation loss: 2.0544




training:  19%|█▊        | 298/1592 [4:40:25<20:29:59, 57.03s/it][A[A

/n Epoch: 705 |Training loss: 1.9145




training:  19%|█▉        | 299/1592 [4:41:21<20:20:48, 56.65s/it][A[A

/n Epoch: 706 |Training loss: 1.8894




training:  19%|█▉        | 300/1592 [4:42:17<20:15:49, 56.46s/it][A[A

/n Epoch: 707 |Training loss: 1.8989




training:  19%|█▉        | 301/1592 [4:43:13<20:11:37, 56.31s/it][A[A

/n Epoch: 708 |Training loss: 1.8972




training:  19%|█▉        | 302/1592 [4:44:09<20:07:43, 56.17s/it][A[A

/n Epoch: 709 |Training loss: 1.8759
validation loss: 2.0625




training:  19%|█▉        | 303/1592 [4:45:08<20:25:22, 57.04s/it][A[A

/n Epoch: 710 |Training loss: 1.9109




training:  19%|█▉        | 304/1592 [4:46:03<20:13:51, 56.55s/it][A[A

/n Epoch: 711 |Training loss: 1.8902




training:  19%|█▉        | 305/1592 [4:46:59<20:05:17, 56.19s/it][A[A

/n Epoch: 712 |Training loss: 1.8746




training:  19%|█▉        | 306/1592 [4:47:54<19:58:23, 55.91s/it][A[A

/n Epoch: 713 |Training loss: 1.8916




training:  19%|█▉        | 307/1592 [4:48:50<19:54:22, 55.77s/it][A[A

/n Epoch: 714 |Training loss: 1.8774
validation loss: 2.0481




training:  19%|█▉        | 308/1592 [4:49:48<20:12:24, 56.65s/it][A[A

/n Epoch: 715 |Training loss: 1.8831




training:  19%|█▉        | 309/1592 [4:50:43<20:01:11, 56.17s/it][A[A

/n Epoch: 716 |Training loss: 1.8773




training:  19%|█▉        | 310/1592 [4:51:38<19:53:35, 55.86s/it][A[A

/n Epoch: 717 |Training loss: 1.8772




training:  20%|█▉        | 311/1592 [4:52:34<19:48:00, 55.64s/it][A[A

/n Epoch: 718 |Training loss: 1.8593




training:  20%|█▉        | 312/1592 [4:53:29<19:43:30, 55.48s/it][A[A

/n Epoch: 719 |Training loss: 1.8664
validation loss: 2.0313




training:  20%|█▉        | 313/1592 [4:54:27<20:03:48, 56.47s/it][A[A

/n Epoch: 720 |Training loss: 1.8652




training:  20%|█▉        | 314/1592 [4:55:23<19:54:21, 56.07s/it][A[A

/n Epoch: 721 |Training loss: 1.8570




training:  20%|█▉        | 315/1592 [4:56:18<19:47:05, 55.78s/it][A[A

/n Epoch: 722 |Training loss: 1.8562




training:  20%|█▉        | 316/1592 [4:57:13<19:42:05, 55.58s/it][A[A

/n Epoch: 723 |Training loss: 1.8640




training:  20%|█▉        | 317/1592 [4:58:08<19:41:26, 55.60s/it][A[A

/n Epoch: 724 |Training loss: 1.8528
validation loss: 2.0177




training:  20%|█▉        | 318/1592 [4:59:09<20:09:55, 56.98s/it][A[A

/n Epoch: 725 |Training loss: 1.8557




training:  20%|██        | 319/1592 [5:00:06<20:09:09, 56.99s/it][A[A

/n Epoch: 726 |Training loss: 1.8493




training:  20%|██        | 320/1592 [5:01:03<20:09:25, 57.05s/it][A[A

/n Epoch: 727 |Training loss: 1.8491




training:  20%|██        | 321/1592 [5:02:00<20:08:53, 57.07s/it][A[A

/n Epoch: 728 |Training loss: 1.8467




training:  20%|██        | 322/1592 [5:02:58<20:12:15, 57.27s/it][A[A

/n Epoch: 729 |Training loss: 1.8495
validation loss: 2.0186




training:  20%|██        | 323/1592 [5:03:59<20:39:13, 58.59s/it][A[A

/n Epoch: 730 |Training loss: 1.8311




training:  20%|██        | 324/1592 [5:04:57<20:32:53, 58.34s/it][A[A

/n Epoch: 731 |Training loss: 1.8457




training:  20%|██        | 325/1592 [5:05:55<20:25:57, 58.06s/it][A[A

/n Epoch: 732 |Training loss: 1.8270




training:  20%|██        | 326/1592 [5:06:51<20:15:47, 57.62s/it][A[A

/n Epoch: 733 |Training loss: 1.8515




training:  21%|██        | 327/1592 [5:07:47<20:05:13, 57.16s/it][A[A

/n Epoch: 734 |Training loss: 1.8299
validation loss: 2.0039




training:  21%|██        | 328/1592 [5:08:47<20:18:33, 57.84s/it][A[A

/n Epoch: 735 |Training loss: 1.8515




training:  21%|██        | 329/1592 [5:09:42<20:04:00, 57.20s/it][A[A

/n Epoch: 736 |Training loss: 1.8350




training:  21%|██        | 330/1592 [5:10:38<19:51:53, 56.67s/it][A[A

/n Epoch: 737 |Training loss: 1.8564




training:  21%|██        | 331/1592 [5:11:33<19:42:26, 56.26s/it][A[A

/n Epoch: 738 |Training loss: 1.8488




training:  21%|██        | 332/1592 [5:12:29<19:36:45, 56.04s/it][A[A

/n Epoch: 739 |Training loss: 1.8507
validation loss: 2.0017




training:  21%|██        | 333/1592 [5:13:28<19:54:29, 56.93s/it][A[A

/n Epoch: 740 |Training loss: 1.8676




training:  21%|██        | 334/1592 [5:14:23<19:44:21, 56.49s/it][A[A

/n Epoch: 741 |Training loss: 1.8379




training:  21%|██        | 335/1592 [5:15:18<19:35:19, 56.10s/it][A[A

/n Epoch: 742 |Training loss: 1.8450




training:  21%|██        | 336/1592 [5:16:13<19:28:21, 55.81s/it][A[A

/n Epoch: 743 |Training loss: 1.8322




training:  21%|██        | 337/1592 [5:17:08<19:21:41, 55.54s/it][A[A

/n Epoch: 744 |Training loss: 1.8467
validation loss: 2.0045




training:  21%|██        | 338/1592 [5:18:07<19:42:51, 56.60s/it][A[A

/n Epoch: 745 |Training loss: 1.8288




training:  21%|██▏       | 339/1592 [5:19:02<19:32:16, 56.13s/it][A[A

/n Epoch: 746 |Training loss: 1.8366




training:  21%|██▏       | 340/1592 [5:19:58<19:26:11, 55.89s/it][A[A

/n Epoch: 747 |Training loss: 1.8387




training:  21%|██▏       | 341/1592 [5:20:53<19:19:22, 55.61s/it][A[A

/n Epoch: 748 |Training loss: 1.8200




training:  21%|██▏       | 342/1592 [5:21:48<19:16:37, 55.52s/it][A[A

/n Epoch: 749 |Training loss: 1.8554
validation loss: 2.0355




training:  22%|██▏       | 343/1592 [5:22:47<19:36:29, 56.52s/it][A[A

/n Epoch: 750 |Training loss: 1.8270




training:  22%|██▏       | 344/1592 [5:23:42<19:27:25, 56.13s/it][A[A

/n Epoch: 751 |Training loss: 1.8730




training:  22%|██▏       | 345/1592 [5:24:37<19:20:33, 55.84s/it][A[A

/n Epoch: 752 |Training loss: 1.8591




training:  22%|██▏       | 346/1592 [5:25:32<19:14:19, 55.59s/it][A[A

/n Epoch: 753 |Training loss: 1.8538




training:  22%|██▏       | 347/1592 [5:26:28<19:11:30, 55.49s/it][A[A

/n Epoch: 754 |Training loss: 1.8412
validation loss: 1.9975




training:  22%|██▏       | 348/1592 [5:27:26<19:30:43, 56.47s/it][A[A

/n Epoch: 755 |Training loss: 1.8462




training:  22%|██▏       | 349/1592 [5:28:21<19:21:32, 56.07s/it][A[A

/n Epoch: 756 |Training loss: 1.8386




training:  22%|██▏       | 350/1592 [5:29:16<19:14:28, 55.77s/it][A[A

/n Epoch: 757 |Training loss: 1.8381




training:  22%|██▏       | 351/1592 [5:30:11<19:08:56, 55.55s/it][A[A

/n Epoch: 758 |Training loss: 1.8176




training:  22%|██▏       | 352/1592 [5:31:07<19:06:14, 55.46s/it][A[A

/n Epoch: 759 |Training loss: 1.8417
validation loss: 1.9951




training:  22%|██▏       | 353/1592 [5:32:06<19:26:39, 56.50s/it][A[A

/n Epoch: 760 |Training loss: 1.8295




training:  22%|██▏       | 354/1592 [5:33:01<19:17:43, 56.11s/it][A[A

/n Epoch: 761 |Training loss: 1.8267




training:  22%|██▏       | 355/1592 [5:33:56<19:09:58, 55.78s/it][A[A

/n Epoch: 762 |Training loss: 1.8326




training:  22%|██▏       | 356/1592 [5:34:51<19:04:33, 55.56s/it][A[A

/n Epoch: 763 |Training loss: 1.8219




training:  22%|██▏       | 357/1592 [5:35:46<19:01:23, 55.45s/it][A[A

/n Epoch: 764 |Training loss: 1.8155
validation loss: 1.9791




training:  22%|██▏       | 358/1592 [5:36:45<19:20:11, 56.41s/it][A[A

/n Epoch: 765 |Training loss: 1.8210




training:  23%|██▎       | 359/1592 [5:37:40<19:11:05, 56.01s/it][A[A

/n Epoch: 766 |Training loss: 1.8141




training:  23%|██▎       | 360/1592 [5:38:35<19:04:22, 55.73s/it][A[A

/n Epoch: 767 |Training loss: 1.8046




training:  23%|██▎       | 361/1592 [5:39:30<18:59:28, 55.54s/it][A[A

/n Epoch: 768 |Training loss: 1.8054




training:  23%|██▎       | 362/1592 [5:40:25<18:56:06, 55.42s/it][A[A

/n Epoch: 769 |Training loss: 1.8029
validation loss: 1.9635




training:  23%|██▎       | 363/1592 [5:41:24<19:14:48, 56.38s/it][A[A

/n Epoch: 770 |Training loss: 1.7984




training:  23%|██▎       | 364/1592 [5:42:19<19:06:26, 56.02s/it][A[A

/n Epoch: 771 |Training loss: 1.7915




training:  23%|██▎       | 365/1592 [5:43:14<18:59:50, 55.74s/it][A[A

/n Epoch: 772 |Training loss: 1.7920




training:  23%|██▎       | 366/1592 [5:44:09<18:54:53, 55.54s/it][A[A

/n Epoch: 773 |Training loss: 1.7899




training:  23%|██▎       | 367/1592 [5:45:04<18:49:58, 55.35s/it][A[A

/n Epoch: 774 |Training loss: 1.7890
validation loss: 1.9563




training:  23%|██▎       | 368/1592 [5:46:03<19:11:32, 56.45s/it][A[A

/n Epoch: 775 |Training loss: 1.7916




training:  23%|██▎       | 369/1592 [5:46:58<19:01:42, 56.01s/it][A[A

/n Epoch: 776 |Training loss: 1.7912




training:  23%|██▎       | 370/1592 [5:47:53<18:55:41, 55.76s/it][A[A

/n Epoch: 777 |Training loss: 1.7952




training:  23%|██▎       | 371/1592 [5:48:48<18:49:11, 55.49s/it][A[A

/n Epoch: 778 |Training loss: 1.7875




training:  23%|██▎       | 372/1592 [5:49:43<18:46:15, 55.39s/it][A[A

/n Epoch: 779 |Training loss: 1.7878
validation loss: 1.9498




training:  23%|██▎       | 373/1592 [5:50:42<19:07:09, 56.46s/it][A[A

/n Epoch: 780 |Training loss: 1.7815




training:  23%|██▎       | 374/1592 [5:51:38<18:59:26, 56.13s/it][A[A

/n Epoch: 781 |Training loss: 1.7879




training:  24%|██▎       | 375/1592 [5:52:33<18:53:09, 55.87s/it][A[A

/n Epoch: 782 |Training loss: 1.7675




training:  24%|██▎       | 376/1592 [5:53:28<18:47:50, 55.65s/it][A[A

/n Epoch: 783 |Training loss: 1.7771




training:  24%|██▎       | 377/1592 [5:54:23<18:45:43, 55.59s/it][A[A

/n Epoch: 784 |Training loss: 1.7686
validation loss: 1.9374




training:  24%|██▎       | 378/1592 [5:55:22<19:05:00, 56.59s/it][A[A

/n Epoch: 785 |Training loss: 1.7933




training:  24%|██▍       | 379/1592 [5:56:18<18:57:05, 56.25s/it][A[A

/n Epoch: 786 |Training loss: 1.7667




training:  24%|██▍       | 380/1592 [5:57:13<18:49:18, 55.91s/it][A[A

/n Epoch: 787 |Training loss: 1.7964




training:  24%|██▍       | 381/1592 [5:58:08<18:42:58, 55.64s/it][A[A

/n Epoch: 788 |Training loss: 1.7700




training:  24%|██▍       | 382/1592 [5:59:03<18:40:59, 55.59s/it][A[A

/n Epoch: 789 |Training loss: 1.8185
validation loss: 2.0072




training:  24%|██▍       | 383/1592 [6:00:02<18:59:42, 56.56s/it][A[A

/n Epoch: 790 |Training loss: 1.7863




training:  24%|██▍       | 384/1592 [6:00:57<18:50:21, 56.14s/it][A[A

/n Epoch: 791 |Training loss: 1.8400




training:  24%|██▍       | 385/1592 [6:01:53<18:43:26, 55.85s/it][A[A

/n Epoch: 792 |Training loss: 1.8168




training:  24%|██▍       | 386/1592 [6:02:48<18:38:31, 55.65s/it][A[A

/n Epoch: 793 |Training loss: 1.8223




training:  24%|██▍       | 387/1592 [6:03:43<18:34:44, 55.51s/it][A[A

/n Epoch: 794 |Training loss: 1.8344
validation loss: 1.9956




training:  24%|██▍       | 388/1592 [6:04:42<18:53:12, 56.47s/it][A[A

/n Epoch: 795 |Training loss: 1.7996




training:  24%|██▍       | 389/1592 [6:05:37<18:45:11, 56.12s/it][A[A

/n Epoch: 796 |Training loss: 1.8255




training:  24%|██▍       | 390/1592 [6:06:32<18:38:37, 55.84s/it][A[A

/n Epoch: 797 |Training loss: 1.8002




training:  25%|██▍       | 391/1592 [6:07:27<18:33:13, 55.61s/it][A[A

/n Epoch: 798 |Training loss: 1.8240




training:  25%|██▍       | 392/1592 [6:08:22<18:29:18, 55.47s/it][A[A

/n Epoch: 799 |Training loss: 1.8351
validation loss: 2.0252




training:  25%|██▍       | 393/1592 [6:09:21<18:47:39, 56.43s/it][A[A

/n Epoch: 800 |Training loss: 1.7933




training:  25%|██▍       | 394/1592 [6:10:16<18:38:49, 56.03s/it][A[A

/n Epoch: 801 |Training loss: 1.8477




training:  25%|██▍       | 395/1592 [6:11:11<18:32:02, 55.74s/it][A[A

/n Epoch: 802 |Training loss: 1.8045




training:  25%|██▍       | 396/1592 [6:12:06<18:28:00, 55.59s/it][A[A

/n Epoch: 803 |Training loss: 1.8389




training:  25%|██▍       | 397/1592 [6:13:01<18:23:26, 55.40s/it][A[A

/n Epoch: 804 |Training loss: 1.8707
validation loss: 1.9889




training:  25%|██▌       | 398/1592 [6:14:00<18:42:41, 56.42s/it][A[A

/n Epoch: 805 |Training loss: 1.8331




training:  25%|██▌       | 399/1592 [6:14:55<18:33:59, 56.03s/it][A[A

/n Epoch: 806 |Training loss: 1.8208




training:  25%|██▌       | 400/1592 [6:15:50<18:27:18, 55.74s/it][A[A

/n Epoch: 807 |Training loss: 1.8456




training:  25%|██▌       | 401/1592 [6:16:45<18:21:44, 55.50s/it][A[A

/n Epoch: 808 |Training loss: 1.8034




training:  25%|██▌       | 402/1592 [6:17:41<18:19:23, 55.43s/it][A[A

/n Epoch: 809 |Training loss: 1.8081
validation loss: 1.9568




training:  25%|██▌       | 403/1592 [6:18:39<18:37:55, 56.41s/it][A[A

/n Epoch: 810 |Training loss: 1.8132




training:  25%|██▌       | 404/1592 [6:19:34<18:28:53, 56.00s/it][A[A

/n Epoch: 811 |Training loss: 1.7967




training:  25%|██▌       | 405/1592 [6:20:29<18:22:26, 55.73s/it][A[A

/n Epoch: 812 |Training loss: 1.7855




training:  26%|██▌       | 406/1592 [6:21:24<18:16:04, 55.45s/it][A[A

/n Epoch: 813 |Training loss: 1.7985




training:  26%|██▌       | 407/1592 [6:22:20<18:14:41, 55.43s/it][A[A

/n Epoch: 814 |Training loss: 1.7832
validation loss: 1.9501




training:  26%|██▌       | 408/1592 [6:23:18<18:32:31, 56.38s/it][A[A

/n Epoch: 815 |Training loss: 1.7797




training:  26%|██▌       | 409/1592 [6:24:13<18:24:29, 56.02s/it][A[A

/n Epoch: 816 |Training loss: 1.7823




training:  26%|██▌       | 410/1592 [6:25:08<18:17:37, 55.72s/it][A[A

/n Epoch: 817 |Training loss: 1.7699




training:  26%|██▌       | 411/1592 [6:26:03<18:11:38, 55.46s/it][A[A

/n Epoch: 818 |Training loss: 1.7652




training:  26%|██▌       | 412/1592 [6:26:59<18:09:57, 55.42s/it][A[A

/n Epoch: 819 |Training loss: 1.7728
validation loss: 1.9153




training:  26%|██▌       | 413/1592 [6:27:57<18:28:34, 56.42s/it][A[A

/n Epoch: 820 |Training loss: 1.7642




training:  26%|██▌       | 414/1592 [6:28:53<18:22:55, 56.18s/it][A[A

/n Epoch: 821 |Training loss: 1.7575




training:  26%|██▌       | 415/1592 [6:29:48<18:16:19, 55.89s/it][A[A

/n Epoch: 822 |Training loss: 1.7589




training:  26%|██▌       | 416/1592 [6:30:43<18:10:32, 55.64s/it][A[A

/n Epoch: 823 |Training loss: 1.7518




training:  26%|██▌       | 417/1592 [6:31:38<18:06:04, 55.46s/it][A[A

/n Epoch: 824 |Training loss: 1.7457
validation loss: 1.9087




training:  26%|██▋       | 418/1592 [6:32:37<18:25:07, 56.48s/it][A[A

/n Epoch: 825 |Training loss: 1.7475




training:  26%|██▋       | 419/1592 [6:33:32<18:15:59, 56.06s/it][A[A

/n Epoch: 826 |Training loss: 1.7402




training:  26%|██▋       | 420/1592 [6:34:27<18:09:34, 55.78s/it][A[A

/n Epoch: 827 |Training loss: 1.7426




training:  26%|██▋       | 421/1592 [6:35:22<18:04:05, 55.55s/it][A[A

/n Epoch: 828 |Training loss: 1.7414




training:  27%|██▋       | 422/1592 [6:36:17<18:00:25, 55.41s/it][A[A

/n Epoch: 829 |Training loss: 1.7467
validation loss: 1.9096




training:  27%|██▋       | 423/1592 [6:37:16<18:19:32, 56.43s/it][A[A

/n Epoch: 830 |Training loss: 1.7379




training:  27%|██▋       | 424/1592 [6:38:11<18:10:51, 56.04s/it][A[A

/n Epoch: 831 |Training loss: 1.7389




training:  27%|██▋       | 425/1592 [6:39:06<18:04:44, 55.77s/it][A[A

/n Epoch: 832 |Training loss: 1.7344




training:  27%|██▋       | 426/1592 [6:40:01<17:59:49, 55.57s/it][A[A

/n Epoch: 833 |Training loss: 1.7421




training:  27%|██▋       | 427/1592 [6:40:57<17:55:45, 55.40s/it][A[A

/n Epoch: 834 |Training loss: 1.7310




training:  27%|██▋       | 428/1592 [6:41:55<18:14:10, 56.40s/it][A[A

validation loss: 1.8935
/n Epoch: 835 |Training loss: 1.7586




training:  27%|██▋       | 429/1592 [6:42:51<18:07:15, 56.09s/it][A[A

/n Epoch: 836 |Training loss: 1.7360




training:  27%|██▋       | 430/1592 [6:43:46<18:00:20, 55.78s/it][A[A

/n Epoch: 837 |Training loss: 1.7567




training:  27%|██▋       | 431/1592 [6:44:41<17:55:44, 55.59s/it][A[A

/n Epoch: 838 |Training loss: 1.7256




training:  27%|██▋       | 432/1592 [6:45:36<17:51:46, 55.44s/it][A[A

/n Epoch: 839 |Training loss: 1.7694
validation loss: 1.9910




training:  27%|██▋       | 433/1592 [6:46:35<18:09:22, 56.40s/it][A[A

/n Epoch: 840 |Training loss: 1.7255




training:  27%|██▋       | 434/1592 [6:47:30<18:01:13, 56.02s/it][A[A

/n Epoch: 841 |Training loss: 1.8328




training:  27%|██▋       | 435/1592 [6:48:25<17:56:00, 55.80s/it][A[A

/n Epoch: 842 |Training loss: 1.7779




training:  27%|██▋       | 436/1592 [6:49:20<17:50:06, 55.54s/it][A[A

/n Epoch: 843 |Training loss: 1.8568




training:  27%|██▋       | 437/1592 [6:50:15<17:47:35, 55.46s/it][A[A

/n Epoch: 844 |Training loss: 1.8893
validation loss: 1.9992




training:  28%|██▊       | 438/1592 [6:51:14<18:06:38, 56.50s/it][A[A

/n Epoch: 845 |Training loss: 1.7912




training:  28%|██▊       | 439/1592 [6:52:09<17:59:07, 56.16s/it][A[A

/n Epoch: 846 |Training loss: 1.8427




training:  28%|██▊       | 440/1592 [6:53:05<17:54:40, 55.97s/it][A[A

/n Epoch: 847 |Training loss: 1.8291




training:  28%|██▊       | 441/1592 [6:54:00<17:49:20, 55.74s/it][A[A

/n Epoch: 848 |Training loss: 1.7891




training:  28%|██▊       | 442/1592 [6:54:56<17:46:16, 55.63s/it][A[A

/n Epoch: 849 |Training loss: 1.8233
validation loss: 1.9455




training:  28%|██▊       | 443/1592 [6:55:55<18:05:01, 56.66s/it][A[A

/n Epoch: 850 |Training loss: 1.7879




training:  28%|██▊       | 444/1592 [6:56:50<17:56:32, 56.27s/it][A[A

/n Epoch: 851 |Training loss: 1.7806




training:  28%|██▊       | 445/1592 [6:57:45<17:50:01, 55.97s/it][A[A

/n Epoch: 852 |Training loss: 1.8073




training:  28%|██▊       | 446/1592 [6:58:41<17:45:29, 55.79s/it][A[A

/n Epoch: 853 |Training loss: 1.7650




training:  28%|██▊       | 447/1592 [6:59:36<17:41:04, 55.60s/it][A[A

/n Epoch: 854 |Training loss: 1.7792
validation loss: 1.9193




training:  28%|██▊       | 448/1592 [7:00:35<17:58:10, 56.55s/it][A[A

/n Epoch: 855 |Training loss: 1.7955




training:  28%|██▊       | 449/1592 [7:01:30<17:49:29, 56.14s/it][A[A

/n Epoch: 856 |Training loss: 1.7555




training:  28%|██▊       | 450/1592 [7:02:25<17:42:59, 55.85s/it][A[A

/n Epoch: 857 |Training loss: 1.7808




training:  28%|██▊       | 451/1592 [7:03:20<17:38:48, 55.68s/it][A[A

/n Epoch: 858 |Training loss: 1.7743




training:  28%|██▊       | 452/1592 [7:04:15<17:35:12, 55.54s/it][A[A

/n Epoch: 859 |Training loss: 1.7506
validation loss: 1.9153




training:  28%|██▊       | 453/1592 [7:05:14<17:52:31, 56.50s/it][A[A

/n Epoch: 860 |Training loss: 1.7769




training:  29%|██▊       | 454/1592 [7:06:09<17:43:33, 56.08s/it][A[A

/n Epoch: 861 |Training loss: 1.7546




training:  29%|██▊       | 455/1592 [7:07:04<17:37:10, 55.79s/it][A[A

/n Epoch: 862 |Training loss: 1.7499




training:  29%|██▊       | 456/1592 [7:07:59<17:32:28, 55.59s/it][A[A

/n Epoch: 863 |Training loss: 1.7644




training:  29%|██▊       | 457/1592 [7:08:54<17:28:12, 55.41s/it][A[A

/n Epoch: 864 |Training loss: 1.7432
validation loss: 1.9042




training:  29%|██▉       | 458/1592 [7:09:53<17:46:50, 56.45s/it][A[A

/n Epoch: 865 |Training loss: 1.7496




training:  29%|██▉       | 459/1592 [7:10:48<17:38:16, 56.04s/it][A[A

/n Epoch: 866 |Training loss: 1.7493




training:  29%|██▉       | 460/1592 [7:11:44<17:31:48, 55.75s/it][A[A

/n Epoch: 867 |Training loss: 1.7330




training:  29%|██▉       | 461/1592 [7:12:38<17:26:36, 55.52s/it][A[A

/n Epoch: 868 |Training loss: 1.7363




training:  29%|██▉       | 462/1592 [7:13:34<17:24:34, 55.46s/it][A[A

/n Epoch: 869 |Training loss: 1.7388
validation loss: 1.8935




training:  29%|██▉       | 463/1592 [7:14:33<17:42:25, 56.46s/it][A[A

/n Epoch: 870 |Training loss: 1.7178




training:  29%|██▉       | 464/1592 [7:15:28<17:34:04, 56.07s/it][A[A

/n Epoch: 871 |Training loss: 1.7286




training:  29%|██▉       | 465/1592 [7:16:23<17:28:08, 55.80s/it][A[A

/n Epoch: 872 |Training loss: 1.7206




training:  29%|██▉       | 466/1592 [7:17:18<17:21:49, 55.51s/it][A[A

/n Epoch: 873 |Training loss: 1.7189




training:  29%|██▉       | 467/1592 [7:18:13<17:19:05, 55.42s/it][A[A

/n Epoch: 874 |Training loss: 1.7274
validation loss: 1.8748




training:  29%|██▉       | 468/1592 [7:19:12<17:37:06, 56.43s/it][A[A

/n Epoch: 875 |Training loss: 1.7152




training:  29%|██▉       | 469/1592 [7:20:07<17:29:09, 56.06s/it][A[A

/n Epoch: 876 |Training loss: 1.7150




training:  30%|██▉       | 470/1592 [7:21:02<17:22:20, 55.74s/it][A[A

/n Epoch: 877 |Training loss: 1.7052




training:  30%|██▉       | 471/1592 [7:21:57<17:16:59, 55.50s/it][A[A

/n Epoch: 878 |Training loss: 1.7141




training:  30%|██▉       | 472/1592 [7:22:52<17:14:13, 55.40s/it][A[A

/n Epoch: 879 |Training loss: 1.7056
validation loss: 1.8522




training:  30%|██▉       | 473/1592 [7:23:51<17:32:09, 56.42s/it][A[A

/n Epoch: 880 |Training loss: 1.7125




training:  30%|██▉       | 474/1592 [7:24:46<17:24:14, 56.04s/it][A[A

/n Epoch: 881 |Training loss: 1.6925




training:  30%|██▉       | 475/1592 [7:25:41<17:17:48, 55.75s/it][A[A

/n Epoch: 882 |Training loss: 1.7147




training:  30%|██▉       | 476/1592 [7:26:36<17:13:15, 55.55s/it][A[A

/n Epoch: 883 |Training loss: 1.6929




training:  30%|██▉       | 477/1592 [7:27:31<17:09:59, 55.43s/it][A[A

/n Epoch: 884 |Training loss: 1.7358
validation loss: 1.8917




training:  30%|███       | 478/1592 [7:28:30<17:26:39, 56.37s/it][A[A

/n Epoch: 885 |Training loss: 1.7054




training:  30%|███       | 479/1592 [7:29:25<17:19:11, 56.02s/it][A[A

/n Epoch: 886 |Training loss: 1.7423




training:  30%|███       | 480/1592 [7:30:20<17:12:38, 55.72s/it][A[A

/n Epoch: 887 |Training loss: 1.7196




training:  30%|███       | 481/1592 [7:31:15<17:08:11, 55.53s/it][A[A

/n Epoch: 888 |Training loss: 1.7304




training:  30%|███       | 482/1592 [7:32:10<17:04:20, 55.37s/it][A[A

/n Epoch: 889 |Training loss: 1.7215
validation loss: 1.8861




training:  30%|███       | 483/1592 [7:33:09<17:20:51, 56.31s/it][A[A

/n Epoch: 890 |Training loss: 1.7272




training:  30%|███       | 484/1592 [7:34:04<17:13:01, 55.94s/it][A[A

/n Epoch: 891 |Training loss: 1.7293




training:  30%|███       | 485/1592 [7:34:59<17:07:57, 55.72s/it][A[A

/n Epoch: 892 |Training loss: 1.7105




training:  31%|███       | 486/1592 [7:35:54<17:03:12, 55.51s/it][A[A

/n Epoch: 893 |Training loss: 1.7353




training:  31%|███       | 487/1592 [7:36:49<16:58:47, 55.32s/it][A[A

/n Epoch: 894 |Training loss: 1.7075
validation loss: 1.8839




training:  31%|███       | 488/1592 [7:37:48<17:17:26, 56.38s/it][A[A

/n Epoch: 895 |Training loss: 1.7059




training:  31%|███       | 489/1592 [7:38:43<17:09:10, 55.98s/it][A[A

/n Epoch: 896 |Training loss: 1.7214




training:  31%|███       | 490/1592 [7:39:38<17:03:37, 55.73s/it][A[A

/n Epoch: 897 |Training loss: 1.6921




training:  31%|███       | 491/1592 [7:40:33<16:58:06, 55.48s/it][A[A

/n Epoch: 898 |Training loss: 1.7120




training:  31%|███       | 492/1592 [7:41:28<16:55:49, 55.41s/it][A[A

/n Epoch: 899 |Training loss: 1.7163
validation loss: 1.8678




training:  31%|███       | 493/1592 [7:42:27<17:13:25, 56.42s/it][A[A

/n Epoch: 900 |Training loss: 1.6971




training:  31%|███       | 494/1592 [7:43:22<17:05:26, 56.04s/it][A[A

/n Epoch: 901 |Training loss: 1.7058




training:  31%|███       | 495/1592 [7:44:17<16:59:04, 55.74s/it][A[A

/n Epoch: 902 |Training loss: 1.6940




training:  31%|███       | 496/1592 [7:45:12<16:54:23, 55.53s/it][A[A

/n Epoch: 903 |Training loss: 1.7000




training:  31%|███       | 497/1592 [7:46:07<16:52:02, 55.45s/it][A[A

/n Epoch: 904 |Training loss: 1.6930
validation loss: 1.8470




training:  31%|███▏      | 498/1592 [7:47:06<17:08:30, 56.41s/it][A[A

/n Epoch: 905 |Training loss: 1.6885




training:  31%|███▏      | 499/1592 [7:48:01<17:00:13, 56.01s/it][A[A

/n Epoch: 906 |Training loss: 1.6886




training:  31%|███▏      | 500/1592 [7:48:56<16:54:21, 55.73s/it][A[A

/n Epoch: 907 |Training loss: 1.6804




training:  31%|███▏      | 501/1592 [7:49:51<16:49:20, 55.51s/it][A[A

/n Epoch: 908 |Training loss: 1.6802




training:  32%|███▏      | 502/1592 [7:50:46<16:47:19, 55.45s/it][A[A

/n Epoch: 909 |Training loss: 1.6825
validation loss: 1.8333




training:  32%|███▏      | 503/1592 [7:51:45<17:04:20, 56.44s/it][A[A

/n Epoch: 910 |Training loss: 1.6804




training:  32%|███▏      | 504/1592 [7:52:40<16:56:28, 56.06s/it][A[A

/n Epoch: 911 |Training loss: 1.6717




training:  32%|███▏      | 505/1592 [7:53:35<16:50:27, 55.78s/it][A[A

/n Epoch: 912 |Training loss: 1.6795




training:  32%|███▏      | 506/1592 [7:54:31<16:45:50, 55.57s/it][A[A

/n Epoch: 913 |Training loss: 1.6690




training:  32%|███▏      | 507/1592 [7:55:26<16:43:31, 55.49s/it][A[A

/n Epoch: 914 |Training loss: 1.6719
validation loss: 1.8234




training:  32%|███▏      | 508/1592 [7:56:25<16:59:58, 56.46s/it][A[A

/n Epoch: 915 |Training loss: 1.6642




training:  32%|███▏      | 509/1592 [7:57:20<16:51:42, 56.05s/it][A[A

/n Epoch: 916 |Training loss: 1.6680




training:  32%|███▏      | 510/1592 [7:58:15<16:45:50, 55.78s/it][A[A

/n Epoch: 917 |Training loss: 1.6590




training:  32%|███▏      | 511/1592 [7:59:10<16:41:19, 55.58s/it][A[A

/n Epoch: 918 |Training loss: 1.6734




training:  32%|███▏      | 512/1592 [8:00:05<16:37:44, 55.43s/it][A[A

/n Epoch: 919 |Training loss: 1.6633
validation loss: 1.8196




training:  32%|███▏      | 513/1592 [8:01:04<16:55:12, 56.45s/it][A[A

/n Epoch: 920 |Training loss: 1.6688




training:  32%|███▏      | 514/1592 [8:01:59<16:47:04, 56.05s/it][A[A

/n Epoch: 921 |Training loss: 1.6599




training:  32%|███▏      | 515/1592 [8:02:54<16:40:57, 55.76s/it][A[A

/n Epoch: 922 |Training loss: 1.6774




training:  32%|███▏      | 516/1592 [8:03:49<16:36:02, 55.54s/it][A[A

/n Epoch: 923 |Training loss: 1.6584




training:  32%|███▏      | 517/1592 [8:04:44<16:32:53, 55.42s/it][A[A

/n Epoch: 924 |Training loss: 1.6807
validation loss: 1.8383




training:  33%|███▎      | 518/1592 [8:05:43<16:50:37, 56.46s/it][A[A

/n Epoch: 925 |Training loss: 1.6550




training:  33%|███▎      | 519/1592 [8:06:38<16:42:36, 56.06s/it][A[A

/n Epoch: 926 |Training loss: 1.6829




training:  33%|███▎      | 520/1592 [8:07:33<16:36:46, 55.79s/it][A[A

/n Epoch: 927 |Training loss: 1.6554




training:  33%|███▎      | 521/1592 [8:08:29<16:32:45, 55.62s/it][A[A

/n Epoch: 928 |Training loss: 1.7294




training:  33%|███▎      | 522/1592 [8:09:24<16:31:13, 55.58s/it][A[A

/n Epoch: 929 |Training loss: 1.6842
validation loss: 1.8494




training:  33%|███▎      | 523/1592 [8:10:24<16:52:47, 56.84s/it][A[A

/n Epoch: 930 |Training loss: 1.6899




training:  33%|███▎      | 524/1592 [8:11:19<16:43:02, 56.35s/it][A[A

/n Epoch: 931 |Training loss: 1.6936




training:  33%|███▎      | 525/1592 [8:12:14<16:35:47, 56.00s/it][A[A

/n Epoch: 932 |Training loss: 1.6814




training:  33%|███▎      | 526/1592 [8:13:09<16:28:42, 55.65s/it][A[A

/n Epoch: 933 |Training loss: 1.6903




training:  33%|███▎      | 527/1592 [8:14:04<16:25:51, 55.54s/it][A[A

/n Epoch: 934 |Training loss: 1.6658
validation loss: 1.8247




training:  33%|███▎      | 528/1592 [8:15:03<16:42:02, 56.51s/it][A[A

/n Epoch: 935 |Training loss: 1.6919




training:  33%|███▎      | 529/1592 [8:15:59<16:35:09, 56.17s/it][A[A

/n Epoch: 936 |Training loss: 1.6673




training:  33%|███▎      | 530/1592 [8:16:54<16:28:46, 55.86s/it][A[A

/n Epoch: 937 |Training loss: 1.6939




training:  33%|███▎      | 531/1592 [8:17:49<16:23:16, 55.60s/it][A[A

/n Epoch: 938 |Training loss: 1.6882




training:  33%|███▎      | 532/1592 [8:18:44<16:20:36, 55.51s/it][A[A

/n Epoch: 939 |Training loss: 1.6673
validation loss: 1.8193




training:  33%|███▎      | 533/1592 [8:19:43<16:37:27, 56.51s/it][A[A

/n Epoch: 940 |Training loss: 1.6892




training:  34%|███▎      | 534/1592 [8:20:38<16:30:02, 56.15s/it][A[A

/n Epoch: 941 |Training loss: 1.6622




training:  34%|███▎      | 535/1592 [8:21:33<16:24:36, 55.89s/it][A[A

/n Epoch: 942 |Training loss: 1.6841




training:  34%|███▎      | 536/1592 [8:22:29<16:19:51, 55.67s/it][A[A

/n Epoch: 943 |Training loss: 1.6785




training:  34%|███▎      | 537/1592 [8:23:24<16:16:13, 55.52s/it][A[A

/n Epoch: 944 |Training loss: 1.6795
validation loss: 1.8188




training:  34%|███▍      | 538/1592 [8:24:23<16:32:54, 56.52s/it][A[A

/n Epoch: 945 |Training loss: 1.6860




training:  34%|███▍      | 539/1592 [8:25:18<16:25:07, 56.13s/it][A[A

/n Epoch: 946 |Training loss: 1.6598




training:  34%|███▍      | 540/1592 [8:26:13<16:20:35, 55.93s/it][A[A

/n Epoch: 947 |Training loss: 1.6742




training:  34%|███▍      | 541/1592 [8:27:09<16:16:57, 55.77s/it][A[A

/n Epoch: 948 |Training loss: 1.6715




training:  34%|███▍      | 542/1592 [8:28:04<16:13:22, 55.62s/it][A[A

/n Epoch: 949 |Training loss: 1.6503




training:  34%|███▍      | 543/1592 [8:29:03<16:29:42, 56.61s/it][A[A

validation loss: 1.8094
/n Epoch: 950 |Training loss: 1.6766




training:  34%|███▍      | 544/1592 [8:29:58<16:21:20, 56.18s/it][A[A

/n Epoch: 951 |Training loss: 1.6499




training:  34%|███▍      | 545/1592 [8:30:53<16:15:03, 55.88s/it][A[A

/n Epoch: 952 |Training loss: 1.6623




training:  34%|███▍      | 546/1592 [8:31:49<16:11:12, 55.71s/it][A[A

/n Epoch: 953 |Training loss: 1.6663




training:  34%|███▍      | 547/1592 [8:32:44<16:07:51, 55.57s/it][A[A

/n Epoch: 954 |Training loss: 1.6523
validation loss: 1.8122




training:  34%|███▍      | 548/1592 [8:33:43<16:23:40, 56.53s/it][A[A

/n Epoch: 955 |Training loss: 1.6549




training:  34%|███▍      | 549/1592 [8:34:38<16:15:39, 56.13s/it][A[A

/n Epoch: 956 |Training loss: 1.6534




training:  35%|███▍      | 550/1592 [8:35:33<16:09:42, 55.84s/it][A[A

/n Epoch: 957 |Training loss: 1.6454




training:  35%|███▍      | 551/1592 [8:36:28<16:05:59, 55.68s/it][A[A

/n Epoch: 958 |Training loss: 1.6498




training:  35%|███▍      | 552/1592 [8:37:23<16:02:21, 55.52s/it][A[A

/n Epoch: 959 |Training loss: 1.6365
validation loss: 1.7969




training:  35%|███▍      | 553/1592 [8:38:22<16:18:23, 56.50s/it][A[A

/n Epoch: 960 |Training loss: 1.6606




training:  35%|███▍      | 554/1592 [8:39:17<16:10:36, 56.10s/it][A[A

/n Epoch: 961 |Training loss: 1.6416




training:  35%|███▍      | 555/1592 [8:40:12<16:04:40, 55.82s/it][A[A

/n Epoch: 962 |Training loss: 1.6515




training:  35%|███▍      | 556/1592 [8:41:07<15:59:30, 55.57s/it][A[A

/n Epoch: 963 |Training loss: 1.6474




training:  35%|███▍      | 557/1592 [8:42:03<15:58:02, 55.54s/it][A[A

/n Epoch: 964 |Training loss: 1.6477
validation loss: 1.7963




training:  35%|███▌      | 558/1592 [8:43:02<16:13:58, 56.52s/it][A[A

/n Epoch: 965 |Training loss: 1.6392




training:  35%|███▌      | 559/1592 [8:43:57<16:06:09, 56.12s/it][A[A

/n Epoch: 966 |Training loss: 1.6401




training:  35%|███▌      | 560/1592 [8:44:52<16:00:36, 55.85s/it][A[A

/n Epoch: 967 |Training loss: 1.6328




training:  35%|███▌      | 561/1592 [8:45:47<15:55:16, 55.59s/it][A[A

/n Epoch: 968 |Training loss: 1.6304




training:  35%|███▌      | 562/1592 [8:46:43<15:53:49, 55.56s/it][A[A

/n Epoch: 969 |Training loss: 1.6357
validation loss: 1.7861




training:  35%|███▌      | 563/1592 [8:47:42<16:10:15, 56.58s/it][A[A

/n Epoch: 970 |Training loss: 1.6330




training:  35%|███▌      | 564/1592 [8:48:37<16:02:26, 56.17s/it][A[A

/n Epoch: 971 |Training loss: 1.6292




training:  35%|███▌      | 565/1592 [8:49:32<15:56:18, 55.87s/it][A[A

/n Epoch: 972 |Training loss: 1.6352




training:  36%|███▌      | 566/1592 [8:50:27<15:51:58, 55.67s/it][A[A

/n Epoch: 973 |Training loss: 1.6303




training:  36%|███▌      | 567/1592 [8:51:23<15:49:26, 55.58s/it][A[A

/n Epoch: 974 |Training loss: 1.6318
validation loss: 1.7785




training:  36%|███▌      | 568/1592 [8:52:21<16:05:42, 56.58s/it][A[A

/n Epoch: 975 |Training loss: 1.6302




training:  36%|███▌      | 569/1592 [8:53:17<15:57:42, 56.17s/it][A[A

/n Epoch: 976 |Training loss: 1.6168




training:  36%|███▌      | 570/1592 [8:54:12<15:51:56, 55.89s/it][A[A

/n Epoch: 977 |Training loss: 1.6204




training:  36%|███▌      | 571/1592 [8:55:07<15:48:09, 55.72s/it][A[A

/n Epoch: 978 |Training loss: 1.6171




training:  36%|███▌      | 572/1592 [8:56:02<15:44:20, 55.55s/it][A[A

/n Epoch: 979 |Training loss: 1.6191
validation loss: 1.7782




training:  36%|███▌      | 573/1592 [8:57:01<16:00:37, 56.56s/it][A[A

/n Epoch: 980 |Training loss: 1.6204




training:  36%|███▌      | 574/1592 [8:57:57<15:53:31, 56.20s/it][A[A

/n Epoch: 981 |Training loss: 1.6221




training:  36%|███▌      | 575/1592 [8:58:52<15:47:10, 55.88s/it][A[A

/n Epoch: 982 |Training loss: 1.6223




training:  36%|███▌      | 576/1592 [8:59:47<15:42:43, 55.67s/it][A[A

/n Epoch: 983 |Training loss: 1.6242




training:  36%|███▌      | 577/1592 [9:00:42<15:38:08, 55.46s/it][A[A

/n Epoch: 984 |Training loss: 1.6216
validation loss: 1.7671




training:  36%|███▋      | 578/1592 [9:01:41<15:54:50, 56.50s/it][A[A

/n Epoch: 985 |Training loss: 1.6250




training:  36%|███▋      | 579/1592 [9:02:36<15:48:18, 56.17s/it][A[A

/n Epoch: 986 |Training loss: 1.6119




training:  36%|███▋      | 580/1592 [9:03:31<15:42:09, 55.86s/it][A[A

/n Epoch: 987 |Training loss: 1.6295




training:  36%|███▋      | 581/1592 [9:04:26<15:36:10, 55.56s/it][A[A

/n Epoch: 988 |Training loss: 1.6107




training:  37%|███▋      | 582/1592 [9:05:22<15:34:02, 55.49s/it][A[A

/n Epoch: 989 |Training loss: 1.6314
validation loss: 1.8311




training:  37%|███▋      | 583/1592 [9:06:20<15:50:12, 56.50s/it][A[A

/n Epoch: 990 |Training loss: 1.6012




training:  37%|███▋      | 584/1592 [9:07:16<15:42:46, 56.12s/it][A[A

/n Epoch: 991 |Training loss: 1.6661




training:  37%|███▋      | 585/1592 [9:08:11<15:37:38, 55.87s/it][A[A

/n Epoch: 992 |Training loss: 1.6142




training:  37%|███▋      | 586/1592 [9:09:06<15:31:54, 55.58s/it][A[A

/n Epoch: 993 |Training loss: 1.7351




training:  37%|███▋      | 587/1592 [9:10:01<15:29:29, 55.49s/it][A[A

/n Epoch: 994 |Training loss: 1.6873
validation loss: 1.8781




training:  37%|███▋      | 588/1592 [9:11:00<15:45:24, 56.50s/it][A[A

/n Epoch: 995 |Training loss: 1.6965




training:  37%|███▋      | 589/1592 [9:11:55<15:39:21, 56.19s/it][A[A

/n Epoch: 996 |Training loss: 1.7078




training:  37%|███▋      | 590/1592 [9:12:51<15:34:16, 55.94s/it][A[A

/n Epoch: 997 |Training loss: 1.6683




training:  37%|███▋      | 591/1592 [9:13:46<15:30:13, 55.76s/it][A[A

/n Epoch: 998 |Training loss: 1.6789




training:  37%|███▋      | 592/1592 [9:14:42<15:28:36, 55.72s/it][A[A

/n Epoch: 999 |Training loss: 1.6693
validation loss: 1.8225




training:  37%|███▋      | 593/1592 [9:15:41<15:43:46, 56.68s/it][A[A

/n Epoch: 1000 |Training loss: 1.6632




training:  37%|███▋      | 594/1592 [9:16:36<15:36:38, 56.31s/it][A[A

/n Epoch: 1001 |Training loss: 1.6690




training:  37%|███▋      | 595/1592 [9:17:32<15:31:01, 56.03s/it][A[A

/n Epoch: 1002 |Training loss: 1.6626




training:  37%|███▋      | 596/1592 [9:18:27<15:27:48, 55.89s/it][A[A

/n Epoch: 1003 |Training loss: 1.6594




training:  38%|███▊      | 597/1592 [9:19:22<15:23:42, 55.70s/it][A[A

/n Epoch: 1004 |Training loss: 1.6619
validation loss: 1.8096




training:  38%|███▊      | 598/1592 [9:20:21<15:37:58, 56.62s/it][A[A

/n Epoch: 1005 |Training loss: 1.6486




training:  38%|███▊      | 599/1592 [9:21:16<15:30:17, 56.21s/it][A[A

/n Epoch: 1006 |Training loss: 1.6493




training:  38%|███▊      | 600/1592 [9:22:12<15:24:37, 55.92s/it][A[A

/n Epoch: 1007 |Training loss: 1.6542




training:  38%|███▊      | 601/1592 [9:23:07<15:21:38, 55.80s/it][A[A

/n Epoch: 1008 |Training loss: 1.6362




training:  38%|███▊      | 602/1592 [9:24:02<15:18:19, 55.66s/it][A[A

/n Epoch: 1009 |Training loss: 1.6438
validation loss: 1.7985




training:  38%|███▊      | 603/1592 [9:25:01<15:33:25, 56.63s/it][A[A

/n Epoch: 1010 |Training loss: 1.6288




training:  38%|███▊      | 604/1592 [9:25:57<15:25:51, 56.23s/it][A[A

/n Epoch: 1011 |Training loss: 1.6411




training:  38%|███▊      | 605/1592 [9:26:52<15:20:48, 55.98s/it][A[A

/n Epoch: 1012 |Training loss: 1.6239




training:  38%|███▊      | 606/1592 [9:27:47<15:16:58, 55.80s/it][A[A

/n Epoch: 1013 |Training loss: 1.6396




training:  38%|███▊      | 607/1592 [9:28:43<15:14:25, 55.70s/it][A[A

/n Epoch: 1014 |Training loss: 1.6348
validation loss: 1.7952




training:  38%|███▊      | 608/1592 [9:29:42<15:28:59, 56.65s/it][A[A

/n Epoch: 1015 |Training loss: 1.6244




training:  38%|███▊      | 609/1592 [9:30:37<15:21:57, 56.27s/it][A[A

/n Epoch: 1016 |Training loss: 1.6249




training:  38%|███▊      | 610/1592 [9:31:32<15:16:09, 55.98s/it][A[A

/n Epoch: 1017 |Training loss: 1.6271




training:  38%|███▊      | 611/1592 [9:32:28<15:12:13, 55.79s/it][A[A

/n Epoch: 1018 |Training loss: 1.6216




training:  38%|███▊      | 612/1592 [9:33:23<15:09:22, 55.68s/it][A[A

/n Epoch: 1019 |Training loss: 1.6127
validation loss: 1.7634




training:  39%|███▊      | 613/1592 [9:34:22<15:24:28, 56.66s/it][A[A

/n Epoch: 1020 |Training loss: 1.6090




training:  39%|███▊      | 614/1592 [9:35:17<15:16:44, 56.24s/it][A[A

/n Epoch: 1021 |Training loss: 1.6040




training:  39%|███▊      | 615/1592 [9:36:13<15:10:47, 55.93s/it][A[A

/n Epoch: 1022 |Training loss: 1.6107




training:  39%|███▊      | 616/1592 [9:37:08<15:05:50, 55.69s/it][A[A

/n Epoch: 1023 |Training loss: 1.6034




training:  39%|███▉      | 617/1592 [9:38:03<15:03:28, 55.60s/it][A[A

/n Epoch: 1024 |Training loss: 1.6117
validation loss: 1.7625




training:  39%|███▉      | 618/1592 [9:39:02<15:19:15, 56.63s/it][A[A

/n Epoch: 1025 |Training loss: 1.5933


**Music generation**

In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load(output_dir+weights)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# Generate network input again
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
network_input = np.reshape(network_input, (n_patterns, sequence_length))


The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2


In [None]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = torch.from_numpy(network_input[start]).cuda()

prediction_output = model.generate(pattern, 500)


In [None]:
result_sample=[]

for i in range(500):
  print(i)
  result = int_to_note[prediction_output[i].item()]
  print('\r', 'Predicted ', i, " ",result, end='')
  result_sample.append(result)

prediction_output=result_sample

0
 Predicted  0   61
 Predicted  1   4.62
 Predicted  2   6.113
 Predicted  3   64
 Predicted  4   6.115
 Predicted  5   A46
 Predicted  6   4.67
 Predicted  7   F48
 Predicted  8   69
 Predicted  9   610
 Predicted  10   5.7.9.011
 Predicted  11   2.3.7.1012
 Predicted  12   D513
 Predicted  13   C514
 Predicted  14   5.7.9.015
 Predicted  15   C516
 Predicted  16   4.617
 Predicted  17   B-118
 Predicted  18   10.2.519
 Predicted  19   C520
 Predicted  20   6.1121
 Predicted  21   622
 Predicted  22   F223
 Predicted  23   6.1124
 Predicted  24   4.625
 Predicted  25   B-226
 Predicted  26   B-127
 Predicted  27   A428
 Predicted  28   629
 Predicted  29   C530
 Predicted  30   E-331
 Predicted  31   F232
 Predicted  32   4.633
 Predicted  33   534
 Predicted  34   5.1035
 Predicted  35   4.636
 Predicted  36   637
 Predicted  37   4.638
 Predicted  38   4.639
 Predicted  39   F240
 Predicted  40   4.641
 Predicted  41   B-242
 Predicted  42

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [None]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'