<a href="https://colab.research.google.com/github/GiovanniSorice/Deep_Music_Generator/blob/main/notebooks/Music_Generation_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Music Generator 



In this notebook, we use an Transformer to generate some music.


**This notebook was inspired (and part of the code comes from it) by [Music_Generation_LSTM](https://colab.research.google.com/drive/19TQqekOlnOSW36VCL8CPVEQKBBukmaEQ#scrollTo=DDOBVWULXfpz)**




**Load dependencies**

In [1]:
pip install compressive_transformer_pytorch

Collecting compressive_transformer_pytorch
  Downloading https://files.pythonhosted.org/packages/30/39/b8caf2671abcb8615977c08766aa9f450addd6949f57c7dda87224e844b5/compressive_transformer_pytorch-0.3.20-py3-none-any.whl
Collecting mogrifier
  Downloading https://files.pythonhosted.org/packages/77/01/62a55d0f8048e788fce435f2ade6478f443e4e53ed9b89b55ba0fc42c198/mogrifier-0.0.3-py3-none-any.whl
Installing collected packages: mogrifier, compressive-transformer-pytorch
Successfully installed compressive-transformer-pytorch-0.3.20 mogrifier-0.0.3


In [2]:
import torch
import tqdm
import numpy as np
import pandas as pd
import tensorflow as tf
import os
from compressive_transformer_pytorch import CompressiveTransformer
from compressive_transformer_pytorch import AutoregressiveWrapper
from torchsummary import summary
from torch.utils.data import DataLoader, Dataset
from tensorflow.keras import utils
from sklearn.metrics import roc_auc_score 
import matplotlib.pyplot as plt
import glob
import pickle
from music21 import converter, instrument, stream, note, chord
import math
import shutil

In [3]:
# Set to false if you are not running
# this notebook in Google Colaboratory
run_on_colab = True

**Set hyperparameters**

In [4]:
# output directory name:
output_dir = '/content/drive/My Drive/ISPR_project/Transformer/'
current_path ='/content/drive/My Drive/ISPR_project/'
# training:
epochs = 2000
batch_size = 64
learning_rate=1e-2
# vector-space embedding: 
n_dim = 64 
sequence_length = 64


VALIDATE_EVERY  = 5

GENERATE_EVERY  = 500



**Save model function**

In [5]:
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
    torch.save(state, output_dir+filename)
    if is_best:
        shutil.copyfile(output_dir+filename, output_dir+'model_best.pth.tar')

**Google drive configuration (only Colab)**

In [6]:
if(run_on_colab):
  from google.colab import drive
  # This will prompt for authorization.
  drive.mount('/content/drive')

Mounted at /content/drive


**Load data** \\
Original MIDI files
 I have obtained  **MIDI files** from [The Lakh MIDI Dataset v0.1](https://colinraffel.com/projects/lmd/). 

## Processing data

Let's process the files, and load them into **music21**

In [7]:
file = current_path+"midi_songs/small_dataset/Metal/Metallica/Am I Evil?.mid"
midi = converter.parse(file)
notes_to_parse = midi.flat.notes
for element in notes_to_parse[:10]:
  print(element, element.offset)

<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.chord.Chord E2 E3 B3 E4> 0.0
<music21.note.Note E> 0.0
<music21.chord.Chord C2 C#3> 0.0
<music21.note.Note G#> 2.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.chord.Chord D3 A3 D4> 3.0
<music21.note.Note D> 3.0
<music21.chord.Chord C#3 C2> 3.0
<music21.chord.Chord B3 E3 E4> 3.5


I will process all MIDI files obtaining data from each note of chord.

- If I process a **note**, I will store in the list a string representing the pitch (the note name) and the octave.

- If I process a **chord** (Remember that chords are set of notes that are played at the same time) I will store a different type of string with numbers separated by dots. Each number represents the pitch of a chord note. 

As you can see, **I are not considering yet time offsets of each element**. In this first version, we won't consider them, so all the notes and chords will have the same duration. Maybe, in the future, I will consider them.

I are creating a big list with all the elements of all the compositions.

In [None]:
notes_for_instruments = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/small_dataset/*/*/*.mid")):
      midi = converter.parse(file)
      print('Parsing file ', i, " ", file)
      notes_to_parse = None
      try:  # file has instrument parts
          s2 = instrument.partitionByInstrument(midi)
          notes_to_parse = s2.recurse()
      except:  # file has notes in a flat structure
          notes_to_parse = midi.flat.notes
      notes_instrument = []
      for element in notes_to_parse:
          if isinstance(element, note.Note):
              notes_instrument.append(str(element.pitch))
          elif isinstance(element, chord.Chord):
              notes_instrument.append('.'.join(str(n) for n in element.normalOrder))
      notes_for_instruments.append(notes_instrument)
with open(current_path + 'SMALL_notes_for_instruments', 'wb') as filepath:
    pickle.dump(notes_for_instruments, filepath)


Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Nessun rimpianto.1.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Grazie mille.1.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).1.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Andra tutto bene ('58).mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.1.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/883/Hanno ucciso l'uomo ragno.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/I'll Be Over You.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/small_dataset/Pop_rock/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_proje

In [None]:
notes_for_instruments_validation = []
for i,file in enumerate(glob.glob(current_path+"midi_songs/test/*.mid")):
      midi = converter.parse(file)
      print('Parsing file ', i, " ", file)
      notes_to_parse = None
      try:  # file has instrument parts
          s2 = instrument.partitionByInstrument(midi)
          notes_to_parse = s2.recurse()
      except:  # file has notes in a flat structure
          notes_to_parse = midi.flat.notes
      notes_instrument = []
      for element in notes_to_parse:
          if isinstance(element, note.Note):
              notes_instrument.append(str(element.pitch))
          elif isinstance(element, chord.Chord):
              notes_instrument.append('.'.join(str(n) for n in element.normalOrder))
      notes_for_instruments_validation.append(notes_instrument)
with open(current_path + 'SMALL_VALIDATION_notes_for_instruments', 'wb') as filepath:
    pickle.dump(notes_for_instruments_validation, filepath)


Parsing file  0   /content/drive/My Drive/ISPR_project/midi_songs/test/I Disappear.mid
Parsing file  1   /content/drive/My Drive/ISPR_project/midi_songs/test/Hit the Lights.mid
Parsing file  2   /content/drive/My Drive/ISPR_project/midi_songs/test/Fight Fire With Fire.mid
Parsing file  3   /content/drive/My Drive/ISPR_project/midi_songs/test/Smile.mid
Parsing file  4   /content/drive/My Drive/ISPR_project/midi_songs/test/Another One Bites The Dust.2.mid
Parsing file  5   /content/drive/My Drive/ISPR_project/midi_songs/test/Bicycle Race.1.mid
Parsing file  6   /content/drive/My Drive/ISPR_project/midi_songs/test/Se tornerai.1.mid
Parsing file  7   /content/drive/My Drive/ISPR_project/midi_songs/test/Non ti passa piu.mid
Parsing file  8   /content/drive/My Drive/ISPR_project/midi_songs/test/I'll Be Over You.mid


In [8]:
with open(current_path + 'SMALL_notes_for_instruments', 'rb') as f:
    notes_for_instruments = pickle.load(f)

In [9]:
with open(current_path + 'SMALL_VALIDATION_notes_for_instruments', 'rb') as f:
    notes_for_instruments_validation = pickle.load(f)

I obtain the number of different notes in our dataset, because this will be the **number of possible output classes**  of our model.

In [10]:
# Count different possible outputs
n_vocab = (len(set(item for notes_for_instrument in notes_for_instruments for item in notes_for_instrument)))
n_vocab

476

In [11]:
# Count different possible outputs valifation
print(len(set(item for notes_for_instrument in notes_for_instruments_validation for item in notes_for_instrument)))

287


**Preprocess data** \\
Now, there is some **data processing** that I have to do:

- I will map each pitch or chord to an integer
- I will create pairs of input sequences and its corresponding output note

I can try different **sequence_length** to obtain different results. In this first version, I will use a sequence_length of 100.

The network will made its prediction of the next note (or chord), based on the previous *sequence_length* notes (or chords). 


In [12]:
# get all pitch names
pitchnames_training = set(item for notes_for_instrument in notes_for_instruments for item in notes_for_instrument)
pitchnames_validation = set(item for notes_for_instrument in notes_for_instruments_validation for item in notes_for_instrument)
pitchnames = sorted(pitchnames_training.union(pitchnames_validation))

In [13]:
# create a dictionary to map pitches to integers
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
network_input = []
for notes in notes_for_instruments:
    if len(notes) - sequence_length<=0:
        print("canzone troppo corta")
    # create input sequences and the corresponding outputs
    for i in range(0, len(notes) - sequence_length, 1):
      # Map pitches of sequence_in to integers
      network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
# reshape the input into a format compatible with Transormer layers
network_input = np.reshape(network_input, (n_patterns, sequence_length))

In [14]:
# create a dictionary to map pitches to integers
note_to_int_validation = dict((notes_validation, number) for number, notes_validation in enumerate(pitchnames))
network_input_validation = []
network_output_validation = []
for notes_validation in notes_for_instruments:
    if len(notes) - sequence_length<=0:
        print("canzone troppo corta")
    # create input sequences and the corresponding outputs
    for i in range(0, len(notes_validation) - sequence_length, 1):
      # Map pitches of sequence_in to integers
      network_input_validation.append([note_to_int_validation[char] for char in notes_validation[i:i + sequence_length]])
n_patterns = len(network_input_validation)
# reshape the input into a format compatible with Transormer layers
network_input_validation = np.reshape(network_input_validation, (n_patterns, sequence_length))

Let's see the new metwork_input size

In [15]:
network_input.shape

(132668, 64)

**Design neural network architecture** 

In [16]:
def create_network(sequence_length, n_vocab):
    """ create the structure of the neural network """
    model = CompressiveTransformer(
    num_tokens = n_vocab,
    dim = sequence_length,
    depth = 6,
    seq_len = sequence_length,
    mem_len = sequence_length,
    cmem_len = 256,
    cmem_ratio = 4,
    memory_layers = [5,6]
    )

    model = AutoregressiveWrapper(model)
    model.cuda()
    return model

In [17]:
model = create_network(sequence_length,n_vocab)

print(model)


AutoregressiveWrapper(
  (net): CompressiveTransformer(
    (token_emb): Embedding(476, 64)
    (to_model_dim): Identity()
    (to_logits): Sequential(
      (0): Identity()
      (1): Linear(in_features=64, out_features=476, bias=True)
    )
    (attn_layers): ModuleList(
      (0): GRUGating(
        (fn): PreNorm(
          (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
          (fn): SelfAttention(
            (compress_mem_fn): ConvCompress(
              (conv): Conv1d(64, 64, kernel_size=(4,), stride=(4,))
            )
            (to_q): Linear(in_features=64, out_features=64, bias=False)
            (to_kv): Linear(in_features=64, out_features=128, bias=False)
            (to_out): Linear(in_features=64, out_features=64, bias=True)
            (attn_dropout): Dropout(p=0.0, inplace=False)
            (dropout): Dropout(p=0.0, inplace=False)
            (reconstruction_attn_dropout): Dropout(p=0.0, inplace=False)
          )
        )
        (gru): GRUCell(64, 

In [18]:
def cycle(loader):
    while True:
        for data in loader:
          yield data


data_train = torch.from_numpy(network_input).cuda()
train_loader = torch.utils.data.DataLoader(data_train, batch_size=32) 
cycle_train_loader  = cycle(DataLoader(data_train, batch_size = data_train.shape[0]))
num_batches=math.ceil(data_train.shape[0]/batch_size) # Total number of batches

In [19]:
#Validation
data_validation = torch.from_numpy(network_input_validation).cuda()
validation_loader = torch.utils.data.DataLoader(data_validation, batch_size=32) 
cycle_validation_loader  = cycle(DataLoader(data_validation, batch_size = data_validation.shape[0]))
num_batches_val=math.ceil(data_validation.shape[0]/batch_size) # Total number of batches

In [20]:
# optimizer

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In case we want to use previously trained weights, to continue the training in the point we left it, we should load them into the model.

This is very useful in Google Colaboratory, that usually kills the virtual machine that is executing the Jupyter notework after a certime amount of time. If this happens to you, you should have to look for the last weights file in your configured Drive account and use it to train the network.


In [21]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load("/content/drive/MyDrive/ISPR_project/Transformer/model_best.pth.tar")
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# training

for i in tqdm.tqdm(range(416,epochs), mininterval=20., desc='training'):
    model.train()
    tot_loss = 0.0
    is_best=0
    best_loss_value=n_vocab
    avg_loss_val=0
    for mlm_loss, aux_loss, is_last in model(next(cycle_train_loader), max_batch_size = batch_size, return_loss = True):
        loss = mlm_loss + aux_loss

        loss.backward()

        tot_loss+=loss;

        if is_last:
            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
            optimizer.step()
            optimizer.zero_grad()
    
    if i % VALIDATE_EVERY == 0 or i==epochs-1:
      model.eval()
      with torch.no_grad():
          for loss_val, aux_loss_val, is_last_val in model(next(cycle_validation_loader), max_batch_size = batch_size, return_loss = True):
            avg_loss_val+=loss_val/num_batches_val;

            if is_last_val:
              print(f'\n validation loss: {avg_loss_val.item():.4f}')


    avg_loss=tot_loss/num_batches

    if i%5==0 or i==epochs-1:
      if best_loss_value>avg_loss:
        best_loss_value=avg_loss;
        is_best=1

      save_checkpoint({
      'epoch': i,
      'model_state_dict': model.state_dict(),
      'optimizer_state_dict' : optimizer.state_dict(),
      'loss':avg_loss.item(),
     }, is_best, 'Tran_64_Checkpoint'+str(i)+'_'+"{:.4f}".format(avg_loss.item())+'.pth.tar')
      is_best=0
    print(f'\n Epoch: {i} |Training loss: {avg_loss.item():.4f}')
print('\nTraining complete.')






training:   0%|          | 1/1584 [01:12<32:04:56, 72.96s/it]


 Epoch: 416 |Training loss: 2.0246


training:   0%|          | 2/1584 [02:27<32:14:41, 73.38s/it]


 Epoch: 417 |Training loss: 2.0686


training:   0%|          | 3/1584 [03:41<32:22:09, 73.71s/it]


 Epoch: 418 |Training loss: 2.0392


training:   0%|          | 4/1584 [04:56<32:27:07, 73.94s/it]


 Epoch: 419 |Training loss: 2.0738

 validation loss: 2.0536


training:   0%|          | 5/1584 [06:36<35:54:22, 81.86s/it]


 Epoch: 420 |Training loss: 2.0560


training:   0%|          | 6/1584 [07:51<34:55:44, 79.69s/it]


 Epoch: 421 |Training loss: 2.0536


training:   0%|          | 7/1584 [09:05<34:14:41, 78.17s/it]


 Epoch: 422 |Training loss: 2.0498


training:   1%|          | 8/1584 [10:20<33:45:56, 77.13s/it]


 Epoch: 423 |Training loss: 2.0479


training:   1%|          | 9/1584 [11:35<33:26:54, 76.45s/it]


 Epoch: 424 |Training loss: 2.0496

 validation loss: 2.0389


training:   1%|          | 10/1584 [13:15<36:34:18, 83.65s/it]


 Epoch: 425 |Training loss: 2.0339


training:   1%|          | 11/1584 [14:30<35:22:23, 80.96s/it]


 Epoch: 426 |Training loss: 2.0389


training:   1%|          | 12/1584 [15:45<34:31:33, 79.07s/it]


 Epoch: 427 |Training loss: 2.0224


training:   1%|          | 13/1584 [16:59<33:53:11, 77.65s/it]


 Epoch: 428 |Training loss: 2.0277


training:   1%|          | 14/1584 [18:14<33:27:05, 76.70s/it]


 Epoch: 429 |Training loss: 2.0266

 validation loss: 2.0342


training:   1%|          | 15/1584 [19:54<36:32:22, 83.84s/it]


 Epoch: 430 |Training loss: 2.0132


training:   1%|          | 16/1584 [21:09<35:20:32, 81.14s/it]


 Epoch: 431 |Training loss: 2.0342


training:   1%|          | 17/1584 [22:24<34:29:31, 79.24s/it]


 Epoch: 432 |Training loss: 2.0151


training:   1%|          | 18/1584 [23:38<33:47:15, 77.67s/it]


 Epoch: 433 |Training loss: 2.0187


training:   1%|          | 19/1584 [24:51<33:15:16, 76.50s/it]


 Epoch: 434 |Training loss: 2.0148

 validation loss: 2.0088


training:   1%|▏         | 20/1584 [26:31<36:14:30, 83.42s/it]


 Epoch: 435 |Training loss: 2.0069


training:   1%|▏         | 21/1584 [27:45<34:55:57, 80.46s/it]


 Epoch: 436 |Training loss: 2.0088


training:   1%|▏         | 22/1584 [28:58<33:56:57, 78.24s/it]


 Epoch: 437 |Training loss: 1.9970


training:   1%|▏         | 23/1584 [30:11<33:17:30, 76.78s/it]


 Epoch: 438 |Training loss: 2.0082


training:   2%|▏         | 24/1584 [31:25<32:50:55, 75.80s/it]


 Epoch: 439 |Training loss: 1.9896

 validation loss: 1.9951


training:   2%|▏         | 25/1584 [33:03<35:49:21, 82.72s/it]


 Epoch: 440 |Training loss: 2.0022


training:   2%|▏         | 26/1584 [34:17<34:35:28, 79.93s/it]


 Epoch: 441 |Training loss: 1.9951


training:   2%|▏         | 27/1584 [35:30<33:43:17, 77.97s/it]


 Epoch: 442 |Training loss: 1.9927


training:   2%|▏         | 28/1584 [36:43<33:05:12, 76.55s/it]


 Epoch: 443 |Training loss: 1.9871


training:   2%|▏         | 29/1584 [37:57<32:39:47, 75.62s/it]


 Epoch: 444 |Training loss: 1.9835

 validation loss: 1.9802


training:   2%|▏         | 30/1584 [39:36<35:40:56, 82.66s/it]


 Epoch: 445 |Training loss: 1.9805


training:   2%|▏         | 31/1584 [40:49<34:27:31, 79.88s/it]


 Epoch: 446 |Training loss: 1.9802


training:   2%|▏         | 32/1584 [42:03<33:36:12, 77.95s/it]


 Epoch: 447 |Training loss: 1.9746


training:   2%|▏         | 33/1584 [43:16<32:58:35, 76.54s/it]


 Epoch: 448 |Training loss: 1.9750


training:   2%|▏         | 34/1584 [44:30<32:33:33, 75.62s/it]


 Epoch: 449 |Training loss: 1.9663

 validation loss: 1.9658


training:   2%|▏         | 35/1584 [46:09<35:33:22, 82.64s/it]


 Epoch: 450 |Training loss: 1.9629


training:   2%|▏         | 36/1584 [47:22<34:22:12, 79.93s/it]


 Epoch: 451 |Training loss: 1.9658


training:   2%|▏         | 37/1584 [48:36<33:30:31, 77.98s/it]


 Epoch: 452 |Training loss: 1.9611


training:   2%|▏         | 38/1584 [49:49<32:53:17, 76.58s/it]


 Epoch: 453 |Training loss: 1.9664


training:   2%|▏         | 39/1584 [51:02<32:27:51, 75.65s/it]


 Epoch: 454 |Training loss: 1.9556

 validation loss: 1.9538


training:   3%|▎         | 40/1584 [52:42<35:28:02, 82.70s/it]


 Epoch: 455 |Training loss: 1.9658


training:   3%|▎         | 41/1584 [53:55<34:14:37, 79.89s/it]


 Epoch: 456 |Training loss: 1.9538


training:   3%|▎         | 42/1584 [55:08<33:23:38, 77.96s/it]


 Epoch: 457 |Training loss: 1.9583


training:   3%|▎         | 43/1584 [56:22<32:48:53, 76.66s/it]


 Epoch: 458 |Training loss: 1.9489


training:   3%|▎         | 44/1584 [57:35<32:20:27, 75.60s/it]


 Epoch: 459 |Training loss: 1.9535

 validation loss: 1.9531


training:   3%|▎         | 45/1584 [59:15<35:25:46, 82.88s/it]


 Epoch: 460 |Training loss: 1.9660


training:   3%|▎         | 46/1584 [1:00:29<34:14:14, 80.14s/it]


 Epoch: 461 |Training loss: 1.9531


training:   3%|▎         | 47/1584 [1:01:42<33:23:43, 78.22s/it]


 Epoch: 462 |Training loss: 1.9432


training:   3%|▎         | 48/1584 [1:02:57<32:50:25, 76.97s/it]


 Epoch: 463 |Training loss: 1.9429


training:   3%|▎         | 49/1584 [1:04:10<32:24:30, 76.01s/it]


 Epoch: 464 |Training loss: 1.9444

 validation loss: 1.9359


training:   3%|▎         | 50/1584 [1:05:49<35:21:16, 82.97s/it]


 Epoch: 465 |Training loss: 1.9353


training:   3%|▎         | 51/1584 [1:07:03<34:07:33, 80.14s/it]


 Epoch: 466 |Training loss: 1.9359


training:   3%|▎         | 52/1584 [1:08:16<33:14:19, 78.11s/it]


 Epoch: 467 |Training loss: 1.9300


training:   3%|▎         | 53/1584 [1:09:30<32:35:43, 76.64s/it]


 Epoch: 468 |Training loss: 1.9344


training:   3%|▎         | 54/1584 [1:10:44<32:14:38, 75.87s/it]


 Epoch: 469 |Training loss: 1.9311

 validation loss: 1.9193


training:   3%|▎         | 55/1584 [1:12:23<35:14:58, 82.99s/it]


 Epoch: 470 |Training loss: 1.9303


training:   4%|▎         | 56/1584 [1:13:37<33:59:31, 80.09s/it]


 Epoch: 471 |Training loss: 1.9193


training:   4%|▎         | 57/1584 [1:14:50<33:05:12, 78.00s/it]


 Epoch: 472 |Training loss: 1.9273


training:   4%|▎         | 58/1584 [1:16:03<32:27:20, 76.57s/it]


 Epoch: 473 |Training loss: 1.9175


training:   4%|▎         | 59/1584 [1:17:16<31:59:59, 75.54s/it]


 Epoch: 474 |Training loss: 1.9386

 validation loss: 1.9409


training:   4%|▍         | 60/1584 [1:18:56<35:01:09, 82.72s/it]


 Epoch: 475 |Training loss: 1.9131


training:   4%|▍         | 61/1584 [1:20:09<33:50:28, 79.99s/it]


 Epoch: 476 |Training loss: 1.9409


training:   4%|▍         | 62/1584 [1:21:22<32:57:51, 77.97s/it]


 Epoch: 477 |Training loss: 1.9071


training:   4%|▍         | 63/1584 [1:22:36<32:22:59, 76.65s/it]


 Epoch: 478 |Training loss: 1.9904


training:   4%|▍         | 64/1584 [1:23:50<31:58:56, 75.75s/it]


 Epoch: 479 |Training loss: 1.9692

 validation loss: 1.9881


training:   4%|▍         | 65/1584 [1:25:29<34:56:47, 82.82s/it]


 Epoch: 480 |Training loss: 1.9567


training:   4%|▍         | 66/1584 [1:26:43<33:49:23, 80.21s/it]


 Epoch: 481 |Training loss: 1.9881


training:   4%|▍         | 67/1584 [1:27:57<32:57:21, 78.21s/it]


 Epoch: 482 |Training loss: 1.9411


training:   4%|▍         | 68/1584 [1:29:11<32:23:57, 76.94s/it]


 Epoch: 483 |Training loss: 1.9823


training:   4%|▍         | 69/1584 [1:30:24<31:58:54, 76.00s/it]


 Epoch: 484 |Training loss: 1.9808

 validation loss: 1.9717


training:   4%|▍         | 70/1584 [1:32:04<34:58:12, 83.15s/it]


 Epoch: 485 |Training loss: 1.9414


training:   4%|▍         | 71/1584 [1:33:18<33:47:16, 80.39s/it]


 Epoch: 486 |Training loss: 1.9717


training:   5%|▍         | 72/1584 [1:34:32<32:55:14, 78.38s/it]


 Epoch: 487 |Training loss: 1.9676


training:   5%|▍         | 73/1584 [1:35:46<32:20:00, 77.04s/it]


 Epoch: 488 |Training loss: 1.9418


training:   5%|▍         | 74/1584 [1:37:00<31:58:26, 76.23s/it]


 Epoch: 489 |Training loss: 1.9546

 validation loss: 1.9333


training:   5%|▍         | 75/1584 [1:38:40<34:57:17, 83.39s/it]


 Epoch: 490 |Training loss: 1.9424


training:   5%|▍         | 76/1584 [1:39:54<33:44:05, 80.53s/it]


 Epoch: 491 |Training loss: 1.9333


training:   5%|▍         | 77/1584 [1:41:08<32:53:16, 78.56s/it]


 Epoch: 492 |Training loss: 1.9321


training:   5%|▍         | 78/1584 [1:42:22<32:18:23, 77.23s/it]


 Epoch: 493 |Training loss: 1.9283


training:   5%|▍         | 79/1584 [1:43:36<31:53:14, 76.28s/it]


 Epoch: 494 |Training loss: 1.9126

 validation loss: 1.9195


training:   5%|▌         | 80/1584 [1:45:16<34:49:00, 83.34s/it]


 Epoch: 495 |Training loss: 1.9215


training:   5%|▌         | 81/1584 [1:46:30<33:36:27, 80.50s/it]


 Epoch: 496 |Training loss: 1.9195


training:   5%|▌         | 82/1584 [1:47:44<32:44:05, 78.46s/it]


 Epoch: 497 |Training loss: 1.9130


training:   5%|▌         | 83/1584 [1:48:58<32:10:42, 77.18s/it]


 Epoch: 498 |Training loss: 1.9083


training:   5%|▌         | 84/1584 [1:50:12<31:45:21, 76.21s/it]


 Epoch: 499 |Training loss: 1.9070

 validation loss: 1.9023


training:   5%|▌         | 85/1584 [1:51:52<34:41:28, 83.31s/it]


 Epoch: 500 |Training loss: 1.8956


training:   5%|▌         | 86/1584 [1:53:05<33:28:47, 80.46s/it]


 Epoch: 501 |Training loss: 1.9023


training:   5%|▌         | 87/1584 [1:54:19<32:37:47, 78.47s/it]


 Epoch: 502 |Training loss: 1.8891


training:   6%|▌         | 88/1584 [1:55:33<32:00:43, 77.03s/it]


 Epoch: 503 |Training loss: 1.8962


training:   6%|▌         | 89/1584 [1:56:47<31:36:03, 76.10s/it]


 Epoch: 504 |Training loss: 1.8800

 validation loss: 1.8709


training:   6%|▌         | 90/1584 [1:58:27<34:32:37, 83.24s/it]


 Epoch: 505 |Training loss: 1.9031


training:   6%|▌         | 91/1584 [1:59:41<33:21:58, 80.45s/it]


 Epoch: 506 |Training loss: 1.8709


training:   6%|▌         | 92/1584 [2:00:55<32:32:08, 78.50s/it]


 Epoch: 507 |Training loss: 1.9038


training:   6%|▌         | 93/1584 [2:02:08<31:54:36, 77.05s/it]


 Epoch: 508 |Training loss: 1.8841


training:   6%|▌         | 94/1584 [2:03:22<31:29:01, 76.07s/it]


 Epoch: 509 |Training loss: 1.9179

 validation loss: 1.8859


training:   6%|▌         | 95/1584 [2:05:02<34:24:55, 83.21s/it]


 Epoch: 510 |Training loss: 1.9083


training:   6%|▌         | 96/1584 [2:06:16<33:15:57, 80.48s/it]


 Epoch: 511 |Training loss: 1.8859


training:   6%|▌         | 97/1584 [2:07:30<32:25:47, 78.51s/it]


 Epoch: 512 |Training loss: 1.8992


training:   6%|▌         | 98/1584 [2:08:44<31:51:18, 77.17s/it]


 Epoch: 513 |Training loss: 1.8879


training:   6%|▋         | 99/1584 [2:09:58<31:27:48, 76.27s/it]


 Epoch: 514 |Training loss: 1.8835

 validation loss: 1.8796


training:   6%|▋         | 100/1584 [2:11:38<34:19:00, 83.25s/it]


 Epoch: 515 |Training loss: 1.8806


training:   6%|▋         | 101/1584 [2:12:52<33:09:53, 80.51s/it]


 Epoch: 516 |Training loss: 1.8796


training:   6%|▋         | 102/1584 [2:14:06<32:20:19, 78.56s/it]


 Epoch: 517 |Training loss: 1.8630


training:   7%|▋         | 103/1584 [2:15:20<31:48:14, 77.31s/it]


 Epoch: 518 |Training loss: 1.8781


training:   7%|▋         | 104/1584 [2:16:34<31:22:42, 76.33s/it]


 Epoch: 519 |Training loss: 1.8644

 validation loss: 1.8809


training:   7%|▋         | 105/1584 [2:18:14<34:15:47, 83.40s/it]


 Epoch: 520 |Training loss: 1.8589


training:   7%|▋         | 106/1584 [2:19:28<33:05:58, 80.62s/it]


 Epoch: 521 |Training loss: 1.8809


training:   7%|▋         | 107/1584 [2:20:42<32:16:09, 78.65s/it]


 Epoch: 522 |Training loss: 1.8515


training:   7%|▋         | 108/1584 [2:21:56<31:37:51, 77.15s/it]


 Epoch: 523 |Training loss: 1.8670


training:   7%|▋         | 109/1584 [2:23:10<31:13:22, 76.21s/it]


 Epoch: 524 |Training loss: 1.8690

 validation loss: 1.8579


training:   7%|▋         | 110/1584 [2:24:50<34:07:53, 83.36s/it]


 Epoch: 525 |Training loss: 1.8579


training:   7%|▋         | 111/1584 [2:26:04<32:56:32, 80.51s/it]


 Epoch: 526 |Training loss: 1.8579


training:   7%|▋         | 112/1584 [2:27:18<32:08:02, 78.59s/it]


 Epoch: 527 |Training loss: 1.8549


training:   7%|▋         | 113/1584 [2:28:32<31:31:09, 77.14s/it]


 Epoch: 528 |Training loss: 1.8512


training:   7%|▋         | 114/1584 [2:29:46<31:09:37, 76.31s/it]


 Epoch: 529 |Training loss: 1.8507

 validation loss: 1.8552


training:   7%|▋         | 115/1584 [2:31:26<34:02:31, 83.42s/it]


 Epoch: 530 |Training loss: 1.8423


training:   7%|▋         | 116/1584 [2:32:41<32:54:10, 80.69s/it]


 Epoch: 531 |Training loss: 1.8552


training:   7%|▋         | 117/1584 [2:33:55<32:04:44, 78.72s/it]


 Epoch: 532 |Training loss: 1.8311


training:   7%|▋         | 118/1584 [2:35:09<31:30:46, 77.39s/it]


 Epoch: 533 |Training loss: 1.8530


training:   8%|▊         | 119/1584 [2:36:23<31:04:49, 76.37s/it]


 Epoch: 534 |Training loss: 1.8330

 validation loss: 1.8270


training:   8%|▊         | 120/1584 [2:38:03<33:55:38, 83.43s/it]


 Epoch: 535 |Training loss: 1.8488


training:   8%|▊         | 121/1584 [2:39:17<32:44:08, 80.55s/it]


 Epoch: 536 |Training loss: 1.8270


training:   8%|▊         | 122/1584 [2:40:30<31:49:11, 78.35s/it]


 Epoch: 537 |Training loss: 1.8533


training:   8%|▊         | 123/1584 [2:41:44<31:13:53, 76.96s/it]


 Epoch: 538 |Training loss: 1.8467


training:   8%|▊         | 124/1584 [2:42:57<30:47:02, 75.91s/it]


 Epoch: 539 |Training loss: 1.8442

 validation loss: 1.8275


training:   8%|▊         | 125/1584 [2:44:36<33:35:53, 82.90s/it]


 Epoch: 540 |Training loss: 1.8451


training:   8%|▊         | 126/1584 [2:45:50<32:25:18, 80.05s/it]


 Epoch: 541 |Training loss: 1.8275


training:   8%|▊         | 127/1584 [2:47:03<31:36:37, 78.10s/it]


 Epoch: 542 |Training loss: 1.8293


training:   8%|▊         | 128/1584 [2:48:17<31:00:58, 76.69s/it]


 Epoch: 543 |Training loss: 1.8249


training:   8%|▊         | 129/1584 [2:49:29<30:30:38, 75.49s/it]


 Epoch: 544 |Training loss: 1.8173

 validation loss: 1.8171


training:   8%|▊         | 130/1584 [2:51:09<33:23:19, 82.67s/it]


 Epoch: 545 |Training loss: 1.8200


training:   8%|▊         | 131/1584 [2:52:22<32:15:36, 79.93s/it]


 Epoch: 546 |Training loss: 1.8171


training:   8%|▊         | 132/1584 [2:53:36<31:27:25, 77.99s/it]


 Epoch: 547 |Training loss: 1.8069


training:   8%|▊         | 133/1584 [2:54:49<30:54:57, 76.70s/it]


 Epoch: 548 |Training loss: 1.8157


training:   8%|▊         | 134/1584 [2:56:03<30:30:45, 75.76s/it]


 Epoch: 549 |Training loss: 1.8019

 validation loss: 1.8107


training:   9%|▊         | 135/1584 [2:57:42<33:18:44, 82.76s/it]


 Epoch: 550 |Training loss: 1.8147


training:   9%|▊         | 136/1584 [2:58:56<32:10:14, 79.98s/it]


 Epoch: 551 |Training loss: 1.8107


training:   9%|▊         | 137/1584 [3:00:09<31:21:40, 78.02s/it]


 Epoch: 552 |Training loss: 1.7990


training:   9%|▊         | 138/1584 [3:01:23<30:47:43, 76.67s/it]


 Epoch: 553 |Training loss: 1.8078


training:   9%|▉         | 139/1584 [3:02:36<30:22:37, 75.68s/it]


 Epoch: 554 |Training loss: 1.8006

 validation loss: 1.7847


training:   9%|▉         | 140/1584 [3:04:15<33:10:58, 82.73s/it]


 Epoch: 555 |Training loss: 1.7961


training:   9%|▉         | 141/1584 [3:05:28<32:01:10, 79.88s/it]


 Epoch: 556 |Training loss: 1.7847


training:   9%|▉         | 142/1584 [3:06:42<31:13:34, 77.96s/it]


 Epoch: 557 |Training loss: 1.7964


training:   9%|▉         | 143/1584 [3:07:56<30:41:39, 76.68s/it]


 Epoch: 558 |Training loss: 1.7879


training:   9%|▉         | 144/1584 [3:09:09<30:18:03, 75.75s/it]


 Epoch: 559 |Training loss: 1.7992

 validation loss: 1.8061


training:   9%|▉         | 145/1584 [3:10:49<33:06:46, 82.84s/it]


 Epoch: 560 |Training loss: 1.7819


training:   9%|▉         | 146/1584 [3:12:02<32:01:34, 80.18s/it]


 Epoch: 561 |Training loss: 1.8061


training:   9%|▉         | 147/1584 [3:13:16<31:14:57, 78.29s/it]


 Epoch: 562 |Training loss: 1.7711


training:   9%|▉         | 148/1584 [3:14:30<30:41:56, 76.96s/it]


 Epoch: 563 |Training loss: 1.8145


training:   9%|▉         | 149/1584 [3:15:44<30:19:24, 76.07s/it]


 Epoch: 564 |Training loss: 1.7894

 validation loss: 1.8038


training:   9%|▉         | 150/1584 [3:17:24<33:09:44, 83.25s/it]


 Epoch: 565 |Training loss: 1.8166


training:  10%|▉         | 151/1584 [3:18:38<32:01:06, 80.44s/it]


 Epoch: 566 |Training loss: 1.8038


training:  10%|▉         | 152/1584 [3:19:52<31:12:27, 78.46s/it]


 Epoch: 567 |Training loss: 1.8130


training:  10%|▉         | 153/1584 [3:21:06<30:39:11, 77.12s/it]


 Epoch: 568 |Training loss: 1.8186


training:  10%|▉         | 154/1584 [3:22:20<30:13:05, 76.07s/it]


 Epoch: 569 |Training loss: 1.7973

 validation loss: 1.7959


training:  10%|▉         | 155/1584 [3:24:00<33:02:37, 83.25s/it]


 Epoch: 570 |Training loss: 1.8160


training:  10%|▉         | 156/1584 [3:25:14<31:57:24, 80.56s/it]


 Epoch: 571 |Training loss: 1.7959


training:  10%|▉         | 157/1584 [3:26:27<31:06:36, 78.48s/it]


 Epoch: 572 |Training loss: 1.7944


training:  10%|▉         | 158/1584 [3:27:42<30:37:19, 77.31s/it]


 Epoch: 573 |Training loss: 1.7982


training:  10%|█         | 159/1584 [3:28:56<30:13:35, 76.36s/it]


 Epoch: 574 |Training loss: 1.7780

 validation loss: 1.7790


training:  10%|█         | 160/1584 [3:30:36<32:59:54, 83.42s/it]


 Epoch: 575 |Training loss: 1.7976


training:  10%|█         | 161/1584 [3:31:50<31:53:25, 80.68s/it]


 Epoch: 576 |Training loss: 1.7790


training:  10%|█         | 162/1584 [3:33:05<31:06:52, 78.77s/it]


 Epoch: 577 |Training loss: 1.7786


training:  10%|█         | 163/1584 [3:34:19<30:31:14, 77.32s/it]


 Epoch: 578 |Training loss: 1.7900


training:  10%|█         | 164/1584 [3:35:33<30:07:09, 76.36s/it]


 Epoch: 579 |Training loss: 1.7643

 validation loss: 1.7724


training:  10%|█         | 165/1584 [3:37:13<32:54:56, 83.51s/it]


 Epoch: 580 |Training loss: 1.7843


training:  10%|█         | 166/1584 [3:38:27<31:46:24, 80.67s/it]


 Epoch: 581 |Training loss: 1.7724


training:  11%|█         | 167/1584 [3:39:41<30:58:28, 78.69s/it]


 Epoch: 582 |Training loss: 1.7790


training:  11%|█         | 168/1584 [3:40:55<30:24:58, 77.33s/it]


 Epoch: 583 |Training loss: 1.7752


training:  11%|█         | 169/1584 [3:42:09<29:57:42, 76.23s/it]


 Epoch: 584 |Training loss: 1.7645

 validation loss: 1.7815


training:  11%|█         | 170/1584 [3:43:49<32:45:07, 83.39s/it]


 Epoch: 585 |Training loss: 1.7671


training:  11%|█         | 171/1584 [3:45:03<31:39:35, 80.66s/it]


 Epoch: 586 |Training loss: 1.7814


training:  11%|█         | 172/1584 [3:46:17<30:49:15, 78.58s/it]


 Epoch: 587 |Training loss: 1.7616


training:  11%|█         | 173/1584 [3:47:31<30:14:46, 77.17s/it]


 Epoch: 588 |Training loss: 1.7664


training:  11%|█         | 174/1584 [3:48:45<29:51:23, 76.23s/it]


 Epoch: 589 |Training loss: 1.7585

 validation loss: 1.7508


training:  11%|█         | 175/1584 [3:50:24<32:34:26, 83.23s/it]


 Epoch: 590 |Training loss: 1.7534


training:  11%|█         | 176/1584 [3:51:39<31:30:10, 80.55s/it]


 Epoch: 591 |Training loss: 1.7508


training:  11%|█         | 177/1584 [3:52:53<30:42:35, 78.58s/it]


 Epoch: 592 |Training loss: 1.7469


training:  11%|█         | 178/1584 [3:54:07<30:09:50, 77.23s/it]


 Epoch: 593 |Training loss: 1.7398


training:  11%|█▏        | 179/1584 [3:55:21<29:46:00, 76.27s/it]


 Epoch: 594 |Training loss: 1.7445

 validation loss: 1.7350


training:  11%|█▏        | 180/1584 [3:57:01<32:31:40, 83.40s/it]


 Epoch: 595 |Training loss: 1.7466


training:  11%|█▏        | 181/1584 [3:58:15<31:23:08, 80.53s/it]


 Epoch: 596 |Training loss: 1.7350


training:  11%|█▏        | 182/1584 [3:59:29<30:35:32, 78.55s/it]


 Epoch: 597 |Training loss: 1.7470


training:  12%|█▏        | 183/1584 [4:00:43<30:05:30, 77.32s/it]


 Epoch: 598 |Training loss: 1.7225


training:  12%|█▏        | 184/1584 [4:01:57<29:40:25, 76.30s/it]


 Epoch: 599 |Training loss: 1.7496

 validation loss: 1.7567


training:  12%|█▏        | 185/1584 [4:03:37<32:24:51, 83.41s/it]


 Epoch: 600 |Training loss: 1.7216


training:  12%|█▏        | 186/1584 [4:04:51<31:19:42, 80.67s/it]


 Epoch: 601 |Training loss: 1.7567


training:  12%|█▏        | 187/1584 [4:06:05<30:30:28, 78.62s/it]


 Epoch: 602 |Training loss: 1.7271


training:  12%|█▏        | 188/1584 [4:07:19<29:57:15, 77.25s/it]


 Epoch: 603 |Training loss: 1.7515


training:  12%|█▏        | 189/1584 [4:08:33<29:33:25, 76.28s/it]


 Epoch: 604 |Training loss: 1.7371

 validation loss: 1.7338


training:  12%|█▏        | 190/1584 [4:10:13<32:15:02, 83.29s/it]


 Epoch: 605 |Training loss: 1.7340


training:  12%|█▏        | 191/1584 [4:11:27<31:08:46, 80.49s/it]


 Epoch: 606 |Training loss: 1.7338


training:  12%|█▏        | 192/1584 [4:12:41<30:22:25, 78.55s/it]


 Epoch: 607 |Training loss: 1.7340


training:  12%|█▏        | 193/1584 [4:13:55<29:50:06, 77.22s/it]


 Epoch: 608 |Training loss: 1.7274


training:  12%|█▏        | 194/1584 [4:15:09<29:25:40, 76.22s/it]


 Epoch: 609 |Training loss: 1.7238

 validation loss: 1.7134


training:  12%|█▏        | 195/1584 [4:16:49<32:08:24, 83.30s/it]


 Epoch: 610 |Training loss: 1.7273


training:  12%|█▏        | 196/1584 [4:18:03<31:02:25, 80.51s/it]


 Epoch: 611 |Training loss: 1.7134


training:  12%|█▏        | 197/1584 [4:19:16<30:14:35, 78.50s/it]


 Epoch: 612 |Training loss: 1.7142


training:  12%|█▎        | 198/1584 [4:20:31<29:43:33, 77.21s/it]


 Epoch: 613 |Training loss: 1.7030


training:  13%|█▎        | 199/1584 [4:21:45<29:21:50, 76.33s/it]


 Epoch: 614 |Training loss: 1.7082

 validation loss: 1.6993


training:  13%|█▎        | 200/1584 [4:23:26<32:08:46, 83.62s/it]


 Epoch: 615 |Training loss: 1.7027


training:  13%|█▎        | 201/1584 [4:24:40<31:02:36, 80.81s/it]


 Epoch: 616 |Training loss: 1.6994


training:  13%|█▎        | 202/1584 [4:25:54<30:17:39, 78.91s/it]


 Epoch: 617 |Training loss: 1.7057


training:  13%|█▎        | 203/1584 [4:27:09<29:46:43, 77.63s/it]


 Epoch: 618 |Training loss: 1.7045


training:  13%|█▎        | 204/1584 [4:28:23<29:22:38, 76.64s/it]


 Epoch: 619 |Training loss: 1.7018

 validation loss: 1.6928


training:  13%|█▎        | 205/1584 [4:30:03<32:01:13, 83.59s/it]


 Epoch: 620 |Training loss: 1.6988


training:  13%|█▎        | 206/1584 [4:31:17<30:56:21, 80.83s/it]


 Epoch: 621 |Training loss: 1.6928


training:  13%|█▎        | 207/1584 [4:32:31<30:06:50, 78.73s/it]


 Epoch: 622 |Training loss: 1.6873


training:  13%|█▎        | 208/1584 [4:33:45<29:33:56, 77.35s/it]


 Epoch: 623 |Training loss: 1.6962


training:  13%|█▎        | 209/1584 [4:35:00<29:10:22, 76.38s/it]


 Epoch: 624 |Training loss: 1.6798


training:  13%|█▎        | 210/1584 [4:36:39<31:50:59, 83.45s/it]


 validation loss: 1.6744

 Epoch: 625 |Training loss: 1.6880


training:  13%|█▎        | 211/1584 [4:37:53<30:44:06, 80.59s/it]


 Epoch: 626 |Training loss: 1.6744


training:  13%|█▎        | 212/1584 [4:39:07<29:55:45, 78.53s/it]


 Epoch: 627 |Training loss: 1.6808


training:  13%|█▎        | 213/1584 [4:40:21<29:20:47, 77.06s/it]


 Epoch: 628 |Training loss: 1.6704


training:  14%|█▎        | 214/1584 [4:41:34<28:55:24, 76.00s/it]


 Epoch: 629 |Training loss: 1.6865


training:  14%|█▎        | 215/1584 [4:43:13<31:31:22, 82.89s/it]


 validation loss: 1.6694

 Epoch: 630 |Training loss: 1.6663


training:  14%|█▎        | 216/1584 [4:44:27<30:26:07, 80.09s/it]


 Epoch: 631 |Training loss: 1.6694


training:  14%|█▎        | 217/1584 [4:45:41<29:42:51, 78.25s/it]


 Epoch: 632 |Training loss: 1.6669


training:  14%|█▍        | 218/1584 [4:46:54<29:08:07, 76.78s/it]


 Epoch: 633 |Training loss: 1.6621


training:  14%|█▍        | 219/1584 [4:48:07<28:43:12, 75.75s/it]


 Epoch: 634 |Training loss: 1.6672

 validation loss: 1.6607


training:  14%|█▍        | 220/1584 [4:49:47<31:22:18, 82.80s/it]


 Epoch: 635 |Training loss: 1.6622


training:  14%|█▍        | 221/1584 [4:51:00<30:19:20, 80.09s/it]


 Epoch: 636 |Training loss: 1.6607


training:  14%|█▍        | 222/1584 [4:52:14<29:31:30, 78.04s/it]


 Epoch: 637 |Training loss: 1.6548


training:  14%|█▍        | 223/1584 [4:53:27<28:58:33, 76.64s/it]


 Epoch: 638 |Training loss: 1.6630


training:  14%|█▍        | 224/1584 [4:54:41<28:37:14, 75.76s/it]


 Epoch: 639 |Training loss: 1.6476

 validation loss: 1.6466


training:  14%|█▍        | 225/1584 [4:56:20<31:17:00, 82.87s/it]


 Epoch: 640 |Training loss: 1.6667


training:  14%|█▍        | 226/1584 [4:57:34<30:12:53, 80.10s/it]


 Epoch: 641 |Training loss: 1.6466


training:  14%|█▍        | 227/1584 [4:58:47<29:26:31, 78.11s/it]


 Epoch: 642 |Training loss: 1.6704


training:  14%|█▍        | 228/1584 [5:00:01<28:56:32, 76.84s/it]


 Epoch: 643 |Training loss: 1.6491


training:  14%|█▍        | 229/1584 [5:01:15<28:32:28, 75.83s/it]


 Epoch: 644 |Training loss: 1.6595

 validation loss: 1.6487


training:  15%|█▍        | 230/1584 [5:02:54<31:09:49, 82.86s/it]


 Epoch: 645 |Training loss: 1.6521


training:  15%|█▍        | 231/1584 [5:04:07<30:04:25, 80.02s/it]


 Epoch: 646 |Training loss: 1.6487


training:  15%|█▍        | 232/1584 [5:05:21<29:18:54, 78.06s/it]


 Epoch: 647 |Training loss: 1.6560


training:  15%|█▍        | 233/1584 [5:06:34<28:43:45, 76.55s/it]


 Epoch: 648 |Training loss: 1.6449


training:  15%|█▍        | 234/1584 [5:07:47<28:19:42, 75.54s/it]


 Epoch: 649 |Training loss: 1.6568

 validation loss: 1.6547


training:  15%|█▍        | 235/1584 [5:09:27<30:59:34, 82.71s/it]


 Epoch: 650 |Training loss: 1.6516


training:  15%|█▍        | 236/1584 [5:10:40<29:55:32, 79.92s/it]


 Epoch: 651 |Training loss: 1.6547


training:  15%|█▍        | 237/1584 [5:11:53<29:11:29, 78.02s/it]


 Epoch: 652 |Training loss: 1.6508


training:  15%|█▌        | 238/1584 [5:13:07<28:41:18, 76.73s/it]


 Epoch: 653 |Training loss: 1.6466


training:  15%|█▌        | 239/1584 [5:14:21<28:20:59, 75.88s/it]


 Epoch: 654 |Training loss: 1.6545

 validation loss: 1.6568


training:  15%|█▌        | 240/1584 [5:16:01<30:57:46, 82.94s/it]


 Epoch: 655 |Training loss: 1.6297


training:  15%|█▌        | 241/1584 [5:17:14<29:54:54, 80.19s/it]


 Epoch: 656 |Training loss: 1.6568


training:  15%|█▌        | 242/1584 [5:18:29<29:14:19, 78.44s/it]


 Epoch: 657 |Training loss: 1.6345


training:  15%|█▌        | 243/1584 [5:19:43<28:42:46, 77.08s/it]


 Epoch: 658 |Training loss: 1.6471


training:  15%|█▌        | 244/1584 [5:20:57<28:22:23, 76.23s/it]


 Epoch: 659 |Training loss: 1.6475

 validation loss: 1.6370


training:  15%|█▌        | 245/1584 [5:22:37<31:01:16, 83.40s/it]


 Epoch: 660 |Training loss: 1.6412


training:  16%|█▌        | 246/1584 [5:23:51<29:54:29, 80.47s/it]


 Epoch: 661 |Training loss: 1.6370


training:  16%|█▌        | 247/1584 [5:25:05<29:11:32, 78.60s/it]


 Epoch: 662 |Training loss: 1.6513


training:  16%|█▌        | 248/1584 [5:26:19<28:38:56, 77.20s/it]


 Epoch: 663 |Training loss: 1.6287


training:  16%|█▌        | 249/1584 [5:27:33<28:15:51, 76.22s/it]


 Epoch: 664 |Training loss: 1.6564

 validation loss: 1.6459


training:  16%|█▌        | 250/1584 [5:29:12<30:51:10, 83.26s/it]


 Epoch: 665 |Training loss: 1.6386


training:  16%|█▌        | 251/1584 [5:30:26<29:47:44, 80.47s/it]


 Epoch: 666 |Training loss: 1.6459


training:  16%|█▌        | 252/1584 [5:31:40<29:02:15, 78.48s/it]


 Epoch: 667 |Training loss: 1.6428


training:  16%|█▌        | 253/1584 [5:32:54<28:30:08, 77.09s/it]


 Epoch: 668 |Training loss: 1.6313


training:  16%|█▌        | 254/1584 [5:34:08<28:07:59, 76.15s/it]


 Epoch: 669 |Training loss: 1.6449

 validation loss: 1.6409


training:  16%|█▌        | 255/1584 [5:35:48<30:43:00, 83.21s/it]


 Epoch: 670 |Training loss: 1.6273


training:  16%|█▌        | 256/1584 [5:37:02<29:41:33, 80.49s/it]


 Epoch: 671 |Training loss: 1.6409


training:  16%|█▌        | 257/1584 [5:38:16<28:59:27, 78.65s/it]


 Epoch: 672 |Training loss: 1.6306


training:  16%|█▋        | 258/1584 [5:39:30<28:28:18, 77.30s/it]


 Epoch: 673 |Training loss: 1.6261


training:  16%|█▋        | 259/1584 [5:40:44<28:05:03, 76.30s/it]


 Epoch: 674 |Training loss: 1.6171

 validation loss: 1.6268


training:  16%|█▋        | 260/1584 [5:42:24<30:38:14, 83.30s/it]


 Epoch: 675 |Training loss: 1.6160


training:  16%|█▋        | 261/1584 [5:43:38<29:36:14, 80.56s/it]


 Epoch: 676 |Training loss: 1.6268


training:  17%|█▋        | 262/1584 [5:44:52<28:49:18, 78.49s/it]


 Epoch: 677 |Training loss: 1.6185


training:  17%|█▋        | 263/1584 [5:46:06<28:18:10, 77.13s/it]


 Epoch: 678 |Training loss: 1.6149


training:  17%|█▋        | 264/1584 [5:47:20<27:56:24, 76.20s/it]


 Epoch: 679 |Training loss: 1.6141

 validation loss: 1.6069


training:  17%|█▋        | 265/1584 [5:48:59<30:29:09, 83.21s/it]


 Epoch: 680 |Training loss: 1.6022


training:  17%|█▋        | 266/1584 [5:50:13<29:27:29, 80.46s/it]


 Epoch: 681 |Training loss: 1.6069


training:  17%|█▋        | 267/1584 [5:51:27<28:43:08, 78.50s/it]


 Epoch: 682 |Training loss: 1.5987


training:  17%|█▋        | 268/1584 [5:52:41<28:13:16, 77.20s/it]


 Epoch: 683 |Training loss: 1.6016


training:  17%|█▋        | 269/1584 [5:53:55<27:48:04, 76.11s/it]


 Epoch: 684 |Training loss: 1.5946

 validation loss: 1.5939


training:  17%|█▋        | 270/1584 [5:55:35<30:23:30, 83.27s/it]


 Epoch: 685 |Training loss: 1.5961


training:  17%|█▋        | 271/1584 [5:56:49<29:19:20, 80.40s/it]


 Epoch: 686 |Training loss: 1.5939


training:  17%|█▋        | 272/1584 [5:58:02<28:34:27, 78.40s/it]


 Epoch: 687 |Training loss: 1.5965


training:  17%|█▋        | 273/1584 [5:59:16<28:03:06, 77.03s/it]


 Epoch: 688 |Training loss: 1.5895


training:  17%|█▋        | 274/1584 [6:00:30<27:41:01, 76.08s/it]


 Epoch: 689 |Training loss: 1.5964

 validation loss: 1.5905


training:  17%|█▋        | 275/1584 [6:02:10<30:14:28, 83.17s/it]


 Epoch: 690 |Training loss: 1.5902


training:  17%|█▋        | 276/1584 [6:03:24<29:13:52, 80.45s/it]


 Epoch: 691 |Training loss: 1.5905


training:  17%|█▋        | 277/1584 [6:04:38<28:31:18, 78.56s/it]


 Epoch: 692 |Training loss: 1.5826


training:  18%|█▊        | 278/1584 [6:05:52<27:58:40, 77.12s/it]


 Epoch: 693 |Training loss: 1.5734


training:  18%|█▊        | 279/1584 [6:07:06<27:37:14, 76.20s/it]


 Epoch: 694 |Training loss: 1.5879

 validation loss: 1.5864


training:  18%|█▊        | 280/1584 [6:08:45<30:08:10, 83.20s/it]


 Epoch: 695 |Training loss: 1.5787


training:  18%|█▊        | 281/1584 [6:10:00<29:08:31, 80.52s/it]


 Epoch: 696 |Training loss: 1.5864


training:  18%|█▊        | 282/1584 [6:11:14<28:26:19, 78.63s/it]


 Epoch: 697 |Training loss: 1.5666


training:  18%|█▊        | 283/1584 [6:12:28<27:57:02, 77.34s/it]


 Epoch: 698 |Training loss: 1.5875


training:  18%|█▊        | 284/1584 [6:13:43<27:36:49, 76.47s/it]


 Epoch: 699 |Training loss: 1.5700

 validation loss: 1.5749


training:  18%|█▊        | 285/1584 [6:15:23<30:08:55, 83.55s/it]


 Epoch: 700 |Training loss: 1.6026


training:  18%|█▊        | 286/1584 [6:16:37<29:06:23, 80.73s/it]


 Epoch: 701 |Training loss: 1.5749


training:  18%|█▊        | 287/1584 [6:17:51<28:21:55, 78.73s/it]


 Epoch: 702 |Training loss: 1.5936


training:  18%|█▊        | 288/1584 [6:19:05<27:51:20, 77.38s/it]


 Epoch: 703 |Training loss: 1.5733


training:  18%|█▊        | 289/1584 [6:20:19<27:27:25, 76.33s/it]


 Epoch: 704 |Training loss: 1.6005

 validation loss: 1.5977


training:  18%|█▊        | 290/1584 [6:21:59<29:59:57, 83.46s/it]


 Epoch: 705 |Training loss: 1.5881


training:  18%|█▊        | 291/1584 [6:23:14<29:00:24, 80.76s/it]


 Epoch: 706 |Training loss: 1.5977


training:  18%|█▊        | 292/1584 [6:24:28<28:16:56, 78.80s/it]


 Epoch: 707 |Training loss: 1.5982


training:  18%|█▊        | 293/1584 [6:25:42<27:43:09, 77.30s/it]


 Epoch: 708 |Training loss: 1.5874


training:  19%|█▊        | 294/1584 [6:26:56<27:21:00, 76.33s/it]


 Epoch: 709 |Training loss: 1.5900

 validation loss: 1.5816


training:  19%|█▊        | 295/1584 [6:28:36<29:52:14, 83.42s/it]


 Epoch: 710 |Training loss: 1.5801


training:  19%|█▊        | 296/1584 [6:29:49<28:48:44, 80.53s/it]


 Epoch: 711 |Training loss: 1.5816


training:  19%|█▉        | 297/1584 [6:31:03<28:05:31, 78.58s/it]


 Epoch: 712 |Training loss: 1.5870


training:  19%|█▉        | 298/1584 [6:32:17<27:34:27, 77.19s/it]


 Epoch: 713 |Training loss: 1.5808


training:  19%|█▉        | 299/1584 [6:33:31<27:09:18, 76.08s/it]


 Epoch: 714 |Training loss: 1.6083

 validation loss: 1.6006


training:  19%|█▉        | 300/1584 [6:35:11<29:41:40, 83.26s/it]


 Epoch: 715 |Training loss: 1.5851


training:  19%|█▉        | 301/1584 [6:36:25<28:41:21, 80.50s/it]


 Epoch: 716 |Training loss: 1.6006


training:  19%|█▉        | 302/1584 [6:37:39<27:56:10, 78.45s/it]


 Epoch: 717 |Training loss: 1.5787


training:  19%|█▉        | 303/1584 [6:38:53<27:26:06, 77.10s/it]


 Epoch: 718 |Training loss: 1.6175


training:  19%|█▉        | 304/1584 [6:40:07<27:04:34, 76.15s/it]


 Epoch: 719 |Training loss: 1.5794

 validation loss: 1.6507


training:  19%|█▉        | 305/1584 [6:41:46<29:33:26, 83.19s/it]


 Epoch: 720 |Training loss: 1.6846


training:  19%|█▉        | 306/1584 [6:43:00<28:34:12, 80.48s/it]


 Epoch: 721 |Training loss: 1.6507


training:  19%|█▉        | 307/1584 [6:44:14<27:50:39, 78.50s/it]


 Epoch: 722 |Training loss: 1.6360


training:  19%|█▉        | 308/1584 [6:45:28<27:18:05, 77.03s/it]


 Epoch: 723 |Training loss: 1.6511


training:  20%|█▉        | 309/1584 [6:46:41<26:54:46, 75.99s/it]


 Epoch: 724 |Training loss: 1.6332

 validation loss: 1.6226


training:  20%|█▉        | 310/1584 [6:48:21<29:22:13, 82.99s/it]


 Epoch: 725 |Training loss: 1.6260


training:  20%|█▉        | 311/1584 [6:49:34<28:19:01, 80.08s/it]


 Epoch: 726 |Training loss: 1.6226


training:  20%|█▉        | 312/1584 [6:50:47<27:36:03, 78.12s/it]


 Epoch: 727 |Training loss: 1.6044


training:  20%|█▉        | 313/1584 [6:52:01<27:04:18, 76.68s/it]


 Epoch: 728 |Training loss: 1.6123


training:  20%|█▉        | 314/1584 [6:53:14<26:41:59, 75.68s/it]


 Epoch: 729 |Training loss: 1.6030

 validation loss: 1.5917


training:  20%|█▉        | 315/1584 [6:54:54<29:11:29, 82.81s/it]


 Epoch: 730 |Training loss: 1.5959


training:  20%|█▉        | 316/1584 [6:56:08<28:13:48, 80.15s/it]


 Epoch: 731 |Training loss: 1.5917


training:  20%|██        | 317/1584 [6:57:21<27:29:40, 78.12s/it]


 Epoch: 732 |Training loss: 1.5771


training:  20%|██        | 318/1584 [6:58:35<27:00:15, 76.79s/it]


 Epoch: 733 |Training loss: 1.5911


training:  20%|██        | 319/1584 [6:59:48<26:39:07, 75.85s/it]


 Epoch: 734 |Training loss: 1.5809

 validation loss: 1.5668


training:  20%|██        | 320/1584 [7:01:28<29:06:07, 82.89s/it]


 Epoch: 735 |Training loss: 1.5748


training:  20%|██        | 321/1584 [7:02:41<28:04:54, 80.04s/it]


 Epoch: 736 |Training loss: 1.5668


training:  20%|██        | 322/1584 [7:03:54<27:22:07, 78.07s/it]


 Epoch: 737 |Training loss: 1.5646


training:  20%|██        | 323/1584 [7:05:08<26:52:08, 76.71s/it]


 Epoch: 738 |Training loss: 1.5750


training:  20%|██        | 324/1584 [7:06:21<26:29:20, 75.68s/it]


 Epoch: 739 |Training loss: 1.5641

 validation loss: 1.5562


training:  21%|██        | 325/1584 [7:08:01<28:56:47, 82.77s/it]


 Epoch: 740 |Training loss: 1.5556


training:  21%|██        | 326/1584 [7:09:14<27:58:45, 80.07s/it]


 Epoch: 741 |Training loss: 1.5562


training:  21%|██        | 327/1584 [7:10:28<27:13:53, 77.99s/it]


 Epoch: 742 |Training loss: 1.5426


training:  21%|██        | 328/1584 [7:11:41<26:45:01, 76.67s/it]


 Epoch: 743 |Training loss: 1.5534


training:  21%|██        | 329/1584 [7:12:55<26:24:18, 75.74s/it]


 Epoch: 744 |Training loss: 1.5472

 validation loss: 1.5312


training:  21%|██        | 330/1584 [7:14:34<28:51:11, 82.83s/it]


 Epoch: 745 |Training loss: 1.5484


training:  21%|██        | 331/1584 [7:15:47<27:50:45, 80.00s/it]


 Epoch: 746 |Training loss: 1.5312


training:  21%|██        | 332/1584 [7:17:01<27:09:02, 78.07s/it]


 Epoch: 747 |Training loss: 1.5338


training:  21%|██        | 333/1584 [7:18:15<26:39:49, 76.73s/it]


 Epoch: 748 |Training loss: 1.5216


training:  21%|██        | 334/1584 [7:19:28<26:20:27, 75.86s/it]


 Epoch: 749 |Training loss: 1.5377

 validation loss: 1.5427


training:  21%|██        | 335/1584 [7:21:08<28:44:30, 82.84s/it]


 Epoch: 750 |Training loss: 1.5214


training:  21%|██        | 336/1584 [7:22:21<27:47:19, 80.16s/it]


 Epoch: 751 |Training loss: 1.5427


training:  21%|██▏       | 337/1584 [7:23:35<27:03:29, 78.12s/it]


 Epoch: 752 |Training loss: 1.5310


training:  21%|██▏       | 338/1584 [7:24:49<26:34:52, 76.80s/it]


 Epoch: 753 |Training loss: 1.5355


training:  21%|██▏       | 339/1584 [7:26:02<26:13:45, 75.84s/it]


 Epoch: 754 |Training loss: 1.5279

 validation loss: 1.5253


training:  21%|██▏       | 340/1584 [7:27:42<28:39:49, 82.95s/it]


 Epoch: 755 |Training loss: 1.5297


training:  22%|██▏       | 341/1584 [7:28:55<27:37:35, 80.01s/it]


 Epoch: 756 |Training loss: 1.5253


training:  22%|██▏       | 342/1584 [7:30:09<26:56:46, 78.11s/it]


 Epoch: 757 |Training loss: 1.5285


training:  22%|██▏       | 343/1584 [7:31:22<26:27:31, 76.75s/it]


 Epoch: 758 |Training loss: 1.5234


training:  22%|██▏       | 344/1584 [7:32:36<26:07:58, 75.87s/it]


 Epoch: 759 |Training loss: 1.5129

 validation loss: 1.5039


training:  22%|██▏       | 345/1584 [7:34:15<28:32:18, 82.92s/it]


 Epoch: 760 |Training loss: 1.5245


training:  22%|██▏       | 346/1584 [7:35:29<27:35:25, 80.23s/it]


 Epoch: 761 |Training loss: 1.5039


training:  22%|██▏       | 347/1584 [7:36:43<26:53:42, 78.27s/it]


 Epoch: 762 |Training loss: 1.5222


training:  22%|██▏       | 348/1584 [7:37:56<26:21:19, 76.76s/it]


 Epoch: 763 |Training loss: 1.5121


training:  22%|██▏       | 349/1584 [7:39:10<25:59:29, 75.77s/it]


 Epoch: 764 |Training loss: 1.5195

 validation loss: 1.5050


training:  22%|██▏       | 350/1584 [7:40:49<28:24:08, 82.86s/it]


 Epoch: 765 |Training loss: 1.5002


training:  22%|██▏       | 351/1584 [7:42:03<27:25:54, 80.09s/it]


 Epoch: 766 |Training loss: 1.5050


training:  22%|██▏       | 352/1584 [7:43:16<26:43:23, 78.09s/it]


 Epoch: 767 |Training loss: 1.4961


training:  22%|██▏       | 353/1584 [7:44:29<26:13:06, 76.67s/it]


 Epoch: 768 |Training loss: 1.5091


training:  22%|██▏       | 354/1584 [7:45:43<25:52:55, 75.75s/it]


 Epoch: 769 |Training loss: 1.5064

 validation loss: 1.4943


training:  22%|██▏       | 355/1584 [7:47:22<28:16:32, 82.83s/it]


 Epoch: 770 |Training loss: 1.4848


training:  22%|██▏       | 356/1584 [7:48:36<27:18:24, 80.05s/it]


 Epoch: 771 |Training loss: 1.4943


training:  23%|██▎       | 357/1584 [7:49:50<26:37:29, 78.12s/it]


 Epoch: 772 |Training loss: 1.4826


training:  23%|██▎       | 358/1584 [7:51:03<26:08:31, 76.76s/it]


 Epoch: 773 |Training loss: 1.4996


training:  23%|██▎       | 359/1584 [7:52:17<25:47:52, 75.81s/it]


 Epoch: 774 |Training loss: 1.4839

 validation loss: 1.4842


training:  23%|██▎       | 360/1584 [7:53:56<28:11:28, 82.92s/it]


 Epoch: 775 |Training loss: 1.5136


training:  23%|██▎       | 361/1584 [7:55:10<27:13:42, 80.15s/it]


 Epoch: 776 |Training loss: 1.4842


training:  23%|██▎       | 362/1584 [7:56:23<26:30:12, 78.08s/it]


 Epoch: 777 |Training loss: 1.4943


training:  23%|██▎       | 363/1584 [7:57:36<25:59:14, 76.62s/it]


 Epoch: 778 |Training loss: 1.4905


training:  23%|██▎       | 364/1584 [7:58:50<25:40:15, 75.75s/it]


 Epoch: 779 |Training loss: 1.4996

 validation loss: 1.4821


training:  23%|██▎       | 365/1584 [8:00:30<28:04:11, 82.90s/it]


 Epoch: 780 |Training loss: 1.4901


training:  23%|██▎       | 366/1584 [8:01:43<27:03:39, 79.98s/it]


 Epoch: 781 |Training loss: 1.4821


training:  23%|██▎       | 367/1584 [8:02:57<26:23:40, 78.08s/it]


 Epoch: 782 |Training loss: 1.4806


training:  23%|██▎       | 368/1584 [8:04:10<25:53:28, 76.65s/it]


 Epoch: 783 |Training loss: 1.4741


training:  23%|██▎       | 369/1584 [8:05:24<25:33:58, 75.75s/it]


 Epoch: 784 |Training loss: 1.4732

 validation loss: 1.4722


training:  23%|██▎       | 370/1584 [8:07:03<27:54:23, 82.75s/it]


 Epoch: 785 |Training loss: 1.4788


training:  23%|██▎       | 371/1584 [8:08:16<26:57:05, 79.99s/it]


 Epoch: 786 |Training loss: 1.4722


training:  23%|██▎       | 372/1584 [8:09:30<26:16:54, 78.06s/it]


 Epoch: 787 |Training loss: 1.4804


training:  24%|██▎       | 373/1584 [8:10:43<25:49:26, 76.77s/it]


 Epoch: 788 |Training loss: 1.4669


training:  24%|██▎       | 374/1584 [8:11:57<25:27:28, 75.74s/it]


 Epoch: 789 |Training loss: 1.4646

 validation loss: 1.4840


training:  24%|██▎       | 375/1584 [8:13:36<27:48:56, 82.83s/it]


 Epoch: 790 |Training loss: 1.4625


training:  24%|██▎       | 376/1584 [8:14:50<26:52:35, 80.10s/it]


 Epoch: 791 |Training loss: 1.4840


training:  24%|██▍       | 377/1584 [8:16:03<26:11:41, 78.13s/it]


 Epoch: 792 |Training loss: 1.4677


training:  24%|██▍       | 378/1584 [8:17:17<25:43:12, 76.78s/it]


 Epoch: 793 |Training loss: 1.4800


training:  24%|██▍       | 379/1584 [8:18:30<25:21:39, 75.77s/it]


 Epoch: 794 |Training loss: 1.4631

 validation loss: 1.4602


training:  24%|██▍       | 380/1584 [8:20:09<27:39:32, 82.70s/it]


 Epoch: 795 |Training loss: 1.4752


training:  24%|██▍       | 381/1584 [8:21:23<26:44:39, 80.03s/it]


 Epoch: 796 |Training loss: 1.4602


training:  24%|██▍       | 382/1584 [8:22:37<26:04:02, 78.07s/it]


 Epoch: 797 |Training loss: 1.4701


training:  24%|██▍       | 383/1584 [8:23:50<25:35:46, 76.73s/it]


 Epoch: 798 |Training loss: 1.4543


training:  24%|██▍       | 384/1584 [8:25:04<25:13:45, 75.69s/it]


 Epoch: 799 |Training loss: 1.4776

 validation loss: 1.4945


training:  24%|██▍       | 385/1584 [8:26:43<27:33:56, 82.77s/it]


 Epoch: 800 |Training loss: 1.4540


training:  24%|██▍       | 386/1584 [8:27:56<26:36:05, 79.94s/it]


 Epoch: 801 |Training loss: 1.4945


training:  24%|██▍       | 387/1584 [8:29:10<25:57:04, 78.05s/it]


 Epoch: 802 |Training loss: 1.4614


training:  24%|██▍       | 388/1584 [8:30:23<25:28:55, 76.70s/it]


 Epoch: 803 |Training loss: 1.4904


training:  25%|██▍       | 389/1584 [8:31:37<25:09:52, 75.81s/it]


 Epoch: 804 |Training loss: 1.4763

 validation loss: 1.4848


training:  25%|██▍       | 390/1584 [8:33:16<27:28:23, 82.83s/it]


 Epoch: 805 |Training loss: 1.4638


training:  25%|██▍       | 391/1584 [8:34:30<26:30:54, 80.01s/it]


 Epoch: 806 |Training loss: 1.4848


training:  25%|██▍       | 392/1584 [8:35:43<25:49:42, 78.01s/it]


 Epoch: 807 |Training loss: 1.4555


training:  25%|██▍       | 393/1584 [8:36:56<25:21:04, 76.63s/it]


 Epoch: 808 |Training loss: 1.4770


training:  25%|██▍       | 394/1584 [8:38:10<24:59:32, 75.61s/it]


 Epoch: 809 |Training loss: 1.4598

 validation loss: 1.4582


training:  25%|██▍       | 395/1584 [8:39:49<27:19:17, 82.72s/it]


 Epoch: 810 |Training loss: 1.4645


training:  25%|██▌       | 396/1584 [8:41:02<26:21:32, 79.88s/it]


 Epoch: 811 |Training loss: 1.4582


training:  25%|██▌       | 397/1584 [8:42:16<25:43:50, 78.04s/it]


 Epoch: 812 |Training loss: 1.4537


training:  25%|██▌       | 398/1584 [8:43:29<25:15:26, 76.67s/it]


 Epoch: 813 |Training loss: 1.4594


training:  25%|██▌       | 399/1584 [8:44:43<24:54:51, 75.69s/it]


 Epoch: 814 |Training loss: 1.4404

 validation loss: 1.4469


training:  25%|██▌       | 400/1584 [8:46:22<27:14:10, 82.81s/it]


 Epoch: 815 |Training loss: 1.4491


training:  25%|██▌       | 401/1584 [8:47:36<26:18:35, 80.06s/it]


 Epoch: 816 |Training loss: 1.4469


training:  25%|██▌       | 402/1584 [8:48:49<25:38:51, 78.11s/it]


 Epoch: 817 |Training loss: 1.4573


training:  25%|██▌       | 403/1584 [8:50:03<25:08:13, 76.62s/it]


 Epoch: 818 |Training loss: 1.4406


training:  26%|██▌       | 404/1584 [8:51:16<24:49:18, 75.73s/it]


 Epoch: 819 |Training loss: 1.4570

 validation loss: 1.4361


training:  26%|██▌       | 405/1584 [8:52:55<27:06:29, 82.77s/it]


 Epoch: 820 |Training loss: 1.4526


training:  26%|██▌       | 406/1584 [8:54:09<26:11:51, 80.06s/it]


 Epoch: 821 |Training loss: 1.4361


training:  26%|██▌       | 407/1584 [8:55:23<25:32:03, 78.10s/it]


 Epoch: 822 |Training loss: 1.4486


training:  26%|██▌       | 408/1584 [8:56:36<25:04:31, 76.76s/it]


 Epoch: 823 |Training loss: 1.4250


training:  26%|██▌       | 409/1584 [8:57:49<24:41:40, 75.66s/it]


 Epoch: 824 |Training loss: 1.4690


training:  26%|██▌       | 410/1584 [8:59:29<27:00:14, 82.81s/it]


 validation loss: 1.4667

 Epoch: 825 |Training loss: 1.4385


training:  26%|██▌       | 411/1584 [9:00:43<26:05:34, 80.08s/it]


 Epoch: 826 |Training loss: 1.4667


training:  26%|██▌       | 412/1584 [9:01:56<25:24:00, 78.02s/it]


 Epoch: 827 |Training loss: 1.4638


training:  26%|██▌       | 413/1584 [9:03:10<24:58:05, 76.76s/it]


 Epoch: 828 |Training loss: 1.4427


training:  26%|██▌       | 414/1584 [9:04:23<24:39:25, 75.87s/it]


 Epoch: 829 |Training loss: 1.4652

 validation loss: 1.5004


training:  26%|██▌       | 415/1584 [9:06:03<26:54:57, 82.89s/it]


 Epoch: 830 |Training loss: 1.5128


training:  26%|██▋       | 416/1584 [9:07:17<26:00:37, 80.17s/it]


 Epoch: 831 |Training loss: 1.5004


training:  26%|██▋       | 417/1584 [9:08:30<25:22:04, 78.26s/it]


 Epoch: 832 |Training loss: 1.5159


training:  26%|██▋       | 418/1584 [9:09:44<24:53:35, 76.86s/it]


 Epoch: 833 |Training loss: 1.4966


training:  26%|██▋       | 419/1584 [9:10:58<24:33:27, 75.89s/it]


 Epoch: 834 |Training loss: 1.5022

 validation loss: 1.4706


training:  27%|██▋       | 420/1584 [9:12:37<26:51:17, 83.06s/it]


 Epoch: 835 |Training loss: 1.5098


training:  27%|██▋       | 421/1584 [9:13:51<25:53:38, 80.15s/it]


 Epoch: 836 |Training loss: 1.4706


training:  27%|██▋       | 422/1584 [9:15:04<25:14:11, 78.19s/it]


 Epoch: 837 |Training loss: 1.4983


training:  27%|██▋       | 423/1584 [9:16:18<24:46:10, 76.80s/it]


 Epoch: 838 |Training loss: 1.4621


training:  27%|██▋       | 424/1584 [9:17:31<24:25:36, 75.81s/it]


 Epoch: 839 |Training loss: 1.4934

 validation loss: 1.4977


training:  27%|██▋       | 425/1584 [9:19:11<26:43:10, 82.99s/it]


 Epoch: 840 |Training loss: 1.5093


training:  27%|██▋       | 426/1584 [9:20:25<25:47:27, 80.18s/it]


 Epoch: 841 |Training loss: 1.4977


training:  27%|██▋       | 427/1584 [9:21:38<25:05:59, 78.10s/it]


 Epoch: 842 |Training loss: 1.4848


training:  27%|██▋       | 428/1584 [9:22:52<24:40:00, 76.82s/it]


 Epoch: 843 |Training loss: 1.4848


training:  27%|██▋       | 429/1584 [9:24:06<24:21:41, 75.93s/it]


 Epoch: 844 |Training loss: 1.4747

 validation loss: 1.4630


training:  27%|██▋       | 430/1584 [9:25:45<26:35:22, 82.95s/it]


 Epoch: 845 |Training loss: 1.4701


training:  27%|██▋       | 431/1584 [9:26:59<25:41:11, 80.20s/it]


 Epoch: 846 |Training loss: 1.4630


training:  27%|██▋       | 432/1584 [9:28:12<25:00:41, 78.16s/it]


 Epoch: 847 |Training loss: 1.4644


training:  27%|██▋       | 433/1584 [9:29:25<24:31:05, 76.69s/it]


 Epoch: 848 |Training loss: 1.4559


training:  27%|██▋       | 434/1584 [9:30:39<24:12:17, 75.77s/it]


 Epoch: 849 |Training loss: 1.4472

 validation loss: 1.4504


training:  27%|██▋       | 435/1584 [9:32:19<26:27:31, 82.90s/it]


 Epoch: 850 |Training loss: 1.4471


training:  28%|██▊       | 436/1584 [9:33:32<25:30:08, 79.97s/it]


 Epoch: 851 |Training loss: 1.4504


training:  28%|██▊       | 437/1584 [9:34:45<24:51:12, 78.01s/it]


 Epoch: 852 |Training loss: 1.4409


training:  28%|██▊       | 438/1584 [9:35:59<24:24:27, 76.67s/it]


 Epoch: 853 |Training loss: 1.4330


training:  28%|██▊       | 439/1584 [9:37:12<24:03:27, 75.64s/it]


 Epoch: 854 |Training loss: 1.4308

 validation loss: 1.4177


training:  28%|██▊       | 440/1584 [9:38:51<26:17:32, 82.74s/it]


 Epoch: 855 |Training loss: 1.4238


training:  28%|██▊       | 441/1584 [9:40:05<25:24:44, 80.04s/it]


 Epoch: 856 |Training loss: 1.4177


training:  28%|██▊       | 442/1584 [9:41:18<24:43:33, 77.94s/it]


 Epoch: 857 |Training loss: 1.4177


training:  28%|██▊       | 443/1584 [9:42:32<24:18:09, 76.68s/it]


 Epoch: 858 |Training loss: 1.4167


training:  28%|██▊       | 444/1584 [9:43:46<23:59:56, 75.79s/it]


 Epoch: 859 |Training loss: 1.4107

 validation loss: 1.4126


training:  28%|██▊       | 445/1584 [9:45:25<26:12:27, 82.83s/it]


 Epoch: 860 |Training loss: 1.4188


training:  28%|██▊       | 446/1584 [9:46:39<25:20:23, 80.16s/it]


 Epoch: 861 |Training loss: 1.4126


training:  28%|██▊       | 447/1584 [9:47:52<24:40:17, 78.12s/it]


 Epoch: 862 |Training loss: 1.4135


training:  28%|██▊       | 448/1584 [9:49:06<24:15:00, 76.85s/it]


 Epoch: 863 |Training loss: 1.4089


training:  28%|██▊       | 449/1584 [9:50:19<23:54:18, 75.82s/it]


 Epoch: 864 |Training loss: 1.4076

 validation loss: 1.4096


training:  28%|██▊       | 450/1584 [9:51:59<26:07:50, 82.95s/it]


 Epoch: 865 |Training loss: 1.4096


training:  28%|██▊       | 451/1584 [9:53:13<25:12:57, 80.12s/it]


 Epoch: 866 |Training loss: 1.4096


training:  29%|██▊       | 452/1584 [9:54:26<24:32:55, 78.07s/it]


 Epoch: 867 |Training loss: 1.3984


training:  29%|██▊       | 453/1584 [9:55:40<24:06:57, 76.76s/it]


 Epoch: 868 |Training loss: 1.3997


training:  29%|██▊       | 454/1584 [9:56:53<23:49:46, 75.92s/it]


 Epoch: 869 |Training loss: 1.4028

 validation loss: 1.3993


training:  29%|██▊       | 455/1584 [9:58:33<26:00:42, 82.94s/it]


 Epoch: 870 |Training loss: 1.3852


training:  29%|██▉       | 456/1584 [9:59:46<25:04:49, 80.04s/it]


 Epoch: 871 |Training loss: 1.3993


training:  29%|██▉       | 457/1584 [10:01:00<24:27:23, 78.12s/it]


 Epoch: 872 |Training loss: 1.3816


**Music generation**

In [None]:
# In case we want to use previously trained weights
weights = "model_best.pth.tar"
checkpoint = torch.load(output_dir+weights)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# Generate network input again
network_input = []
network_output = []
for i in range(0, len(notes) - sequence_length, 1):
  network_input.append([note_to_int[char] for char in notes[i:i + sequence_length]])
n_patterns = len(network_input)
network_input = np.reshape(network_input, (n_patterns, sequence_length))


The workflow now is:


1.   Pick a **seed sequence** randomly from your list of inputs (*pattern* variable)
2.   Pass it as input for your model to generate a new element (note or chord)
3.   Add the new element to your final song and to your *pattern* list
4.   Remove the first item from *pattern*
5.   Go to step 2


In [None]:
""" Generate notes from the neural network based on a sequence of notes """
# pick a random sequence from the input as a starting point for the prediction
start = np.random.randint(0, len(network_input)-1)
int_to_note = dict((number, note) for number, note in enumerate(pitchnames))
pattern = torch.from_numpy(network_input[start]).cuda()

prediction_output = model.generate(pattern, 500)


In [None]:
result_sample=[]

for i in range(500):
  print(i)
  result = int_to_note[prediction_output[i].item()]
  print('\r', 'Predicted ', i, " ",result, end='')
  result_sample.append(result)

prediction_output=result_sample

0
 Predicted  0   61
 Predicted  1   4.62
 Predicted  2   6.113
 Predicted  3   64
 Predicted  4   6.115
 Predicted  5   A46
 Predicted  6   4.67
 Predicted  7   F48
 Predicted  8   69
 Predicted  9   610
 Predicted  10   5.7.9.011
 Predicted  11   2.3.7.1012
 Predicted  12   D513
 Predicted  13   C514
 Predicted  14   5.7.9.015
 Predicted  15   C516
 Predicted  16   4.617
 Predicted  17   B-118
 Predicted  18   10.2.519
 Predicted  19   C520
 Predicted  20   6.1121
 Predicted  21   622
 Predicted  22   F223
 Predicted  23   6.1124
 Predicted  24   4.625
 Predicted  25   B-226
 Predicted  26   B-127
 Predicted  27   A428
 Predicted  28   629
 Predicted  29   C530
 Predicted  30   E-331
 Predicted  31   F232
 Predicted  32   4.633
 Predicted  33   534
 Predicted  34   5.1035
 Predicted  35   4.636
 Predicted  36   637
 Predicted  37   4.638
 Predicted  38   4.639
 Predicted  39   F240
 Predicted  40   4.641
 Predicted  41   B-242
 Predicted  42

The last step is creating a MIDI file from the predictions.

**music21** will help us again for this task. We should create a **Stream** and add to it the predicted notes and chords.

We are adding an offset of 0.5 between elements.

In [None]:
offset = 0
output_notes = []
# create note and chord objects based on the values generated by the model
for pattern in prediction_output:
    # pattern is a chord
    if ('.' in pattern) or pattern.isdigit():
        notes_in_chord = pattern.split('.')
        notes = []
        for current_note in notes_in_chord:
            new_note = note.Note(int(current_note))
            new_note.storedInstrument = instrument.Piano()
            notes.append(new_note)
        new_chord = chord.Chord(notes)
        new_chord.offset = offset
        output_notes.append(new_chord)
    # pattern is a note
    else:
        new_note = note.Note(pattern)
        new_note.offset = offset
        new_note.storedInstrument = instrument.Piano()
        output_notes.append(new_note)

    # increase offset each iteration so that notes do not stack
    offset += 0.5

midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='test_output.mid')

'test_output.mid'