
``` 
ROMEO:
And from the embracement be spokes to stand,
As we shall breathest to the market-fairly maid
So month in my father, I may see thee not my side
And love the prisoner like a cradist of my daughter.
```

The above text is not a lost work of Shakespeare but a fully generated text by a GPT2-like model I trained on my laptop in less than 20 minutes. Today, in this tutorial, we will follow an implementation of the "Attention Is All You Need" paper, so that you can generate your own Shakespeare at home.

In [1]:
import os
import torch
from dataset import getData, getVocabSize
import pickle
from contextlib import nullcontext
from utils import train, inference
import math
import torch.nn as nn
from torch.nn import functional as F

Below, we define all the parameters used for training and to describe the model. Please feel free to modify any parameters described except for certain marked with ``DO NOT MODIFY``.

In [2]:
class TrainConfig:

    # Parameters to modify:
    batch_size: int = 64  # How many batches per training step
    max_iters: int = 2000  # Total of training iterations
    learning_rate: float=1e-3 # Learning rate
    grad_clip: float=1.0 # Maximium magnitude of gradient
    eval_interval: int=50 # How often to evaluate the model
    eval_iters: int=10 # Number of iterations to average for evaluation
    seed: int=1337 # Random seed (can change the results)
    device: str = 'cuda' if torch.cuda.is_available() else 'cpu'

    # These are responsible for correct training given GPU (DO NOT MODIFY)
    dtype: str =  'bfloat16' if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else 'float16'
    ptdtype = {'float32': torch.float32, 'bfloat16': torch.bfloat16, 'float16': torch.float16}[dtype]
    ctx = nullcontext() if device == 'cpu' else torch.amp.autocast(device_type=device, dtype=ptdtype)
    scaler = torch.cuda.amp.GradScaler(enabled=(dtype == 'float16'))
    
    # Populated by the script (DO NOT MODIFY)
    train_dataloader: None
    test_dataloader: None
    optimizer: None

class ModelConfig:
    context_length: int = 256 # Number of tokens used for predicition
    vocab_size: int = -1 # Number of words in the vocab (DO NOT MODIFY; changing the number here can make the model only recognize limited number of words!!!)
    n_layer: int = 6 # Depth of the Transformer model (here: 6 Transformer Blocks)
    n_head: int = 6 # Number of heads in the Multi-Head Attention
    n_embd: int = 384 # Embedding dimension
    dropout: float = 0.2 # Fraction used for drop-out; lower fraction -> more robust, but longer training (requires adjustment to the training time)
    bias: bool = False # Whether or not to use a bias in the transformers layers
    compile: bool = False # Whether to use the torch.compile (slows in the beginning of the training; faster training)
    attn_dim: int = n_embd//n_head # Attention dimension (DO NOT MODIFY; changing the number here can break the model)


model_config = ModelConfig()
train_config = TrainConfig()



Below, we define CUDA optimizations. This can controls whether TensorFloat-32 tensor cores may be used in matrix multiplications on Ampere or newer GPUs. It offers a significant speed-up, but might not be available on older GPUs.

In [3]:
torch.manual_seed(train_config.seed)
torch.backends.cuda.matmul.allow_tf32 = True # allow tf32 on matmul
torch.backends.cudnn.allow_tf32 = True # allow tf32 on cudnn

Data Loading function. Here, we get the necessary vocabulary for the training and perform a simple training/testing split. No need to change anything here.

In [4]:
# Load data
data_dir = os.path.join('data', 'Shakespeare')
model_config.vocab_size = getVocabSize(data_dir)
train_config.train_dataloader, train_config.test_dataloader = getData(data_dir,model_config,train_config)

A simple definition of a feed-forward layer. No need to change anything here.

In [5]:
# Define feed forward network
class FeedForwardNetwork(nn.Module):
    def __init__(self, config:ModelConfig):
        super().__init__()
        self.ffn = nn.Sequential(
            nn.Linear(config.n_embd, config.n_embd * 4),
            nn.ReLU(),
            nn.Linear(config.n_embd * 4, config.n_embd),
            nn.Dropout(config.dropout)
        )

    def forward(self, x):
        return self.ffn(x)

### IMPLEMENTATION REQUIRED - Implement ``attention(self,q,k,v,T)`` of the Attention Module

Below, we define the attention layer of the Transformer model. Here, you need to implement the attention mechanism. We define the attention as:
$$ Attention(Q, K, V ) = \text{softmax}(\frac{QK^T}{\sqrt{d_k}})V $$
Nevertheless, the original attention can easily overfit to the data. To allivate that, we introduce an additional dropout layer. For your convenience, we split the implementation into two steps:
$$weights = \frac{QK^T}{\sqrt{d_k}}$$
$$attention = \text{dropout}(\text{softmax}(weights))V$$

In [6]:
class Attention(nn.Module):
    def __init__(self, config:ModelConfig):
        super().__init__()
        self.Wq = nn.Linear(config.n_embd, config.attn_dim, bias=config.bias)
        self.Wk = nn.Linear(config.n_embd, config.attn_dim, bias=config.bias)
        self.Wv = nn.Linear(config.n_embd, config.attn_dim, bias=config.bias)
        self.dropout = nn.Dropout(config.dropout)
        self.register_buffer("mask", torch.tril(torch.ones(config.context_length, config.context_length, requires_grad=False)))

    def forward(self, x):
        B, T, C = x.shape
        q = self.Wq(x)
        k = self.Wk(x)
        v = self.Wv(x)
        return self.attention(q, k, v, T)

    def attention(self, q, k, v, T):
        dk = k.size(-1)
        weights = (q @ k.transpose(-2, -1)) / torch.sqrt(torch.tensor(dk, dtype=q.dtype, device=q.device))
        weights = weights.masked_fill(self.mask[:T, :T] == 0, float('-inf'))
        attn = torch.softmax(weights, dim=-1)
        attn = self.dropout(attn)
        out = attn @ v
        return out


### IMPLEMENTATION REQUIRED - Implement ``forward(self,x)`` of the MultiHeadAttention Module

Below, we define the multi-head attention layer of the Transformer model. Here, you need to implement the multi-head attention mechanism defined as:
$$MultiHead(x) = \text{Dropout}(\text{Concat}(\text{head}_1, ..., \text{head}_{\text{heads}})W^O),$$
$$ \text{where head}_i = \text{Attention}(x)$$ 

In [7]:
# Define Multi-head Attention ｜
class MultiHeadAttention(nn.Module):
    def __init__(self, config:ModelConfig):
        super().__init__()
        self.config = config
        self.heads = nn.ModuleList([Attention(config) for _ in range(self.config.n_head)])
        self.projection_layer = nn.Linear(self.config.n_embd, self.config.n_embd)
        self.dropout = nn.Dropout(self.config.dropout)

    def forward(self, x):
        head_outputs = [head(x) for head in self.heads]
        concat = torch.cat(head_outputs, dim=-1)
        projected = self.projection_layer(concat)
        projected = self.dropout(projected)
        return projected


Finally, we are able to define the standard Transformer Block. No changes required here.

In [8]:
# Define Transformer Block ｜
class TransformerBlock(nn.Module):
    def __init__(self, config:ModelConfig):
        super().__init__()
        self.ln1 = nn.LayerNorm(config.n_embd)
        self.ln2 = nn.LayerNorm(config.n_embd)
        self.mha = MultiHeadAttention(config)
        self.ffn = FeedForwardNetwork(config)

    def forward(self, x):
        x = x + self.mha(self.ln1(x))
        x = x + self.ffn(self.ln2(x))
        return x

### IMPLEMENTATION REQUIRED - Implement ``__init__`` of Positional Encoding

Below, we define the Positional Encoding of the Transformer architecture. The positional encoding gives a specific value based on the token position in the input data. Therefore, a positional encoding can be seen as a feature defined only based on the position of each token. We can precompute it as:
$$PE(pos,2i) = \sin(\text{pos}/div)$$
$$PE(pos,2i+1) = \cos(\text{pos}/div),$$
where $div=10000^{2i/dmodel}$ and the first equation defined the positional encoding for even tokens and the second one defines the encoding for the odd tokens.

In [9]:
class PositionalEncoding(nn.Module):

    def __init__(self, config:ModelConfig):
        super().__init__()
        pos = torch.arange(0, config.context_length, requires_grad=False).unsqueeze(1)
        div = torch.exp(torch.arange(0, config.n_embd, 2) * (math.log(10000.0) / config.n_embd))
        pe = torch.zeros(config.context_length, config.n_embd, requires_grad=False)
        pe[:, 0::2] = torch.sin(pos / div)
        pe[:, 1::2] = torch.cos(pos / div)
        self.register_buffer('pe', pe)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.pe[:x.size(1),:]


Now, we define our model. We combine all our blocks into final Transfomer Model consisting of multiple Transformer blocks.

In [10]:
# Define the model ｜
class Model(nn.Module):
    def __init__(self, config:ModelConfig):
        super().__init__()
        self.tok_embedding = nn.Embedding(config.vocab_size, config.n_embd)
        self.pos_embedding = PositionalEncoding(config)
        self.transformer_blocks = nn.Sequential(*(
                [TransformerBlock(config) for _ in range(config.n_layer)] +
                [nn.LayerNorm(config.n_embd)]
        ))
        self.model_out_linear_layer = nn.Linear(config.n_embd, config.vocab_size)
        self.drop = nn.Dropout(config.dropout)
        self.context_length = config.context_length

    def forward(self, idx:torch.Tensor):
        _, T = idx.shape
        pos_emb = self.pos_embedding(idx)
        tok_emb = self.tok_embedding(idx)

        x = self.transformer_blocks(self.drop(tok_emb+pos_emb))
        logits = self.model_out_linear_layer(x)
        return logits

Now, we can initialize the model and, optionally, compile it

In [11]:
# Initialize the model
model = Model(model_config).to(train_config.device)
if model_config.compile:
    model = torch.compile(model)

Finally, we can start the optimization process and start our training! This will take a bit...

In [12]:
# Create the optimizer and train; Losses updated every eval_interval steps
train_config.optimizer = torch.optim.AdamW(model.parameters(), lr=train_config.learning_rate)
train(model,train_config)

  0%|          | 0/2000 [00:00<?, ?it/s]

  0%|          | 0/2000 [00:00<?, ?it/s, Training Loss: -1 Validation Loss: -1]

  0%|          | 0/2000 [00:19<?, ?it/s, Training Loss: -1 Validation Loss: -1]

  0%|          | 0/2000 [00:52<?, ?it/s, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 0/2000 [00:52<?, ?it/s, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 1/2000 [01:02<34:40:53, 62.46s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 2/2000 [01:12<17:27:34, 31.46s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 3/2000 [01:21<11:51:20, 21.37s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 4/2000 [01:31<9:16:53, 16.74s/it, Training Loss: 4.42 Validation Loss: 4.391] 

  0%|          | 5/2000 [01:40<7:49:46, 14.13s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 6/2000 [01:50<7:04:22, 12.77s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 7/2000 [02:00<6:27:11, 11.66s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 8/2000 [02:09<6:04:20, 10.97s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 9/2000 [02:19<5:54:52, 10.69s/it, Training Loss: 4.42 Validation Loss: 4.391]

  0%|          | 10/2000 [02:29<5:44:21, 10.38s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 11/2000 [02:39<5:35:49, 10.13s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 12/2000 [02:48<5:28:51,  9.93s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 13/2000 [02:58<5:24:48,  9.81s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 14/2000 [03:08<5:29:09,  9.94s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 15/2000 [03:18<5:28:17,  9.92s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 16/2000 [03:27<5:24:38,  9.82s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 17/2000 [03:37<5:24:02,  9.80s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 18/2000 [03:47<5:26:03,  9.87s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 19/2000 [03:57<5:22:06,  9.76s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 20/2000 [04:06<5:17:16,  9.61s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 21/2000 [04:15<5:16:51,  9.61s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 22/2000 [04:27<5:32:23, 10.08s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 23/2000 [04:38<5:40:14, 10.33s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|          | 24/2000 [04:50<6:02:29, 11.01s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|▏         | 25/2000 [05:02<6:06:24, 11.13s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|▏         | 26/2000 [05:11<5:54:46, 10.78s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|▏         | 27/2000 [05:22<5:48:13, 10.59s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|▏         | 28/2000 [05:33<5:59:06, 10.93s/it, Training Loss: 4.42 Validation Loss: 4.391]

  1%|▏         | 29/2000 [05:43<5:48:00, 10.59s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 30/2000 [05:54<5:52:36, 10.74s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 31/2000 [06:07<6:09:22, 11.26s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 32/2000 [06:17<6:00:52, 11.00s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 33/2000 [06:30<6:15:57, 11.47s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 34/2000 [06:42<6:21:11, 11.63s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 35/2000 [06:53<6:22:10, 11.67s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 36/2000 [07:04<6:06:16, 11.19s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 37/2000 [07:13<5:53:27, 10.80s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 38/2000 [07:23<5:43:20, 10.50s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 39/2000 [07:33<5:37:25, 10.32s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 40/2000 [07:43<5:29:45, 10.09s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 41/2000 [07:54<5:37:43, 10.34s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 42/2000 [08:05<5:44:29, 10.56s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 43/2000 [08:15<5:38:55, 10.39s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 44/2000 [08:24<5:29:18, 10.10s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 45/2000 [08:34<5:26:48, 10.03s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 46/2000 [08:44<5:22:56,  9.92s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 47/2000 [08:55<5:41:09, 10.48s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 48/2000 [09:05<5:36:16, 10.34s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▏         | 49/2000 [09:16<5:41:05, 10.49s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▎         | 50/2000 [09:28<5:48:58, 10.74s/it, Training Loss: 4.42 Validation Loss: 4.391]

  2%|▎         | 50/2000 [10:26<5:48:58, 10.74s/it, Training Loss: 2.487 Validation Loss: 2.732]

  2%|▎         | 50/2000 [10:26<5:48:58, 10.74s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 51/2000 [10:37<15:16:08, 28.20s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 52/2000 [10:46<12:12:28, 22.56s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 53/2000 [10:56<10:10:45, 18.82s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 54/2000 [11:06<8:40:37, 16.05s/it, Training Loss: 2.487 Validation Loss: 2.732] 

  3%|▎         | 55/2000 [11:16<7:42:38, 14.27s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 56/2000 [11:25<6:57:05, 12.87s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 57/2000 [11:35<6:23:36, 11.85s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 58/2000 [11:45<6:05:03, 11.28s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 59/2000 [11:54<5:48:02, 10.76s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 60/2000 [12:04<5:34:05, 10.33s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 61/2000 [12:15<5:39:34, 10.51s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 62/2000 [12:25<5:35:51, 10.40s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 63/2000 [12:35<5:33:54, 10.34s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 64/2000 [12:45<5:35:41, 10.40s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 65/2000 [12:56<5:33:41, 10.35s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 66/2000 [13:07<5:46:02, 10.74s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 67/2000 [13:17<5:34:06, 10.37s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 68/2000 [13:26<5:23:06, 10.03s/it, Training Loss: 2.487 Validation Loss: 2.732]

  3%|▎         | 69/2000 [13:36<5:17:40,  9.87s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▎         | 70/2000 [13:46<5:18:35,  9.90s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▎         | 71/2000 [13:56<5:28:24, 10.22s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▎         | 72/2000 [14:07<5:27:44, 10.20s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▎         | 73/2000 [14:17<5:32:03, 10.34s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▎         | 74/2000 [14:27<5:25:54, 10.15s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 75/2000 [14:37<5:21:18, 10.01s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 76/2000 [14:46<5:17:47,  9.91s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 77/2000 [14:57<5:20:41, 10.01s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 78/2000 [15:06<5:14:34,  9.82s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 79/2000 [15:16<5:11:35,  9.73s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 80/2000 [15:25<5:07:32,  9.61s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 81/2000 [15:35<5:08:31,  9.65s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 82/2000 [15:44<5:07:33,  9.62s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 83/2000 [15:54<5:12:07,  9.77s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 84/2000 [16:04<5:08:32,  9.66s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 85/2000 [16:13<5:06:32,  9.60s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 86/2000 [16:23<5:08:33,  9.67s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 87/2000 [16:32<5:06:48,  9.62s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 88/2000 [16:43<5:10:47,  9.75s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 89/2000 [16:54<5:25:43, 10.23s/it, Training Loss: 2.487 Validation Loss: 2.732]

  4%|▍         | 90/2000 [17:04<5:26:36, 10.26s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 91/2000 [17:15<5:29:43, 10.36s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 92/2000 [17:24<5:23:09, 10.16s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 93/2000 [17:34<5:20:14, 10.08s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 94/2000 [17:45<5:21:22, 10.12s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 95/2000 [17:54<5:16:41,  9.97s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 96/2000 [18:05<5:23:34, 10.20s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 97/2000 [18:15<5:19:45, 10.08s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 98/2000 [18:24<5:14:45,  9.93s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▍         | 99/2000 [18:34<5:11:33,  9.83s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▌         | 100/2000 [18:44<5:14:14,  9.92s/it, Training Loss: 2.487 Validation Loss: 2.732]

  5%|▌         | 100/2000 [19:37<5:14:14,  9.92s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 100/2000 [19:37<5:14:14,  9.92s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 101/2000 [19:46<13:32:16, 25.66s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 102/2000 [19:56<10:58:00, 20.80s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 103/2000 [20:06<9:12:26, 17.47s/it, Training Loss: 2.396 Validation Loss: 2.641] 

  5%|▌         | 104/2000 [20:15<7:57:48, 15.12s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 105/2000 [20:25<7:05:12, 13.46s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 106/2000 [20:34<6:28:19, 12.30s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 107/2000 [20:44<6:01:20, 11.45s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 108/2000 [20:54<5:52:18, 11.17s/it, Training Loss: 2.396 Validation Loss: 2.641]

  5%|▌         | 109/2000 [21:05<5:50:14, 11.11s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 110/2000 [21:15<5:35:09, 10.64s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 111/2000 [21:25<5:25:28, 10.34s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 112/2000 [21:35<5:22:57, 10.26s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 113/2000 [21:45<5:19:15, 10.15s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 114/2000 [21:54<5:14:55, 10.02s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 115/2000 [22:04<5:10:00,  9.87s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 116/2000 [22:14<5:13:08,  9.97s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 117/2000 [22:24<5:09:15,  9.85s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 118/2000 [22:33<5:06:30,  9.77s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 119/2000 [22:43<5:05:11,  9.73s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 120/2000 [22:52<5:03:03,  9.67s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 121/2000 [23:02<5:02:18,  9.65s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 122/2000 [23:12<5:07:19,  9.82s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 123/2000 [23:22<5:04:55,  9.75s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▌         | 124/2000 [23:31<5:03:08,  9.70s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▋         | 125/2000 [23:41<5:07:19,  9.83s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▋         | 126/2000 [23:51<5:06:22,  9.81s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▋         | 127/2000 [24:01<5:05:45,  9.79s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▋         | 128/2000 [24:11<5:06:53,  9.84s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▋         | 129/2000 [24:21<5:06:01,  9.81s/it, Training Loss: 2.396 Validation Loss: 2.641]

  6%|▋         | 130/2000 [24:30<5:04:42,  9.78s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 131/2000 [24:40<5:02:25,  9.71s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 132/2000 [24:50<5:01:35,  9.69s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 133/2000 [24:59<4:58:46,  9.60s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 134/2000 [25:08<4:58:00,  9.58s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 135/2000 [25:18<4:57:04,  9.56s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 136/2000 [25:28<5:02:32,  9.74s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 137/2000 [25:38<5:00:39,  9.68s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 138/2000 [25:47<4:58:53,  9.63s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 139/2000 [25:57<5:02:18,  9.75s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 140/2000 [26:07<5:03:25,  9.79s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 141/2000 [26:17<5:00:22,  9.69s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 142/2000 [26:26<4:59:22,  9.67s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 143/2000 [26:36<4:59:08,  9.67s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 144/2000 [26:46<5:03:01,  9.80s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 145/2000 [26:56<5:01:44,  9.76s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 146/2000 [27:06<5:02:48,  9.80s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 147/2000 [27:15<5:01:26,  9.76s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 148/2000 [27:25<5:06:10,  9.92s/it, Training Loss: 2.396 Validation Loss: 2.641]

  7%|▋         | 149/2000 [27:35<5:00:55,  9.75s/it, Training Loss: 2.396 Validation Loss: 2.641]

  8%|▊         | 150/2000 [27:44<4:59:45,  9.72s/it, Training Loss: 2.396 Validation Loss: 2.641]

  8%|▊         | 150/2000 [28:38<4:59:45,  9.72s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 150/2000 [28:38<4:59:45,  9.72s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 151/2000 [28:48<13:12:28, 25.72s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 152/2000 [28:57<10:42:05, 20.85s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 153/2000 [29:07<8:57:24, 17.46s/it, Training Loss: 2.265 Validation Loss: 2.476] 

  8%|▊         | 154/2000 [29:17<7:50:52, 15.30s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 155/2000 [29:26<6:57:31, 13.58s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 156/2000 [29:36<6:20:06, 12.37s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 157/2000 [29:46<5:54:50, 11.55s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 158/2000 [29:55<5:36:23, 10.96s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 159/2000 [30:05<5:24:34, 10.58s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 160/2000 [30:16<5:27:11, 10.67s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 161/2000 [30:25<5:17:31, 10.36s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 162/2000 [30:35<5:08:02, 10.06s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 163/2000 [30:45<5:10:02, 10.13s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 164/2000 [30:55<5:08:52, 10.09s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 165/2000 [31:05<5:05:13,  9.98s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 166/2000 [31:14<5:01:00,  9.85s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 167/2000 [31:24<4:56:51,  9.72s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 168/2000 [31:34<5:00:32,  9.84s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 169/2000 [31:43<4:58:02,  9.77s/it, Training Loss: 2.265 Validation Loss: 2.476]

  8%|▊         | 170/2000 [31:53<4:55:42,  9.70s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▊         | 171/2000 [32:03<4:59:43,  9.83s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▊         | 172/2000 [32:13<4:57:33,  9.77s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▊         | 173/2000 [32:22<4:54:46,  9.68s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▊         | 174/2000 [32:32<4:52:17,  9.60s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 175/2000 [32:41<4:51:38,  9.59s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 176/2000 [32:51<4:50:26,  9.55s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 177/2000 [33:00<4:51:57,  9.61s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 178/2000 [33:11<5:00:00,  9.88s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 179/2000 [33:20<4:55:49,  9.75s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 180/2000 [33:30<4:53:01,  9.66s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 181/2000 [33:40<4:54:53,  9.73s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 182/2000 [33:50<4:57:35,  9.82s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 183/2000 [33:59<4:54:41,  9.73s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 184/2000 [34:09<4:51:10,  9.62s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 185/2000 [34:19<4:55:43,  9.78s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 186/2000 [34:28<4:54:48,  9.75s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 187/2000 [34:38<4:53:38,  9.72s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 188/2000 [34:48<4:52:50,  9.70s/it, Training Loss: 2.265 Validation Loss: 2.476]

  9%|▉         | 189/2000 [34:57<4:53:15,  9.72s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 190/2000 [35:07<4:51:27,  9.66s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 191/2000 [35:16<4:49:20,  9.60s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 192/2000 [35:26<4:48:51,  9.59s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 193/2000 [35:35<4:47:25,  9.54s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 194/2000 [35:45<4:48:13,  9.58s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 195/2000 [35:55<4:51:25,  9.69s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 196/2000 [36:06<5:01:18, 10.02s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 197/2000 [36:15<4:56:17,  9.86s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 198/2000 [36:25<4:54:05,  9.79s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|▉         | 199/2000 [36:35<4:56:38,  9.88s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|█         | 200/2000 [36:45<4:56:48,  9.89s/it, Training Loss: 2.265 Validation Loss: 2.476]

 10%|█         | 200/2000 [37:37<4:56:48,  9.89s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 200/2000 [37:37<4:56:48,  9.89s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 201/2000 [37:47<12:49:13, 25.65s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 202/2000 [37:57<10:24:18, 20.83s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 203/2000 [38:07<8:42:39, 17.45s/it, Training Loss: 2.076 Validation Loss: 2.254] 

 10%|█         | 204/2000 [38:16<7:30:44, 15.06s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 205/2000 [38:26<6:41:09, 13.41s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 206/2000 [38:35<6:06:34, 12.26s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 207/2000 [38:45<5:43:54, 11.51s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 208/2000 [38:55<5:32:02, 11.12s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 209/2000 [39:05<5:17:43, 10.64s/it, Training Loss: 2.076 Validation Loss: 2.254]

 10%|█         | 210/2000 [39:14<5:08:18, 10.33s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 211/2000 [39:24<5:01:40, 10.12s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 212/2000 [39:34<4:56:55,  9.96s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 213/2000 [39:43<4:51:42,  9.79s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 214/2000 [39:53<4:49:52,  9.74s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 215/2000 [40:02<4:47:43,  9.67s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 216/2000 [40:12<4:46:13,  9.63s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 217/2000 [40:21<4:47:35,  9.68s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 218/2000 [40:31<4:45:00,  9.60s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 219/2000 [40:40<4:45:16,  9.61s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 220/2000 [40:50<4:46:34,  9.66s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 221/2000 [41:00<4:51:31,  9.83s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 222/2000 [41:10<4:49:07,  9.76s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 223/2000 [41:21<5:00:32, 10.15s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█         | 224/2000 [41:35<5:31:34, 11.20s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█▏        | 225/2000 [41:44<5:16:18, 10.69s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█▏        | 226/2000 [41:54<5:05:03, 10.32s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█▏        | 227/2000 [42:04<5:01:44, 10.21s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█▏        | 228/2000 [42:14<5:01:22, 10.20s/it, Training Loss: 2.076 Validation Loss: 2.254]

 11%|█▏        | 229/2000 [42:23<4:53:43,  9.95s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 230/2000 [42:33<4:50:06,  9.83s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 231/2000 [42:42<4:48:06,  9.77s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 232/2000 [42:52<4:46:38,  9.73s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 233/2000 [43:01<4:44:11,  9.65s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 234/2000 [43:12<4:48:57,  9.82s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 235/2000 [43:21<4:46:21,  9.73s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 236/2000 [43:31<4:42:58,  9.62s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 237/2000 [43:40<4:43:26,  9.65s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 238/2000 [43:51<4:48:35,  9.83s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 239/2000 [44:00<4:48:39,  9.84s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 240/2000 [44:10<4:47:05,  9.79s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 241/2000 [44:20<4:49:33,  9.88s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 242/2000 [44:30<4:48:31,  9.85s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 243/2000 [44:39<4:45:11,  9.74s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 244/2000 [44:49<4:43:16,  9.68s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 245/2000 [45:00<4:55:43, 10.11s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 246/2000 [45:10<4:54:31, 10.07s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 247/2000 [45:19<4:47:54,  9.85s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 248/2000 [45:29<4:45:01,  9.76s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▏        | 249/2000 [45:40<4:52:21, 10.02s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▎        | 250/2000 [45:50<4:53:46, 10.07s/it, Training Loss: 2.076 Validation Loss: 2.254]

 12%|█▎        | 250/2000 [46:51<4:53:46, 10.07s/it, Training Loss: 1.929 Validation Loss: 2.106]

 12%|█▎        | 250/2000 [46:51<4:53:46, 10.07s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 251/2000 [47:03<14:07:50, 29.09s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 252/2000 [47:15<11:35:40, 23.88s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 253/2000 [47:25<9:30:47, 19.60s/it, Training Loss: 1.929 Validation Loss: 2.106] 

 13%|█▎        | 254/2000 [47:34<8:02:29, 16.58s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 255/2000 [47:44<7:00:50, 14.47s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 256/2000 [47:54<6:28:16, 13.36s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 257/2000 [48:04<5:52:56, 12.15s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 258/2000 [48:13<5:31:14, 11.41s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 259/2000 [48:23<5:16:52, 10.92s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 260/2000 [48:33<5:08:00, 10.62s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 261/2000 [48:43<5:01:03, 10.39s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 262/2000 [48:53<4:54:38, 10.17s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 263/2000 [49:02<4:50:06, 10.02s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 264/2000 [49:12<4:51:16, 10.07s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 265/2000 [49:23<4:52:11, 10.10s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 266/2000 [49:32<4:47:55,  9.96s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 267/2000 [49:42<4:45:38,  9.89s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 268/2000 [49:52<4:43:50,  9.83s/it, Training Loss: 1.929 Validation Loss: 2.106]

 13%|█▎        | 269/2000 [50:01<4:40:39,  9.73s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▎        | 270/2000 [50:11<4:38:25,  9.66s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▎        | 271/2000 [50:20<4:38:02,  9.65s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▎        | 272/2000 [50:31<4:43:18,  9.84s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▎        | 273/2000 [50:40<4:40:38,  9.75s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▎        | 274/2000 [50:50<4:39:21,  9.71s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 275/2000 [50:59<4:38:06,  9.67s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 276/2000 [51:10<4:44:58,  9.92s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 277/2000 [51:19<4:41:28,  9.80s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 278/2000 [51:29<4:37:31,  9.67s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 279/2000 [51:38<4:36:31,  9.64s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 280/2000 [51:49<4:43:10,  9.88s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 281/2000 [51:58<4:39:51,  9.77s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 282/2000 [52:08<4:37:45,  9.70s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 283/2000 [52:17<4:35:25,  9.62s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 284/2000 [52:27<4:40:33,  9.81s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 285/2000 [52:37<4:36:42,  9.68s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 286/2000 [52:46<4:35:29,  9.64s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 287/2000 [52:56<4:32:41,  9.55s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 288/2000 [53:06<4:37:00,  9.71s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 289/2000 [53:16<4:37:06,  9.72s/it, Training Loss: 1.929 Validation Loss: 2.106]

 14%|█▍        | 290/2000 [53:25<4:35:03,  9.65s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 291/2000 [53:34<4:33:15,  9.59s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 292/2000 [53:45<4:38:30,  9.78s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 293/2000 [53:54<4:34:54,  9.66s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 294/2000 [54:04<4:35:30,  9.69s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 295/2000 [54:14<4:35:17,  9.69s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 296/2000 [54:23<4:33:25,  9.63s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 297/2000 [54:33<4:33:56,  9.65s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 298/2000 [54:42<4:33:43,  9.65s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▍        | 299/2000 [54:52<4:32:06,  9.60s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▌        | 300/2000 [55:01<4:31:22,  9.58s/it, Training Loss: 1.929 Validation Loss: 2.106]

 15%|█▌        | 300/2000 [55:55<4:31:22,  9.58s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 300/2000 [55:55<4:31:22,  9.58s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 301/2000 [56:06<12:14:48, 25.95s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 302/2000 [56:16<10:03:49, 21.34s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 303/2000 [56:26<8:29:46, 18.02s/it, Training Loss: 1.811 Validation Loss: 2.003] 

 15%|█▌        | 304/2000 [56:37<7:24:27, 15.72s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 305/2000 [56:48<6:46:09, 14.38s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 306/2000 [56:58<6:04:46, 12.92s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 307/2000 [57:07<5:33:56, 11.83s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 308/2000 [57:17<5:16:13, 11.21s/it, Training Loss: 1.811 Validation Loss: 2.003]

 15%|█▌        | 309/2000 [57:27<5:07:02, 10.89s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 310/2000 [57:39<5:16:42, 11.24s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 311/2000 [57:50<5:16:20, 11.24s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 312/2000 [58:00<5:01:42, 10.72s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 313/2000 [58:11<5:04:55, 10.85s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 314/2000 [58:20<4:53:25, 10.44s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 315/2000 [58:30<4:45:01, 10.15s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 316/2000 [58:39<4:40:06,  9.98s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 317/2000 [58:50<4:45:05, 10.16s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 318/2000 [59:02<5:04:05, 10.85s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 319/2000 [59:18<5:48:00, 12.42s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 320/2000 [59:31<5:52:38, 12.59s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 321/2000 [59:45<5:58:25, 12.81s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 322/2000 [59:57<5:58:05, 12.80s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 323/2000 [1:00:10<5:58:31, 12.83s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▌        | 324/2000 [1:00:23<5:57:37, 12.80s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▋        | 325/2000 [1:00:50<7:52:55, 16.94s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▋        | 326/2000 [1:01:07<7:59:50, 17.20s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▋        | 327/2000 [1:01:21<7:31:01, 16.18s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▋        | 328/2000 [1:01:33<6:56:38, 14.95s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▋        | 329/2000 [1:01:48<6:50:23, 14.74s/it, Training Loss: 1.811 Validation Loss: 2.003]

 16%|█▋        | 330/2000 [1:02:00<6:29:23, 13.99s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 331/2000 [1:02:13<6:20:23, 13.67s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 332/2000 [1:02:26<6:12:42, 13.41s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 333/2000 [1:02:38<6:02:42, 13.05s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 334/2000 [1:02:50<5:55:49, 12.81s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 335/2000 [1:03:03<5:59:17, 12.95s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 336/2000 [1:03:17<6:04:27, 13.14s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 337/2000 [1:03:30<6:01:55, 13.06s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 338/2000 [1:03:45<6:22:16, 13.80s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 339/2000 [1:04:00<6:28:45, 14.04s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 340/2000 [1:04:14<6:30:31, 14.12s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 341/2000 [1:04:26<6:15:17, 13.57s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 342/2000 [1:04:39<6:05:33, 13.23s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 343/2000 [1:04:54<6:18:01, 13.69s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 344/2000 [1:05:06<6:06:34, 13.28s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 345/2000 [1:05:18<5:58:40, 13.00s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 346/2000 [1:05:31<5:53:52, 12.84s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 347/2000 [1:05:43<5:51:02, 12.74s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 348/2000 [1:05:56<5:53:09, 12.83s/it, Training Loss: 1.811 Validation Loss: 2.003]

 17%|█▋        | 349/2000 [1:06:09<5:50:27, 12.74s/it, Training Loss: 1.811 Validation Loss: 2.003]

 18%|█▊        | 350/2000 [1:06:21<5:47:24, 12.63s/it, Training Loss: 1.811 Validation Loss: 2.003]

 18%|█▊        | 350/2000 [1:08:02<5:47:24, 12.63s/it, Training Loss: 1.724 Validation Loss: 1.88] 

 18%|█▊        | 350/2000 [1:08:02<5:47:24, 12.63s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 351/2000 [1:08:18<20:01:23, 43.71s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 352/2000 [1:08:31<15:55:07, 34.77s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 353/2000 [1:08:45<13:01:03, 28.45s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 354/2000 [1:09:00<11:06:13, 24.29s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 355/2000 [1:09:14<9:39:51, 21.15s/it, Training Loss: 1.724 Validation Loss: 1.88] 

 18%|█▊        | 356/2000 [1:09:27<8:40:06, 18.98s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 357/2000 [1:09:40<7:48:24, 17.11s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 358/2000 [1:09:53<7:16:58, 15.97s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 359/2000 [1:10:10<7:19:25, 16.07s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 360/2000 [1:10:36<8:43:18, 19.15s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 361/2000 [1:10:51<8:07:02, 17.83s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 362/2000 [1:11:05<7:37:23, 16.75s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 363/2000 [1:11:18<7:02:13, 15.48s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 364/2000 [1:11:32<6:52:28, 15.13s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 365/2000 [1:11:46<6:43:58, 14.82s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 366/2000 [1:11:59<6:31:42, 14.38s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 367/2000 [1:12:13<6:26:15, 14.19s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 368/2000 [1:12:26<6:14:22, 13.76s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 369/2000 [1:12:38<6:02:41, 13.34s/it, Training Loss: 1.724 Validation Loss: 1.88]

 18%|█▊        | 370/2000 [1:12:51<5:57:51, 13.17s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▊        | 371/2000 [1:13:04<5:55:08, 13.08s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▊        | 372/2000 [1:13:18<6:04:44, 13.44s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▊        | 373/2000 [1:13:32<6:08:18, 13.58s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▊        | 374/2000 [1:13:50<6:40:50, 14.79s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 375/2000 [1:14:06<6:53:18, 15.26s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 376/2000 [1:14:22<6:54:33, 15.32s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 377/2000 [1:14:37<6:54:59, 15.34s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 378/2000 [1:14:54<7:09:26, 15.89s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 379/2000 [1:15:12<7:22:45, 16.39s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 380/2000 [1:15:26<7:09:23, 15.90s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 381/2000 [1:15:40<6:53:07, 15.31s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 382/2000 [1:15:55<6:45:14, 15.03s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 383/2000 [1:16:08<6:32:46, 14.57s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 384/2000 [1:16:22<6:22:57, 14.22s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 385/2000 [1:16:36<6:24:42, 14.29s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 386/2000 [1:16:49<6:12:32, 13.85s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 387/2000 [1:17:03<6:10:27, 13.78s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 388/2000 [1:17:18<6:21:58, 14.22s/it, Training Loss: 1.724 Validation Loss: 1.88]

 19%|█▉        | 389/2000 [1:17:34<6:34:53, 14.71s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 390/2000 [1:17:50<6:50:41, 15.31s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 391/2000 [1:18:07<7:01:04, 15.70s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 392/2000 [1:18:23<7:04:02, 15.82s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 393/2000 [1:18:39<7:07:01, 15.94s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 394/2000 [1:18:53<6:46:31, 15.19s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 395/2000 [1:19:08<6:45:20, 15.15s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 396/2000 [1:19:22<6:41:14, 15.01s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 397/2000 [1:19:36<6:26:32, 14.47s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 398/2000 [1:19:47<5:59:23, 13.46s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|█▉        | 399/2000 [1:19:59<5:48:44, 13.07s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|██        | 400/2000 [1:20:10<5:33:16, 12.50s/it, Training Loss: 1.724 Validation Loss: 1.88]

 20%|██        | 400/2000 [1:21:07<5:33:16, 12.50s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 400/2000 [1:21:07<5:33:16, 12.50s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 401/2000 [1:21:19<13:08:04, 29.57s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 402/2000 [1:21:29<10:30:36, 23.68s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 403/2000 [1:21:41<8:56:50, 20.17s/it, Training Loss: 1.648 Validation Loss: 1.737] 

 20%|██        | 404/2000 [1:21:53<7:48:08, 17.60s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 405/2000 [1:22:03<6:47:16, 15.32s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 406/2000 [1:22:14<6:14:14, 14.09s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 407/2000 [1:22:26<5:55:11, 13.38s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 408/2000 [1:22:36<5:31:03, 12.48s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 409/2000 [1:22:50<5:39:47, 12.81s/it, Training Loss: 1.648 Validation Loss: 1.737]

 20%|██        | 410/2000 [1:23:01<5:25:28, 12.28s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 411/2000 [1:23:11<5:09:49, 11.70s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 412/2000 [1:23:22<5:04:35, 11.51s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 413/2000 [1:23:35<5:12:11, 11.80s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 414/2000 [1:23:47<5:14:05, 11.88s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 415/2000 [1:23:57<4:59:13, 11.33s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 416/2000 [1:24:09<5:02:51, 11.47s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 417/2000 [1:24:24<5:29:04, 12.47s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 418/2000 [1:24:34<5:16:37, 12.01s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 419/2000 [1:24:46<5:13:52, 11.91s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 420/2000 [1:24:57<5:03:12, 11.51s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 421/2000 [1:25:13<5:41:33, 12.98s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 422/2000 [1:25:26<5:40:01, 12.93s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 423/2000 [1:25:40<5:50:52, 13.35s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██        | 424/2000 [1:25:56<6:05:40, 13.92s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██▏       | 425/2000 [1:26:07<5:42:32, 13.05s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██▏       | 426/2000 [1:26:17<5:23:13, 12.32s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██▏       | 427/2000 [1:26:30<5:25:18, 12.41s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██▏       | 428/2000 [1:26:43<5:34:17, 12.76s/it, Training Loss: 1.648 Validation Loss: 1.737]

 21%|██▏       | 429/2000 [1:27:00<6:07:43, 14.04s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 430/2000 [1:27:12<5:51:23, 13.43s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 431/2000 [1:27:27<5:58:42, 13.72s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 432/2000 [1:27:43<6:14:16, 14.32s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 433/2000 [1:27:55<6:00:58, 13.82s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 434/2000 [1:28:06<5:38:04, 12.95s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 435/2000 [1:28:17<5:18:45, 12.22s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 436/2000 [1:28:30<5:31:31, 12.72s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 437/2000 [1:28:43<5:26:09, 12.52s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 438/2000 [1:28:56<5:35:40, 12.89s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 439/2000 [1:29:07<5:18:24, 12.24s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 440/2000 [1:29:17<5:03:12, 11.66s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 441/2000 [1:29:28<4:54:16, 11.33s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 442/2000 [1:29:39<4:52:51, 11.28s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 443/2000 [1:29:50<4:50:14, 11.18s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 444/2000 [1:30:00<4:44:33, 10.97s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 445/2000 [1:30:12<4:46:37, 11.06s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 446/2000 [1:30:22<4:39:52, 10.81s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 447/2000 [1:30:34<4:49:26, 11.18s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 448/2000 [1:30:47<4:59:50, 11.59s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▏       | 449/2000 [1:31:01<5:23:44, 12.52s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▎       | 450/2000 [1:31:14<5:27:08, 12.66s/it, Training Loss: 1.648 Validation Loss: 1.737]

 22%|██▎       | 450/2000 [1:32:24<5:27:08, 12.66s/it, Training Loss: 1.593 Validation Loss: 1.643]

 22%|██▎       | 450/2000 [1:32:24<5:27:08, 12.66s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 451/2000 [1:32:38<14:36:52, 33.97s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 452/2000 [1:32:53<12:11:59, 28.37s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 453/2000 [1:33:07<10:17:00, 23.93s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 454/2000 [1:33:20<8:51:24, 20.62s/it, Training Loss: 1.593 Validation Loss: 1.643] 

 23%|██▎       | 455/2000 [1:33:33<7:52:45, 18.36s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 456/2000 [1:33:46<7:15:52, 16.94s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 457/2000 [1:33:58<6:30:34, 15.19s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 458/2000 [1:34:10<6:10:33, 14.42s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 459/2000 [1:34:22<5:47:47, 13.54s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 460/2000 [1:34:36<5:50:33, 13.66s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 461/2000 [1:34:49<5:51:50, 13.72s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 462/2000 [1:35:02<5:45:46, 13.49s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 463/2000 [1:35:13<5:23:37, 12.63s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 464/2000 [1:35:24<5:10:16, 12.12s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 465/2000 [1:35:35<5:02:10, 11.81s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 466/2000 [1:35:47<5:05:08, 11.94s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 467/2000 [1:36:00<5:09:30, 12.11s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 468/2000 [1:36:13<5:18:12, 12.46s/it, Training Loss: 1.593 Validation Loss: 1.643]

 23%|██▎       | 469/2000 [1:36:24<5:08:43, 12.10s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▎       | 470/2000 [1:36:36<5:06:05, 12.00s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▎       | 471/2000 [1:36:48<5:07:00, 12.05s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▎       | 472/2000 [1:37:01<5:12:32, 12.27s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▎       | 473/2000 [1:37:14<5:19:24, 12.55s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▎       | 474/2000 [1:37:27<5:18:54, 12.54s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 475/2000 [1:37:39<5:16:02, 12.43s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 476/2000 [1:37:51<5:15:40, 12.43s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 477/2000 [1:38:02<5:05:07, 12.02s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 478/2000 [1:38:14<4:58:48, 11.78s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 479/2000 [1:38:26<5:00:17, 11.85s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 480/2000 [1:38:38<5:06:39, 12.10s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 481/2000 [1:38:51<5:09:00, 12.21s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 482/2000 [1:39:02<5:03:26, 11.99s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 483/2000 [1:39:14<4:58:02, 11.79s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 484/2000 [1:39:24<4:49:35, 11.46s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 485/2000 [1:39:36<4:47:34, 11.39s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 486/2000 [1:39:47<4:45:40, 11.32s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 487/2000 [1:39:59<4:55:08, 11.70s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 488/2000 [1:40:11<4:58:31, 11.85s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 489/2000 [1:40:24<5:04:12, 12.08s/it, Training Loss: 1.593 Validation Loss: 1.643]

 24%|██▍       | 490/2000 [1:40:35<4:58:45, 11.87s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 491/2000 [1:40:47<4:58:15, 11.86s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 492/2000 [1:40:59<4:55:45, 11.77s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 493/2000 [1:41:11<5:00:20, 11.96s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 494/2000 [1:41:23<5:00:18, 11.96s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 495/2000 [1:41:35<4:55:35, 11.78s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 496/2000 [1:41:47<5:00:47, 12.00s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 497/2000 [1:42:00<5:04:15, 12.15s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 498/2000 [1:42:10<4:54:14, 11.75s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▍       | 499/2000 [1:42:22<4:55:12, 11.80s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▌       | 500/2000 [1:42:33<4:47:16, 11.49s/it, Training Loss: 1.593 Validation Loss: 1.643]

 25%|██▌       | 500/2000 [1:43:35<4:47:16, 11.49s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 500/2000 [1:43:35<4:47:16, 11.49s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 501/2000 [1:43:46<12:29:52, 30.02s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 502/2000 [1:43:57<10:03:36, 24.18s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 503/2000 [1:44:08<8:27:47, 20.35s/it, Training Loss: 1.551 Validation Loss: 1.627] 

 25%|██▌       | 504/2000 [1:44:20<7:24:21, 17.82s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 505/2000 [1:44:31<6:34:10, 15.82s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 506/2000 [1:44:43<6:04:10, 14.63s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 507/2000 [1:44:57<5:55:09, 14.27s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 508/2000 [1:45:09<5:38:47, 13.62s/it, Training Loss: 1.551 Validation Loss: 1.627]

 25%|██▌       | 509/2000 [1:45:23<5:42:05, 13.77s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 510/2000 [1:45:33<5:18:10, 12.81s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 511/2000 [1:45:44<5:02:39, 12.20s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 512/2000 [1:45:56<5:00:38, 12.12s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 513/2000 [1:46:07<4:49:39, 11.69s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 514/2000 [1:46:21<5:07:13, 12.40s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 515/2000 [1:46:32<5:00:08, 12.13s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 516/2000 [1:46:44<4:54:29, 11.91s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 517/2000 [1:46:57<5:01:43, 12.21s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 518/2000 [1:47:07<4:48:16, 11.67s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 519/2000 [1:47:19<4:52:30, 11.85s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 520/2000 [1:47:31<4:53:10, 11.89s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 521/2000 [1:47:43<4:47:28, 11.66s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 522/2000 [1:47:55<4:54:21, 11.95s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 523/2000 [1:48:07<4:53:50, 11.94s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▌       | 524/2000 [1:48:19<4:51:57, 11.87s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▋       | 525/2000 [1:48:29<4:42:54, 11.51s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▋       | 526/2000 [1:48:41<4:40:48, 11.43s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▋       | 527/2000 [1:48:53<4:45:17, 11.62s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▋       | 528/2000 [1:49:04<4:42:47, 11.53s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▋       | 529/2000 [1:49:16<4:47:11, 11.71s/it, Training Loss: 1.551 Validation Loss: 1.627]

 26%|██▋       | 530/2000 [1:49:29<4:57:38, 12.15s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 531/2000 [1:49:41<4:50:55, 11.88s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 532/2000 [1:49:52<4:48:33, 11.79s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 533/2000 [1:50:04<4:49:44, 11.85s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 534/2000 [1:50:15<4:43:12, 11.59s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 535/2000 [1:50:26<4:37:44, 11.37s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 536/2000 [1:50:37<4:34:38, 11.26s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 537/2000 [1:50:50<4:48:24, 11.83s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 538/2000 [1:51:02<4:44:55, 11.69s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 539/2000 [1:51:13<4:41:24, 11.56s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 540/2000 [1:51:25<4:45:00, 11.71s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 541/2000 [1:51:35<4:36:24, 11.37s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 542/2000 [1:51:48<4:45:23, 11.74s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 543/2000 [1:52:00<4:48:50, 11.89s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 544/2000 [1:52:11<4:39:31, 11.52s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 545/2000 [1:52:22<4:34:43, 11.33s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 546/2000 [1:52:33<4:29:45, 11.13s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 547/2000 [1:52:43<4:25:38, 10.97s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 548/2000 [1:52:54<4:28:01, 11.08s/it, Training Loss: 1.551 Validation Loss: 1.627]

 27%|██▋       | 549/2000 [1:53:08<4:48:12, 11.92s/it, Training Loss: 1.551 Validation Loss: 1.627]

 28%|██▊       | 550/2000 [1:53:21<4:53:38, 12.15s/it, Training Loss: 1.551 Validation Loss: 1.627]

 28%|██▊       | 550/2000 [1:54:41<4:53:38, 12.15s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 550/2000 [1:54:41<4:53:38, 12.15s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 551/2000 [1:54:54<14:38:22, 36.37s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 552/2000 [1:55:07<11:46:07, 29.26s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 553/2000 [1:55:30<11:03:04, 27.49s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 554/2000 [1:55:50<10:06:24, 25.16s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 555/2000 [1:56:05<8:54:44, 22.20s/it, Training Loss: 1.495 Validation Loss: 1.509] 

 28%|██▊       | 556/2000 [1:56:17<7:41:48, 19.19s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 557/2000 [1:56:31<7:01:03, 17.51s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 558/2000 [1:56:42<6:17:31, 15.71s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 559/2000 [1:56:55<5:59:08, 14.95s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 560/2000 [1:57:09<5:46:49, 14.45s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 561/2000 [1:57:21<5:29:44, 13.75s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 562/2000 [1:57:35<5:33:30, 13.92s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 563/2000 [1:57:50<5:39:22, 14.17s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 564/2000 [1:58:05<5:49:16, 14.59s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 565/2000 [1:58:18<5:34:23, 13.98s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 566/2000 [1:58:29<5:14:55, 13.18s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 567/2000 [1:58:43<5:15:01, 13.19s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 568/2000 [1:58:58<5:29:16, 13.80s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 569/2000 [1:59:12<5:33:12, 13.97s/it, Training Loss: 1.495 Validation Loss: 1.509]

 28%|██▊       | 570/2000 [1:59:24<5:16:49, 13.29s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▊       | 571/2000 [1:59:35<5:00:15, 12.61s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▊       | 572/2000 [1:59:47<4:54:40, 12.38s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▊       | 573/2000 [1:59:58<4:49:18, 12.16s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▊       | 574/2000 [2:00:09<4:40:34, 11.81s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 575/2000 [2:00:20<4:33:23, 11.51s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 576/2000 [2:00:31<4:28:10, 11.30s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 577/2000 [2:00:42<4:25:00, 11.17s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 578/2000 [2:00:53<4:24:32, 11.16s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 579/2000 [2:01:04<4:21:28, 11.04s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 580/2000 [2:01:15<4:21:51, 11.06s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 581/2000 [2:01:26<4:20:41, 11.02s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 582/2000 [2:01:36<4:18:18, 10.93s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 583/2000 [2:01:47<4:17:25, 10.90s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 584/2000 [2:01:59<4:24:57, 11.23s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 585/2000 [2:02:10<4:24:30, 11.22s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 586/2000 [2:02:21<4:21:39, 11.10s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 587/2000 [2:02:34<4:30:31, 11.49s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 588/2000 [2:02:44<4:24:36, 11.24s/it, Training Loss: 1.495 Validation Loss: 1.509]

 29%|██▉       | 589/2000 [2:02:58<4:37:48, 11.81s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 590/2000 [2:03:10<4:41:13, 11.97s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 591/2000 [2:03:21<4:38:44, 11.87s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 592/2000 [2:03:38<5:12:41, 13.33s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 593/2000 [2:03:54<5:30:29, 14.09s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 594/2000 [2:04:08<5:27:05, 13.96s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 595/2000 [2:04:21<5:23:09, 13.80s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 596/2000 [2:04:33<5:11:41, 13.32s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 597/2000 [2:04:45<5:03:03, 12.96s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 598/2000 [2:04:59<5:07:32, 13.16s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|██▉       | 599/2000 [2:05:11<5:01:16, 12.90s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|███       | 600/2000 [2:05:24<5:02:16, 12.95s/it, Training Loss: 1.495 Validation Loss: 1.509]

 30%|███       | 600/2000 [2:06:33<5:02:16, 12.95s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 600/2000 [2:06:33<5:02:16, 12.95s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 601/2000 [2:06:47<13:06:48, 33.74s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 602/2000 [2:07:01<10:51:39, 27.97s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 603/2000 [2:07:14<9:04:38, 23.39s/it, Training Loss: 1.468 Validation Loss: 1.455] 

 30%|███       | 604/2000 [2:07:27<7:51:18, 20.26s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 605/2000 [2:07:39<6:56:10, 17.90s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 606/2000 [2:07:52<6:19:18, 16.33s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 607/2000 [2:08:04<5:47:51, 14.98s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 608/2000 [2:08:16<5:31:18, 14.28s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 609/2000 [2:08:30<5:26:30, 14.08s/it, Training Loss: 1.468 Validation Loss: 1.455]

 30%|███       | 610/2000 [2:08:43<5:18:23, 13.74s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 611/2000 [2:08:57<5:17:56, 13.73s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 612/2000 [2:09:08<5:01:45, 13.04s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 613/2000 [2:09:19<4:45:44, 12.36s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 614/2000 [2:09:30<4:33:46, 11.85s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 615/2000 [2:09:41<4:29:14, 11.66s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 616/2000 [2:09:52<4:28:57, 11.66s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 617/2000 [2:10:04<4:25:54, 11.54s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 618/2000 [2:10:15<4:24:17, 11.47s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 619/2000 [2:10:29<4:40:30, 12.19s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 620/2000 [2:10:50<5:44:45, 14.99s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 621/2000 [2:11:05<5:43:16, 14.94s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 622/2000 [2:11:18<5:31:00, 14.41s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 623/2000 [2:11:33<5:31:08, 14.43s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███       | 624/2000 [2:11:47<5:27:02, 14.26s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███▏      | 625/2000 [2:11:59<5:13:17, 13.67s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███▏      | 626/2000 [2:12:12<5:08:23, 13.47s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███▏      | 627/2000 [2:12:24<4:58:46, 13.06s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███▏      | 628/2000 [2:12:36<4:50:16, 12.69s/it, Training Loss: 1.468 Validation Loss: 1.455]

 31%|███▏      | 629/2000 [2:12:48<4:45:24, 12.49s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 630/2000 [2:13:00<4:41:23, 12.32s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 631/2000 [2:13:13<4:43:28, 12.42s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 632/2000 [2:13:24<4:39:22, 12.25s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 633/2000 [2:13:37<4:38:06, 12.21s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 634/2000 [2:13:49<4:42:13, 12.40s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 635/2000 [2:14:01<4:37:40, 12.21s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 636/2000 [2:14:14<4:44:06, 12.50s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 637/2000 [2:14:28<4:50:45, 12.80s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 638/2000 [2:14:40<4:48:06, 12.69s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 639/2000 [2:14:53<4:50:41, 12.82s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 640/2000 [2:15:06<4:49:11, 12.76s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 641/2000 [2:15:18<4:44:07, 12.54s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 642/2000 [2:15:30<4:39:20, 12.34s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 643/2000 [2:15:42<4:38:56, 12.33s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 644/2000 [2:15:56<4:48:55, 12.78s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 645/2000 [2:16:09<4:46:48, 12.70s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 646/2000 [2:16:21<4:44:59, 12.63s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 647/2000 [2:16:33<4:43:16, 12.56s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 648/2000 [2:16:45<4:39:31, 12.41s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▏      | 649/2000 [2:16:58<4:37:53, 12.34s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▎      | 650/2000 [2:17:11<4:46:16, 12.72s/it, Training Loss: 1.468 Validation Loss: 1.455]

 32%|███▎      | 650/2000 [2:18:30<4:46:16, 12.72s/it, Training Loss: 1.446 Validation Loss: 1.347]

 32%|███▎      | 650/2000 [2:18:30<4:46:16, 12.72s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 651/2000 [2:18:43<13:37:44, 36.37s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 652/2000 [2:18:55<10:53:09, 29.07s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 653/2000 [2:19:05<8:42:09, 23.26s/it, Training Loss: 1.446 Validation Loss: 1.347] 

 33%|███▎      | 654/2000 [2:19:14<7:08:00, 19.08s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 655/2000 [2:19:24<6:05:23, 16.30s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 656/2000 [2:19:35<5:30:25, 14.75s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 657/2000 [2:19:44<4:54:24, 13.15s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 658/2000 [2:19:56<4:43:23, 12.67s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 659/2000 [2:20:06<4:28:36, 12.02s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 660/2000 [2:20:16<4:15:53, 11.46s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 661/2000 [2:20:27<4:07:58, 11.11s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 662/2000 [2:20:37<3:59:25, 10.74s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 663/2000 [2:20:46<3:51:11, 10.37s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 664/2000 [2:20:56<3:50:44, 10.36s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 665/2000 [2:21:07<3:48:23, 10.27s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 666/2000 [2:21:16<3:42:23, 10.00s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 667/2000 [2:21:26<3:42:15, 10.00s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 668/2000 [2:21:37<3:46:41, 10.21s/it, Training Loss: 1.446 Validation Loss: 1.347]

 33%|███▎      | 669/2000 [2:21:46<3:41:39,  9.99s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▎      | 670/2000 [2:21:56<3:41:05,  9.97s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▎      | 671/2000 [2:22:06<3:41:32, 10.00s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▎      | 672/2000 [2:22:16<3:39:50,  9.93s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▎      | 673/2000 [2:22:26<3:38:55,  9.90s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▎      | 674/2000 [2:22:36<3:39:43,  9.94s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 675/2000 [2:22:45<3:37:00,  9.83s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 676/2000 [2:22:56<3:39:33,  9.95s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 677/2000 [2:23:07<3:46:08, 10.26s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 678/2000 [2:23:16<3:44:14, 10.18s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 679/2000 [2:23:27<3:44:53, 10.21s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 680/2000 [2:23:37<3:43:52, 10.18s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 681/2000 [2:23:48<3:48:49, 10.41s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 682/2000 [2:23:58<3:44:31, 10.22s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 683/2000 [2:24:09<3:49:45, 10.47s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 684/2000 [2:24:19<3:46:10, 10.31s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 685/2000 [2:24:28<3:43:03, 10.18s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 686/2000 [2:24:39<3:44:00, 10.23s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 687/2000 [2:24:49<3:41:43, 10.13s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 688/2000 [2:25:00<3:45:49, 10.33s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 689/2000 [2:25:12<4:00:07, 10.99s/it, Training Loss: 1.446 Validation Loss: 1.347]

 34%|███▍      | 690/2000 [2:25:25<4:13:27, 11.61s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 691/2000 [2:25:38<4:23:32, 12.08s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 692/2000 [2:25:50<4:20:27, 11.95s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 693/2000 [2:26:00<4:10:46, 11.51s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 694/2000 [2:26:11<4:02:13, 11.13s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 695/2000 [2:26:21<3:55:08, 10.81s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 696/2000 [2:26:31<3:48:55, 10.53s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 697/2000 [2:26:41<3:46:15, 10.42s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 698/2000 [2:26:52<3:49:16, 10.57s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▍      | 699/2000 [2:27:03<3:52:37, 10.73s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▌      | 700/2000 [2:27:15<4:05:12, 11.32s/it, Training Loss: 1.446 Validation Loss: 1.347]

 35%|███▌      | 700/2000 [2:28:22<4:05:12, 11.32s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 700/2000 [2:28:22<4:05:12, 11.32s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 701/2000 [2:28:32<11:08:04, 30.86s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 702/2000 [2:28:43<8:58:09, 24.88s/it, Training Loss: 1.429 Validation Loss: 1.295] 

 35%|███▌      | 703/2000 [2:28:55<7:37:14, 21.15s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 704/2000 [2:29:06<6:30:54, 18.10s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 705/2000 [2:29:24<6:25:57, 17.88s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 706/2000 [2:29:36<5:51:44, 16.31s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 707/2000 [2:29:48<5:23:13, 15.00s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 708/2000 [2:30:00<4:59:19, 13.90s/it, Training Loss: 1.429 Validation Loss: 1.295]

 35%|███▌      | 709/2000 [2:30:10<4:34:40, 12.77s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 710/2000 [2:30:20<4:17:30, 11.98s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 711/2000 [2:30:30<4:06:14, 11.46s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 712/2000 [2:30:40<3:53:52, 10.89s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 713/2000 [2:30:50<3:50:52, 10.76s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 714/2000 [2:31:01<3:49:31, 10.71s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 715/2000 [2:31:11<3:43:47, 10.45s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 716/2000 [2:31:21<3:41:21, 10.34s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 717/2000 [2:31:32<3:45:40, 10.55s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 718/2000 [2:31:42<3:44:00, 10.48s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 719/2000 [2:31:52<3:42:51, 10.44s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 720/2000 [2:32:03<3:41:12, 10.37s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 721/2000 [2:32:13<3:41:33, 10.39s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 722/2000 [2:32:23<3:37:52, 10.23s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 723/2000 [2:32:33<3:36:46, 10.19s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▌      | 724/2000 [2:32:43<3:34:15, 10.07s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▋      | 725/2000 [2:32:53<3:32:55, 10.02s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▋      | 726/2000 [2:33:03<3:36:42, 10.21s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▋      | 727/2000 [2:33:14<3:40:31, 10.39s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▋      | 728/2000 [2:33:24<3:35:59, 10.19s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▋      | 729/2000 [2:33:34<3:34:01, 10.10s/it, Training Loss: 1.429 Validation Loss: 1.295]

 36%|███▋      | 730/2000 [2:33:44<3:36:11, 10.21s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 731/2000 [2:33:55<3:37:10, 10.27s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 732/2000 [2:34:04<3:34:33, 10.15s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 733/2000 [2:34:15<3:34:21, 10.15s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 734/2000 [2:34:25<3:33:09, 10.10s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 735/2000 [2:34:34<3:30:16,  9.97s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 736/2000 [2:34:44<3:30:17,  9.98s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 737/2000 [2:34:54<3:31:06, 10.03s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 738/2000 [2:35:04<3:28:32,  9.91s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 739/2000 [2:35:14<3:29:55,  9.99s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 740/2000 [2:35:24<3:29:41,  9.98s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 741/2000 [2:35:34<3:27:41,  9.90s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 742/2000 [2:35:44<3:27:24,  9.89s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 743/2000 [2:35:54<3:29:37, 10.01s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 744/2000 [2:36:05<3:32:50, 10.17s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 745/2000 [2:36:15<3:31:22, 10.11s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 746/2000 [2:36:25<3:30:44, 10.08s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 747/2000 [2:36:34<3:27:44,  9.95s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 748/2000 [2:36:45<3:31:35, 10.14s/it, Training Loss: 1.429 Validation Loss: 1.295]

 37%|███▋      | 749/2000 [2:36:55<3:32:18, 10.18s/it, Training Loss: 1.429 Validation Loss: 1.295]

 38%|███▊      | 750/2000 [2:37:05<3:28:03,  9.99s/it, Training Loss: 1.429 Validation Loss: 1.295]

 38%|███▊      | 750/2000 [2:38:00<3:28:03,  9.99s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 750/2000 [2:38:00<3:28:03,  9.99s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 751/2000 [2:38:10<9:11:49, 26.51s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 752/2000 [2:38:20<7:32:27, 21.75s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 753/2000 [2:38:30<6:17:03, 18.14s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 754/2000 [2:38:40<5:26:29, 15.72s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 755/2000 [2:38:50<4:51:01, 14.03s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 756/2000 [2:39:01<4:28:05, 12.93s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 757/2000 [2:39:11<4:10:10, 12.08s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 758/2000 [2:39:21<3:58:34, 11.53s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 759/2000 [2:39:31<3:49:32, 11.10s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 760/2000 [2:39:41<3:42:56, 10.79s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 761/2000 [2:39:51<3:38:44, 10.59s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 762/2000 [2:40:01<3:33:47, 10.36s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 763/2000 [2:40:11<3:31:22, 10.25s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 764/2000 [2:40:21<3:31:16, 10.26s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 765/2000 [2:40:32<3:31:29, 10.27s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 766/2000 [2:40:42<3:29:48, 10.20s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 767/2000 [2:40:52<3:29:39, 10.20s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 768/2000 [2:41:02<3:30:18, 10.24s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 769/2000 [2:41:13<3:30:49, 10.28s/it, Training Loss: 1.406 Validation Loss: 1.297]

 38%|███▊      | 770/2000 [2:41:23<3:30:02, 10.25s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▊      | 771/2000 [2:41:32<3:26:26, 10.08s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▊      | 772/2000 [2:41:43<3:27:04, 10.12s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▊      | 773/2000 [2:41:53<3:30:38, 10.30s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▊      | 774/2000 [2:42:03<3:26:08, 10.09s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 775/2000 [2:42:13<3:26:02, 10.09s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 776/2000 [2:42:24<3:30:15, 10.31s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 777/2000 [2:42:34<3:27:10, 10.16s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 778/2000 [2:42:44<3:26:10, 10.12s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 779/2000 [2:42:54<3:25:58, 10.12s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 780/2000 [2:43:04<3:23:49, 10.02s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 781/2000 [2:43:14<3:27:23, 10.21s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 782/2000 [2:43:24<3:26:32, 10.17s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 783/2000 [2:43:34<3:23:25, 10.03s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 784/2000 [2:43:45<3:26:19, 10.18s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 785/2000 [2:43:55<3:26:51, 10.22s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 786/2000 [2:44:05<3:25:14, 10.14s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 787/2000 [2:44:15<3:23:21, 10.06s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 788/2000 [2:44:26<3:28:50, 10.34s/it, Training Loss: 1.406 Validation Loss: 1.297]

 39%|███▉      | 789/2000 [2:44:35<3:25:31, 10.18s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 790/2000 [2:44:45<3:22:51, 10.06s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 791/2000 [2:44:55<3:23:03, 10.08s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 792/2000 [2:45:05<3:22:44, 10.07s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 793/2000 [2:45:15<3:21:26, 10.01s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 794/2000 [2:45:25<3:22:09, 10.06s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 795/2000 [2:45:36<3:22:27, 10.08s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 796/2000 [2:45:45<3:20:01,  9.97s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 797/2000 [2:45:56<3:24:16, 10.19s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 798/2000 [2:46:06<3:24:49, 10.22s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|███▉      | 799/2000 [2:46:16<3:21:45, 10.08s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|████      | 800/2000 [2:46:26<3:21:53, 10.09s/it, Training Loss: 1.406 Validation Loss: 1.297]

 40%|████      | 800/2000 [2:47:22<3:21:53, 10.09s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 800/2000 [2:47:22<3:21:53, 10.09s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 801/2000 [2:47:32<8:56:35, 26.85s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 802/2000 [2:47:42<7:16:30, 21.86s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 803/2000 [2:47:53<6:07:35, 18.43s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 804/2000 [2:48:03<5:18:02, 15.96s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 805/2000 [2:48:13<4:41:09, 14.12s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 806/2000 [2:48:23<4:19:33, 13.04s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 807/2000 [2:48:34<4:02:30, 12.20s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 808/2000 [2:48:43<3:48:13, 11.49s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 809/2000 [2:48:54<3:41:53, 11.18s/it, Training Loss: 1.386 Validation Loss: 1.256]

 40%|████      | 810/2000 [2:49:04<3:36:16, 10.90s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 811/2000 [2:49:14<3:30:54, 10.64s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 812/2000 [2:49:24<3:27:58, 10.50s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 813/2000 [2:49:34<3:23:11, 10.27s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 814/2000 [2:49:44<3:21:32, 10.20s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 815/2000 [2:49:54<3:21:32, 10.20s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 816/2000 [2:50:04<3:18:17, 10.05s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 817/2000 [2:50:14<3:17:34, 10.02s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 818/2000 [2:50:24<3:17:51, 10.04s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 819/2000 [2:50:34<3:16:11,  9.97s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 820/2000 [2:50:44<3:14:20,  9.88s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 821/2000 [2:50:54<3:15:19,  9.94s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 822/2000 [2:51:04<3:20:24, 10.21s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 823/2000 [2:51:15<3:19:49, 10.19s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████      | 824/2000 [2:51:24<3:16:54, 10.05s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████▏     | 825/2000 [2:51:34<3:17:44, 10.10s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████▏     | 826/2000 [2:51:45<3:19:19, 10.19s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████▏     | 827/2000 [2:51:55<3:19:18, 10.20s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████▏     | 828/2000 [2:52:05<3:17:22, 10.10s/it, Training Loss: 1.386 Validation Loss: 1.256]

 41%|████▏     | 829/2000 [2:52:15<3:17:23, 10.11s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 830/2000 [2:52:26<3:19:04, 10.21s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 831/2000 [2:52:36<3:17:32, 10.14s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 832/2000 [2:52:46<3:17:14, 10.13s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 833/2000 [2:52:56<3:16:50, 10.12s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 834/2000 [2:53:07<3:20:41, 10.33s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 835/2000 [2:53:18<3:27:40, 10.70s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 836/2000 [2:53:28<3:22:13, 10.42s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 837/2000 [2:53:38<3:19:29, 10.29s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 838/2000 [2:53:49<3:24:38, 10.57s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 839/2000 [2:53:59<3:20:06, 10.34s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 840/2000 [2:54:09<3:16:38, 10.17s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 841/2000 [2:54:19<3:19:24, 10.32s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 842/2000 [2:54:30<3:19:39, 10.35s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 843/2000 [2:54:40<3:16:31, 10.19s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 844/2000 [2:54:50<3:15:46, 10.16s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 845/2000 [2:55:00<3:15:14, 10.14s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 846/2000 [2:55:10<3:14:40, 10.12s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 847/2000 [2:55:20<3:12:18, 10.01s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 848/2000 [2:55:29<3:11:18,  9.96s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▏     | 849/2000 [2:55:40<3:12:02, 10.01s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▎     | 850/2000 [2:55:50<3:12:43, 10.06s/it, Training Loss: 1.386 Validation Loss: 1.256]

 42%|████▎     | 850/2000 [2:56:46<3:12:43, 10.06s/it, Training Loss: 1.371 Validation Loss: 1.239]

 42%|████▎     | 850/2000 [2:56:46<3:12:43, 10.06s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 851/2000 [2:56:56<8:33:42, 26.83s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 852/2000 [2:57:06<6:55:55, 21.74s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 853/2000 [2:57:17<5:54:39, 18.55s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 854/2000 [2:57:27<5:04:29, 15.94s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 855/2000 [2:57:36<4:29:18, 14.11s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 856/2000 [2:57:47<4:09:14, 13.07s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 857/2000 [2:57:57<3:53:41, 12.27s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 858/2000 [2:58:08<3:45:23, 11.84s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 859/2000 [2:58:21<3:49:19, 12.06s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 860/2000 [2:58:31<3:40:22, 11.60s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 861/2000 [2:58:42<3:32:27, 11.19s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 862/2000 [2:58:52<3:28:05, 10.97s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 863/2000 [2:59:02<3:23:15, 10.73s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 864/2000 [2:59:12<3:18:05, 10.46s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 865/2000 [2:59:22<3:17:34, 10.44s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 866/2000 [2:59:34<3:21:11, 10.65s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 867/2000 [2:59:44<3:20:11, 10.60s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 868/2000 [2:59:54<3:16:57, 10.44s/it, Training Loss: 1.371 Validation Loss: 1.239]

 43%|████▎     | 869/2000 [3:00:04<3:14:01, 10.29s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▎     | 870/2000 [3:00:14<3:12:58, 10.25s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▎     | 871/2000 [3:00:24<3:12:31, 10.23s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▎     | 872/2000 [3:00:34<3:10:57, 10.16s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▎     | 873/2000 [3:00:45<3:10:59, 10.17s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▎     | 874/2000 [3:00:55<3:12:59, 10.28s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 875/2000 [3:01:05<3:11:36, 10.22s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 876/2000 [3:01:15<3:09:34, 10.12s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 877/2000 [3:01:25<3:09:06, 10.10s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 878/2000 [3:01:36<3:12:17, 10.28s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 879/2000 [3:01:46<3:09:04, 10.12s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 880/2000 [3:01:56<3:10:22, 10.20s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 881/2000 [3:02:06<3:11:14, 10.25s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 882/2000 [3:02:16<3:09:33, 10.17s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 883/2000 [3:02:26<3:08:08, 10.11s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 884/2000 [3:02:37<3:08:37, 10.14s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 885/2000 [3:02:47<3:08:18, 10.13s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 886/2000 [3:02:56<3:06:02, 10.02s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 887/2000 [3:03:07<3:10:41, 10.28s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 888/2000 [3:03:17<3:09:53, 10.25s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 889/2000 [3:03:27<3:07:05, 10.10s/it, Training Loss: 1.371 Validation Loss: 1.239]

 44%|████▍     | 890/2000 [3:03:37<3:06:51, 10.10s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 891/2000 [3:03:48<3:10:47, 10.32s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 892/2000 [3:03:58<3:08:05, 10.19s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 893/2000 [3:04:08<3:06:38, 10.12s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 894/2000 [3:04:18<3:05:42, 10.07s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 895/2000 [3:04:28<3:06:44, 10.14s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 896/2000 [3:04:38<3:05:23, 10.08s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 897/2000 [3:04:48<3:04:00, 10.01s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 898/2000 [3:04:58<3:03:46, 10.01s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▍     | 899/2000 [3:05:08<3:03:36, 10.01s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▌     | 900/2000 [3:05:18<3:03:41, 10.02s/it, Training Loss: 1.371 Validation Loss: 1.239]

 45%|████▌     | 900/2000 [3:06:13<3:03:41, 10.02s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 900/2000 [3:06:13<3:03:41, 10.02s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 901/2000 [3:06:24<8:09:25, 26.72s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 902/2000 [3:06:33<6:35:07, 21.59s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 903/2000 [3:06:44<5:35:17, 18.34s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 904/2000 [3:06:54<4:49:57, 15.87s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 905/2000 [3:07:04<4:17:01, 14.08s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 906/2000 [3:07:14<3:54:20, 12.85s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 907/2000 [3:07:25<3:42:49, 12.23s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 908/2000 [3:07:35<3:29:14, 11.50s/it, Training Loss: 1.346 Validation Loss: 1.186]

 45%|████▌     | 909/2000 [3:07:45<3:20:32, 11.03s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 910/2000 [3:07:55<3:17:08, 10.85s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 911/2000 [3:08:06<3:14:49, 10.73s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 912/2000 [3:08:16<3:10:47, 10.52s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 913/2000 [3:08:25<3:06:40, 10.30s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 914/2000 [3:08:35<3:03:58, 10.16s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 915/2000 [3:08:46<3:07:14, 10.35s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 916/2000 [3:08:56<3:06:33, 10.33s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 917/2000 [3:09:06<3:03:14, 10.15s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 918/2000 [3:09:16<3:03:20, 10.17s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 919/2000 [3:09:27<3:08:42, 10.47s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 920/2000 [3:09:37<3:05:39, 10.31s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 921/2000 [3:09:48<3:04:48, 10.28s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 922/2000 [3:09:58<3:03:53, 10.24s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 923/2000 [3:10:08<3:03:43, 10.24s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▌     | 924/2000 [3:10:18<3:01:02, 10.10s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▋     | 925/2000 [3:10:28<2:59:59, 10.05s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▋     | 926/2000 [3:10:37<2:58:49,  9.99s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▋     | 927/2000 [3:10:48<2:59:13, 10.02s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▋     | 928/2000 [3:10:58<3:02:26, 10.21s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▋     | 929/2000 [3:11:08<2:59:31, 10.06s/it, Training Loss: 1.346 Validation Loss: 1.186]

 46%|████▋     | 930/2000 [3:11:18<2:59:14, 10.05s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 931/2000 [3:11:28<2:58:07, 10.00s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 932/2000 [3:11:38<2:59:18, 10.07s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 933/2000 [3:11:49<3:01:26, 10.20s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 934/2000 [3:11:58<2:59:25, 10.10s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 935/2000 [3:12:08<2:58:07, 10.04s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 936/2000 [3:12:18<2:58:16, 10.05s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 937/2000 [3:12:28<2:57:44, 10.03s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 938/2000 [3:12:38<2:56:35,  9.98s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 939/2000 [3:12:48<2:57:03, 10.01s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 940/2000 [3:12:58<2:56:29,  9.99s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 941/2000 [3:13:09<2:58:21, 10.11s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 942/2000 [3:13:19<2:58:51, 10.14s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 943/2000 [3:13:29<2:57:28, 10.07s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 944/2000 [3:13:39<2:56:25, 10.02s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 945/2000 [3:13:49<3:00:14, 10.25s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 946/2000 [3:13:59<2:58:11, 10.14s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 947/2000 [3:14:09<2:57:30, 10.11s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 948/2000 [3:14:19<2:55:15, 10.00s/it, Training Loss: 1.346 Validation Loss: 1.186]

 47%|████▋     | 949/2000 [3:14:29<2:56:36, 10.08s/it, Training Loss: 1.346 Validation Loss: 1.186]

 48%|████▊     | 950/2000 [3:14:39<2:54:50,  9.99s/it, Training Loss: 1.346 Validation Loss: 1.186]

 48%|████▊     | 950/2000 [3:15:33<2:54:50,  9.99s/it, Training Loss: 1.341 Validation Loss: 1.19] 

 48%|████▊     | 950/2000 [3:15:33<2:54:50,  9.99s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 951/2000 [3:15:43<7:37:04, 26.14s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 952/2000 [3:15:54<6:16:06, 21.53s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 953/2000 [3:16:04<5:17:30, 18.20s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 954/2000 [3:16:14<4:34:00, 15.72s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 955/2000 [3:16:24<4:02:26, 13.92s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 956/2000 [3:16:34<3:42:33, 12.79s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 957/2000 [3:16:44<3:29:40, 12.06s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 958/2000 [3:16:55<3:19:52, 11.51s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 959/2000 [3:17:05<3:11:39, 11.05s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 960/2000 [3:17:15<3:05:43, 10.72s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 961/2000 [3:17:25<3:04:31, 10.66s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 962/2000 [3:17:35<3:00:37, 10.44s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 963/2000 [3:17:45<2:56:42, 10.22s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 964/2000 [3:17:55<2:59:18, 10.38s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 965/2000 [3:18:06<2:57:44, 10.30s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 966/2000 [3:18:15<2:54:29, 10.12s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 967/2000 [3:18:25<2:53:03, 10.05s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 968/2000 [3:18:35<2:52:41, 10.04s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 969/2000 [3:18:46<2:54:13, 10.14s/it, Training Loss: 1.341 Validation Loss: 1.19]

 48%|████▊     | 970/2000 [3:18:56<2:54:25, 10.16s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▊     | 971/2000 [3:19:06<2:53:43, 10.13s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▊     | 972/2000 [3:19:16<2:51:35, 10.02s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▊     | 973/2000 [3:19:26<2:53:05, 10.11s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▊     | 974/2000 [3:19:36<2:52:15, 10.07s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 975/2000 [3:19:46<2:49:56,  9.95s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 976/2000 [3:19:56<2:50:42, 10.00s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 977/2000 [3:20:06<2:51:04, 10.03s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 978/2000 [3:20:16<2:49:58,  9.98s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 979/2000 [3:20:26<2:52:07, 10.12s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 980/2000 [3:20:36<2:52:10, 10.13s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 981/2000 [3:20:46<2:49:49, 10.00s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 982/2000 [3:20:56<2:52:28, 10.17s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 983/2000 [3:21:07<2:52:14, 10.16s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 984/2000 [3:21:16<2:49:58, 10.04s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 985/2000 [3:21:27<2:51:03, 10.11s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 986/2000 [3:21:36<2:49:20, 10.02s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 987/2000 [3:21:47<2:53:24, 10.27s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 988/2000 [3:21:58<2:53:20, 10.28s/it, Training Loss: 1.341 Validation Loss: 1.19]

 49%|████▉     | 989/2000 [3:22:07<2:50:34, 10.12s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 990/2000 [3:22:17<2:48:25, 10.01s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 991/2000 [3:22:28<2:51:13, 10.18s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 992/2000 [3:22:38<2:50:28, 10.15s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 993/2000 [3:22:48<2:49:10, 10.08s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 994/2000 [3:22:58<2:48:33, 10.05s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 995/2000 [3:23:08<2:51:44, 10.25s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 996/2000 [3:23:18<2:48:39, 10.08s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 997/2000 [3:23:28<2:48:14, 10.06s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 998/2000 [3:23:38<2:46:57, 10.00s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|████▉     | 999/2000 [3:23:48<2:49:30, 10.16s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|█████     | 1000/2000 [3:23:59<2:48:57, 10.14s/it, Training Loss: 1.341 Validation Loss: 1.19]

 50%|█████     | 1000/2000 [3:24:54<2:48:57, 10.14s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1000/2000 [3:24:54<2:48:57, 10.14s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1001/2000 [3:25:04<7:25:47, 26.77s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1002/2000 [3:25:14<6:01:34, 21.74s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1003/2000 [3:25:24<5:02:19, 18.19s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1004/2000 [3:25:34<4:19:46, 15.65s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1005/2000 [3:25:44<3:50:36, 13.91s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1006/2000 [3:25:54<3:35:15, 12.99s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1007/2000 [3:26:04<3:19:39, 12.06s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1008/2000 [3:26:14<3:08:48, 11.42s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1009/2000 [3:26:24<3:01:13, 10.97s/it, Training Loss: 1.324 Validation Loss: 1.217]

 50%|█████     | 1010/2000 [3:26:35<2:58:12, 10.80s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1011/2000 [3:26:45<2:54:33, 10.59s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1012/2000 [3:26:55<2:52:00, 10.45s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1013/2000 [3:27:05<2:51:00, 10.40s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1014/2000 [3:27:15<2:48:39, 10.26s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1015/2000 [3:27:25<2:47:23, 10.20s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1016/2000 [3:27:35<2:45:29, 10.09s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1017/2000 [3:27:45<2:46:04, 10.14s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1018/2000 [3:27:56<2:47:31, 10.24s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1019/2000 [3:28:05<2:45:09, 10.10s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1020/2000 [3:28:15<2:43:47, 10.03s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1021/2000 [3:28:26<2:44:55, 10.11s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1022/2000 [3:28:36<2:45:09, 10.13s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1023/2000 [3:28:46<2:43:44, 10.06s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████     | 1024/2000 [3:28:56<2:44:22, 10.11s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████▏    | 1025/2000 [3:29:06<2:44:01, 10.09s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████▏    | 1026/2000 [3:29:16<2:44:21, 10.12s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████▏    | 1027/2000 [3:29:26<2:43:49, 10.10s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████▏    | 1028/2000 [3:29:36<2:41:52,  9.99s/it, Training Loss: 1.324 Validation Loss: 1.217]

 51%|█████▏    | 1029/2000 [3:29:46<2:41:08,  9.96s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1030/2000 [3:29:56<2:42:15, 10.04s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1031/2000 [3:30:06<2:40:17,  9.93s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1032/2000 [3:30:16<2:40:16,  9.93s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1033/2000 [3:30:26<2:40:58,  9.99s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1034/2000 [3:30:35<2:39:08,  9.88s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1035/2000 [3:30:45<2:38:46,  9.87s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1036/2000 [3:30:56<2:43:08, 10.15s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1037/2000 [3:31:06<2:41:36, 10.07s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1038/2000 [3:31:16<2:41:01, 10.04s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1039/2000 [3:31:26<2:40:53, 10.04s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1040/2000 [3:31:36<2:38:56,  9.93s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1041/2000 [3:31:46<2:41:55, 10.13s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1042/2000 [3:31:56<2:42:04, 10.15s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1043/2000 [3:32:06<2:38:53,  9.96s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1044/2000 [3:32:17<2:41:51, 10.16s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1045/2000 [3:32:27<2:42:29, 10.21s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1046/2000 [3:32:36<2:39:03, 10.00s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1047/2000 [3:32:47<2:39:24, 10.04s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1048/2000 [3:32:57<2:39:11, 10.03s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▏    | 1049/2000 [3:33:07<2:40:22, 10.12s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▎    | 1050/2000 [3:33:17<2:41:04, 10.17s/it, Training Loss: 1.324 Validation Loss: 1.217]

 52%|█████▎    | 1050/2000 [3:34:12<2:41:04, 10.17s/it, Training Loss: 1.309 Validation Loss: 1.119]

 52%|█████▎    | 1050/2000 [3:34:12<2:41:04, 10.17s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1051/2000 [3:34:23<7:05:53, 26.93s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1052/2000 [3:34:33<5:44:22, 21.80s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1053/2000 [3:34:43<4:48:07, 18.25s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1054/2000 [3:34:53<4:09:52, 15.85s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1055/2000 [3:35:03<3:39:46, 13.95s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1056/2000 [3:35:13<3:21:00, 12.78s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1057/2000 [3:35:23<3:08:09, 11.97s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1058/2000 [3:35:33<2:56:49, 11.26s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1059/2000 [3:35:43<2:50:37, 10.88s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1060/2000 [3:35:53<2:49:44, 10.83s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1061/2000 [3:36:03<2:45:39, 10.59s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1062/2000 [3:36:13<2:41:47, 10.35s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1063/2000 [3:36:23<2:40:33, 10.28s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1064/2000 [3:36:33<2:39:50, 10.25s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1065/2000 [3:36:43<2:39:01, 10.20s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1066/2000 [3:36:54<2:38:19, 10.17s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1067/2000 [3:37:04<2:38:31, 10.19s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1068/2000 [3:37:14<2:38:44, 10.22s/it, Training Loss: 1.309 Validation Loss: 1.119]

 53%|█████▎    | 1069/2000 [3:37:24<2:37:39, 10.16s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▎    | 1070/2000 [3:37:34<2:35:39, 10.04s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▎    | 1071/2000 [3:37:44<2:37:10, 10.15s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▎    | 1072/2000 [3:37:55<2:38:32, 10.25s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▎    | 1073/2000 [3:38:04<2:35:50, 10.09s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▎    | 1074/2000 [3:38:15<2:35:51, 10.10s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1075/2000 [3:38:25<2:37:31, 10.22s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1076/2000 [3:38:35<2:34:56, 10.06s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1077/2000 [3:38:45<2:34:55, 10.07s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1078/2000 [3:38:56<2:38:08, 10.29s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1079/2000 [3:39:06<2:37:45, 10.28s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1080/2000 [3:39:16<2:37:39, 10.28s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1081/2000 [3:39:26<2:36:39, 10.23s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1082/2000 [3:39:36<2:34:16, 10.08s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1083/2000 [3:39:46<2:34:09, 10.09s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1084/2000 [3:39:56<2:33:32, 10.06s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1085/2000 [3:40:06<2:32:22,  9.99s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1086/2000 [3:40:16<2:32:50, 10.03s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1087/2000 [3:40:26<2:31:54,  9.98s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1088/2000 [3:40:36<2:30:06,  9.88s/it, Training Loss: 1.309 Validation Loss: 1.119]

 54%|█████▍    | 1089/2000 [3:40:45<2:29:47,  9.87s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1090/2000 [3:40:56<2:34:02, 10.16s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1091/2000 [3:41:06<2:31:49, 10.02s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1092/2000 [3:41:16<2:31:10,  9.99s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1093/2000 [3:41:26<2:33:19, 10.14s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1094/2000 [3:41:36<2:31:53, 10.06s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1095/2000 [3:41:46<2:31:40, 10.06s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1096/2000 [3:41:56<2:31:52, 10.08s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1097/2000 [3:42:07<2:33:26, 10.19s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1098/2000 [3:42:17<2:31:59, 10.11s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▍    | 1099/2000 [3:42:27<2:31:20, 10.08s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▌    | 1100/2000 [3:42:37<2:31:01, 10.07s/it, Training Loss: 1.309 Validation Loss: 1.119]

 55%|█████▌    | 1100/2000 [3:43:32<2:31:01, 10.07s/it, Training Loss: 1.3 Validation Loss: 1.125]  

 55%|█████▌    | 1100/2000 [3:43:32<2:31:01, 10.07s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1101/2000 [3:43:42<6:40:02, 26.70s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1102/2000 [3:43:53<5:26:05, 21.79s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1103/2000 [3:44:03<4:33:10, 18.27s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1104/2000 [3:44:13<3:55:26, 15.77s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1105/2000 [3:44:22<3:28:25, 13.97s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1106/2000 [3:44:33<3:10:44, 12.80s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1107/2000 [3:44:42<2:57:31, 11.93s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1108/2000 [3:44:52<2:47:36, 11.27s/it, Training Loss: 1.3 Validation Loss: 1.125]

 55%|█████▌    | 1109/2000 [3:45:02<2:41:54, 10.90s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1110/2000 [3:45:12<2:38:28, 10.68s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1111/2000 [3:45:23<2:36:07, 10.54s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1112/2000 [3:45:33<2:35:00, 10.47s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1113/2000 [3:45:43<2:32:13, 10.30s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1114/2000 [3:45:53<2:30:03, 10.16s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1115/2000 [3:46:03<2:31:38, 10.28s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1116/2000 [3:46:13<2:30:55, 10.24s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1117/2000 [3:46:23<2:28:02, 10.06s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1118/2000 [3:46:33<2:28:19, 10.09s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1119/2000 [3:46:44<2:30:55, 10.28s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1120/2000 [3:46:54<2:30:10, 10.24s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1121/2000 [3:47:04<2:28:24, 10.13s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1122/2000 [3:47:14<2:28:13, 10.13s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1123/2000 [3:47:25<2:31:11, 10.34s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▌    | 1124/2000 [3:47:35<2:30:45, 10.33s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▋    | 1125/2000 [3:47:46<2:31:06, 10.36s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▋    | 1126/2000 [3:47:56<2:31:52, 10.43s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▋    | 1127/2000 [3:48:06<2:29:45, 10.29s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▋    | 1128/2000 [3:48:16<2:29:06, 10.26s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▋    | 1129/2000 [3:48:26<2:27:28, 10.16s/it, Training Loss: 1.3 Validation Loss: 1.125]

 56%|█████▋    | 1130/2000 [3:48:36<2:25:29, 10.03s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1131/2000 [3:48:47<2:29:38, 10.33s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1132/2000 [3:48:57<2:27:46, 10.22s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1133/2000 [3:49:07<2:25:29, 10.07s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1134/2000 [3:49:17<2:27:03, 10.19s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1135/2000 [3:49:28<2:28:23, 10.29s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1136/2000 [3:49:37<2:25:15, 10.09s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1137/2000 [3:49:48<2:26:02, 10.15s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1138/2000 [3:49:58<2:24:58, 10.09s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1139/2000 [3:50:07<2:23:12,  9.98s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1140/2000 [3:50:17<2:22:56,  9.97s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1141/2000 [3:50:27<2:23:36, 10.03s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1142/2000 [3:50:37<2:22:04,  9.93s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1143/2000 [3:50:47<2:21:42,  9.92s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1144/2000 [3:50:58<2:25:00, 10.16s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1145/2000 [3:51:08<2:23:19, 10.06s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1146/2000 [3:51:18<2:22:51, 10.04s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1147/2000 [3:51:28<2:22:46, 10.04s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1148/2000 [3:51:38<2:25:35, 10.25s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▋    | 1149/2000 [3:51:49<2:25:09, 10.23s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▊    | 1150/2000 [3:51:58<2:23:37, 10.14s/it, Training Loss: 1.3 Validation Loss: 1.125]

 57%|█████▊    | 1150/2000 [3:52:54<2:23:37, 10.14s/it, Training Loss: 1.278 Validation Loss: 1.141]

 57%|█████▊    | 1150/2000 [3:52:54<2:23:37, 10.14s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1151/2000 [3:53:05<6:21:46, 26.98s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1152/2000 [3:53:15<5:09:47, 21.92s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1153/2000 [3:53:25<4:19:05, 18.35s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1154/2000 [3:53:35<3:44:25, 15.92s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1155/2000 [3:53:45<3:18:53, 14.12s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1156/2000 [3:53:56<3:05:04, 13.16s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1157/2000 [3:54:06<2:51:31, 12.21s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1158/2000 [3:54:16<2:40:58, 11.47s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1159/2000 [3:54:26<2:35:19, 11.08s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1160/2000 [3:54:36<2:30:44, 10.77s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1161/2000 [3:54:46<2:27:06, 10.52s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1162/2000 [3:54:56<2:24:18, 10.33s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1163/2000 [3:55:06<2:22:42, 10.23s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1164/2000 [3:55:16<2:21:38, 10.17s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1165/2000 [3:55:26<2:23:16, 10.30s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1166/2000 [3:55:36<2:20:58, 10.14s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1167/2000 [3:55:46<2:20:44, 10.14s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1168/2000 [3:55:56<2:20:17, 10.12s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1169/2000 [3:56:07<2:21:50, 10.24s/it, Training Loss: 1.278 Validation Loss: 1.141]

 58%|█████▊    | 1170/2000 [3:56:17<2:20:54, 10.19s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▊    | 1171/2000 [3:56:27<2:20:17, 10.15s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▊    | 1172/2000 [3:56:37<2:18:31, 10.04s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▊    | 1173/2000 [3:56:48<2:22:25, 10.33s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▊    | 1174/2000 [3:56:58<2:20:45, 10.22s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1175/2000 [3:57:07<2:18:05, 10.04s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1176/2000 [3:57:18<2:21:10, 10.28s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1177/2000 [3:57:28<2:20:31, 10.24s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1178/2000 [3:57:38<2:18:13, 10.09s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1179/2000 [3:57:49<2:20:13, 10.25s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1180/2000 [3:57:59<2:20:15, 10.26s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1181/2000 [3:58:09<2:19:29, 10.22s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1182/2000 [3:58:19<2:19:09, 10.21s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1183/2000 [3:58:29<2:17:11, 10.07s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1184/2000 [3:58:41<2:26:03, 10.74s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1185/2000 [3:58:54<2:34:50, 11.40s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1186/2000 [3:59:04<2:28:11, 10.92s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1187/2000 [3:59:14<2:23:58, 10.63s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1188/2000 [3:59:25<2:26:50, 10.85s/it, Training Loss: 1.278 Validation Loss: 1.141]

 59%|█████▉    | 1189/2000 [3:59:35<2:22:37, 10.55s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1190/2000 [3:59:45<2:20:05, 10.38s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1191/2000 [3:59:57<2:23:36, 10.65s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1192/2000 [4:00:06<2:19:48, 10.38s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1193/2000 [4:00:16<2:17:43, 10.24s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1194/2000 [4:00:26<2:17:47, 10.26s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1195/2000 [4:00:36<2:15:50, 10.12s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1196/2000 [4:00:46<2:13:59, 10.00s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1197/2000 [4:00:57<2:17:13, 10.25s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1198/2000 [4:01:07<2:17:12, 10.26s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|█████▉    | 1199/2000 [4:01:17<2:15:36, 10.16s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|██████    | 1200/2000 [4:01:27<2:15:38, 10.17s/it, Training Loss: 1.278 Validation Loss: 1.141]

 60%|██████    | 1200/2000 [4:02:24<2:15:38, 10.17s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1200/2000 [4:02:24<2:15:38, 10.17s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1201/2000 [4:02:34<6:00:34, 27.08s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1202/2000 [4:02:44<4:51:15, 21.90s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1203/2000 [4:02:54<4:04:41, 18.42s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1204/2000 [4:03:04<3:31:53, 15.97s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1205/2000 [4:03:15<3:10:05, 14.35s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1206/2000 [4:03:25<2:52:05, 13.00s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1207/2000 [4:03:35<2:40:51, 12.17s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1208/2000 [4:03:45<2:33:22, 11.62s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1209/2000 [4:03:56<2:28:34, 11.27s/it, Training Loss: 1.277 Validation Loss: 1.144]

 60%|██████    | 1210/2000 [4:04:06<2:23:29, 10.90s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1211/2000 [4:04:16<2:19:52, 10.64s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1212/2000 [4:04:27<2:21:08, 10.75s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1213/2000 [4:04:37<2:18:36, 10.57s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1214/2000 [4:04:46<2:14:50, 10.29s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1215/2000 [4:04:57<2:15:13, 10.34s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1216/2000 [4:05:07<2:13:48, 10.24s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1217/2000 [4:05:17<2:12:19, 10.14s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1218/2000 [4:05:27<2:10:52, 10.04s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1219/2000 [4:05:37<2:11:14, 10.08s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1220/2000 [4:05:47<2:10:53, 10.07s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1221/2000 [4:05:57<2:11:59, 10.17s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1222/2000 [4:06:08<2:12:52, 10.25s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1223/2000 [4:06:18<2:11:33, 10.16s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████    | 1224/2000 [4:06:28<2:11:21, 10.16s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████▏   | 1225/2000 [4:06:38<2:12:08, 10.23s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████▏   | 1226/2000 [4:06:48<2:12:09, 10.25s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████▏   | 1227/2000 [4:06:59<2:11:16, 10.19s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████▏   | 1228/2000 [4:07:09<2:12:48, 10.32s/it, Training Loss: 1.277 Validation Loss: 1.144]

 61%|██████▏   | 1229/2000 [4:07:19<2:10:49, 10.18s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1230/2000 [4:07:29<2:11:00, 10.21s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1231/2000 [4:07:40<2:11:27, 10.26s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1232/2000 [4:07:50<2:11:29, 10.27s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1233/2000 [4:08:00<2:09:48, 10.15s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1234/2000 [4:08:10<2:10:14, 10.20s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1235/2000 [4:08:20<2:10:19, 10.22s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1236/2000 [4:08:31<2:10:01, 10.21s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1237/2000 [4:08:41<2:09:04, 10.15s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1238/2000 [4:08:51<2:09:05, 10.16s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1239/2000 [4:09:02<2:11:13, 10.35s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1240/2000 [4:09:12<2:10:05, 10.27s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1241/2000 [4:09:22<2:09:04, 10.20s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1242/2000 [4:09:32<2:07:53, 10.12s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1243/2000 [4:09:42<2:07:37, 10.11s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1244/2000 [4:09:52<2:07:33, 10.12s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1245/2000 [4:10:02<2:06:32, 10.06s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1246/2000 [4:10:12<2:06:48, 10.09s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1247/2000 [4:10:22<2:06:44, 10.10s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1248/2000 [4:10:32<2:05:45, 10.03s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▏   | 1249/2000 [4:10:42<2:04:40,  9.96s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▎   | 1250/2000 [4:10:52<2:06:54, 10.15s/it, Training Loss: 1.277 Validation Loss: 1.144]

 62%|██████▎   | 1250/2000 [4:11:48<2:06:54, 10.15s/it, Training Loss: 1.267 Validation Loss: 1.102]

 62%|██████▎   | 1250/2000 [4:11:48<2:06:54, 10.15s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1251/2000 [4:11:58<5:35:28, 26.87s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1252/2000 [4:12:08<4:31:30, 21.78s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1253/2000 [4:12:19<3:50:01, 18.48s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1254/2000 [4:12:29<3:18:31, 15.97s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1255/2000 [4:12:39<2:56:01, 14.18s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1256/2000 [4:12:49<2:41:28, 13.02s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1257/2000 [4:12:59<2:29:34, 12.08s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1258/2000 [4:13:10<2:23:54, 11.64s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1259/2000 [4:13:20<2:18:09, 11.19s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1260/2000 [4:13:30<2:13:38, 10.84s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1261/2000 [4:13:40<2:10:43, 10.61s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1262/2000 [4:13:51<2:11:56, 10.73s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1263/2000 [4:14:01<2:08:45, 10.48s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1264/2000 [4:14:11<2:06:12, 10.29s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1265/2000 [4:14:22<2:08:43, 10.51s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1266/2000 [4:14:32<2:07:19, 10.41s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1267/2000 [4:14:42<2:05:15, 10.25s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1268/2000 [4:14:52<2:04:58, 10.24s/it, Training Loss: 1.267 Validation Loss: 1.102]

 63%|██████▎   | 1269/2000 [4:15:02<2:04:33, 10.22s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▎   | 1270/2000 [4:15:12<2:03:38, 10.16s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▎   | 1271/2000 [4:15:22<2:02:35, 10.09s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▎   | 1272/2000 [4:15:32<2:01:18, 10.00s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▎   | 1273/2000 [4:15:42<2:02:12, 10.09s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▎   | 1274/2000 [4:15:52<2:01:40, 10.06s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1275/2000 [4:16:03<2:03:24, 10.21s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1276/2000 [4:16:13<2:02:57, 10.19s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1277/2000 [4:16:23<2:01:55, 10.12s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1278/2000 [4:16:33<2:02:07, 10.15s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1279/2000 [4:16:44<2:02:48, 10.22s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1280/2000 [4:16:54<2:03:05, 10.26s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1281/2000 [4:17:04<2:02:12, 10.20s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1282/2000 [4:17:14<2:02:43, 10.26s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1283/2000 [4:17:24<2:01:52, 10.20s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1284/2000 [4:17:34<2:00:55, 10.13s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1285/2000 [4:17:45<2:03:09, 10.33s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1286/2000 [4:17:56<2:03:25, 10.37s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1287/2000 [4:18:06<2:02:04, 10.27s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1288/2000 [4:18:16<2:01:17, 10.22s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1289/2000 [4:18:26<2:01:19, 10.24s/it, Training Loss: 1.267 Validation Loss: 1.102]

 64%|██████▍   | 1290/2000 [4:18:36<2:01:18, 10.25s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1291/2000 [4:18:47<2:00:47, 10.22s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1292/2000 [4:18:56<1:59:45, 10.15s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1293/2000 [4:19:07<2:01:26, 10.31s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1294/2000 [4:19:17<2:00:14, 10.22s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1295/2000 [4:19:28<2:00:36, 10.26s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1296/2000 [4:19:37<1:59:06, 10.15s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1297/2000 [4:19:47<1:58:27, 10.11s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1298/2000 [4:19:58<1:58:24, 10.12s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▍   | 1299/2000 [4:20:08<1:58:08, 10.11s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▌   | 1300/2000 [4:20:17<1:56:52, 10.02s/it, Training Loss: 1.267 Validation Loss: 1.102]

 65%|██████▌   | 1300/2000 [4:21:13<1:56:52, 10.02s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1300/2000 [4:21:13<1:56:52, 10.02s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1301/2000 [4:21:23<5:10:06, 26.62s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1302/2000 [4:21:33<4:13:13, 21.77s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1303/2000 [4:21:43<3:32:06, 18.26s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1304/2000 [4:21:53<3:03:14, 15.80s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1305/2000 [4:22:04<2:43:26, 14.11s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1306/2000 [4:22:14<2:30:28, 13.01s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1307/2000 [4:22:24<2:19:51, 12.11s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1308/2000 [4:22:34<2:12:40, 11.50s/it, Training Loss: 1.255 Validation Loss: 1.083]

 65%|██████▌   | 1309/2000 [4:22:44<2:06:12, 10.96s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1310/2000 [4:22:54<2:03:51, 10.77s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1311/2000 [4:23:05<2:02:24, 10.66s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1312/2000 [4:23:15<2:00:28, 10.51s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1313/2000 [4:23:25<1:58:29, 10.35s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1314/2000 [4:23:35<1:57:47, 10.30s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1315/2000 [4:23:45<1:57:38, 10.30s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1316/2000 [4:23:55<1:57:09, 10.28s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1317/2000 [4:24:05<1:56:08, 10.20s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1318/2000 [4:24:16<1:55:37, 10.17s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1319/2000 [4:24:26<1:57:53, 10.39s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1320/2000 [4:24:36<1:56:18, 10.26s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1321/2000 [4:24:47<1:55:43, 10.23s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1322/2000 [4:24:57<1:55:07, 10.19s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1323/2000 [4:25:06<1:53:33, 10.06s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▌   | 1324/2000 [4:25:16<1:53:14, 10.05s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▋   | 1325/2000 [4:25:27<1:55:33, 10.27s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▋   | 1326/2000 [4:25:37<1:54:19, 10.18s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▋   | 1327/2000 [4:25:47<1:53:21, 10.11s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▋   | 1328/2000 [4:25:57<1:52:47, 10.07s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▋   | 1329/2000 [4:26:08<1:55:31, 10.33s/it, Training Loss: 1.255 Validation Loss: 1.083]

 66%|██████▋   | 1330/2000 [4:26:18<1:54:06, 10.22s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1331/2000 [4:26:28<1:52:16, 10.07s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1332/2000 [4:26:38<1:52:23, 10.09s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1333/2000 [4:26:49<1:55:46, 10.42s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1334/2000 [4:26:59<1:52:56, 10.17s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1335/2000 [4:27:09<1:52:09, 10.12s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1336/2000 [4:27:19<1:54:07, 10.31s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1337/2000 [4:27:29<1:53:06, 10.24s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1338/2000 [4:27:39<1:51:05, 10.07s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1339/2000 [4:27:50<1:54:07, 10.36s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1340/2000 [4:28:00<1:53:08, 10.29s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1341/2000 [4:28:10<1:51:43, 10.17s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1342/2000 [4:28:20<1:51:15, 10.14s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1343/2000 [4:28:30<1:50:36, 10.10s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1344/2000 [4:28:40<1:49:16,  9.99s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1345/2000 [4:28:51<1:52:06, 10.27s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1346/2000 [4:29:01<1:51:16, 10.21s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1347/2000 [4:29:11<1:49:27, 10.06s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1348/2000 [4:29:21<1:51:37, 10.27s/it, Training Loss: 1.255 Validation Loss: 1.083]

 67%|██████▋   | 1349/2000 [4:29:32<1:52:05, 10.33s/it, Training Loss: 1.255 Validation Loss: 1.083]

 68%|██████▊   | 1350/2000 [4:29:42<1:49:51, 10.14s/it, Training Loss: 1.255 Validation Loss: 1.083]

 68%|██████▊   | 1350/2000 [4:30:36<1:49:51, 10.14s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1350/2000 [4:30:36<1:49:51, 10.14s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1351/2000 [4:30:46<4:45:28, 26.39s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1352/2000 [4:30:56<3:53:28, 21.62s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1353/2000 [4:31:07<3:16:25, 18.22s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1354/2000 [4:31:17<2:49:43, 15.76s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1355/2000 [4:31:26<2:29:52, 13.94s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1356/2000 [4:31:37<2:19:31, 13.00s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1357/2000 [4:31:48<2:10:34, 12.18s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1358/2000 [4:31:57<2:01:58, 11.40s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1359/2000 [4:32:08<2:00:14, 11.26s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1360/2000 [4:32:18<1:55:53, 10.86s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1361/2000 [4:32:28<1:52:03, 10.52s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1362/2000 [4:32:38<1:50:15, 10.37s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1363/2000 [4:32:48<1:49:46, 10.34s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1364/2000 [4:32:58<1:47:05, 10.10s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1365/2000 [4:33:08<1:48:53, 10.29s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1366/2000 [4:33:18<1:48:01, 10.22s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1367/2000 [4:33:28<1:46:02, 10.05s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1368/2000 [4:33:38<1:45:45, 10.04s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1369/2000 [4:33:49<1:48:27, 10.31s/it, Training Loss: 1.246 Validation Loss: 1.096]

 68%|██████▊   | 1370/2000 [4:33:59<1:46:23, 10.13s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▊   | 1371/2000 [4:34:09<1:46:17, 10.14s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▊   | 1372/2000 [4:34:19<1:45:58, 10.12s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▊   | 1373/2000 [4:34:29<1:44:59, 10.05s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▊   | 1374/2000 [4:34:39<1:44:15,  9.99s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1375/2000 [4:34:49<1:45:34, 10.13s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1376/2000 [4:34:59<1:44:09, 10.02s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1377/2000 [4:35:09<1:43:33,  9.97s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1378/2000 [4:35:19<1:43:33,  9.99s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1379/2000 [4:35:30<1:45:52, 10.23s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1380/2000 [4:35:39<1:43:45, 10.04s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1381/2000 [4:35:49<1:43:56, 10.08s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1382/2000 [4:36:00<1:45:52, 10.28s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1383/2000 [4:36:10<1:44:56, 10.20s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1384/2000 [4:36:20<1:44:12, 10.15s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1385/2000 [4:36:30<1:43:41, 10.12s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1386/2000 [4:36:40<1:42:53, 10.05s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1387/2000 [4:36:51<1:45:04, 10.28s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1388/2000 [4:37:01<1:44:08, 10.21s/it, Training Loss: 1.246 Validation Loss: 1.096]

 69%|██████▉   | 1389/2000 [4:37:11<1:42:50, 10.10s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1390/2000 [4:37:21<1:43:19, 10.16s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1391/2000 [4:37:31<1:43:19, 10.18s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1392/2000 [4:37:41<1:41:49, 10.05s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1393/2000 [4:37:51<1:41:44, 10.06s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1394/2000 [4:38:02<1:43:43, 10.27s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1395/2000 [4:38:12<1:43:01, 10.22s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1396/2000 [4:38:22<1:40:57, 10.03s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1397/2000 [4:38:32<1:41:08, 10.06s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1398/2000 [4:38:42<1:40:40, 10.03s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|██████▉   | 1399/2000 [4:38:52<1:42:15, 10.21s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|███████   | 1400/2000 [4:39:02<1:41:34, 10.16s/it, Training Loss: 1.246 Validation Loss: 1.096]

 70%|███████   | 1400/2000 [4:39:59<1:41:34, 10.16s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1400/2000 [4:39:59<1:41:34, 10.16s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1401/2000 [4:40:09<4:31:35, 27.20s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1402/2000 [4:40:19<3:38:57, 21.97s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1403/2000 [4:40:29<3:02:41, 18.36s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1404/2000 [4:40:39<2:37:59, 15.91s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1405/2000 [4:40:49<2:19:36, 14.08s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1406/2000 [4:40:59<2:08:35, 12.99s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1407/2000 [4:41:10<2:00:48, 12.22s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1408/2000 [4:41:19<1:52:41, 11.42s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1409/2000 [4:41:29<1:48:31, 11.02s/it, Training Loss: 1.244 Validation Loss: 1.084]

 70%|███████   | 1410/2000 [4:41:40<1:45:42, 10.75s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1411/2000 [4:41:50<1:45:09, 10.71s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1412/2000 [4:42:00<1:43:01, 10.51s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1413/2000 [4:42:10<1:40:57, 10.32s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1414/2000 [4:42:20<1:40:36, 10.30s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1415/2000 [4:42:30<1:39:22, 10.19s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1416/2000 [4:42:41<1:39:38, 10.24s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1417/2000 [4:42:51<1:38:43, 10.16s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1418/2000 [4:43:01<1:38:03, 10.11s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1419/2000 [4:43:11<1:39:55, 10.32s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1420/2000 [4:43:21<1:38:38, 10.20s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1421/2000 [4:43:31<1:38:05, 10.16s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1422/2000 [4:43:41<1:37:33, 10.13s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1423/2000 [4:43:52<1:39:13, 10.32s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████   | 1424/2000 [4:44:02<1:37:37, 10.17s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████▏  | 1425/2000 [4:44:12<1:37:11, 10.14s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████▏  | 1426/2000 [4:44:22<1:35:58, 10.03s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████▏  | 1427/2000 [4:44:33<1:38:21, 10.30s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████▏  | 1428/2000 [4:44:43<1:37:29, 10.23s/it, Training Loss: 1.244 Validation Loss: 1.084]

 71%|███████▏  | 1429/2000 [4:44:53<1:36:18, 10.12s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1430/2000 [4:45:03<1:36:17, 10.14s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1431/2000 [4:45:13<1:35:49, 10.10s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1432/2000 [4:45:23<1:34:24,  9.97s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1433/2000 [4:45:33<1:34:06,  9.96s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1434/2000 [4:45:43<1:34:18, 10.00s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1435/2000 [4:45:53<1:34:34, 10.04s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1436/2000 [4:46:03<1:35:50, 10.20s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1437/2000 [4:46:13<1:34:49, 10.11s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1438/2000 [4:46:23<1:33:29,  9.98s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1439/2000 [4:46:34<1:35:23, 10.20s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1440/2000 [4:46:43<1:34:06, 10.08s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1441/2000 [4:46:54<1:33:50, 10.07s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1442/2000 [4:47:04<1:34:57, 10.21s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1443/2000 [4:47:14<1:34:10, 10.14s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1444/2000 [4:47:24<1:33:08, 10.05s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1445/2000 [4:47:34<1:33:12, 10.08s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1446/2000 [4:47:44<1:33:33, 10.13s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1447/2000 [4:47:55<1:33:45, 10.17s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1448/2000 [4:48:05<1:33:31, 10.16s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▏  | 1449/2000 [4:48:15<1:32:51, 10.11s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▎  | 1450/2000 [4:48:25<1:33:43, 10.22s/it, Training Loss: 1.244 Validation Loss: 1.084]

 72%|███████▎  | 1450/2000 [4:49:21<1:33:43, 10.22s/it, Training Loss: 1.239 Validation Loss: 1.03] 

 72%|███████▎  | 1450/2000 [4:49:21<1:33:43, 10.22s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1451/2000 [4:49:31<4:05:42, 26.85s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1452/2000 [4:49:41<3:19:23, 21.83s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1453/2000 [4:49:51<2:46:49, 18.30s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1454/2000 [4:50:01<2:23:27, 15.77s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1455/2000 [4:50:11<2:07:53, 14.08s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1456/2000 [4:50:21<1:57:46, 12.99s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1457/2000 [4:50:32<1:49:41, 12.12s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1458/2000 [4:50:41<1:43:25, 11.45s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1459/2000 [4:50:52<1:40:01, 11.09s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1460/2000 [4:51:02<1:38:42, 10.97s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1461/2000 [4:51:12<1:35:18, 10.61s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1462/2000 [4:51:22<1:33:15, 10.40s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1463/2000 [4:51:32<1:31:59, 10.28s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1464/2000 [4:51:42<1:30:02, 10.08s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1465/2000 [4:51:53<1:32:49, 10.41s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1466/2000 [4:52:02<1:30:38, 10.18s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1467/2000 [4:52:12<1:29:23, 10.06s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1468/2000 [4:52:22<1:29:06, 10.05s/it, Training Loss: 1.239 Validation Loss: 1.03]

 73%|███████▎  | 1469/2000 [4:52:33<1:29:31, 10.12s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▎  | 1470/2000 [4:52:43<1:29:04, 10.08s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▎  | 1471/2000 [4:52:53<1:29:11, 10.12s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▎  | 1472/2000 [4:53:03<1:28:12, 10.02s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▎  | 1473/2000 [4:53:13<1:29:18, 10.17s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▎  | 1474/2000 [4:53:23<1:29:06, 10.16s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1475/2000 [4:53:33<1:27:15,  9.97s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1476/2000 [4:53:43<1:27:19, 10.00s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1477/2000 [4:53:54<1:29:20, 10.25s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1478/2000 [4:54:03<1:27:35, 10.07s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1479/2000 [4:54:13<1:27:23, 10.07s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1480/2000 [4:54:23<1:27:08, 10.06s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1481/2000 [4:54:34<1:28:19, 10.21s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1482/2000 [4:54:44<1:27:37, 10.15s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1483/2000 [4:54:54<1:27:56, 10.21s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1484/2000 [4:55:04<1:26:25, 10.05s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1485/2000 [4:55:14<1:26:59, 10.13s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1486/2000 [4:55:25<1:28:25, 10.32s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1487/2000 [4:55:35<1:26:46, 10.15s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1488/2000 [4:55:45<1:25:25, 10.01s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1489/2000 [4:55:55<1:27:08, 10.23s/it, Training Loss: 1.239 Validation Loss: 1.03]

 74%|███████▍  | 1490/2000 [4:56:05<1:25:48, 10.10s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1491/2000 [4:56:15<1:24:59, 10.02s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1492/2000 [4:56:25<1:25:22, 10.08s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1493/2000 [4:56:36<1:26:03, 10.18s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1494/2000 [4:56:45<1:24:50, 10.06s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1495/2000 [4:56:55<1:24:57, 10.09s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1496/2000 [4:57:06<1:25:49, 10.22s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1497/2000 [4:57:16<1:24:52, 10.12s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1498/2000 [4:57:26<1:24:41, 10.12s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▍  | 1499/2000 [4:57:36<1:23:52, 10.05s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▌  | 1500/2000 [4:57:46<1:24:25, 10.13s/it, Training Loss: 1.239 Validation Loss: 1.03]

 75%|███████▌  | 1500/2000 [4:58:42<1:24:25, 10.13s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1500/2000 [4:58:42<1:24:25, 10.13s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1501/2000 [4:58:55<3:51:32, 27.84s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1502/2000 [4:59:07<3:09:50, 22.87s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1503/2000 [4:59:16<2:36:53, 18.94s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1504/2000 [4:59:27<2:15:01, 16.33s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1505/2000 [4:59:37<1:59:23, 14.47s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1506/2000 [4:59:46<1:47:04, 13.00s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1507/2000 [4:59:57<1:39:46, 12.14s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1508/2000 [5:00:07<1:35:14, 11.61s/it, Training Loss: 1.228 Validation Loss: 1.058]

 75%|███████▌  | 1509/2000 [5:00:17<1:31:59, 11.24s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1510/2000 [5:00:27<1:28:06, 10.79s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1511/2000 [5:00:37<1:25:53, 10.54s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1512/2000 [5:00:47<1:25:29, 10.51s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1513/2000 [5:00:58<1:24:26, 10.40s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1514/2000 [5:01:08<1:23:47, 10.34s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1515/2000 [5:01:18<1:22:38, 10.22s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1516/2000 [5:01:27<1:21:25, 10.09s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1517/2000 [5:01:37<1:20:47, 10.04s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1518/2000 [5:01:48<1:22:56, 10.33s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1519/2000 [5:01:58<1:21:11, 10.13s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1520/2000 [5:02:08<1:21:01, 10.13s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1521/2000 [5:02:18<1:20:54, 10.14s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1522/2000 [5:02:28<1:19:26,  9.97s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1523/2000 [5:02:38<1:19:30, 10.00s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▌  | 1524/2000 [5:02:48<1:19:33, 10.03s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▋  | 1525/2000 [5:02:58<1:18:36,  9.93s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▋  | 1526/2000 [5:03:08<1:19:19, 10.04s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▋  | 1527/2000 [5:03:18<1:19:56, 10.14s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▋  | 1528/2000 [5:03:28<1:18:46, 10.01s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▋  | 1529/2000 [5:03:38<1:18:50, 10.04s/it, Training Loss: 1.228 Validation Loss: 1.058]

 76%|███████▋  | 1530/2000 [5:03:49<1:20:08, 10.23s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1531/2000 [5:03:59<1:18:42, 10.07s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1532/2000 [5:04:09<1:19:24, 10.18s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1533/2000 [5:04:19<1:18:11, 10.05s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1534/2000 [5:04:29<1:18:03, 10.05s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1535/2000 [5:04:39<1:17:57, 10.06s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1536/2000 [5:04:49<1:17:52, 10.07s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1537/2000 [5:04:59<1:17:01,  9.98s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1538/2000 [5:05:09<1:17:14, 10.03s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1539/2000 [5:05:19<1:16:19,  9.93s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1540/2000 [5:05:29<1:15:57,  9.91s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1541/2000 [5:05:39<1:16:14,  9.97s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1542/2000 [5:05:49<1:16:03,  9.96s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1543/2000 [5:05:59<1:17:17, 10.15s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1544/2000 [5:06:09<1:16:53, 10.12s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1545/2000 [5:06:19<1:15:44,  9.99s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1546/2000 [5:06:29<1:15:40, 10.00s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1547/2000 [5:06:39<1:16:19, 10.11s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1548/2000 [5:06:50<1:16:38, 10.17s/it, Training Loss: 1.228 Validation Loss: 1.058]

 77%|███████▋  | 1549/2000 [5:07:00<1:15:53, 10.10s/it, Training Loss: 1.228 Validation Loss: 1.058]

 78%|███████▊  | 1550/2000 [5:07:10<1:16:26, 10.19s/it, Training Loss: 1.228 Validation Loss: 1.058]

 78%|███████▊  | 1550/2000 [5:08:06<1:16:26, 10.19s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1550/2000 [5:08:06<1:16:26, 10.19s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1551/2000 [5:08:17<3:23:08, 27.14s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1552/2000 [5:08:26<2:43:41, 21.92s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1553/2000 [5:08:36<2:16:35, 18.33s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1554/2000 [5:08:47<1:59:22, 16.06s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1555/2000 [5:08:57<1:45:53, 14.28s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1556/2000 [5:09:07<1:36:07, 12.99s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1557/2000 [5:09:17<1:28:39, 12.01s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1558/2000 [5:09:28<1:25:46, 11.64s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1559/2000 [5:09:38<1:22:15, 11.19s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1560/2000 [5:09:48<1:19:16, 10.81s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1561/2000 [5:09:58<1:17:04, 10.53s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1562/2000 [5:10:08<1:16:22, 10.46s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1563/2000 [5:10:18<1:15:05, 10.31s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1564/2000 [5:10:27<1:13:14, 10.08s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1565/2000 [5:10:37<1:12:56, 10.06s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1566/2000 [5:10:48<1:13:08, 10.11s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1567/2000 [5:10:58<1:12:49, 10.09s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1568/2000 [5:11:08<1:12:40, 10.09s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1569/2000 [5:11:18<1:12:21, 10.07s/it, Training Loss: 1.219 Validation Loss: 1.083]

 78%|███████▊  | 1570/2000 [5:11:28<1:11:27,  9.97s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▊  | 1571/2000 [5:11:38<1:12:15, 10.11s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▊  | 1572/2000 [5:11:48<1:12:38, 10.18s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▊  | 1573/2000 [5:11:58<1:11:04,  9.99s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▊  | 1574/2000 [5:12:08<1:12:05, 10.15s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1575/2000 [5:12:19<1:12:24, 10.22s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1576/2000 [5:12:29<1:11:20, 10.10s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1577/2000 [5:12:39<1:11:20, 10.12s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1578/2000 [5:12:49<1:11:13, 10.13s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1579/2000 [5:12:58<1:09:43,  9.94s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1580/2000 [5:13:09<1:10:46, 10.11s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1581/2000 [5:13:19<1:11:02, 10.17s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1582/2000 [5:13:29<1:09:30,  9.98s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1583/2000 [5:13:39<1:09:25,  9.99s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1584/2000 [5:13:50<1:10:42, 10.20s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1585/2000 [5:13:59<1:09:21, 10.03s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1586/2000 [5:14:10<1:10:24, 10.20s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1587/2000 [5:14:20<1:10:42, 10.27s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1588/2000 [5:14:30<1:09:51, 10.17s/it, Training Loss: 1.219 Validation Loss: 1.083]

 79%|███████▉  | 1589/2000 [5:14:40<1:09:07, 10.09s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1590/2000 [5:14:50<1:09:22, 10.15s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1591/2000 [5:15:00<1:08:01,  9.98s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1592/2000 [5:15:10<1:07:59, 10.00s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1593/2000 [5:15:20<1:07:50, 10.00s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1594/2000 [5:15:30<1:08:19, 10.10s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1595/2000 [5:15:40<1:08:01, 10.08s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1596/2000 [5:15:50<1:07:49, 10.07s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1597/2000 [5:16:00<1:06:52,  9.96s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1598/2000 [5:16:11<1:08:18, 10.20s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|███████▉  | 1599/2000 [5:16:21<1:07:52, 10.16s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|████████  | 1600/2000 [5:16:31<1:06:48, 10.02s/it, Training Loss: 1.219 Validation Loss: 1.083]

 80%|████████  | 1600/2000 [5:17:26<1:06:48, 10.02s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1600/2000 [5:17:26<1:06:48, 10.02s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1601/2000 [5:17:36<2:57:32, 26.70s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1602/2000 [5:17:46<2:24:00, 21.71s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1603/2000 [5:17:56<2:00:13, 18.17s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1604/2000 [5:18:06<1:43:40, 15.71s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1605/2000 [5:18:17<1:33:36, 14.22s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1606/2000 [5:18:27<1:24:21, 12.85s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1607/2000 [5:18:36<1:18:29, 11.98s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1608/2000 [5:18:47<1:15:22, 11.54s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1609/2000 [5:18:57<1:12:12, 11.08s/it, Training Loss: 1.214 Validation Loss: 1.063]

 80%|████████  | 1610/2000 [5:19:07<1:10:15, 10.81s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1611/2000 [5:19:17<1:08:25, 10.55s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1612/2000 [5:19:28<1:08:13, 10.55s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1613/2000 [5:19:38<1:07:03, 10.40s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1614/2000 [5:19:48<1:06:12, 10.29s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1615/2000 [5:19:58<1:05:22, 10.19s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1616/2000 [5:20:08<1:05:01, 10.16s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1617/2000 [5:20:18<1:04:45, 10.14s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1618/2000 [5:20:28<1:03:46, 10.02s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1619/2000 [5:20:37<1:03:08,  9.94s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1620/2000 [5:20:47<1:03:15,  9.99s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1621/2000 [5:20:58<1:03:28, 10.05s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1622/2000 [5:21:08<1:03:47, 10.13s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1623/2000 [5:21:18<1:03:05, 10.04s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████  | 1624/2000 [5:21:28<1:02:28,  9.97s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████▏ | 1625/2000 [5:21:38<1:03:28, 10.16s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████▏ | 1626/2000 [5:21:48<1:03:23, 10.17s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████▏ | 1627/2000 [5:21:58<1:02:41, 10.08s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████▏ | 1628/2000 [5:22:10<1:06:09, 10.67s/it, Training Loss: 1.214 Validation Loss: 1.063]

 81%|████████▏ | 1629/2000 [5:22:21<1:05:03, 10.52s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1630/2000 [5:22:31<1:04:19, 10.43s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1631/2000 [5:22:41<1:04:07, 10.43s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1632/2000 [5:22:51<1:03:23, 10.34s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1633/2000 [5:23:01<1:02:46, 10.26s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1634/2000 [5:23:12<1:04:05, 10.51s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1635/2000 [5:23:22<1:02:45, 10.32s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1636/2000 [5:23:32<1:02:15, 10.26s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1637/2000 [5:23:43<1:02:06, 10.27s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1638/2000 [5:23:53<1:02:27, 10.35s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1639/2000 [5:24:04<1:03:07, 10.49s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1640/2000 [5:24:15<1:03:27, 10.58s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1641/2000 [5:24:26<1:03:56, 10.69s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1642/2000 [5:24:36<1:02:30, 10.48s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1643/2000 [5:24:46<1:01:52, 10.40s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1644/2000 [5:24:56<1:01:17, 10.33s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1645/2000 [5:25:06<1:00:24, 10.21s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1646/2000 [5:25:16<59:59, 10.17s/it, Training Loss: 1.214 Validation Loss: 1.063]  

 82%|████████▏ | 1647/2000 [5:25:26<59:54, 10.18s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1648/2000 [5:25:36<58:48, 10.02s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▏ | 1649/2000 [5:25:46<58:59, 10.09s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▎ | 1650/2000 [5:25:57<1:00:08, 10.31s/it, Training Loss: 1.214 Validation Loss: 1.063]

 82%|████████▎ | 1650/2000 [5:26:52<1:00:08, 10.31s/it, Training Loss: 1.208 Validation Loss: 1.04] 

 82%|████████▎ | 1650/2000 [5:26:52<1:00:08, 10.31s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1651/2000 [5:27:02<2:35:57, 26.81s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1652/2000 [5:27:13<2:06:55, 21.88s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1653/2000 [5:27:23<1:45:52, 18.31s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1654/2000 [5:27:33<1:31:14, 15.82s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1655/2000 [5:27:43<1:21:08, 14.11s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1656/2000 [5:27:53<1:14:35, 13.01s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1657/2000 [5:28:04<1:09:25, 12.14s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1658/2000 [5:28:13<1:04:51, 11.38s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1659/2000 [5:28:23<1:02:53, 11.07s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1660/2000 [5:28:34<1:01:26, 10.84s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1661/2000 [5:28:43<58:59, 10.44s/it, Training Loss: 1.208 Validation Loss: 1.04]  

 83%|████████▎ | 1662/2000 [5:28:54<58:33, 10.40s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1663/2000 [5:29:03<57:36, 10.26s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1664/2000 [5:29:13<56:24, 10.07s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1665/2000 [5:29:23<55:54, 10.01s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1666/2000 [5:29:33<56:10, 10.09s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1667/2000 [5:29:43<55:21,  9.97s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1668/2000 [5:29:53<55:03,  9.95s/it, Training Loss: 1.208 Validation Loss: 1.04]

 83%|████████▎ | 1669/2000 [5:30:03<55:00,  9.97s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▎ | 1670/2000 [5:30:13<54:19,  9.88s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▎ | 1671/2000 [5:30:23<55:06, 10.05s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▎ | 1672/2000 [5:30:33<55:34, 10.16s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▎ | 1673/2000 [5:30:43<54:34, 10.01s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▎ | 1674/2000 [5:30:53<54:43, 10.07s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1675/2000 [5:31:04<55:27, 10.24s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1676/2000 [5:31:14<54:34, 10.11s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1677/2000 [5:31:24<54:04, 10.04s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1678/2000 [5:31:34<53:53, 10.04s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1679/2000 [5:31:44<54:01, 10.10s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1680/2000 [5:31:54<54:22, 10.19s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1681/2000 [5:32:04<54:00, 10.16s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1682/2000 [5:32:14<53:19, 10.06s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1683/2000 [5:32:25<54:02, 10.23s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1684/2000 [5:32:35<53:17, 10.12s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1685/2000 [5:32:44<52:22,  9.98s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1686/2000 [5:32:55<52:43, 10.08s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1687/2000 [5:33:04<51:52,  9.95s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1688/2000 [5:33:15<53:04, 10.21s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1689/2000 [5:33:25<52:31, 10.13s/it, Training Loss: 1.208 Validation Loss: 1.04]

 84%|████████▍ | 1690/2000 [5:33:35<51:22,  9.94s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1691/2000 [5:33:45<52:15, 10.15s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1692/2000 [5:33:55<51:48, 10.09s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1693/2000 [5:34:05<51:27, 10.06s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1694/2000 [5:34:15<50:40,  9.94s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1695/2000 [5:34:26<52:24, 10.31s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1696/2000 [5:34:36<51:16, 10.12s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1697/2000 [5:34:46<50:47, 10.06s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1698/2000 [5:34:56<50:57, 10.12s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▍ | 1699/2000 [5:35:05<50:02,  9.98s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▌ | 1700/2000 [5:35:15<49:30,  9.90s/it, Training Loss: 1.208 Validation Loss: 1.04]

 85%|████████▌ | 1700/2000 [5:36:10<49:30,  9.90s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1700/2000 [5:36:10<49:30,  9.90s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1701/2000 [5:36:20<2:11:01, 26.29s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1702/2000 [5:36:30<1:46:05, 21.36s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1703/2000 [5:36:40<1:29:49, 18.15s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1704/2000 [5:36:50<1:17:36, 15.73s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1705/2000 [5:37:00<1:08:22, 13.91s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1706/2000 [5:37:11<1:03:27, 12.95s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1707/2000 [5:37:21<59:03, 12.10s/it, Training Loss: 1.192 Validation Loss: 1.051]  

 85%|████████▌ | 1708/2000 [5:37:30<55:07, 11.33s/it, Training Loss: 1.192 Validation Loss: 1.051]

 85%|████████▌ | 1709/2000 [5:37:40<53:09, 10.96s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1710/2000 [5:37:51<52:20, 10.83s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1711/2000 [5:38:01<50:35, 10.50s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1712/2000 [5:38:11<49:36, 10.34s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1713/2000 [5:38:21<48:44, 10.19s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1714/2000 [5:38:30<47:57, 10.06s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1715/2000 [5:38:41<48:47, 10.27s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1716/2000 [5:38:51<47:52, 10.11s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1717/2000 [5:39:01<47:10, 10.00s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1718/2000 [5:39:11<47:52, 10.19s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1719/2000 [5:39:21<47:11, 10.08s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1720/2000 [5:39:31<46:57, 10.06s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1721/2000 [5:39:41<46:37, 10.03s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1722/2000 [5:39:51<46:15,  9.98s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1723/2000 [5:40:01<45:56,  9.95s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▌ | 1724/2000 [5:40:11<45:58,  9.99s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▋ | 1725/2000 [5:40:21<45:28,  9.92s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▋ | 1726/2000 [5:40:31<46:13, 10.12s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▋ | 1727/2000 [5:40:41<45:43, 10.05s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▋ | 1728/2000 [5:40:51<45:17,  9.99s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▋ | 1729/2000 [5:41:01<45:42, 10.12s/it, Training Loss: 1.192 Validation Loss: 1.051]

 86%|████████▋ | 1730/2000 [5:41:11<45:38, 10.14s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1731/2000 [5:41:21<44:38,  9.96s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1732/2000 [5:41:31<44:26,  9.95s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1733/2000 [5:41:41<44:41, 10.04s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1734/2000 [5:41:51<44:42, 10.08s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1735/2000 [5:42:01<44:15, 10.02s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1736/2000 [5:42:11<43:56,  9.99s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1737/2000 [5:42:22<44:29, 10.15s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1738/2000 [5:42:32<44:03, 10.09s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1739/2000 [5:42:41<43:27,  9.99s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1740/2000 [5:42:51<43:10,  9.96s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1741/2000 [5:43:01<42:50,  9.92s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1742/2000 [5:43:12<43:48, 10.19s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1743/2000 [5:43:21<42:44,  9.98s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1744/2000 [5:43:31<42:36,  9.99s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1745/2000 [5:43:42<42:56, 10.10s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1746/2000 [5:43:52<42:34, 10.06s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1747/2000 [5:44:02<42:38, 10.11s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1748/2000 [5:44:12<42:08, 10.03s/it, Training Loss: 1.192 Validation Loss: 1.051]

 87%|████████▋ | 1749/2000 [5:44:22<41:50, 10.00s/it, Training Loss: 1.192 Validation Loss: 1.051]

 88%|████████▊ | 1750/2000 [5:44:32<41:55, 10.06s/it, Training Loss: 1.192 Validation Loss: 1.051]

 88%|████████▊ | 1750/2000 [5:45:26<41:55, 10.06s/it, Training Loss: 1.19 Validation Loss: 1.031] 

 88%|████████▊ | 1750/2000 [5:45:26<41:55, 10.06s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1751/2000 [5:45:36<1:49:11, 26.31s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1752/2000 [5:45:46<1:28:09, 21.33s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1753/2000 [5:45:57<1:14:33, 18.11s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1754/2000 [5:46:07<1:04:39, 15.77s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1755/2000 [5:46:17<57:06, 13.99s/it, Training Loss: 1.19 Validation Loss: 1.031]  

 88%|████████▊ | 1756/2000 [5:46:27<51:59, 12.79s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1757/2000 [5:46:36<48:08, 11.89s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1758/2000 [5:46:47<46:21, 11.50s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1759/2000 [5:46:57<44:26, 11.06s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1760/2000 [5:47:07<42:36, 10.65s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1761/2000 [5:47:17<42:33, 10.68s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1762/2000 [5:47:28<41:49, 10.54s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1763/2000 [5:47:37<40:30, 10.26s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1764/2000 [5:47:48<40:18, 10.25s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1765/2000 [5:47:58<40:47, 10.42s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1766/2000 [5:48:08<39:43, 10.19s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1767/2000 [5:48:18<39:19, 10.12s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1768/2000 [5:48:28<39:01, 10.09s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1769/2000 [5:48:38<38:24,  9.97s/it, Training Loss: 1.19 Validation Loss: 1.031]

 88%|████████▊ | 1770/2000 [5:48:48<39:12, 10.23s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▊ | 1771/2000 [5:48:59<38:51, 10.18s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▊ | 1772/2000 [5:49:08<37:57,  9.99s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▊ | 1773/2000 [5:49:19<38:36, 10.21s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▊ | 1774/2000 [5:49:29<38:19, 10.18s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1775/2000 [5:49:39<37:38, 10.04s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1776/2000 [5:49:49<37:39, 10.09s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1777/2000 [5:49:59<37:10, 10.00s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1778/2000 [5:50:08<36:41,  9.92s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1779/2000 [5:50:18<36:40,  9.96s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1780/2000 [5:50:28<36:04,  9.84s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1781/2000 [5:50:38<35:57,  9.85s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1782/2000 [5:50:48<36:02,  9.92s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1783/2000 [5:50:58<36:16, 10.03s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1784/2000 [5:51:09<36:23, 10.11s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1785/2000 [5:51:18<35:50, 10.00s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1786/2000 [5:51:28<35:21,  9.91s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1787/2000 [5:51:39<35:56, 10.12s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1788/2000 [5:51:49<35:55, 10.17s/it, Training Loss: 1.19 Validation Loss: 1.031]

 89%|████████▉ | 1789/2000 [5:51:58<35:03,  9.97s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1790/2000 [5:52:09<35:16, 10.08s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1791/2000 [5:52:19<35:22, 10.16s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1792/2000 [5:52:29<34:40, 10.00s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1793/2000 [5:52:39<34:46, 10.08s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1794/2000 [5:52:49<34:30, 10.05s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1795/2000 [5:52:59<33:52,  9.91s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1796/2000 [5:53:09<34:30, 10.15s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1797/2000 [5:53:19<34:01, 10.06s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1798/2000 [5:53:29<33:21,  9.91s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|████████▉ | 1799/2000 [5:53:39<33:54, 10.12s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|█████████ | 1800/2000 [5:53:50<33:57, 10.19s/it, Training Loss: 1.19 Validation Loss: 1.031]

 90%|█████████ | 1800/2000 [5:54:44<33:57, 10.19s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1800/2000 [5:54:44<33:57, 10.19s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1801/2000 [5:54:54<1:28:03, 26.55s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1802/2000 [5:55:05<1:11:38, 21.71s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1803/2000 [5:55:15<59:49, 18.22s/it, Training Loss: 1.176 Validation Loss: 1.026]  

 90%|█████████ | 1804/2000 [5:55:25<51:22, 15.73s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1805/2000 [5:55:34<45:13, 13.91s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1806/2000 [5:55:44<41:12, 12.75s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1807/2000 [5:55:55<38:30, 11.97s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1808/2000 [5:56:05<36:46, 11.49s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1809/2000 [5:56:15<35:05, 11.02s/it, Training Loss: 1.176 Validation Loss: 1.026]

 90%|█████████ | 1810/2000 [5:56:25<33:59, 10.73s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1811/2000 [5:56:35<32:43, 10.39s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1812/2000 [5:56:45<32:51, 10.49s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1813/2000 [5:56:55<32:22, 10.39s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1814/2000 [5:57:05<31:21, 10.12s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1815/2000 [5:57:15<31:12, 10.12s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1816/2000 [5:57:26<31:26, 10.25s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1817/2000 [5:57:35<30:47, 10.09s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1818/2000 [5:57:45<30:30, 10.06s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1819/2000 [5:57:56<30:55, 10.25s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1820/2000 [5:58:06<30:24, 10.13s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1821/2000 [5:58:16<30:14, 10.14s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1822/2000 [5:58:26<29:42, 10.01s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1823/2000 [5:58:35<29:19,  9.94s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████ | 1824/2000 [5:58:46<29:38, 10.11s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████▏| 1825/2000 [5:58:56<29:32, 10.13s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████▏| 1826/2000 [5:59:06<29:18, 10.11s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████▏| 1827/2000 [5:59:19<31:09, 10.81s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████▏| 1828/2000 [5:59:30<31:46, 11.08s/it, Training Loss: 1.176 Validation Loss: 1.026]

 91%|█████████▏| 1829/2000 [5:59:40<30:31, 10.71s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1830/2000 [5:59:50<29:48, 10.52s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1831/2000 [6:00:00<29:20, 10.42s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1832/2000 [6:00:10<28:42, 10.25s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1833/2000 [6:00:21<28:29, 10.24s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1834/2000 [6:00:31<28:44, 10.39s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1835/2000 [6:00:41<28:04, 10.21s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1836/2000 [6:00:51<27:54, 10.21s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1837/2000 [6:01:02<28:01, 10.32s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1838/2000 [6:01:12<27:29, 10.18s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1839/2000 [6:01:21<26:50, 10.00s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1840/2000 [6:01:32<27:12, 10.20s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1841/2000 [6:01:42<26:54, 10.15s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1842/2000 [6:01:52<26:28, 10.05s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1843/2000 [6:02:02<26:18, 10.05s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1844/2000 [6:02:13<26:35, 10.23s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1845/2000 [6:02:22<25:57, 10.05s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1846/2000 [6:02:32<25:50, 10.07s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1847/2000 [6:02:42<25:34, 10.03s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1848/2000 [6:02:52<25:16,  9.98s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▏| 1849/2000 [6:03:02<25:23, 10.09s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▎| 1850/2000 [6:03:13<25:20, 10.14s/it, Training Loss: 1.176 Validation Loss: 1.026]

 92%|█████████▎| 1850/2000 [6:04:08<25:20, 10.14s/it, Training Loss: 1.181 Validation Loss: 1.058]

 92%|█████████▎| 1850/2000 [6:04:08<25:20, 10.14s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1851/2000 [6:04:17<1:05:48, 26.50s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1852/2000 [6:04:27<53:11, 21.57s/it, Training Loss: 1.181 Validation Loss: 1.058]  

 93%|█████████▎| 1853/2000 [6:04:37<44:20, 18.10s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1854/2000 [6:04:47<37:56, 15.59s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1855/2000 [6:04:57<33:30, 13.87s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1856/2000 [6:05:07<30:35, 12.74s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1857/2000 [6:05:17<28:14, 11.85s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1858/2000 [6:05:27<26:36, 11.24s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1859/2000 [6:05:37<25:30, 10.85s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1860/2000 [6:05:46<24:33, 10.53s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1861/2000 [6:05:57<24:25, 10.54s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1862/2000 [6:06:07<23:57, 10.42s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1863/2000 [6:06:17<23:09, 10.15s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1864/2000 [6:06:27<22:52, 10.09s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1865/2000 [6:06:37<23:01, 10.24s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1866/2000 [6:06:47<22:29, 10.07s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1867/2000 [6:06:57<22:23, 10.11s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1868/2000 [6:07:07<22:10, 10.08s/it, Training Loss: 1.181 Validation Loss: 1.058]

 93%|█████████▎| 1869/2000 [6:07:17<22:12, 10.17s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▎| 1870/2000 [6:07:27<21:48, 10.07s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▎| 1871/2000 [6:07:37<21:39, 10.07s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▎| 1872/2000 [6:07:48<21:43, 10.18s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▎| 1873/2000 [6:07:58<21:27, 10.14s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▎| 1874/2000 [6:08:08<21:12, 10.10s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1875/2000 [6:08:18<20:52, 10.02s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1876/2000 [6:08:28<20:59, 10.16s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1877/2000 [6:08:38<20:43, 10.11s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1878/2000 [6:08:48<20:23, 10.03s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1879/2000 [6:08:58<20:29, 10.16s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1880/2000 [6:09:09<20:21, 10.18s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1881/2000 [6:09:18<19:49, 10.00s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1882/2000 [6:09:29<19:51, 10.10s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1883/2000 [6:09:38<19:31, 10.02s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1884/2000 [6:09:48<19:17,  9.98s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1885/2000 [6:09:58<19:06,  9.97s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1886/2000 [6:10:08<18:52,  9.94s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1887/2000 [6:10:18<18:31,  9.84s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1888/2000 [6:10:29<18:57, 10.16s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1889/2000 [6:10:38<18:36, 10.06s/it, Training Loss: 1.181 Validation Loss: 1.058]

 94%|█████████▍| 1890/2000 [6:10:48<18:13,  9.94s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1891/2000 [6:10:59<18:32, 10.21s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1892/2000 [6:11:09<18:11, 10.11s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1893/2000 [6:11:19<17:58, 10.08s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1894/2000 [6:11:29<17:47, 10.07s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1895/2000 [6:11:39<17:27,  9.97s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1896/2000 [6:11:50<17:49, 10.28s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1897/2000 [6:11:59<17:21, 10.12s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1898/2000 [6:12:09<17:03, 10.04s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▍| 1899/2000 [6:12:20<17:14, 10.24s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▌| 1900/2000 [6:12:30<16:51, 10.12s/it, Training Loss: 1.181 Validation Loss: 1.058]

 95%|█████████▌| 1900/2000 [6:13:24<16:51, 10.12s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1900/2000 [6:13:24<16:51, 10.12s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1901/2000 [6:13:34<43:40, 26.47s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1902/2000 [6:13:45<35:22, 21.66s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1903/2000 [6:13:55<29:25, 18.21s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1904/2000 [6:14:05<25:10, 15.73s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1905/2000 [6:14:15<22:22, 14.13s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1906/2000 [6:14:26<20:31, 13.10s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1907/2000 [6:14:36<18:48, 12.13s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1908/2000 [6:14:46<17:36, 11.48s/it, Training Loss: 1.173 Validation Loss: 1.047]

 95%|█████████▌| 1909/2000 [6:14:56<16:48, 11.08s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1910/2000 [6:15:06<15:57, 10.64s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1911/2000 [6:15:16<15:26, 10.41s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1912/2000 [6:15:26<15:08, 10.33s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1913/2000 [6:15:35<14:39, 10.11s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1914/2000 [6:15:45<14:21, 10.02s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1915/2000 [6:15:56<14:25, 10.18s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1916/2000 [6:16:06<14:11, 10.14s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1917/2000 [6:16:16<13:55, 10.06s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1918/2000 [6:16:26<13:44, 10.05s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1919/2000 [6:16:36<13:42, 10.15s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1920/2000 [6:16:46<13:24, 10.05s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1921/2000 [6:16:56<13:18, 10.10s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1922/2000 [6:17:06<13:01, 10.02s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1923/2000 [6:17:16<13:05, 10.20s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▌| 1924/2000 [6:17:27<12:52, 10.17s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▋| 1925/2000 [6:17:36<12:29, 10.00s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▋| 1926/2000 [6:17:46<12:16,  9.95s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▋| 1927/2000 [6:17:57<12:28, 10.25s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▋| 1928/2000 [6:18:06<12:01, 10.02s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▋| 1929/2000 [6:18:16<11:50, 10.01s/it, Training Loss: 1.173 Validation Loss: 1.047]

 96%|█████████▋| 1930/2000 [6:18:26<11:38,  9.97s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1931/2000 [6:18:37<11:38, 10.12s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1932/2000 [6:18:47<11:23, 10.06s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1933/2000 [6:18:57<11:15, 10.08s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1934/2000 [6:19:07<10:59,  9.99s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1935/2000 [6:19:17<11:04, 10.22s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1936/2000 [6:19:27<10:47, 10.11s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1937/2000 [6:19:37<10:30, 10.01s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1938/2000 [6:19:47<10:21, 10.02s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1939/2000 [6:19:57<10:06,  9.94s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1940/2000 [6:20:07<09:56,  9.94s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1941/2000 [6:20:17<09:50, 10.01s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1942/2000 [6:20:27<09:44, 10.09s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1943/2000 [6:20:37<09:34, 10.07s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1944/2000 [6:20:47<09:20, 10.00s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1945/2000 [6:20:57<09:09,  9.98s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1946/2000 [6:21:08<09:15, 10.29s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1947/2000 [6:21:18<08:58, 10.17s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1948/2000 [6:21:27<08:40, 10.00s/it, Training Loss: 1.173 Validation Loss: 1.047]

 97%|█████████▋| 1949/2000 [6:21:37<08:31, 10.02s/it, Training Loss: 1.173 Validation Loss: 1.047]

 98%|█████████▊| 1950/2000 [6:21:48<08:31, 10.24s/it, Training Loss: 1.173 Validation Loss: 1.047]

 98%|█████████▊| 1950/2000 [6:22:45<08:31, 10.24s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1950/2000 [6:22:45<08:31, 10.24s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1951/2000 [6:22:55<22:09, 27.14s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1952/2000 [6:23:05<17:38, 22.05s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1953/2000 [6:23:15<14:31, 18.55s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1954/2000 [6:23:25<12:11, 15.90s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1955/2000 [6:23:35<10:31, 14.04s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1956/2000 [6:23:46<09:34, 13.05s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1957/2000 [6:23:55<08:41, 12.12s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1958/2000 [6:24:05<07:58, 11.40s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1959/2000 [6:24:16<07:38, 11.18s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1960/2000 [6:24:26<07:16, 10.91s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1961/2000 [6:24:36<06:52, 10.57s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1962/2000 [6:24:46<06:35, 10.40s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1963/2000 [6:24:56<06:21, 10.30s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1964/2000 [6:25:06<06:04, 10.14s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1965/2000 [6:25:16<05:54, 10.12s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1966/2000 [6:25:25<05:38,  9.97s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1967/2000 [6:25:35<05:27,  9.91s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1968/2000 [6:25:45<05:18,  9.96s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1969/2000 [6:25:55<05:09,  9.99s/it, Training Loss: 1.163 Validation Loss: 1.061]

 98%|█████████▊| 1970/2000 [6:26:06<05:01, 10.06s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▊| 1971/2000 [6:26:16<04:51, 10.05s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▊| 1972/2000 [6:26:25<04:38,  9.94s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▊| 1973/2000 [6:26:35<04:28,  9.96s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▊| 1974/2000 [6:26:46<04:23, 10.14s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1975/2000 [6:26:56<04:14, 10.18s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1976/2000 [6:27:06<04:02, 10.10s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1977/2000 [6:27:16<03:50, 10.00s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1978/2000 [6:27:26<03:42, 10.12s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1979/2000 [6:27:36<03:32, 10.14s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1980/2000 [6:27:46<03:19,  9.99s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1981/2000 [6:27:56<03:12, 10.12s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1982/2000 [6:28:07<03:04, 10.23s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1983/2000 [6:28:17<02:50, 10.04s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1984/2000 [6:28:27<02:40, 10.06s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1985/2000 [6:28:36<02:29,  9.97s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1986/2000 [6:28:47<02:22, 10.16s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1987/2000 [6:28:57<02:12, 10.20s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1988/2000 [6:29:07<02:00, 10.03s/it, Training Loss: 1.163 Validation Loss: 1.061]

 99%|█████████▉| 1989/2000 [6:29:17<01:51, 10.16s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1990/2000 [6:29:28<01:42, 10.22s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1991/2000 [6:29:37<01:30, 10.01s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1992/2000 [6:29:48<01:20, 10.08s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1993/2000 [6:29:57<01:10, 10.03s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1994/2000 [6:30:07<00:59,  9.89s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1995/2000 [6:30:17<00:49,  9.93s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1996/2000 [6:30:27<00:39,  9.91s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1997/2000 [6:30:37<00:29,  9.85s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1998/2000 [6:30:47<00:19,  9.90s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1999/2000 [6:30:57<00:10, 10.08s/it, Training Loss: 1.163 Validation Loss: 1.061]

100%|█████████▉| 1999/2000 [6:31:52<00:10, 10.08s/it, Training Loss: 1.155 Validation Loss: 0.988]

100%|█████████▉| 1999/2000 [6:31:52<00:10, 10.08s/it, Training Loss: 1.155 Validation Loss: 0.988]

100%|██████████| 2000/2000 [6:32:02<00:00, 26.59s/it, Training Loss: 1.155 Validation Loss: 0.988]

100%|██████████| 2000/2000 [6:32:02<00:00, 11.76s/it, Training Loss: 1.155 Validation Loss: 0.988]




Here, you can save the model for further use. We will use this to show you how to load a model in other applications below.

In [13]:
# Save the model
torch.save(model.state_dict(), 'model/model.ckpt')
with open('model/model_config.pkl','wb') as f:
    pickle.dump(model_config, f)

Configuration used for inference. Feel free to modify it to your liking!

In [14]:
class InferenceConfig():
    seed:int=0 # Random seed (impacts the output)
    start:str="ROMEO:" # Starting prompt to generate from
    temperature:float = 0.7 # Degree of 'creativity': 1.0 = no change, < 1.0 = less random, > 1.0 = more random, in predictions
    max_new_tokens:int=250 # Length of the generated sequence in tokens
    top_k:int=None  # Retain only the top k most likely tokens, clamp others to have 0 probability (None - no clamp)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

As previously, we define our CUDA operations if possible. Use the same CUDA config as the one above.

In [15]:
inference_config = InferenceConfig()
torch.backends.cuda.matmul.allow_tf32 = True # allow tf32 on matmul
torch.backends.cudnn.allow_tf32 = True # allow tf32 on cudnn
torch.manual_seed(inference_config.seed)

<torch._C.Generator at 0x10c7ba950>

Here we load the model and optionally compile it. As the `meta_path`, we load the information about the vocabulary we trained the model on to help it with generation.

In [16]:
# Load the model and hyperparameters ｜
with open('model/model_config.pkl', 'rb') as f:
    model_config = pickle.load(f)

model = Model(model_config)
if model_config.compile:
    model = torch.compile(model)
model.load_state_dict(torch.load('model/model.ckpt', weights_only=True),strict=False)
model.eval()
model.to(inference_config.device)

inference_config.meta_path = os.path.join('data', 'Shakespeare', 'meta.pkl')

Now, you can generate your text here!

In [17]:
# Generate text
print(inference(model, inference_config))

ROMEO:
Many is what loss to do it.

BENVOLIO:
I would he were it were any live a man in it
doing in this lamentation.

BRUTUS:
I wot ruled for the ground of gracious lady,
I would not say the new to: I can could not know
The renowned to death to be a trait


To see how big the model is, you can run the cell below.

In [18]:
# Optionally, print model total of parameters
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params

10690625