# Task 5: Language Model [5p]

Build a basic language model using a publicly available text dataset. You'll experiment with RNN-based architectures (Simple RNN, LSTM, GRU) to learn how they model sequences.

### **Part 1: Dataset Download & Preparation (1 point)**

**Tasks:**

* Download a publicly available dataset, e.g., *Alice’s Adventures in Wonderland* from Project Gutenberg.
  * Use requests or a dataset API like torchtext.datasets.
* Preprocess the text:
  * Lowercase, remove non-alphabetic characters.
  * Tokenize into words (use nltk or spaCy).
  * Build a vocabulary, keeping frequent words (e.g., top 10k).
* Use **pretrained word embeddings** (e.g., GloVe 100d or FastText):
  * Load with torchtext.vocab, gensim, or similar.
  * Initialize the embedding layer with pretrained vectors.


### **Part 2: Build a Recurrent Language Model (1 point)**

**Tasks:**

* Implement a word-level language model using:
  * Pretrained embedding layer (frozen or trainable).
  * A single-layer **Simple RNN**.
  * A fully connected output layer with softmax.

### **Part 3: Train the Model (1 point)**

**Tasks:**

* Use cross-entropy loss.
* Predict the next word from a sequence.
* Use teacher forcing and batching.
* Plot training loss over time.

### **Part 4: Generate Text (1 point)**

**Tasks:**

* Given a seed sequence, generate text of specified length.
* Use **temperature sampling** to vary creativity.
* Try different temperatures and compare.

### **Part 5: Evaluation & Reflection (1 point) -> W&B report**

**Tasks:**

* Evaluate model outputs: does it learn sentence structure?
* Reflect on limitations of the Simple RNN and its behavior on longer sequences.

### **Bonus Section (Up to +2 Points): Model Comparison**

Compare the performance of three models:


1. Simple RNN
2. LSTM
3. GRU

**Tasks:**

* Implement the same model architecture but switch out the recurrent layer.
* Train all three models under the same conditions.
* Record and compare:
  * Training time
  * Final loss
  * Generated text quality
* (Optional) Add dropout to recurrent layers and observe effects.
* Summarize findings in a table or chart.

In [11]:
URL = "https://www.gutenberg.org/cache/epub/11/pg11.txt"
MAX_VOCAB_SIZE = 10000
EMBEDDING_DIM = 100
SEQ_LEN = 30
DATA_PATH = "alice_ds.pt"
MODEL_PATH = "alice_model.pt"

### 1

In [12]:
import os
import re
import nltk
import torch
import requests
import gensim.downloader as api
from collections import Counter
from nltk.tokenize import word_tokenize

# Download required NLTK resources
nltk.download('punkt')

# Download text
response = requests.get(URL)
raw_text = response.text

start = raw_text.find("CHAPTER I.")
end = raw_text.find("End of the Project Gutenberg EBook")
text = raw_text[start:end]

# Clean and tokenize text
text = text.lower()
text = re.sub(r'[^a-z\s]', '', text)
tokens = word_tokenize(text)

# Build vocabulary
word_counts = Counter(tokens)
most_common = word_counts.most_common(MAX_VOCAB_SIZE - 2)  # Reserve 2 for <pad> and <unk>

vocab = {'<pad>': 0, '<unk>': 1}
for idx, (word, _) in enumerate(most_common, start=2):
    vocab[word] = idx

# Encode tokens
def encode_tokens(tokens, vocab):
    return [vocab.get(token, vocab['<unk>']) for token in tokens]

encoded = encode_tokens(tokens, vocab)

# Prepare input and target sequences
inputs, targets = [], []
for i in range(len(encoded) - SEQ_LEN):
    seq_in = torch.tensor(encoded[i:i+SEQ_LEN])
    seq_out = torch.tensor(encoded[i+1:i+SEQ_LEN+1])
    inputs.append(seq_in)
    targets.append(seq_out)

# Load GloVe embeddings
glove = api.load("glove-wiki-gigaword-100")

embedding_matrix = torch.zeros((len(vocab), EMBEDDING_DIM))
for word, idx in vocab.items():
    if word in glove.key_to_index:
        embedding_matrix[idx] = torch.tensor(glove[word])
    else:
        embedding_matrix[idx] = torch.randn(EMBEDDING_DIM)

# Save processed data
torch.save({
    'vocab': vocab,
    'inputs': inputs,
    'targets': targets,
    'embedding_matrix': embedding_matrix
}, DATA_PATH)

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\patry\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


### 2

In [13]:
import re
import torch
import torch.nn as nn
import torch.nn.functional as F

class RNNLanguageModel(nn.Module):
    def __init__(self, embedding_matrix, hidden_dim, vocab, dropout=0.6):
        super().__init__()
        num_embeddings, embedding_dim = embedding_matrix.shape
        self.embedding = nn.Embedding.from_pretrained(embedding_matrix, freeze=False)
        self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(hidden_dim, len(vocab))
        self.vocab = vocab

    def forward(self, x):
        embedded = self.embedding(x)  # (batch, seq_len, embed_dim)
        output, _ = self.rnn(embedded)  # (batch, seq_len, hidden_dim)
        output = self.dropout(output)
        return self.fc(output)  # (batch, seq_len, vocab_size)

    def _clean_sequence(self, text):
        text = text.lower()
        text = re.sub(r'[^a-z\s]', '', text)
        return text.split()

    def generate(self, prompt, length=20, temperature=1.0):
        """
        Generate a sequence of tokens given a prompt string.
        Args:
            prompt (str): Seed text.
            length (int): Number of tokens to generate.
            temperature (float): Sampling temperature.
        Returns:
            str: Generated text.
        """
        prompt_words = self._clean_sequence(prompt)
        prompt_ids = [self.vocab.get(w, self.vocab["<unk>"]) for w in prompt_words]

        device = next(self.parameters()).device
        input_seq = torch.tensor(prompt_ids, dtype=torch.long, device=device).unsqueeze(0)

        idx_to_word = {idx: word for word, idx in self.vocab.items()}
        generated = prompt_ids.copy()

        self.eval()
        with torch.no_grad():
            for _ in range(length):
                output = self.forward(input_seq)  # (1, seq_len, vocab_size)
                logits = output[0, -1] / temperature
                probs = F.softmax(logits, dim=0)
                next_token = torch.multinomial(probs, num_samples=1).item()
                generated.append(next_token)
                input_seq = torch.tensor(generated[-len(prompt_ids):], dtype=torch.long, device=device).unsqueeze(0)

        generated_text = [idx_to_word.get(idx, "<unk>") for idx in generated]
        return " ".join(generated_text)

### 3

In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
from torch.nn.utils.rnn import pad_sequence
import wandb
from tqdm import tqdm
import matplotlib.pyplot as plt

def train(model, l1_lambda, l2_lambda, patience=5):
    best_loss = float('inf')
    epochs_no_improve = 0

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=wandb.config.learning_rate, weight_decay=l2_lambda)

    for epoch in tqdm(range(wandb.config.epochs), desc="Training Epochs"):
        model.train()
        epoch_loss = 0

        for batch_inputs, batch_targets in train_loader:
            optimizer.zero_grad()
            outputs = model(batch_inputs)
            outputs = outputs[:, :-1, :].contiguous().view(-1, outputs.size(-1))
            targets = batch_targets[:, 1:].contiguous().view(-1)
            ce_loss = criterion(outputs, targets)
            l1_loss = sum(param.abs().sum() for param in model.parameters())
            loss = ce_loss + l1_lambda * l1_loss
            loss.backward()
            optimizer.step()
            epoch_loss += loss.item()
            wandb.log({"batch_loss": loss.item()})

        model.eval()
        val_loss = 0
        with torch.no_grad():
            for val_inputs, val_targets in val_loader:
                outputs = model(val_inputs)
                outputs = outputs[:, :-1, :].contiguous().view(-1, outputs.size(-1))
                targets = val_targets[:, 1:].contiguous().view(-1)
                val_loss += criterion(outputs, targets)
        val_loss /= len(val_loader)
        avg_loss = epoch_loss / len(train_loader)

        if val_loss < best_loss:
            best_loss = val_loss
            epochs_no_improve = 0
            torch.save(model.state_dict(), MODEL_PATH)
            tqdm.write(f"Saved best model at epoch {epoch+1} with loss {best_loss:.4f}")
            wandb.log({"best_model_saved": True, "best_epoch": epoch + 1})
        else:
            epochs_no_improve += 1
            if epochs_no_improve >= patience:
                tqdm.write(f"Early stopping triggered at epoch {epoch+1}.")
                break

        wandb.log({
            "avg_loss": avg_loss,
            "val_loss": val_loss
        })
        tqdm.write(f"Epoch {epoch+1}/{wandb.config.epochs} - loss: {avg_loss:.4f}")

    model.load_state_dict(torch.load(MODEL_PATH))
    model.eval()
    test_loss = 0
    with torch.no_grad():
        for test_inputs, test_targets in test_loader:
            outputs = model(test_inputs)
            outputs = outputs[:, :-1, :].contiguous().view(-1, outputs.size(-1))
            targets = test_targets[:, 1:].contiguous().view(-1)
            test_loss += criterion(outputs, targets)
    test_loss /= len(test_loader)
    print(f"Test loss: {test_loss:.4f}")
    wandb.log({"test_loss": test_loss})

# W&B initialization
wandb.init(
    project="NNList5",
    config={
        "architecture": "SimpleRNN",
        "dataset": "Alice in Wonderland",
        "epochs": 100,
        "batch_size": 128,
        "embedding_dim": 100,
        "hidden_dim": 128,
        "seq_len": 30,
        "learning_rate": 5e-3,
        "dropout_rate": 0.5,
        "l1_lambda": 1e-5,
        "l2_lambda": 1e-4
    }
)


# Data loading
data = torch.load(DATA_PATH)
vocab = data['vocab']
inputs = pad_sequence(data['inputs'], batch_first=True, padding_value=0)
targets = pad_sequence(data['targets'], batch_first=True, padding_value=0)
embedding_matrix = data['embedding_matrix']

# Dataset and DataLoader
dataset = TensorDataset(inputs, targets)
total_size = len(dataset)
test_size = int(0.1 * total_size)
val_size = int(0.2 * total_size)
train_size = total_size - test_size - val_size

train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])
train_loader = DataLoader(train_dataset, batch_size=wandb.config.batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=wandb.config.batch_size)
test_loader = DataLoader(test_dataset, batch_size=wandb.config.batch_size)

rnn_model = RNNLanguageModel(embedding_matrix, wandb.config.hidden_dim, vocab, wandb.config["dropout_rate"])
train(rnn_model, wandb.config["l1_lambda"], wandb.config["l2_lambda"])

wandb.finish()

0,1
avg_loss,█▃▁
batch_loss,█▇▇▆▅▅▅▅▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁
best_epoch,▁▅█
val_loss,█▄▁

0,1
avg_loss,5.08225
batch_loss,4.63924
best_epoch,3
best_model_saved,True
val_loss,4.07597


Training Epochs:   0%|          | 0/100 [00:00<?, ?it/s]
Epoch 1:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 1:   1%|          | 1/161 [00:00<00:20,  7.90it/s]
Epoch 1:   2%|▏         | 3/161 [00:00<00:15,  9.90it/s]
Epoch 1:   2%|▏         | 4/161 [00:00<00:16,  9.46it/s]
Epoch 1:   3%|▎         | 5/161 [00:00<00:16,  9.26it/s]
Epoch 1:   4%|▍         | 7/161 [00:00<00:16,  9.33it/s]
Epoch 1:   6%|▌         | 9/161 [00:00<00:15, 10.03it/s]
Epoch 1:   7%|▋         | 11/161 [00:01<00:14, 10.48it/s]
Epoch 1:   8%|▊         | 13/161 [00:01<00:13, 10.73it/s]
Epoch 1:   9%|▉         | 15/161 [00:01<00:13, 11.16it/s]
Epoch 1:  11%|█         | 17/161 [00:01<00:12, 11.41it/s]
Epoch 1:  12%|█▏        | 19/161 [00:01<00:12, 11.65it/s]
Epoch 1:  13%|█▎        | 21/161 [00:01<00:12, 11.62it/s]
Epoch 1:  14%|█▍        | 23/161 [00:02<00:11, 11.67it/s]
Epoch 1:  16%|█▌        | 25/161 [00:02<00:11, 11.62it/s]
Epoch 1:  17%|█▋        | 27/161 [00:02<00:11, 11.64it/s]
Epoch 1:  18%|█▊        | 29/1

Saved best model at epoch 1 with loss 5.4501


                                                        

Epoch 1/100 - loss: 6.8458


Training Epochs:   1%|          | 1/100 [00:16<27:31, 16.69s/it]
Epoch 2:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 2:   1%|          | 1/161 [00:00<00:16,  9.62it/s]
Epoch 2:   2%|▏         | 3/161 [00:00<00:13, 11.90it/s]
Epoch 2:   3%|▎         | 5/161 [00:00<00:12, 12.75it/s]
Epoch 2:   4%|▍         | 7/161 [00:00<00:12, 12.27it/s]
Epoch 2:   6%|▌         | 9/161 [00:00<00:12, 12.51it/s]
Epoch 2:   7%|▋         | 11/161 [00:00<00:13, 11.47it/s]
Epoch 2:   8%|▊         | 13/161 [00:01<00:13, 11.26it/s]
Epoch 2:   9%|▉         | 15/161 [00:01<00:12, 11.94it/s]
Epoch 2:  11%|█         | 17/161 [00:01<00:12, 11.61it/s]
Epoch 2:  12%|█▏        | 19/161 [00:01<00:12, 11.43it/s]
Epoch 2:  13%|█▎        | 21/161 [00:01<00:12, 11.34it/s]
Epoch 2:  14%|█▍        | 23/161 [00:01<00:11, 11.53it/s]
Epoch 2:  16%|█▌        | 25/161 [00:02<00:12, 11.29it/s]
Epoch 2:  17%|█▋        | 27/161 [00:02<00:11, 11.47it/s]
Epoch 2:  18%|█▊        | 29/161 [00:02<00:11, 11.74it/s]
Epoch 2:  19%|█▉     

Saved best model at epoch 2 with loss 4.7131


                                                                

Epoch 2/100 - loss: 5.6446


Training Epochs:   2%|▏         | 2/100 [00:31<25:44, 15.76s/it]
Epoch 3:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 3:   1%|          | 2/161 [00:00<00:14, 11.05it/s]
Epoch 3:   2%|▏         | 4/161 [00:00<00:14, 11.09it/s]
Epoch 3:   4%|▎         | 6/161 [00:00<00:12, 12.16it/s]
Epoch 3:   5%|▍         | 8/161 [00:00<00:12, 12.63it/s]
Epoch 3:   6%|▌         | 10/161 [00:00<00:12, 12.32it/s]
Epoch 3:   7%|▋         | 12/161 [00:00<00:11, 12.48it/s]
Epoch 3:   9%|▊         | 14/161 [00:01<00:14, 10.35it/s]
Epoch 3:  10%|▉         | 16/161 [00:01<00:14, 10.19it/s]
Epoch 3:  11%|█         | 18/161 [00:01<00:13, 10.45it/s]
Epoch 3:  12%|█▏        | 20/161 [00:01<00:12, 11.29it/s]
Epoch 3:  14%|█▎        | 22/161 [00:01<00:11, 11.69it/s]
Epoch 3:  15%|█▍        | 24/161 [00:02<00:11, 11.94it/s]
Epoch 3:  16%|█▌        | 26/161 [00:02<00:11, 11.54it/s]
Epoch 3:  17%|█▋        | 28/161 [00:02<00:11, 11.65it/s]
Epoch 3:  19%|█▊        | 30/161 [00:02<00:10, 11.93it/s]
Epoch 3:  20%|█▉    

Saved best model at epoch 3 with loss 4.0374


                                                                

Epoch 3/100 - loss: 5.0455


Training Epochs:   3%|▎         | 3/100 [00:47<25:37, 15.85s/it]
Epoch 4:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 4:   1%|          | 2/161 [00:00<00:13, 11.85it/s]
Epoch 4:   2%|▏         | 4/161 [00:00<00:12, 12.72it/s]
Epoch 4:   4%|▎         | 6/161 [00:00<00:12, 12.34it/s]
Epoch 4:   5%|▍         | 8/161 [00:00<00:13, 11.63it/s]
Epoch 4:   6%|▌         | 10/161 [00:00<00:12, 12.07it/s]
Epoch 4:   7%|▋         | 12/161 [00:00<00:11, 12.63it/s]
Epoch 4:   9%|▊         | 14/161 [00:01<00:11, 12.74it/s]
Epoch 4:  10%|▉         | 16/161 [00:01<00:11, 13.02it/s]
Epoch 4:  11%|█         | 18/161 [00:01<00:10, 13.30it/s]
Epoch 4:  12%|█▏        | 20/161 [00:01<00:10, 13.39it/s]
Epoch 4:  14%|█▎        | 22/161 [00:01<00:10, 13.50it/s]
Epoch 4:  15%|█▍        | 24/161 [00:01<00:10, 12.65it/s]
Epoch 4:  16%|█▌        | 26/161 [00:02<00:11, 11.83it/s]
Epoch 4:  17%|█▋        | 28/161 [00:02<00:11, 11.22it/s]
Epoch 4:  19%|█▊        | 30/161 [00:02<00:11, 11.72it/s]
Epoch 4:  20%|█▉    

Saved best model at epoch 4 with loss 3.5400


                                                                

Epoch 4/100 - loss: 4.5941


Training Epochs:   4%|▍         | 4/100 [01:01<24:05, 15.06s/it]
Epoch 5:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 5:   1%|          | 2/161 [00:00<00:12, 12.51it/s]
Epoch 5:   2%|▏         | 4/161 [00:00<00:12, 12.86it/s]
Epoch 5:   4%|▎         | 6/161 [00:00<00:11, 12.95it/s]
Epoch 5:   5%|▍         | 8/161 [00:00<00:11, 13.26it/s]
Epoch 5:   6%|▌         | 10/161 [00:00<00:11, 13.64it/s]
Epoch 5:   7%|▋         | 12/161 [00:00<00:10, 13.66it/s]
Epoch 5:   9%|▊         | 14/161 [00:01<00:11, 12.80it/s]
Epoch 5:  10%|▉         | 16/161 [00:01<00:13, 11.01it/s]
Epoch 5:  11%|█         | 18/161 [00:01<00:15,  9.50it/s]
Epoch 5:  12%|█▏        | 20/161 [00:01<00:14,  9.52it/s]
Epoch 5:  14%|█▎        | 22/161 [00:01<00:14,  9.85it/s]
Epoch 5:  15%|█▍        | 24/161 [00:02<00:13, 10.10it/s]
Epoch 5:  16%|█▌        | 26/161 [00:02<00:13,  9.80it/s]
Epoch 5:  17%|█▋        | 28/161 [00:02<00:14,  9.26it/s]
Epoch 5:  19%|█▊        | 30/161 [00:02<00:13,  9.95it/s]
Epoch 5:  20%|█▉    

Saved best model at epoch 5 with loss 3.2156


                                                                

Epoch 5/100 - loss: 4.2922


Training Epochs:   5%|▌         | 5/100 [01:18<24:45, 15.64s/it]
Epoch 6:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 6:   1%|          | 2/161 [00:00<00:13, 11.83it/s]
Epoch 6:   2%|▏         | 4/161 [00:00<00:12, 12.47it/s]
Epoch 6:   4%|▎         | 6/161 [00:00<00:13, 11.32it/s]
Epoch 6:   5%|▍         | 8/161 [00:00<00:13, 11.77it/s]
Epoch 6:   6%|▌         | 10/161 [00:00<00:12, 11.96it/s]
Epoch 6:   7%|▋         | 12/161 [00:01<00:12, 12.07it/s]
Epoch 6:   9%|▊         | 14/161 [00:01<00:14, 10.16it/s]
Epoch 6:  10%|▉         | 16/161 [00:01<00:13, 10.43it/s]
Epoch 6:  11%|█         | 18/161 [00:01<00:13, 10.77it/s]
Epoch 6:  12%|█▏        | 20/161 [00:01<00:12, 10.88it/s]
Epoch 6:  14%|█▎        | 22/161 [00:01<00:12, 11.27it/s]
Epoch 6:  15%|█▍        | 24/161 [00:02<00:11, 11.55it/s]
Epoch 6:  16%|█▌        | 26/161 [00:02<00:11, 12.14it/s]
Epoch 6:  17%|█▋        | 28/161 [00:02<00:10, 12.28it/s]
Epoch 6:  19%|█▊        | 30/161 [00:02<00:16,  8.05it/s]
Epoch 6:  20%|█▉    

Saved best model at epoch 6 with loss 2.9662


                                                                

Epoch 6/100 - loss: 4.0798


Training Epochs:   6%|▌         | 6/100 [01:34<24:34, 15.69s/it]
Epoch 7:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 7:   1%|          | 1/161 [00:00<01:19,  2.02it/s]
Epoch 7:   1%|          | 2/161 [00:00<00:43,  3.66it/s]
Epoch 7:   2%|▏         | 3/161 [00:00<00:30,  5.10it/s]
Epoch 7:   3%|▎         | 5/161 [00:00<00:21,  7.11it/s]
Epoch 7:   4%|▍         | 7/161 [00:01<00:18,  8.53it/s]
Epoch 7:   5%|▍         | 8/161 [00:01<00:17,  8.85it/s]
Epoch 7:   6%|▌         | 9/161 [00:01<00:18,  8.19it/s]
Epoch 7:   6%|▌         | 10/161 [00:01<00:21,  7.05it/s]
Epoch 7:   7%|▋         | 11/161 [00:01<00:20,  7.26it/s]
Epoch 7:   8%|▊         | 13/161 [00:01<00:17,  8.54it/s]
Epoch 7:   9%|▊         | 14/161 [00:02<00:19,  7.53it/s]
Epoch 7:  10%|▉         | 16/161 [00:02<00:16,  8.71it/s]
Epoch 7:  11%|█         | 17/161 [00:02<00:17,  8.29it/s]
Epoch 7:  11%|█         | 18/161 [00:02<00:16,  8.59it/s]
Epoch 7:  12%|█▏        | 20/161 [00:02<00:15,  9.00it/s]
Epoch 7:  14%|█▎       

Saved best model at epoch 7 with loss 2.7755


                                                                

Epoch 7/100 - loss: 3.9196


Training Epochs:   7%|▋         | 7/100 [01:49<24:10, 15.60s/it]
Epoch 8:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 8:   1%|          | 2/161 [00:00<00:12, 12.69it/s]
Epoch 8:   2%|▏         | 4/161 [00:00<00:12, 12.14it/s]
Epoch 8:   4%|▎         | 6/161 [00:00<00:14, 10.74it/s]
Epoch 8:   5%|▍         | 8/161 [00:00<00:13, 11.37it/s]
Epoch 8:   6%|▌         | 10/161 [00:00<00:12, 12.18it/s]
Epoch 8:   7%|▋         | 12/161 [00:00<00:11, 12.65it/s]
Epoch 8:   9%|▊         | 14/161 [00:01<00:11, 12.91it/s]
Epoch 8:  10%|▉         | 16/161 [00:01<00:11, 12.91it/s]
Epoch 8:  11%|█         | 18/161 [00:01<00:11, 12.49it/s]
Epoch 8:  12%|█▏        | 20/161 [00:01<00:13, 10.21it/s]
Epoch 8:  14%|█▎        | 22/161 [00:01<00:14,  9.28it/s]
Epoch 8:  15%|█▍        | 24/161 [00:02<00:13, 10.15it/s]
Epoch 8:  16%|█▌        | 26/161 [00:02<00:12, 11.04it/s]
Epoch 8:  17%|█▋        | 28/161 [00:02<00:11, 11.48it/s]
Epoch 8:  19%|█▊        | 30/161 [00:02<00:11, 11.12it/s]
Epoch 8:  20%|█▉    

Saved best model at epoch 8 with loss 2.6266


                                                                

Epoch 8/100 - loss: 3.7966


Training Epochs:   8%|▊         | 8/100 [02:04<23:47, 15.52s/it]
Epoch 9:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 9:   1%|          | 2/161 [00:00<00:13, 12.21it/s]
Epoch 9:   2%|▏         | 4/161 [00:00<00:12, 12.73it/s]
Epoch 9:   4%|▎         | 6/161 [00:00<00:12, 12.68it/s]
Epoch 9:   5%|▍         | 8/161 [00:00<00:12, 12.46it/s]
Epoch 9:   6%|▌         | 10/161 [00:00<00:12, 12.52it/s]
Epoch 9:   7%|▋         | 12/161 [00:00<00:11, 12.50it/s]
Epoch 9:   9%|▊         | 14/161 [00:01<00:11, 12.47it/s]
Epoch 9:  10%|▉         | 16/161 [00:01<00:11, 12.31it/s]
Epoch 9:  11%|█         | 18/161 [00:01<00:11, 11.97it/s]
Epoch 9:  12%|█▏        | 20/161 [00:01<00:11, 12.03it/s]
Epoch 9:  14%|█▎        | 22/161 [00:01<00:15,  9.15it/s]
Epoch 9:  15%|█▍        | 24/161 [00:02<00:13, 10.04it/s]
Epoch 9:  16%|█▌        | 26/161 [00:02<00:13, 10.20it/s]
Epoch 9:  17%|█▋        | 28/161 [00:02<00:12, 10.66it/s]
Epoch 9:  19%|█▊        | 30/161 [00:02<00:11, 11.03it/s]
Epoch 9:  20%|█▉    

Saved best model at epoch 9 with loss 2.4793


                                                                

Epoch 9/100 - loss: 3.6784


Training Epochs:   9%|▉         | 9/100 [02:19<23:17, 15.36s/it]
Epoch 10:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 10:   1%|          | 2/161 [00:00<00:14, 10.80it/s]
Epoch 10:   2%|▏         | 4/161 [00:00<00:14, 10.76it/s]
Epoch 10:   4%|▎         | 6/161 [00:00<00:14, 10.92it/s]
Epoch 10:   5%|▍         | 8/161 [00:00<00:13, 11.05it/s]
Epoch 10:   6%|▌         | 10/161 [00:00<00:13, 10.96it/s]
Epoch 10:   7%|▋         | 12/161 [00:01<00:17,  8.52it/s]
Epoch 10:   8%|▊         | 13/161 [00:01<00:19,  7.45it/s]
Epoch 10:   9%|▊         | 14/161 [00:01<00:18,  7.76it/s]
Epoch 10:  10%|▉         | 16/161 [00:01<00:16,  8.89it/s]
Epoch 10:  11%|█         | 18/161 [00:01<00:15,  9.43it/s]
Epoch 10:  12%|█▏        | 19/161 [00:02<00:15,  9.26it/s]
Epoch 10:  12%|█▏        | 20/161 [00:02<00:19,  7.31it/s]
Epoch 10:  14%|█▎        | 22/161 [00:02<00:16,  8.27it/s]
Epoch 10:  15%|█▍        | 24/161 [00:02<00:15,  9.05it/s]
Epoch 10:  16%|█▌        | 25/161 [00:02<00:15,  8.63it/s]
Epoc

Saved best model at epoch 10 with loss 2.3620


                                                                

Epoch 10/100 - loss: 3.5765


Training Epochs:  10%|█         | 10/100 [02:35<23:19, 15.55s/it]
Epoch 11:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 11:   1%|          | 2/161 [00:00<00:14, 11.25it/s]
Epoch 11:   2%|▏         | 4/161 [00:00<00:13, 11.43it/s]
Epoch 11:   4%|▎         | 6/161 [00:00<00:12, 11.92it/s]
Epoch 11:   5%|▍         | 8/161 [00:00<00:13, 11.25it/s]
Epoch 11:   6%|▌         | 10/161 [00:00<00:12, 11.63it/s]
Epoch 11:   7%|▋         | 12/161 [00:01<00:12, 11.98it/s]
Epoch 11:   9%|▊         | 14/161 [00:01<00:16,  8.66it/s]
Epoch 11:  10%|▉         | 16/161 [00:01<00:15,  9.45it/s]
Epoch 11:  11%|█         | 18/161 [00:01<00:14, 10.15it/s]
Epoch 11:  12%|█▏        | 20/161 [00:01<00:14,  9.81it/s]
Epoch 11:  14%|█▎        | 22/161 [00:02<00:13, 10.15it/s]
Epoch 11:  15%|█▍        | 24/161 [00:02<00:12, 10.57it/s]
Epoch 11:  16%|█▌        | 26/161 [00:02<00:12, 10.67it/s]
Epoch 11:  17%|█▋        | 28/161 [00:02<00:11, 11.16it/s]
Epoch 11:  19%|█▊        | 30/161 [00:02<00:11, 11.20it/s]
Epo

Saved best model at epoch 11 with loss 2.2546


                                                                 

Epoch 11/100 - loss: 3.4934


Training Epochs:  11%|█         | 11/100 [02:52<23:26, 15.81s/it]
Epoch 12:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 12:   1%|          | 1/161 [00:00<00:17,  9.36it/s]
Epoch 12:   1%|          | 2/161 [00:00<00:17,  8.91it/s]
Epoch 12:   2%|▏         | 4/161 [00:00<00:16,  9.32it/s]
Epoch 12:   3%|▎         | 5/161 [00:00<00:16,  9.48it/s]
Epoch 12:   4%|▍         | 7/161 [00:00<00:14, 10.59it/s]
Epoch 12:   6%|▌         | 9/161 [00:00<00:13, 11.30it/s]
Epoch 12:   7%|▋         | 11/161 [00:01<00:13, 11.19it/s]
Epoch 12:   8%|▊         | 13/161 [00:01<00:12, 11.75it/s]
Epoch 12:   9%|▉         | 15/161 [00:01<00:12, 11.94it/s]
Epoch 12:  11%|█         | 17/161 [00:01<00:12, 11.55it/s]
Epoch 12:  12%|█▏        | 19/161 [00:01<00:12, 11.59it/s]
Epoch 12:  13%|█▎        | 21/161 [00:01<00:12, 11.64it/s]
Epoch 12:  14%|█▍        | 23/161 [00:02<00:12, 11.40it/s]
Epoch 12:  16%|█▌        | 25/161 [00:02<00:11, 11.72it/s]
Epoch 12:  17%|█▋        | 27/161 [00:02<00:11, 11.86it/s]
Epoch

Saved best model at epoch 12 with loss 2.1589


                                                                 

Epoch 12/100 - loss: 3.4054


Training Epochs:  12%|█▏        | 12/100 [03:10<24:12, 16.51s/it]
Epoch 13:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 13:   1%|          | 2/161 [00:00<00:13, 12.16it/s]
Epoch 13:   2%|▏         | 4/161 [00:00<00:13, 11.32it/s]
Epoch 13:   4%|▎         | 6/161 [00:00<00:14, 10.58it/s]
Epoch 13:   5%|▍         | 8/161 [00:00<00:14, 10.53it/s]
Epoch 13:   6%|▌         | 10/161 [00:00<00:15, 10.02it/s]
Epoch 13:   7%|▋         | 12/161 [00:01<00:18,  8.06it/s]
Epoch 13:   9%|▊         | 14/161 [00:01<00:16,  8.82it/s]
Epoch 13:  10%|▉         | 16/161 [00:01<00:15,  9.56it/s]
Epoch 13:  11%|█         | 18/161 [00:01<00:14, 10.20it/s]
Epoch 13:  12%|█▏        | 20/161 [00:02<00:13, 10.38it/s]
Epoch 13:  14%|█▎        | 22/161 [00:02<00:16,  8.28it/s]
Epoch 13:  15%|█▍        | 24/161 [00:02<00:15,  8.85it/s]
Epoch 13:  16%|█▌        | 26/161 [00:02<00:14,  9.34it/s]
Epoch 13:  17%|█▋        | 28/161 [00:02<00:13,  9.84it/s]
Epoch 13:  19%|█▊        | 30/161 [00:03<00:12, 10.20it/s]
Epo

Saved best model at epoch 13 with loss 2.0756


                                                                 

Epoch 13/100 - loss: 3.3394


Training Epochs:  13%|█▎        | 13/100 [03:26<23:58, 16.54s/it]
Epoch 14:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 14:   1%|          | 1/161 [00:00<00:21,  7.54it/s]
Epoch 14:   1%|          | 2/161 [00:00<00:19,  8.13it/s]
Epoch 14:   2%|▏         | 3/161 [00:00<00:17,  8.83it/s]
Epoch 14:   3%|▎         | 5/161 [00:00<00:15,  9.82it/s]
Epoch 14:   4%|▎         | 6/161 [00:00<00:16,  9.59it/s]
Epoch 14:   4%|▍         | 7/161 [00:00<00:18,  8.32it/s]
Epoch 14:   5%|▍         | 8/161 [00:01<00:21,  6.96it/s]
Epoch 14:   6%|▌         | 10/161 [00:01<00:22,  6.57it/s]
Epoch 14:   7%|▋         | 12/161 [00:01<00:18,  7.97it/s]
Epoch 14:   9%|▊         | 14/161 [00:01<00:16,  9.08it/s]
Epoch 14:  10%|▉         | 16/161 [00:01<00:14, 10.00it/s]
Epoch 14:  11%|█         | 18/161 [00:01<00:13, 10.66it/s]
Epoch 14:  12%|█▏        | 20/161 [00:02<00:13, 10.24it/s]
Epoch 14:  14%|█▎        | 22/161 [00:02<00:13, 10.00it/s]
Epoch 14:  15%|█▍        | 24/161 [00:02<00:13, 10.39it/s]
Epoch 

Saved best model at epoch 14 with loss 1.9935


                                                                 

Epoch 14/100 - loss: 3.2667


Training Epochs:  14%|█▍        | 14/100 [03:43<23:33, 16.43s/it]
Epoch 15:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 15:   1%|          | 1/161 [00:00<00:30,  5.33it/s]
Epoch 15:   2%|▏         | 3/161 [00:00<00:18,  8.77it/s]
Epoch 15:   3%|▎         | 5/161 [00:00<00:15, 10.10it/s]
Epoch 15:   4%|▍         | 7/161 [00:00<00:13, 11.06it/s]
Epoch 15:   6%|▌         | 9/161 [00:00<00:13, 11.69it/s]
Epoch 15:   7%|▋         | 11/161 [00:01<00:12, 11.98it/s]
Epoch 15:   8%|▊         | 13/161 [00:01<00:12, 11.71it/s]
Epoch 15:   9%|▉         | 15/161 [00:01<00:12, 11.74it/s]
Epoch 15:  11%|█         | 17/161 [00:01<00:11, 12.06it/s]
Epoch 15:  12%|█▏        | 19/161 [00:01<00:11, 12.31it/s]
Epoch 15:  13%|█▎        | 21/161 [00:01<00:11, 12.38it/s]
Epoch 15:  14%|█▍        | 23/161 [00:01<00:11, 12.47it/s]
Epoch 15:  16%|█▌        | 25/161 [00:02<00:10, 12.63it/s]
Epoch 15:  17%|█▋        | 27/161 [00:02<00:11, 12.17it/s]
Epoch 15:  18%|█▊        | 29/161 [00:02<00:10, 12.36it/s]
Epoc

Saved best model at epoch 15 with loss 1.9189


                                                                 

Epoch 15/100 - loss: 3.2069


Training Epochs:  15%|█▌        | 15/100 [03:58<22:53, 16.16s/it]
Epoch 16:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 16:   1%|          | 1/161 [00:00<00:19,  8.17it/s]
Epoch 16:   1%|          | 2/161 [00:00<00:19,  8.15it/s]
Epoch 16:   2%|▏         | 3/161 [00:00<00:22,  7.14it/s]
Epoch 16:   2%|▏         | 4/161 [00:00<00:25,  6.18it/s]
Epoch 16:   3%|▎         | 5/161 [00:00<00:25,  6.21it/s]
Epoch 16:   4%|▎         | 6/161 [00:00<00:27,  5.58it/s]
Epoch 16:   4%|▍         | 7/161 [00:01<00:25,  6.13it/s]
Epoch 16:   6%|▌         | 9/161 [00:01<00:19,  7.98it/s]
Epoch 16:   6%|▌         | 10/161 [00:01<00:18,  8.25it/s]
Epoch 16:   7%|▋         | 12/161 [00:01<00:15,  9.49it/s]
Epoch 16:   9%|▊         | 14/161 [00:01<00:13, 10.52it/s]
Epoch 16:  10%|▉         | 16/161 [00:01<00:13, 10.53it/s]
Epoch 16:  11%|█         | 18/161 [00:02<00:13, 10.43it/s]
Epoch 16:  12%|█▏        | 20/161 [00:02<00:12, 11.13it/s]
Epoch 16:  14%|█▎        | 22/161 [00:02<00:11, 11.75it/s]
Epoch 1

Saved best model at epoch 16 with loss 1.8485


                                                                 

Epoch 16/100 - loss: 3.1486


Training Epochs:  16%|█▌        | 16/100 [04:13<22:16, 15.91s/it]
Epoch 17:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 17:   1%|          | 1/161 [00:00<00:18,  8.52it/s]
Epoch 17:   2%|▏         | 3/161 [00:00<00:15, 10.50it/s]
Epoch 17:   3%|▎         | 5/161 [00:00<00:16,  9.69it/s]
Epoch 17:   4%|▍         | 7/161 [00:00<00:14, 10.80it/s]
Epoch 17:   6%|▌         | 9/161 [00:00<00:13, 11.40it/s]
Epoch 17:   7%|▋         | 11/161 [00:00<00:12, 11.90it/s]
Epoch 17:   8%|▊         | 13/161 [00:01<00:12, 12.28it/s]
Epoch 17:   9%|▉         | 15/161 [00:01<00:12, 11.97it/s]
Epoch 17:  11%|█         | 17/161 [00:01<00:12, 11.60it/s]
Epoch 17:  12%|█▏        | 19/161 [00:01<00:12, 11.52it/s]
Epoch 17:  13%|█▎        | 21/161 [00:01<00:12, 10.87it/s]
Epoch 17:  14%|█▍        | 23/161 [00:02<00:12, 11.15it/s]
Epoch 17:  16%|█▌        | 25/161 [00:02<00:12, 11.32it/s]
Epoch 17:  17%|█▋        | 27/161 [00:02<00:12, 10.80it/s]
Epoch 17:  18%|█▊        | 29/161 [00:02<00:12, 10.65it/s]
Epoc

Saved best model at epoch 17 with loss 1.7957


                                                                 

Epoch 17/100 - loss: 3.0976


Training Epochs:  17%|█▋        | 17/100 [04:28<21:30, 15.55s/it]
Epoch 18:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 18:   1%|          | 1/161 [00:00<00:18,  8.77it/s]
Epoch 18:   2%|▏         | 3/161 [00:00<00:14, 10.93it/s]
Epoch 18:   3%|▎         | 5/161 [00:00<00:12, 12.01it/s]
Epoch 18:   4%|▍         | 7/161 [00:00<00:12, 12.28it/s]
Epoch 18:   6%|▌         | 9/161 [00:00<00:12, 12.54it/s]
Epoch 18:   7%|▋         | 11/161 [00:00<00:11, 12.60it/s]
Epoch 18:   8%|▊         | 13/161 [00:01<00:11, 12.61it/s]
Epoch 18:   9%|▉         | 15/161 [00:01<00:13, 10.89it/s]
Epoch 18:  11%|█         | 17/161 [00:01<00:12, 11.33it/s]
Epoch 18:  12%|█▏        | 19/161 [00:01<00:11, 11.85it/s]
Epoch 18:  13%|█▎        | 21/161 [00:01<00:11, 12.11it/s]
Epoch 18:  14%|█▍        | 23/161 [00:01<00:11, 12.24it/s]
Epoch 18:  16%|█▌        | 25/161 [00:02<00:13,  9.82it/s]
Epoch 18:  17%|█▋        | 27/161 [00:02<00:12, 10.47it/s]
Epoch 18:  18%|█▊        | 29/161 [00:02<00:11, 11.25it/s]
Epoc

Saved best model at epoch 18 with loss 1.7428


                                                                 

Epoch 18/100 - loss: 3.0481


Training Epochs:  18%|█▊        | 18/100 [04:43<21:05, 15.43s/it]
Epoch 19:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 19:   1%|          | 1/161 [00:00<00:18,  8.78it/s]
Epoch 19:   1%|          | 2/161 [00:00<00:17,  9.09it/s]
Epoch 19:   2%|▏         | 3/161 [00:00<00:16,  9.39it/s]
Epoch 19:   3%|▎         | 5/161 [00:00<00:14, 10.72it/s]
Epoch 19:   4%|▍         | 7/161 [00:00<00:14, 10.91it/s]
Epoch 19:   6%|▌         | 9/161 [00:00<00:13, 11.02it/s]
Epoch 19:   7%|▋         | 11/161 [00:01<00:13, 11.34it/s]
Epoch 19:   8%|▊         | 13/161 [00:01<00:13, 11.04it/s]
Epoch 19:   9%|▉         | 15/161 [00:01<00:13, 11.16it/s]
Epoch 19:  11%|█         | 17/161 [00:01<00:12, 11.51it/s]
Epoch 19:  12%|█▏        | 19/161 [00:01<00:12, 11.80it/s]
Epoch 19:  13%|█▎        | 21/161 [00:01<00:11, 11.85it/s]
Epoch 19:  14%|█▍        | 23/161 [00:02<00:11, 11.55it/s]
Epoch 19:  16%|█▌        | 25/161 [00:02<00:11, 11.54it/s]
Epoch 19:  17%|█▋        | 27/161 [00:02<00:11, 11.91it/s]
Epoch

Saved best model at epoch 19 with loss 1.6885


                                                                 

Epoch 19/100 - loss: 3.0096


Training Epochs:  19%|█▉        | 19/100 [04:59<20:50, 15.44s/it]
Epoch 20:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 20:   1%|          | 2/161 [00:00<00:14, 10.68it/s]
Epoch 20:   2%|▏         | 4/161 [00:00<00:14, 10.60it/s]
Epoch 20:   4%|▎         | 6/161 [00:00<00:14, 10.71it/s]
Epoch 20:   5%|▍         | 8/161 [00:00<00:13, 11.01it/s]
Epoch 20:   6%|▌         | 10/161 [00:00<00:13, 11.00it/s]
Epoch 20:   7%|▋         | 12/161 [00:01<00:13, 10.66it/s]
Epoch 20:   9%|▊         | 14/161 [00:01<00:15,  9.27it/s]
Epoch 20:   9%|▉         | 15/161 [00:01<00:15,  9.21it/s]
Epoch 20:  10%|▉         | 16/161 [00:01<00:15,  9.14it/s]
Epoch 20:  11%|█         | 17/161 [00:01<00:15,  9.07it/s]
Epoch 20:  12%|█▏        | 19/161 [00:01<00:14,  9.82it/s]
Epoch 20:  13%|█▎        | 21/161 [00:02<00:13, 10.22it/s]
Epoch 20:  14%|█▍        | 23/161 [00:02<00:13, 10.12it/s]
Epoch 20:  16%|█▌        | 25/161 [00:02<00:12, 10.51it/s]
Epoch 20:  17%|█▋        | 27/161 [00:02<00:11, 11.19it/s]
Epo

Saved best model at epoch 20 with loss 1.6660


                                                                 

Epoch 20/100 - loss: 2.9749


Training Epochs:  20%|██        | 20/100 [05:14<20:21, 15.26s/it]
Epoch 21:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 21:   1%|          | 2/161 [00:00<00:13, 11.98it/s]
Epoch 21:   2%|▏         | 4/161 [00:00<00:13, 12.03it/s]
Epoch 21:   4%|▎         | 6/161 [00:00<00:13, 11.79it/s]
Epoch 21:   5%|▍         | 8/161 [00:00<00:12, 11.77it/s]
Epoch 21:   6%|▌         | 10/161 [00:00<00:12, 11.72it/s]
Epoch 21:   7%|▋         | 12/161 [00:01<00:12, 11.75it/s]
Epoch 21:   9%|▊         | 14/161 [00:01<00:13, 10.59it/s]
Epoch 21:  10%|▉         | 16/161 [00:01<00:13, 10.49it/s]
Epoch 21:  11%|█         | 18/161 [00:01<00:13, 10.89it/s]
Epoch 21:  12%|█▏        | 20/161 [00:01<00:12, 11.23it/s]
Epoch 21:  14%|█▎        | 22/161 [00:01<00:12, 10.76it/s]
Epoch 21:  15%|█▍        | 24/161 [00:02<00:12, 10.77it/s]
Epoch 21:  16%|█▌        | 26/161 [00:02<00:12, 11.05it/s]
Epoch 21:  17%|█▋        | 28/161 [00:02<00:12, 10.92it/s]
Epoch 21:  19%|█▊        | 30/161 [00:02<00:12, 10.77it/s]
Epo

Saved best model at epoch 21 with loss 1.6164


                                                                 

Epoch 21/100 - loss: 2.9368


Training Epochs:  21%|██        | 21/100 [05:29<20:11, 15.33s/it]
Epoch 22:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 22:   1%|          | 2/161 [00:00<00:14, 11.04it/s]
Epoch 22:   2%|▏         | 4/161 [00:00<00:13, 11.61it/s]
Epoch 22:   4%|▎         | 6/161 [00:00<00:13, 11.73it/s]
Epoch 22:   5%|▍         | 8/161 [00:00<00:13, 11.64it/s]
Epoch 22:   6%|▌         | 10/161 [00:00<00:12, 12.32it/s]
Epoch 22:   7%|▋         | 12/161 [00:00<00:11, 12.63it/s]
Epoch 22:   9%|▊         | 14/161 [00:01<00:11, 12.66it/s]
Epoch 22:  10%|▉         | 16/161 [00:01<00:11, 12.71it/s]
Epoch 22:  11%|█         | 18/161 [00:01<00:10, 13.03it/s]
Epoch 22:  12%|█▏        | 20/161 [00:01<00:11, 12.53it/s]
Epoch 22:  14%|█▎        | 22/161 [00:01<00:10, 12.78it/s]
Epoch 22:  15%|█▍        | 24/161 [00:01<00:10, 13.09it/s]
Epoch 22:  16%|█▌        | 26/161 [00:02<00:10, 13.31it/s]
Epoch 22:  17%|█▋        | 28/161 [00:02<00:09, 13.49it/s]
Epoch 22:  19%|█▊        | 30/161 [00:02<00:10, 13.07it/s]
Epo

Saved best model at epoch 22 with loss 1.5855


                                                                 

Epoch 22/100 - loss: 2.8962


Training Epochs:  22%|██▏       | 22/100 [05:44<19:39, 15.12s/it]
Epoch 23:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 23:   1%|          | 2/161 [00:00<00:13, 12.07it/s]
Epoch 23:   2%|▏         | 4/161 [00:00<00:14, 11.02it/s]
Epoch 23:   4%|▎         | 6/161 [00:00<00:14, 10.98it/s]
Epoch 23:   5%|▍         | 8/161 [00:00<00:12, 11.86it/s]
Epoch 23:   6%|▌         | 10/161 [00:00<00:12, 12.38it/s]
Epoch 23:   7%|▋         | 12/161 [00:01<00:12, 11.89it/s]
Epoch 23:   9%|▊         | 14/161 [00:01<00:12, 11.39it/s]
Epoch 23:  10%|▉         | 16/161 [00:01<00:12, 11.73it/s]
Epoch 23:  11%|█         | 18/161 [00:01<00:11, 12.00it/s]
Epoch 23:  12%|█▏        | 20/161 [00:01<00:11, 12.52it/s]
Epoch 23:  14%|█▎        | 22/161 [00:01<00:11, 12.61it/s]
Epoch 23:  15%|█▍        | 24/161 [00:02<00:11, 11.94it/s]
Epoch 23:  16%|█▌        | 26/161 [00:02<00:17,  7.94it/s]
Epoch 23:  17%|█▋        | 27/161 [00:02<00:16,  7.97it/s]
Epoch 23:  17%|█▋        | 28/161 [00:02<00:17,  7.81it/s]
Epo

Saved best model at epoch 23 with loss 1.5490


                                                                 

Epoch 23/100 - loss: 2.8698


Training Epochs:  23%|██▎       | 23/100 [05:58<19:03, 14.85s/it]
Epoch 24:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 24:   1%|          | 2/161 [00:00<00:14, 10.71it/s]
Epoch 24:   2%|▏         | 4/161 [00:00<00:12, 12.67it/s]
Epoch 24:   4%|▎         | 6/161 [00:00<00:13, 11.50it/s]
Epoch 24:   5%|▍         | 8/161 [00:00<00:18,  8.31it/s]
Epoch 24:   6%|▌         | 10/161 [00:01<00:16,  9.39it/s]
Epoch 24:   7%|▋         | 12/161 [00:01<00:16,  9.19it/s]
Epoch 24:   9%|▊         | 14/161 [00:01<00:14,  9.84it/s]
Epoch 24:  10%|▉         | 16/161 [00:01<00:15,  9.32it/s]
Epoch 24:  11%|█         | 18/161 [00:01<00:14,  9.94it/s]
Epoch 24:  12%|█▏        | 20/161 [00:01<00:13, 10.77it/s]
Epoch 24:  14%|█▎        | 22/161 [00:02<00:12, 11.46it/s]
Epoch 24:  15%|█▍        | 24/161 [00:02<00:18,  7.43it/s]
Epoch 24:  16%|█▌        | 26/161 [00:02<00:16,  8.27it/s]
Epoch 24:  17%|█▋        | 28/161 [00:03<00:15,  8.56it/s]
Epoch 24:  19%|█▊        | 30/161 [00:03<00:13,  9.41it/s]
Epo

Saved best model at epoch 24 with loss 1.5229


                                                                 

Epoch 24/100 - loss: 2.8393


Training Epochs:  24%|██▍       | 24/100 [06:19<21:18, 16.82s/it]
Epoch 25:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 25:   1%|          | 1/161 [00:00<01:37,  1.64it/s]
Epoch 25:   1%|          | 2/161 [00:00<00:56,  2.80it/s]
Epoch 25:   2%|▏         | 3/161 [00:00<00:45,  3.50it/s]
Epoch 25:   2%|▏         | 4/161 [00:01<00:50,  3.10it/s]
Epoch 25:   3%|▎         | 5/161 [00:01<00:40,  3.82it/s]
Epoch 25:   4%|▎         | 6/161 [00:01<00:36,  4.26it/s]
Epoch 25:   4%|▍         | 7/161 [00:01<00:30,  5.00it/s]
Epoch 25:   5%|▍         | 8/161 [00:01<00:26,  5.79it/s]
Epoch 25:   6%|▌         | 9/161 [00:02<00:23,  6.48it/s]
Epoch 25:   6%|▌         | 10/161 [00:02<00:21,  6.96it/s]
Epoch 25:   7%|▋         | 11/161 [00:02<00:19,  7.58it/s]
Epoch 25:   7%|▋         | 12/161 [00:02<00:18,  7.93it/s]
Epoch 25:   8%|▊         | 13/161 [00:02<00:18,  7.98it/s]
Epoch 25:   9%|▊         | 14/161 [00:02<00:17,  8.39it/s]
Epoch 25:   9%|▉         | 15/161 [00:02<00:18,  8.08it/s]
Epoch 25

Saved best model at epoch 25 with loss 1.4941


                                                                 

Epoch 25/100 - loss: 2.8125


Training Epochs:  25%|██▌       | 25/100 [06:36<20:59, 16.80s/it]
Epoch 26:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 26:   1%|          | 1/161 [00:00<00:16,  9.91it/s]
Epoch 26:   2%|▏         | 3/161 [00:00<00:13, 11.70it/s]
Epoch 26:   3%|▎         | 5/161 [00:00<00:13, 11.48it/s]
Epoch 26:   4%|▍         | 7/161 [00:00<00:12, 12.21it/s]
Epoch 26:   6%|▌         | 9/161 [00:00<00:12, 12.43it/s]
Epoch 26:   7%|▋         | 11/161 [00:00<00:12, 12.31it/s]
Epoch 26:   8%|▊         | 13/161 [00:01<00:11, 12.59it/s]
Epoch 26:   9%|▉         | 15/161 [00:01<00:13, 10.46it/s]
Epoch 26:  11%|█         | 17/161 [00:01<00:13, 10.92it/s]
Epoch 26:  12%|█▏        | 19/161 [00:01<00:12, 11.35it/s]
Epoch 26:  13%|█▎        | 21/161 [00:01<00:12, 11.11it/s]
Epoch 26:  14%|█▍        | 23/161 [00:02<00:12, 11.18it/s]
Epoch 26:  16%|█▌        | 25/161 [00:02<00:11, 11.74it/s]
Epoch 26:  17%|█▋        | 27/161 [00:02<00:11, 12.04it/s]
Epoch 26:  18%|█▊        | 29/161 [00:02<00:10, 12.22it/s]
Epoc

Saved best model at epoch 26 with loss 1.4674


                                                                 

Epoch 26/100 - loss: 2.7926


Training Epochs:  26%|██▌       | 26/100 [06:52<20:18, 16.47s/it]
Epoch 27:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 27:   1%|          | 2/161 [00:00<00:13, 11.79it/s]
Epoch 27:   2%|▏         | 4/161 [00:00<00:12, 12.41it/s]
Epoch 27:   4%|▎         | 6/161 [00:00<00:12, 12.39it/s]
Epoch 27:   5%|▍         | 8/161 [00:00<00:12, 12.09it/s]
Epoch 27:   6%|▌         | 10/161 [00:00<00:12, 11.69it/s]
Epoch 27:   7%|▋         | 12/161 [00:01<00:13, 11.44it/s]
Epoch 27:   9%|▊         | 14/161 [00:01<00:14, 10.14it/s]
Epoch 27:  10%|▉         | 16/161 [00:01<00:13, 10.51it/s]
Epoch 27:  11%|█         | 18/161 [00:01<00:12, 11.10it/s]
Epoch 27:  12%|█▏        | 20/161 [00:01<00:12, 10.85it/s]
Epoch 27:  14%|█▎        | 22/161 [00:01<00:12, 11.43it/s]
Epoch 27:  15%|█▍        | 24/161 [00:02<00:11, 11.84it/s]
Epoch 27:  16%|█▌        | 26/161 [00:02<00:11, 11.81it/s]
Epoch 27:  17%|█▋        | 28/161 [00:02<00:11, 11.94it/s]
Epoch 27:  19%|█▊        | 30/161 [00:02<00:10, 12.11it/s]
Epo

Saved best model at epoch 27 with loss 1.4434


                                                                 

Epoch 27/100 - loss: 2.7710


Training Epochs:  27%|██▋       | 27/100 [07:06<19:09, 15.74s/it]
Epoch 28:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 28:   1%|          | 1/161 [00:00<00:31,  5.10it/s]
Epoch 28:   2%|▏         | 3/161 [00:00<00:17,  9.05it/s]
Epoch 28:   3%|▎         | 5/161 [00:00<00:18,  8.56it/s]
Epoch 28:   4%|▍         | 7/161 [00:00<00:15,  9.66it/s]
Epoch 28:   6%|▌         | 9/161 [00:00<00:14, 10.39it/s]
Epoch 28:   7%|▋         | 11/161 [00:01<00:13, 11.10it/s]
Epoch 28:   8%|▊         | 13/161 [00:01<00:13, 10.66it/s]
Epoch 28:   9%|▉         | 15/161 [00:01<00:13, 10.85it/s]
Epoch 28:  11%|█         | 17/161 [00:01<00:12, 11.19it/s]
Epoch 28:  12%|█▏        | 19/161 [00:01<00:12, 11.16it/s]
Epoch 28:  13%|█▎        | 21/161 [00:01<00:11, 11.71it/s]
Epoch 28:  14%|█▍        | 23/161 [00:02<00:12, 11.26it/s]
Epoch 28:  16%|█▌        | 25/161 [00:02<00:12, 10.91it/s]
Epoch 28:  17%|█▋        | 27/161 [00:02<00:12, 10.43it/s]
Epoch 28:  18%|█▊        | 29/161 [00:02<00:12, 10.42it/s]
Epoc

Saved best model at epoch 28 with loss 1.4323


                                                                 

Epoch 28/100 - loss: 2.7479


Training Epochs:  28%|██▊       | 28/100 [07:22<19:05, 15.90s/it]
Epoch 29:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 29:   1%|          | 1/161 [00:00<00:31,  5.04it/s]
Epoch 29:   2%|▏         | 3/161 [00:00<00:17,  9.17it/s]
Epoch 29:   3%|▎         | 5/161 [00:00<00:21,  7.40it/s]
Epoch 29:   4%|▎         | 6/161 [00:00<00:25,  6.14it/s]
Epoch 29:   4%|▍         | 7/161 [00:01<00:23,  6.65it/s]
Epoch 29:   5%|▍         | 8/161 [00:01<00:22,  6.85it/s]
Epoch 29:   6%|▌         | 9/161 [00:01<00:20,  7.46it/s]
Epoch 29:   7%|▋         | 11/161 [00:01<00:17,  8.78it/s]
Epoch 29:   8%|▊         | 13/161 [00:01<00:15,  9.83it/s]
Epoch 29:   9%|▉         | 15/161 [00:01<00:13, 10.80it/s]
Epoch 29:  11%|█         | 17/161 [00:01<00:12, 11.52it/s]
Epoch 29:  12%|█▏        | 19/161 [00:02<00:12, 11.75it/s]
Epoch 29:  13%|█▎        | 21/161 [00:02<00:11, 11.96it/s]
Epoch 29:  14%|█▍        | 23/161 [00:02<00:11, 12.07it/s]
Epoch 29:  16%|█▌        | 25/161 [00:02<00:11, 12.28it/s]
Epoch 

Saved best model at epoch 29 with loss 1.4127


                                                                 

Epoch 29/100 - loss: 2.7267


Training Epochs:  29%|██▉       | 29/100 [07:39<19:12, 16.23s/it]
Epoch 30:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 30:   1%|          | 2/161 [00:00<00:11, 13.32it/s]
Epoch 30:   2%|▏         | 4/161 [00:00<00:11, 13.39it/s]
Epoch 30:   4%|▎         | 6/161 [00:00<00:11, 13.49it/s]
Epoch 30:   5%|▍         | 8/161 [00:00<00:11, 13.55it/s]
Epoch 30:   6%|▌         | 10/161 [00:00<00:11, 13.15it/s]
Epoch 30:   7%|▋         | 12/161 [00:00<00:11, 12.79it/s]
Epoch 30:   9%|▊         | 14/161 [00:01<00:12, 12.14it/s]
Epoch 30:  10%|▉         | 16/161 [00:01<00:11, 12.43it/s]
Epoch 30:  11%|█         | 18/161 [00:01<00:11, 12.64it/s]
Epoch 30:  12%|█▏        | 20/161 [00:01<00:11, 12.78it/s]
Epoch 30:  14%|█▎        | 22/161 [00:01<00:10, 12.97it/s]
Epoch 30:  15%|█▍        | 24/161 [00:01<00:10, 12.71it/s]
Epoch 30:  16%|█▌        | 26/161 [00:02<00:10, 12.82it/s]
Epoch 30:  17%|█▋        | 28/161 [00:02<00:10, 13.05it/s]
Epoch 30:  19%|█▊        | 30/161 [00:02<00:10, 12.99it/s]
Epo

Saved best model at epoch 30 with loss 1.3948


                                                                 

Epoch 30/100 - loss: 2.7153


Training Epochs:  30%|███       | 30/100 [07:55<18:40, 16.01s/it]
Epoch 31:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 31:   1%|          | 2/161 [00:00<00:12, 12.61it/s]
Epoch 31:   2%|▏         | 4/161 [00:00<00:12, 12.59it/s]
Epoch 31:   4%|▎         | 6/161 [00:00<00:12, 12.65it/s]
Epoch 31:   5%|▍         | 8/161 [00:00<00:14, 10.43it/s]
Epoch 31:   6%|▌         | 10/161 [00:00<00:13, 11.12it/s]
Epoch 31:   7%|▋         | 12/161 [00:01<00:13, 11.09it/s]
Epoch 31:   9%|▊         | 14/161 [00:01<00:14, 10.11it/s]
Epoch 31:  10%|▉         | 16/161 [00:01<00:13, 10.48it/s]
Epoch 31:  11%|█         | 18/161 [00:01<00:13, 10.77it/s]
Epoch 31:  12%|█▏        | 20/161 [00:01<00:14, 10.06it/s]
Epoch 31:  14%|█▎        | 22/161 [00:02<00:13, 10.43it/s]
Epoch 31:  15%|█▍        | 24/161 [00:02<00:13, 10.27it/s]
Epoch 31:  16%|█▌        | 26/161 [00:02<00:12, 10.57it/s]
Epoch 31:  17%|█▋        | 28/161 [00:02<00:12, 10.85it/s]
Epoch 31:  19%|█▊        | 30/161 [00:02<00:11, 11.49it/s]
Epo

Saved best model at epoch 31 with loss 1.3850


                                                                 

Epoch 31/100 - loss: 2.7017


Training Epochs:  31%|███       | 31/100 [08:11<18:27, 16.05s/it]
Epoch 32:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 32:   1%|          | 2/161 [00:00<00:15, 10.47it/s]
Epoch 32:   2%|▏         | 4/161 [00:00<00:14, 10.88it/s]
Epoch 32:   4%|▎         | 6/161 [00:00<00:13, 11.82it/s]
Epoch 32:   5%|▍         | 8/161 [00:00<00:12, 12.10it/s]
Epoch 32:   6%|▌         | 10/161 [00:00<00:12, 11.68it/s]
Epoch 32:   7%|▋         | 12/161 [00:01<00:12, 11.64it/s]
Epoch 32:   9%|▊         | 14/161 [00:01<00:14, 10.44it/s]
Epoch 32:  10%|▉         | 16/161 [00:01<00:13, 10.77it/s]
Epoch 32:  11%|█         | 18/161 [00:01<00:12, 11.41it/s]
Epoch 32:  12%|█▏        | 20/161 [00:01<00:11, 11.99it/s]
Epoch 32:  14%|█▎        | 22/161 [00:01<00:13, 10.62it/s]
Epoch 32:  15%|█▍        | 24/161 [00:02<00:12, 11.15it/s]
Epoch 32:  16%|█▌        | 26/161 [00:02<00:11, 11.41it/s]
Epoch 32:  17%|█▋        | 28/161 [00:02<00:12, 10.89it/s]
Epoch 32:  19%|█▊        | 30/161 [00:02<00:11, 11.52it/s]
Epo

Saved best model at epoch 32 with loss 1.3715


                                                                 

Epoch 32/100 - loss: 2.6864


Training Epochs:  32%|███▏      | 32/100 [08:28<18:37, 16.44s/it]
Epoch 33:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 33:   1%|          | 1/161 [00:00<00:39,  4.06it/s]
Epoch 33:   1%|          | 2/161 [00:00<00:31,  5.01it/s]
Epoch 33:   2%|▏         | 3/161 [00:00<00:29,  5.41it/s]
Epoch 33:   2%|▏         | 4/161 [00:00<00:26,  6.03it/s]
Epoch 33:   3%|▎         | 5/161 [00:00<00:24,  6.27it/s]
Epoch 33:   4%|▎         | 6/161 [00:00<00:23,  6.69it/s]
Epoch 33:   4%|▍         | 7/161 [00:01<00:22,  6.77it/s]
Epoch 33:   5%|▍         | 8/161 [00:01<00:22,  6.91it/s]
Epoch 33:   6%|▌         | 9/161 [00:01<00:20,  7.41it/s]
Epoch 33:   6%|▌         | 10/161 [00:01<00:19,  7.66it/s]
Epoch 33:   7%|▋         | 11/161 [00:01<00:19,  7.82it/s]
Epoch 33:   7%|▋         | 12/161 [00:01<00:19,  7.82it/s]
Epoch 33:   8%|▊         | 13/161 [00:01<00:17,  8.35it/s]
Epoch 33:   9%|▉         | 15/161 [00:02<00:14, 10.04it/s]
Epoch 33:  11%|█         | 17/161 [00:02<00:15,  9.60it/s]
Epoch 33

Saved best model at epoch 33 with loss 1.3548


                                                                 

Epoch 33/100 - loss: 2.6648


Training Epochs:  33%|███▎      | 33/100 [08:46<18:51, 16.88s/it]
Epoch 34:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 34:   1%|          | 1/161 [00:00<00:28,  5.68it/s]
Epoch 34:   2%|▏         | 3/161 [00:00<00:15,  9.93it/s]
Epoch 34:   3%|▎         | 5/161 [00:00<00:15,  9.88it/s]
Epoch 34:   4%|▍         | 7/161 [00:00<00:14, 10.77it/s]
Epoch 34:   6%|▌         | 9/161 [00:01<00:18,  8.21it/s]
Epoch 34:   7%|▋         | 11/161 [00:01<00:16,  9.21it/s]
Epoch 34:   8%|▊         | 13/161 [00:01<00:14, 10.11it/s]
Epoch 34:   9%|▉         | 15/161 [00:01<00:13, 10.67it/s]
Epoch 34:  11%|█         | 17/161 [00:01<00:12, 11.15it/s]
Epoch 34:  12%|█▏        | 19/161 [00:01<00:12, 11.00it/s]
Epoch 34:  13%|█▎        | 21/161 [00:02<00:13, 10.05it/s]
Epoch 34:  14%|█▍        | 23/161 [00:02<00:12, 10.80it/s]
Epoch 34:  16%|█▌        | 25/161 [00:02<00:11, 11.43it/s]
Epoch 34:  17%|█▋        | 27/161 [00:02<00:11, 11.90it/s]
Epoch 34:  18%|█▊        | 29/161 [00:02<00:10, 12.23it/s]
Epoc

Saved best model at epoch 34 with loss 1.3408


                                                                 

Epoch 34/100 - loss: 2.6602


Training Epochs:  34%|███▍      | 34/100 [09:02<18:25, 16.75s/it]
Epoch 35:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 35:   1%|          | 1/161 [00:00<00:24,  6.56it/s]
Epoch 35:   2%|▏         | 3/161 [00:00<00:15, 10.32it/s]
Epoch 35:   3%|▎         | 5/161 [00:00<00:13, 11.28it/s]
Epoch 35:   4%|▍         | 7/161 [00:00<00:13, 11.39it/s]
Epoch 35:   6%|▌         | 9/161 [00:00<00:12, 11.83it/s]
Epoch 35:   7%|▋         | 11/161 [00:00<00:12, 12.13it/s]
Epoch 35:   8%|▊         | 13/161 [00:01<00:13, 10.96it/s]
Epoch 35:   9%|▉         | 15/161 [00:01<00:15,  9.28it/s]
Epoch 35:  11%|█         | 17/161 [00:01<00:14, 10.16it/s]
Epoch 35:  12%|█▏        | 19/161 [00:01<00:12, 10.97it/s]
Epoch 35:  13%|█▎        | 21/161 [00:02<00:17,  8.23it/s]
Epoch 35:  14%|█▍        | 23/161 [00:02<00:14,  9.21it/s]
Epoch 35:  16%|█▌        | 25/161 [00:02<00:13, 10.28it/s]
Epoch 35:  17%|█▋        | 27/161 [00:02<00:12, 10.82it/s]
Epoch 35:  18%|█▊        | 29/161 [00:02<00:12, 10.92it/s]
Epoc

Saved best model at epoch 35 with loss 1.3303


                                                                 

Epoch 35/100 - loss: 2.6466


Training Epochs:  35%|███▌      | 35/100 [09:18<17:37, 16.26s/it]
Epoch 36:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 36:   1%|          | 1/161 [00:00<00:36,  4.34it/s]
Epoch 36:   1%|          | 2/161 [00:00<00:24,  6.39it/s]
Epoch 36:   2%|▏         | 4/161 [00:00<00:18,  8.36it/s]
Epoch 36:   3%|▎         | 5/161 [00:00<00:18,  8.36it/s]
Epoch 36:   4%|▍         | 7/161 [00:00<00:15,  9.84it/s]
Epoch 36:   6%|▌         | 9/161 [00:00<00:14, 10.65it/s]
Epoch 36:   7%|▋         | 11/161 [00:01<00:13, 11.38it/s]
Epoch 36:   8%|▊         | 13/161 [00:01<00:14, 10.36it/s]
Epoch 36:   9%|▉         | 15/161 [00:01<00:13, 10.84it/s]
Epoch 36:  11%|█         | 17/161 [00:01<00:12, 11.47it/s]
Epoch 36:  12%|█▏        | 19/161 [00:01<00:11, 11.91it/s]
Epoch 36:  13%|█▎        | 21/161 [00:01<00:11, 12.26it/s]
Epoch 36:  14%|█▍        | 23/161 [00:02<00:11, 12.11it/s]
Epoch 36:  16%|█▌        | 25/161 [00:02<00:11, 11.97it/s]
Epoch 36:  17%|█▋        | 27/161 [00:02<00:11, 12.12it/s]
Epoch

Saved best model at epoch 36 with loss 1.3150


                                                                 

Epoch 36/100 - loss: 2.6314


Training Epochs:  36%|███▌      | 36/100 [09:34<17:26, 16.36s/it]
Epoch 37:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 37:   1%|          | 1/161 [00:00<00:28,  5.60it/s]
Epoch 37:   2%|▏         | 3/161 [00:00<00:16,  9.79it/s]
Epoch 37:   3%|▎         | 5/161 [00:00<00:14, 11.01it/s]
Epoch 37:   4%|▍         | 7/161 [00:00<00:13, 11.71it/s]
Epoch 37:   6%|▌         | 9/161 [00:00<00:12, 11.96it/s]
Epoch 37:   7%|▋         | 11/161 [00:00<00:12, 12.46it/s]
Epoch 37:   8%|▊         | 13/161 [00:01<00:11, 12.49it/s]
Epoch 37:   9%|▉         | 15/161 [00:01<00:13, 10.65it/s]
Epoch 37:  11%|█         | 17/161 [00:01<00:13, 10.60it/s]
Epoch 37:  12%|█▏        | 19/161 [00:01<00:12, 11.24it/s]
Epoch 37:  13%|█▎        | 21/161 [00:01<00:11, 11.72it/s]
Epoch 37:  14%|█▍        | 23/161 [00:02<00:11, 12.25it/s]
Epoch 37:  16%|█▌        | 25/161 [00:02<00:10, 12.46it/s]
Epoch 37:  17%|█▋        | 27/161 [00:02<00:10, 12.22it/s]
Epoch 37:  18%|█▊        | 29/161 [00:02<00:11, 11.88it/s]
Epoc

Saved best model at epoch 37 with loss 1.3093


                                                                 

Epoch 37/100 - loss: 2.6174


Training Epochs:  37%|███▋      | 37/100 [09:50<16:53, 16.09s/it]
Epoch 38:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 38:   1%|          | 1/161 [00:00<00:38,  4.19it/s]
Epoch 38:   2%|▏         | 3/161 [00:00<00:20,  7.53it/s]
Epoch 38:   3%|▎         | 5/161 [00:00<00:17,  8.77it/s]
Epoch 38:   4%|▍         | 7/161 [00:00<00:15,  9.74it/s]
Epoch 38:   6%|▌         | 9/161 [00:00<00:14, 10.42it/s]
Epoch 38:   7%|▋         | 11/161 [00:01<00:14, 10.62it/s]
Epoch 38:   8%|▊         | 13/161 [00:01<00:16,  9.09it/s]
Epoch 38:   9%|▉         | 15/161 [00:01<00:14, 10.02it/s]
Epoch 38:  11%|█         | 17/161 [00:01<00:13, 10.39it/s]
Epoch 38:  12%|█▏        | 19/161 [00:01<00:12, 10.97it/s]
Epoch 38:  13%|█▎        | 21/161 [00:02<00:11, 11.67it/s]
Epoch 38:  14%|█▍        | 23/161 [00:02<00:12, 10.89it/s]
Epoch 38:  16%|█▌        | 25/161 [00:02<00:12, 11.23it/s]
Epoch 38:  17%|█▋        | 27/161 [00:02<00:11, 11.38it/s]
Epoch 38:  18%|█▊        | 29/161 [00:02<00:11, 11.53it/s]
Epoc

Saved best model at epoch 38 with loss 1.2905


                                                                 

Epoch 38/100 - loss: 2.6102


Training Epochs:  38%|███▊      | 38/100 [10:06<16:40, 16.14s/it]
Epoch 39:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 39:   1%|          | 2/161 [00:00<00:12, 12.51it/s]
Epoch 39:   2%|▏         | 4/161 [00:00<00:12, 12.98it/s]
Epoch 39:   4%|▎         | 6/161 [00:00<00:11, 13.40it/s]
Epoch 39:   5%|▍         | 8/161 [00:00<00:13, 11.71it/s]
Epoch 39:   6%|▌         | 10/161 [00:00<00:13, 11.37it/s]
Epoch 39:   7%|▋         | 12/161 [00:01<00:16,  9.15it/s]
Epoch 39:   8%|▊         | 13/161 [00:01<00:16,  9.05it/s]
Epoch 39:   9%|▉         | 15/161 [00:01<00:14,  9.88it/s]
Epoch 39:  11%|█         | 17/161 [00:01<00:13, 10.71it/s]
Epoch 39:  12%|█▏        | 19/161 [00:01<00:12, 11.18it/s]
Epoch 39:  13%|█▎        | 21/161 [00:01<00:12, 11.01it/s]
Epoch 39:  14%|█▍        | 23/161 [00:02<00:15,  9.05it/s]
Epoch 39:  15%|█▍        | 24/161 [00:02<00:14,  9.20it/s]
Epoch 39:  16%|█▌        | 25/161 [00:02<00:14,  9.24it/s]
Epoch 39:  16%|█▌        | 26/161 [00:02<00:17,  7.52it/s]
Epo

Epoch 39/100 - loss: 2.5983


Training Epochs:  39%|███▉      | 39/100 [10:22<16:14, 15.98s/it]
Epoch 40:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 40:   1%|          | 2/161 [00:00<00:11, 13.46it/s]
Epoch 40:   2%|▏         | 4/161 [00:00<00:11, 13.77it/s]
Epoch 40:   4%|▎         | 6/161 [00:00<00:11, 13.65it/s]
Epoch 40:   5%|▍         | 8/161 [00:00<00:11, 13.39it/s]
Epoch 40:   6%|▌         | 10/161 [00:00<00:11, 13.63it/s]
Epoch 40:   7%|▋         | 12/161 [00:00<00:11, 13.34it/s]
Epoch 40:   9%|▊         | 14/161 [00:01<00:10, 13.51it/s]
Epoch 40:  10%|▉         | 16/161 [00:01<00:10, 13.54it/s]
Epoch 40:  11%|█         | 18/161 [00:01<00:10, 13.19it/s]
Epoch 40:  12%|█▏        | 20/161 [00:01<00:13, 10.33it/s]
Epoch 40:  14%|█▎        | 22/161 [00:01<00:12, 10.90it/s]
Epoch 40:  15%|█▍        | 24/161 [00:02<00:15,  8.68it/s]
Epoch 40:  16%|█▌        | 26/161 [00:02<00:14,  9.58it/s]
Epoch 40:  17%|█▋        | 28/161 [00:02<00:13, 10.04it/s]
Epoch 40:  19%|█▊        | 30/161 [00:02<00:12, 10.39it/s]
Epo

Epoch 40/100 - loss: 2.5928


Training Epochs:  40%|████      | 40/100 [10:37<15:49, 15.83s/it]
Epoch 41:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 41:   1%|          | 2/161 [00:00<00:12, 12.49it/s]
Epoch 41:   2%|▏         | 4/161 [00:00<00:12, 12.60it/s]
Epoch 41:   4%|▎         | 6/161 [00:00<00:12, 12.68it/s]
Epoch 41:   5%|▍         | 8/161 [00:00<00:13, 11.59it/s]
Epoch 41:   6%|▌         | 10/161 [00:00<00:12, 11.96it/s]
Epoch 41:   7%|▋         | 12/161 [00:00<00:12, 11.94it/s]
Epoch 41:   9%|▊         | 14/161 [00:01<00:13, 11.01it/s]
Epoch 41:  10%|▉         | 16/161 [00:01<00:14, 10.25it/s]
Epoch 41:  11%|█         | 18/161 [00:01<00:13, 10.71it/s]
Epoch 41:  12%|█▏        | 20/161 [00:01<00:12, 11.29it/s]
Epoch 41:  14%|█▎        | 22/161 [00:01<00:11, 11.64it/s]
Epoch 41:  15%|█▍        | 24/161 [00:02<00:11, 11.42it/s]
Epoch 41:  16%|█▌        | 26/161 [00:02<00:11, 11.83it/s]
Epoch 41:  17%|█▋        | 28/161 [00:02<00:10, 12.26it/s]
Epoch 41:  19%|█▊        | 30/161 [00:02<00:10, 12.17it/s]
Epo

Saved best model at epoch 41 with loss 1.2716


                                                                 

Epoch 41/100 - loss: 2.5870


Training Epochs:  41%|████      | 41/100 [10:52<15:14, 15.49s/it]
Epoch 42:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 42:   1%|          | 2/161 [00:00<00:12, 13.14it/s]
Epoch 42:   2%|▏         | 4/161 [00:00<00:13, 11.96it/s]
Epoch 42:   4%|▎         | 6/161 [00:00<00:12, 12.41it/s]
Epoch 42:   5%|▍         | 8/161 [00:00<00:12, 12.16it/s]
Epoch 42:   6%|▌         | 10/161 [00:01<00:23,  6.53it/s]
Epoch 42:   7%|▋         | 11/161 [00:01<00:21,  6.88it/s]
Epoch 42:   8%|▊         | 13/161 [00:01<00:18,  8.16it/s]
Epoch 42:   9%|▊         | 14/161 [00:01<00:17,  8.38it/s]
Epoch 42:   9%|▉         | 15/161 [00:01<00:18,  7.80it/s]
Epoch 42:  10%|▉         | 16/161 [00:01<00:20,  6.98it/s]
Epoch 42:  11%|█         | 18/161 [00:02<00:16,  8.73it/s]
Epoch 42:  12%|█▏        | 19/161 [00:02<00:16,  8.73it/s]
Epoch 42:  13%|█▎        | 21/161 [00:02<00:14,  9.50it/s]
Epoch 42:  14%|█▎        | 22/161 [00:02<00:14,  9.28it/s]
Epoch 42:  14%|█▍        | 23/161 [00:02<00:15,  8.99it/s]
Epo

Saved best model at epoch 42 with loss 1.2706


                                                                 

Epoch 42/100 - loss: 2.5767


Training Epochs:  42%|████▏     | 42/100 [11:08<15:10, 15.71s/it]
Epoch 43:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 43:   1%|          | 1/161 [00:00<00:31,  5.10it/s]
Epoch 43:   2%|▏         | 3/161 [00:00<00:16,  9.50it/s]
Epoch 43:   3%|▎         | 5/161 [00:00<00:13, 11.16it/s]
Epoch 43:   4%|▍         | 7/161 [00:00<00:14, 10.88it/s]
Epoch 43:   6%|▌         | 9/161 [00:00<00:13, 11.57it/s]
Epoch 43:   7%|▋         | 11/161 [00:00<00:12, 12.19it/s]
Epoch 43:   8%|▊         | 13/161 [00:01<00:11, 12.55it/s]
Epoch 43:   9%|▉         | 15/161 [00:01<00:14, 10.18it/s]
Epoch 43:  11%|█         | 17/161 [00:01<00:13, 10.84it/s]
Epoch 43:  12%|█▏        | 19/161 [00:01<00:12, 11.40it/s]
Epoch 43:  13%|█▎        | 21/161 [00:01<00:12, 11.64it/s]
Epoch 43:  14%|█▍        | 23/161 [00:02<00:11, 11.91it/s]
Epoch 43:  16%|█▌        | 25/161 [00:02<00:11, 11.63it/s]
Epoch 43:  17%|█▋        | 27/161 [00:02<00:11, 11.81it/s]
Epoch 43:  18%|█▊        | 29/161 [00:02<00:10, 12.23it/s]
Epoc

Saved best model at epoch 43 with loss 1.2615


                                                                 

Epoch 43/100 - loss: 2.5647


Training Epochs:  43%|████▎     | 43/100 [11:23<14:41, 15.47s/it]
Epoch 44:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 44:   1%|          | 2/161 [00:00<00:12, 12.56it/s]
Epoch 44:   2%|▏         | 4/161 [00:00<00:12, 12.75it/s]
Epoch 44:   4%|▎         | 6/161 [00:00<00:12, 12.76it/s]
Epoch 44:   5%|▍         | 8/161 [00:00<00:16,  9.28it/s]
Epoch 44:   6%|▌         | 10/161 [00:00<00:15,  9.53it/s]
Epoch 44:   7%|▋         | 12/161 [00:01<00:15,  9.56it/s]
Epoch 44:   9%|▊         | 14/161 [00:01<00:15,  9.56it/s]
Epoch 44:  10%|▉         | 16/161 [00:01<00:14, 10.35it/s]
Epoch 44:  11%|█         | 18/161 [00:01<00:13, 10.85it/s]
Epoch 44:  12%|█▏        | 20/161 [00:01<00:12, 11.39it/s]
Epoch 44:  14%|█▎        | 22/161 [00:02<00:11, 11.85it/s]
Epoch 44:  15%|█▍        | 24/161 [00:02<00:11, 12.19it/s]
Epoch 44:  16%|█▌        | 26/161 [00:02<00:11, 12.17it/s]
Epoch 44:  17%|█▋        | 28/161 [00:02<00:11, 11.89it/s]
Epoch 44:  19%|█▊        | 30/161 [00:02<00:11, 11.54it/s]
Epo

Epoch 44/100 - loss: 2.5587


Training Epochs:  44%|████▍     | 44/100 [11:37<14:10, 15.20s/it]
Epoch 45:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 45:   1%|          | 1/161 [00:00<00:28,  5.52it/s]
Epoch 45:   2%|▏         | 3/161 [00:00<00:16,  9.62it/s]
Epoch 45:   3%|▎         | 5/161 [00:00<00:14, 10.90it/s]
Epoch 45:   4%|▍         | 7/161 [00:00<00:13, 11.82it/s]
Epoch 45:   6%|▌         | 9/161 [00:00<00:12, 12.44it/s]
Epoch 45:   7%|▋         | 11/161 [00:00<00:11, 12.80it/s]
Epoch 45:   8%|▊         | 13/161 [00:01<00:11, 12.97it/s]
Epoch 45:   9%|▉         | 15/161 [00:01<00:11, 12.36it/s]
Epoch 45:  11%|█         | 17/161 [00:01<00:11, 12.76it/s]
Epoch 45:  12%|█▏        | 19/161 [00:01<00:11, 12.68it/s]
Epoch 45:  13%|█▎        | 21/161 [00:01<00:10, 12.95it/s]
Epoch 45:  14%|█▍        | 23/161 [00:01<00:10, 12.85it/s]
Epoch 45:  16%|█▌        | 25/161 [00:02<00:10, 13.07it/s]
Epoch 45:  17%|█▋        | 27/161 [00:02<00:10, 13.29it/s]
Epoch 45:  18%|█▊        | 29/161 [00:02<00:09, 13.38it/s]
Epoc

Saved best model at epoch 45 with loss 1.2417


                                                                 

Epoch 45/100 - loss: 2.5546


Training Epochs:  45%|████▌     | 45/100 [11:52<13:39, 14.89s/it]
Epoch 46:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 46:   1%|          | 2/161 [00:00<00:12, 12.97it/s]
Epoch 46:   2%|▏         | 4/161 [00:00<00:12, 12.83it/s]
Epoch 46:   4%|▎         | 6/161 [00:00<00:12, 12.75it/s]
Epoch 46:   5%|▍         | 8/161 [00:00<00:11, 13.01it/s]
Epoch 46:   6%|▌         | 10/161 [00:00<00:11, 13.19it/s]
Epoch 46:   7%|▋         | 12/161 [00:00<00:11, 13.48it/s]
Epoch 46:   9%|▊         | 14/161 [00:01<00:11, 13.30it/s]
Epoch 46:  10%|▉         | 16/161 [00:01<00:12, 11.82it/s]
Epoch 46:  11%|█         | 18/161 [00:01<00:11, 12.21it/s]
Epoch 46:  12%|█▏        | 20/161 [00:01<00:11, 12.55it/s]
Epoch 46:  14%|█▎        | 22/161 [00:01<00:10, 12.71it/s]
Epoch 46:  15%|█▍        | 24/161 [00:02<00:13, 10.29it/s]
Epoch 46:  16%|█▌        | 26/161 [00:02<00:13,  9.81it/s]
Epoch 46:  17%|█▋        | 28/161 [00:02<00:12, 10.43it/s]
Epoch 46:  19%|█▊        | 30/161 [00:02<00:11, 11.02it/s]
Epo

Saved best model at epoch 46 with loss 1.2405


                                                                 

Epoch 46/100 - loss: 2.5435


Training Epochs:  46%|████▌     | 46/100 [12:07<13:31, 15.02s/it]
Epoch 47:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 47:   1%|          | 2/161 [00:00<00:12, 12.38it/s]
Epoch 47:   2%|▏         | 4/161 [00:00<00:13, 11.94it/s]
Epoch 47:   4%|▎         | 6/161 [00:00<00:12, 12.42it/s]
Epoch 47:   5%|▍         | 8/161 [00:00<00:12, 12.72it/s]
Epoch 47:   6%|▌         | 10/161 [00:00<00:11, 12.88it/s]
Epoch 47:   7%|▋         | 12/161 [00:00<00:12, 12.29it/s]
Epoch 47:   9%|▊         | 14/161 [00:01<00:15,  9.36it/s]
Epoch 47:  10%|▉         | 16/161 [00:01<00:14,  9.96it/s]
Epoch 47:  11%|█         | 18/161 [00:01<00:13, 10.27it/s]
Epoch 47:  12%|█▏        | 20/161 [00:01<00:14,  9.90it/s]
Epoch 47:  14%|█▎        | 22/161 [00:02<00:13, 10.41it/s]
Epoch 47:  15%|█▍        | 24/161 [00:02<00:12, 10.78it/s]
Epoch 47:  16%|█▌        | 26/161 [00:02<00:12, 10.60it/s]
Epoch 47:  17%|█▋        | 28/161 [00:02<00:11, 11.17it/s]
Epoch 47:  19%|█▊        | 30/161 [00:02<00:12, 10.46it/s]
Epo

Saved best model at epoch 47 with loss 1.2393


                                                                 

Epoch 47/100 - loss: 2.5411


Training Epochs:  47%|████▋     | 47/100 [12:25<14:05, 15.95s/it]
Epoch 48:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 48:   1%|          | 2/161 [00:00<00:12, 13.15it/s]
Epoch 48:   2%|▏         | 4/161 [00:00<00:11, 13.61it/s]
Epoch 48:   4%|▎         | 6/161 [00:00<00:11, 13.72it/s]
Epoch 48:   5%|▍         | 8/161 [00:00<00:11, 13.66it/s]
Epoch 48:   6%|▌         | 10/161 [00:00<00:11, 13.71it/s]
Epoch 48:   7%|▋         | 12/161 [00:00<00:11, 13.31it/s]
Epoch 48:   9%|▊         | 14/161 [00:01<00:11, 12.54it/s]
Epoch 48:  10%|▉         | 16/161 [00:01<00:11, 12.20it/s]
Epoch 48:  11%|█         | 18/161 [00:01<00:11, 12.76it/s]
Epoch 48:  12%|█▏        | 20/161 [00:01<00:10, 12.99it/s]
Epoch 48:  14%|█▎        | 22/161 [00:01<00:10, 12.95it/s]
Epoch 48:  15%|█▍        | 24/161 [00:01<00:10, 12.96it/s]
Epoch 48:  16%|█▌        | 26/161 [00:02<00:10, 12.80it/s]
Epoch 48:  17%|█▋        | 28/161 [00:02<00:10, 12.46it/s]
Epoch 48:  19%|█▊        | 30/161 [00:02<00:11, 11.29it/s]
Epo

Saved best model at epoch 48 with loss 1.2360


                                                                 

Epoch 48/100 - loss: 2.5340


Training Epochs:  48%|████▊     | 48/100 [12:41<13:48, 15.94s/it]
Epoch 49:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 49:   1%|          | 1/161 [00:00<00:22,  6.97it/s]
Epoch 49:   1%|          | 2/161 [00:00<00:21,  7.32it/s]
Epoch 49:   2%|▏         | 3/161 [00:00<00:19,  8.25it/s]
Epoch 49:   3%|▎         | 5/161 [00:00<00:16,  9.73it/s]
Epoch 49:   4%|▍         | 7/161 [00:00<00:13, 11.06it/s]
Epoch 49:   6%|▌         | 9/161 [00:00<00:12, 11.81it/s]
Epoch 49:   7%|▋         | 11/161 [00:01<00:12, 12.19it/s]
Epoch 49:   8%|▊         | 13/161 [00:01<00:12, 12.30it/s]
Epoch 49:   9%|▉         | 15/161 [00:01<00:11, 12.32it/s]
Epoch 49:  11%|█         | 17/161 [00:01<00:11, 12.60it/s]
Epoch 49:  12%|█▏        | 19/161 [00:01<00:11, 12.87it/s]
Epoch 49:  13%|█▎        | 21/161 [00:01<00:10, 12.88it/s]
Epoch 49:  14%|█▍        | 23/161 [00:01<00:10, 13.13it/s]
Epoch 49:  16%|█▌        | 25/161 [00:02<00:10, 13.34it/s]
Epoch 49:  17%|█▋        | 27/161 [00:02<00:09, 13.48it/s]
Epoch

Saved best model at epoch 49 with loss 1.2314


                                                                 

Epoch 49/100 - loss: 2.5306


Training Epochs:  49%|████▉     | 49/100 [12:55<13:03, 15.36s/it]
Epoch 50:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 50:   1%|          | 2/161 [00:00<00:13, 11.58it/s]
Epoch 50:   2%|▏         | 4/161 [00:00<00:19,  8.23it/s]
Epoch 50:   3%|▎         | 5/161 [00:00<00:18,  8.60it/s]
Epoch 50:   4%|▍         | 7/161 [00:00<00:15,  9.82it/s]
Epoch 50:   6%|▌         | 9/161 [00:00<00:14, 10.74it/s]
Epoch 50:   7%|▋         | 11/161 [00:01<00:13, 11.34it/s]
Epoch 50:   8%|▊         | 13/161 [00:01<00:12, 11.99it/s]
Epoch 50:   9%|▉         | 15/161 [00:01<00:12, 11.75it/s]
Epoch 50:  11%|█         | 17/161 [00:01<00:11, 12.17it/s]
Epoch 50:  12%|█▏        | 19/161 [00:01<00:11, 12.40it/s]
Epoch 50:  13%|█▎        | 21/161 [00:01<00:11, 12.48it/s]
Epoch 50:  14%|█▍        | 23/161 [00:01<00:10, 12.90it/s]
Epoch 50:  16%|█▌        | 25/161 [00:02<00:10, 12.81it/s]
Epoch 50:  17%|█▋        | 27/161 [00:02<00:10, 13.10it/s]
Epoch 50:  18%|█▊        | 29/161 [00:02<00:09, 13.26it/s]
Epoc

Saved best model at epoch 50 with loss 1.2206


                                                                 

Epoch 50/100 - loss: 2.5248


Training Epochs:  50%|█████     | 50/100 [13:10<12:37, 15.16s/it]
Epoch 51:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 51:   1%|          | 2/161 [00:00<00:13, 12.17it/s]
Epoch 51:   2%|▏         | 4/161 [00:00<00:12, 12.71it/s]
Epoch 51:   4%|▎         | 6/161 [00:00<00:12, 12.80it/s]
Epoch 51:   5%|▍         | 8/161 [00:00<00:11, 13.18it/s]
Epoch 51:   6%|▌         | 10/161 [00:00<00:11, 13.38it/s]
Epoch 51:   7%|▋         | 12/161 [00:00<00:11, 13.47it/s]
Epoch 51:   9%|▊         | 14/161 [00:01<00:11, 13.33it/s]
Epoch 51:  10%|▉         | 16/161 [00:01<00:11, 12.75it/s]
Epoch 51:  11%|█         | 18/161 [00:01<00:11, 12.97it/s]
Epoch 51:  12%|█▏        | 20/161 [00:01<00:10, 13.17it/s]
Epoch 51:  14%|█▎        | 22/161 [00:01<00:11, 12.58it/s]
Epoch 51:  15%|█▍        | 24/161 [00:01<00:11, 12.45it/s]
Epoch 51:  16%|█▌        | 26/161 [00:02<00:10, 12.77it/s]
Epoch 51:  17%|█▋        | 28/161 [00:02<00:10, 12.97it/s]
Epoch 51:  19%|█▊        | 30/161 [00:02<00:09, 13.18it/s]
Epo

Saved best model at epoch 51 with loss 1.2121


                                                                 

Epoch 51/100 - loss: 2.5150


Training Epochs:  51%|█████     | 51/100 [13:23<11:56, 14.63s/it]
Epoch 52:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 52:   1%|          | 2/161 [00:00<00:12, 13.04it/s]
Epoch 52:   2%|▏         | 4/161 [00:00<00:11, 13.35it/s]
Epoch 52:   4%|▎         | 6/161 [00:00<00:11, 13.40it/s]
Epoch 52:   5%|▍         | 8/161 [00:00<00:11, 13.25it/s]
Epoch 52:   6%|▌         | 10/161 [00:00<00:11, 13.41it/s]
Epoch 52:   7%|▋         | 12/161 [00:00<00:11, 13.35it/s]
Epoch 52:   9%|▊         | 14/161 [00:01<00:10, 13.44it/s]
Epoch 52:  10%|▉         | 16/161 [00:01<00:12, 11.35it/s]
Epoch 52:  11%|█         | 18/161 [00:01<00:12, 11.83it/s]
Epoch 52:  12%|█▏        | 20/161 [00:01<00:11, 12.35it/s]
Epoch 52:  14%|█▎        | 22/161 [00:01<00:11, 12.52it/s]
Epoch 52:  15%|█▍        | 24/161 [00:01<00:10, 12.57it/s]
Epoch 52:  16%|█▌        | 26/161 [00:02<00:12, 11.01it/s]
Epoch 52:  17%|█▋        | 28/161 [00:02<00:11, 11.61it/s]
Epoch 52:  19%|█▊        | 30/161 [00:02<00:11, 11.82it/s]
Epo

Epoch 52/100 - loss: 2.5086


Training Epochs:  52%|█████▏    | 52/100 [13:37<11:30, 14.38s/it]
Epoch 53:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 53:   1%|          | 2/161 [00:00<00:11, 13.55it/s]
Epoch 53:   2%|▏         | 4/161 [00:00<00:13, 12.01it/s]
Epoch 53:   4%|▎         | 6/161 [00:00<00:13, 11.86it/s]
Epoch 53:   5%|▍         | 8/161 [00:00<00:12, 11.97it/s]
Epoch 53:   6%|▌         | 10/161 [00:00<00:12, 11.92it/s]
Epoch 53:   7%|▋         | 12/161 [00:01<00:12, 11.70it/s]
Epoch 53:   9%|▊         | 14/161 [00:01<00:13, 11.25it/s]
Epoch 53:  10%|▉         | 16/161 [00:01<00:12, 11.41it/s]
Epoch 53:  11%|█         | 18/161 [00:01<00:12, 11.76it/s]
Epoch 53:  12%|█▏        | 20/161 [00:01<00:11, 12.21it/s]
Epoch 53:  14%|█▎        | 22/161 [00:01<00:11, 12.36it/s]
Epoch 53:  15%|█▍        | 24/161 [00:02<00:11, 12.09it/s]
Epoch 53:  16%|█▌        | 26/161 [00:02<00:10, 12.59it/s]
Epoch 53:  17%|█▋        | 28/161 [00:02<00:10, 12.57it/s]
Epoch 53:  19%|█▊        | 30/161 [00:02<00:11, 11.48it/s]
Epo

Saved best model at epoch 53 with loss 1.2020


                                                                 

Epoch 53/100 - loss: 2.5039


Training Epochs:  53%|█████▎    | 53/100 [13:52<11:25, 14.58s/it]
Epoch 54:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 54:   1%|          | 1/161 [00:00<00:16,  9.61it/s]
Epoch 54:   1%|          | 2/161 [00:00<00:16,  9.78it/s]
Epoch 54:   2%|▏         | 4/161 [00:00<00:15, 10.17it/s]
Epoch 54:   4%|▎         | 6/161 [00:00<00:13, 11.23it/s]
Epoch 54:   5%|▍         | 8/161 [00:00<00:12, 11.93it/s]
Epoch 54:   6%|▌         | 10/161 [00:00<00:12, 11.75it/s]
Epoch 54:   7%|▋         | 12/161 [00:01<00:12, 11.64it/s]
Epoch 54:   9%|▊         | 14/161 [00:01<00:14, 10.20it/s]
Epoch 54:  10%|▉         | 16/161 [00:01<00:13, 10.96it/s]
Epoch 54:  11%|█         | 18/161 [00:01<00:12, 11.19it/s]
Epoch 54:  12%|█▏        | 20/161 [00:01<00:12, 10.88it/s]
Epoch 54:  14%|█▎        | 22/161 [00:02<00:16,  8.22it/s]
Epoch 54:  14%|█▍        | 23/161 [00:02<00:17,  8.09it/s]
Epoch 54:  16%|█▌        | 25/161 [00:02<00:16,  8.47it/s]
Epoch 54:  17%|█▋        | 27/161 [00:02<00:14,  9.19it/s]
Epoc

Epoch 54/100 - loss: 2.5004


Training Epochs:  54%|█████▍    | 54/100 [14:07<11:18, 14.75s/it]
Epoch 55:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 55:   1%|          | 2/161 [00:00<00:12, 12.28it/s]
Epoch 55:   2%|▏         | 4/161 [00:00<00:13, 11.70it/s]
Epoch 55:   4%|▎         | 6/161 [00:00<00:12, 11.97it/s]
Epoch 55:   5%|▍         | 8/161 [00:00<00:15,  9.58it/s]
Epoch 55:   6%|▌         | 10/161 [00:00<00:14, 10.24it/s]
Epoch 55:   7%|▋         | 12/161 [00:01<00:14, 10.53it/s]
Epoch 55:   9%|▊         | 14/161 [00:01<00:12, 11.33it/s]
Epoch 55:  10%|▉         | 16/161 [00:01<00:12, 11.72it/s]
Epoch 55:  11%|█         | 18/161 [00:01<00:11, 12.03it/s]
Epoch 55:  12%|█▏        | 20/161 [00:01<00:11, 12.51it/s]
Epoch 55:  14%|█▎        | 22/161 [00:01<00:11, 12.47it/s]
Epoch 55:  15%|█▍        | 24/161 [00:02<00:11, 12.43it/s]
Epoch 55:  16%|█▌        | 26/161 [00:02<00:10, 12.65it/s]
Epoch 55:  17%|█▋        | 28/161 [00:02<00:10, 12.69it/s]
Epoch 55:  19%|█▊        | 30/161 [00:02<00:10, 12.11it/s]
Epo

Saved best model at epoch 55 with loss 1.2003


                                                                 

Epoch 55/100 - loss: 2.4940


Training Epochs:  55%|█████▌    | 55/100 [14:22<11:03, 14.75s/it]
Epoch 56:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 56:   1%|          | 1/161 [00:00<00:30,  5.19it/s]
Epoch 56:   2%|▏         | 3/161 [00:00<00:18,  8.69it/s]
Epoch 56:   3%|▎         | 5/161 [00:00<00:16,  9.41it/s]
Epoch 56:   4%|▍         | 7/161 [00:00<00:14, 10.33it/s]
Epoch 56:   6%|▌         | 9/161 [00:00<00:13, 11.15it/s]
Epoch 56:   7%|▋         | 11/161 [00:01<00:12, 11.55it/s]
Epoch 56:   8%|▊         | 13/161 [00:01<00:13, 10.74it/s]
Epoch 56:   9%|▉         | 15/161 [00:01<00:12, 11.39it/s]
Epoch 56:  11%|█         | 17/161 [00:01<00:12, 11.91it/s]
Epoch 56:  12%|█▏        | 19/161 [00:01<00:11, 11.95it/s]
Epoch 56:  13%|█▎        | 21/161 [00:01<00:11, 11.73it/s]
Epoch 56:  14%|█▍        | 23/161 [00:02<00:11, 11.64it/s]
Epoch 56:  16%|█▌        | 25/161 [00:02<00:11, 11.39it/s]
Epoch 56:  17%|█▋        | 27/161 [00:02<00:11, 11.55it/s]
Epoch 56:  18%|█▊        | 29/161 [00:02<00:11, 11.49it/s]
Epoc

Saved best model at epoch 56 with loss 1.1952


                                                                 

Epoch 56/100 - loss: 2.4891


Training Epochs:  56%|█████▌    | 56/100 [14:37<10:56, 14.92s/it]
Epoch 57:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 57:   1%|          | 1/161 [00:00<00:36,  4.39it/s]
Epoch 57:   1%|          | 2/161 [00:00<00:26,  6.02it/s]
Epoch 57:   2%|▏         | 3/161 [00:00<00:22,  7.06it/s]
Epoch 57:   3%|▎         | 5/161 [00:00<00:16,  9.28it/s]
Epoch 57:   4%|▍         | 7/161 [00:00<00:15, 10.00it/s]
Epoch 57:   6%|▌         | 9/161 [00:00<00:14, 10.73it/s]
Epoch 57:   7%|▋         | 11/161 [00:01<00:13, 11.35it/s]
Epoch 57:   8%|▊         | 13/161 [00:01<00:15,  9.77it/s]
Epoch 57:   9%|▉         | 15/161 [00:01<00:14, 10.24it/s]
Epoch 57:  11%|█         | 17/161 [00:01<00:13, 10.93it/s]
Epoch 57:  12%|█▏        | 19/161 [00:01<00:12, 11.25it/s]
Epoch 57:  13%|█▎        | 21/161 [00:02<00:11, 11.69it/s]
Epoch 57:  14%|█▍        | 23/161 [00:02<00:11, 11.87it/s]
Epoch 57:  16%|█▌        | 25/161 [00:02<00:11, 12.21it/s]
Epoch 57:  17%|█▋        | 27/161 [00:02<00:12, 10.38it/s]
Epoch

Epoch 57/100 - loss: 2.4860


Training Epochs:  57%|█████▋    | 57/100 [14:54<11:09, 15.57s/it]
Epoch 58:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 58:   1%|          | 1/161 [00:00<00:33,  4.84it/s]
Epoch 58:   2%|▏         | 3/161 [00:00<00:18,  8.72it/s]
Epoch 58:   2%|▏         | 4/161 [00:00<00:17,  8.75it/s]
Epoch 58:   4%|▎         | 6/161 [00:00<00:16,  9.56it/s]
Epoch 58:   4%|▍         | 7/161 [00:00<00:16,  9.21it/s]
Epoch 58:   6%|▌         | 9/161 [00:00<00:15,  9.78it/s]
Epoch 58:   7%|▋         | 11/161 [00:01<00:14, 10.41it/s]
Epoch 58:   8%|▊         | 13/161 [00:01<00:14, 10.57it/s]
Epoch 58:   9%|▉         | 15/161 [00:01<00:12, 11.24it/s]
Epoch 58:  11%|█         | 17/161 [00:01<00:12, 11.37it/s]
Epoch 58:  12%|█▏        | 19/161 [00:01<00:12, 11.73it/s]
Epoch 58:  13%|█▎        | 21/161 [00:01<00:11, 11.86it/s]
Epoch 58:  14%|█▍        | 23/161 [00:02<00:11, 12.06it/s]
Epoch 58:  16%|█▌        | 25/161 [00:02<00:11, 11.81it/s]
Epoch 58:  17%|█▋        | 27/161 [00:02<00:11, 11.69it/s]
Epoch

Saved best model at epoch 58 with loss 1.1920


                                                                 

Epoch 58/100 - loss: 2.4829


Training Epochs:  58%|█████▊    | 58/100 [15:09<10:45, 15.36s/it]
Epoch 59:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 59:   1%|          | 2/161 [00:00<00:12, 12.85it/s]
Epoch 59:   2%|▏         | 4/161 [00:00<00:11, 13.34it/s]
Epoch 59:   4%|▎         | 6/161 [00:00<00:11, 13.33it/s]
Epoch 59:   5%|▍         | 8/161 [00:00<00:11, 13.34it/s]
Epoch 59:   6%|▌         | 10/161 [00:00<00:11, 12.64it/s]
Epoch 59:   7%|▋         | 12/161 [00:00<00:11, 12.97it/s]
Epoch 59:   9%|▊         | 14/161 [00:01<00:11, 13.12it/s]
Epoch 59:  10%|▉         | 16/161 [00:01<00:13, 11.05it/s]
Epoch 59:  11%|█         | 18/161 [00:01<00:14,  9.88it/s]
Epoch 59:  12%|█▏        | 20/161 [00:01<00:14,  9.95it/s]
Epoch 59:  14%|█▎        | 22/161 [00:01<00:13, 10.54it/s]
Epoch 59:  15%|█▍        | 24/161 [00:02<00:12, 10.73it/s]
Epoch 59:  16%|█▌        | 26/161 [00:02<00:12, 11.18it/s]
Epoch 59:  17%|█▋        | 28/161 [00:02<00:11, 11.55it/s]
Epoch 59:  19%|█▊        | 30/161 [00:02<00:11, 11.12it/s]
Epo

Epoch 59/100 - loss: 2.4746


Training Epochs:  59%|█████▉    | 59/100 [15:24<10:29, 15.36s/it]
Epoch 60:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 60:   1%|          | 1/161 [00:00<00:33,  4.84it/s]
Epoch 60:   2%|▏         | 3/161 [00:00<00:18,  8.48it/s]
Epoch 60:   3%|▎         | 5/161 [00:00<00:16,  9.74it/s]
Epoch 60:   4%|▍         | 7/161 [00:00<00:15,  9.90it/s]
Epoch 60:   6%|▌         | 9/161 [00:00<00:14, 10.31it/s]
Epoch 60:   7%|▋         | 11/161 [00:01<00:14, 10.55it/s]
Epoch 60:   8%|▊         | 13/161 [00:01<00:13, 10.74it/s]
Epoch 60:   9%|▉         | 15/161 [00:01<00:12, 11.27it/s]
Epoch 60:  11%|█         | 17/161 [00:01<00:12, 11.24it/s]
Epoch 60:  12%|█▏        | 19/161 [00:01<00:12, 11.44it/s]
Epoch 60:  13%|█▎        | 21/161 [00:01<00:12, 11.33it/s]
Epoch 60:  14%|█▍        | 23/161 [00:02<00:11, 11.79it/s]
Epoch 60:  16%|█▌        | 25/161 [00:02<00:11, 12.03it/s]
Epoch 60:  17%|█▋        | 27/161 [00:02<00:11, 12.13it/s]
Epoch 60:  18%|█▊        | 29/161 [00:02<00:11, 11.69it/s]
Epoc

Saved best model at epoch 60 with loss 1.1864


                                                                 

Epoch 60/100 - loss: 2.4746


Training Epochs:  60%|██████    | 60/100 [15:39<10:09, 15.25s/it]
Epoch 61:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 61:   1%|          | 1/161 [00:00<00:18,  8.43it/s]
Epoch 61:   2%|▏         | 3/161 [00:00<00:14, 10.71it/s]
Epoch 61:   3%|▎         | 5/161 [00:00<00:15, 10.16it/s]
Epoch 61:   4%|▍         | 7/161 [00:00<00:14, 10.83it/s]
Epoch 61:   6%|▌         | 9/161 [00:00<00:13, 11.36it/s]
Epoch 61:   7%|▋         | 11/161 [00:01<00:13, 11.15it/s]
Epoch 61:   8%|▊         | 13/161 [00:01<00:15,  9.76it/s]
Epoch 61:   9%|▉         | 15/161 [00:01<00:16,  8.69it/s]
Epoch 61:  11%|█         | 17/161 [00:01<00:15,  9.57it/s]
Epoch 61:  12%|█▏        | 19/161 [00:01<00:14, 10.05it/s]
Epoch 61:  13%|█▎        | 21/161 [00:02<00:13, 10.53it/s]
Epoch 61:  14%|█▍        | 23/161 [00:02<00:12, 11.19it/s]
Epoch 61:  16%|█▌        | 25/161 [00:02<00:11, 11.61it/s]
Epoch 61:  17%|█▋        | 27/161 [00:02<00:11, 12.01it/s]
Epoch 61:  18%|█▊        | 29/161 [00:02<00:11, 11.57it/s]
Epoc

Epoch 61/100 - loss: 2.4671


Training Epochs:  61%|██████    | 61/100 [15:54<09:47, 15.06s/it]
Epoch 62:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 62:   1%|          | 1/161 [00:00<00:35,  4.48it/s]
Epoch 62:   2%|▏         | 3/161 [00:00<00:18,  8.53it/s]
Epoch 62:   3%|▎         | 5/161 [00:00<00:15,  9.91it/s]
Epoch 62:   4%|▍         | 7/161 [00:00<00:14, 10.68it/s]
Epoch 62:   6%|▌         | 9/161 [00:00<00:13, 11.19it/s]
Epoch 62:   7%|▋         | 11/161 [00:01<00:13, 10.91it/s]
Epoch 62:   8%|▊         | 13/161 [00:01<00:14, 10.50it/s]
Epoch 62:   9%|▉         | 15/161 [00:01<00:13, 10.68it/s]
Epoch 62:  11%|█         | 17/161 [00:01<00:13, 10.66it/s]
Epoch 62:  12%|█▏        | 19/161 [00:01<00:12, 10.96it/s]
Epoch 62:  13%|█▎        | 21/161 [00:02<00:12, 10.88it/s]
Epoch 62:  14%|█▍        | 23/161 [00:02<00:12, 10.95it/s]
Epoch 62:  16%|█▌        | 25/161 [00:02<00:12, 11.08it/s]
Epoch 62:  17%|█▋        | 27/161 [00:02<00:11, 11.60it/s]
Epoch 62:  18%|█▊        | 29/161 [00:02<00:11, 11.97it/s]
Epoc

Saved best model at epoch 62 with loss 1.1792


                                                                 

Epoch 62/100 - loss: 2.4651


Training Epochs:  62%|██████▏   | 62/100 [16:10<09:37, 15.19s/it]
Epoch 63:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 63:   1%|          | 2/161 [00:00<00:12, 12.79it/s]
Epoch 63:   2%|▏         | 4/161 [00:00<00:11, 13.17it/s]
Epoch 63:   4%|▎         | 6/161 [00:00<00:11, 13.40it/s]
Epoch 63:   5%|▍         | 8/161 [00:00<00:12, 12.66it/s]
Epoch 63:   6%|▌         | 10/161 [00:00<00:11, 13.07it/s]
Epoch 63:   7%|▋         | 12/161 [00:00<00:11, 13.28it/s]
Epoch 63:   9%|▊         | 14/161 [00:01<00:10, 13.42it/s]
Epoch 63:  10%|▉         | 16/161 [00:01<00:12, 11.82it/s]
Epoch 63:  11%|█         | 18/161 [00:01<00:12, 11.64it/s]
Epoch 63:  12%|█▏        | 20/161 [00:01<00:11, 11.84it/s]
Epoch 63:  14%|█▎        | 22/161 [00:01<00:11, 11.62it/s]
Epoch 63:  15%|█▍        | 24/161 [00:01<00:11, 11.81it/s]
Epoch 63:  16%|█▌        | 26/161 [00:02<00:10, 12.39it/s]
Epoch 63:  17%|█▋        | 28/161 [00:02<00:10, 12.64it/s]
Epoch 63:  19%|█▊        | 30/161 [00:02<00:10, 12.67it/s]
Epo

Saved best model at epoch 63 with loss 1.1724


                                                                 

Epoch 63/100 - loss: 2.4611


Training Epochs:  63%|██████▎   | 63/100 [16:23<09:03, 14.70s/it]
Epoch 64:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 64:   1%|          | 1/161 [00:00<00:29,  5.47it/s]
Epoch 64:   2%|▏         | 3/161 [00:00<00:16,  9.87it/s]
Epoch 64:   3%|▎         | 5/161 [00:00<00:13, 11.43it/s]
Epoch 64:   4%|▍         | 7/161 [00:00<00:12, 12.09it/s]
Epoch 64:   6%|▌         | 9/161 [00:00<00:12, 12.62it/s]
Epoch 64:   7%|▋         | 11/161 [00:00<00:11, 13.13it/s]
Epoch 64:   8%|▊         | 13/161 [00:01<00:11, 13.35it/s]
Epoch 64:   9%|▉         | 15/161 [00:01<00:11, 12.71it/s]
Epoch 64:  11%|█         | 17/161 [00:01<00:12, 11.90it/s]
Epoch 64:  12%|█▏        | 19/161 [00:01<00:11, 12.32it/s]
Epoch 64:  13%|█▎        | 21/161 [00:01<00:11, 12.49it/s]
Epoch 64:  14%|█▍        | 23/161 [00:01<00:10, 12.89it/s]
Epoch 64:  16%|█▌        | 25/161 [00:02<00:10, 12.58it/s]
Epoch 64:  17%|█▋        | 27/161 [00:02<00:10, 12.62it/s]
Epoch 64:  18%|█▊        | 29/161 [00:02<00:10, 12.93it/s]
Epoc

Epoch 64/100 - loss: 2.4632


Training Epochs:  64%|██████▍   | 64/100 [16:36<08:32, 14.25s/it]
Epoch 65:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 65:   1%|          | 2/161 [00:00<00:12, 13.12it/s]
Epoch 65:   2%|▏         | 4/161 [00:00<00:11, 13.20it/s]
Epoch 65:   4%|▎         | 6/161 [00:00<00:11, 13.37it/s]
Epoch 65:   5%|▍         | 8/161 [00:00<00:11, 13.37it/s]
Epoch 65:   6%|▌         | 10/161 [00:00<00:11, 13.34it/s]
Epoch 65:   7%|▋         | 12/161 [00:00<00:11, 13.43it/s]
Epoch 65:   9%|▊         | 14/161 [00:01<00:10, 13.38it/s]
Epoch 65:  10%|▉         | 16/161 [00:01<00:10, 13.49it/s]
Epoch 65:  11%|█         | 18/161 [00:01<00:10, 13.51it/s]
Epoch 65:  12%|█▏        | 20/161 [00:01<00:10, 13.23it/s]
Epoch 65:  14%|█▎        | 22/161 [00:01<00:10, 12.80it/s]
Epoch 65:  15%|█▍        | 24/161 [00:01<00:10, 12.50it/s]
Epoch 65:  16%|█▌        | 26/161 [00:01<00:10, 12.46it/s]
Epoch 65:  17%|█▋        | 28/161 [00:02<00:10, 12.61it/s]
Epoch 65:  19%|█▊        | 30/161 [00:02<00:10, 12.63it/s]
Epo

Epoch 65/100 - loss: 2.4537


Training Epochs:  65%|██████▌   | 65/100 [16:49<08:06, 13.90s/it]
Epoch 66:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 66:   1%|          | 2/161 [00:00<00:11, 13.67it/s]
Epoch 66:   2%|▏         | 4/161 [00:00<00:11, 13.46it/s]
Epoch 66:   4%|▎         | 6/161 [00:00<00:11, 13.65it/s]
Epoch 66:   5%|▍         | 8/161 [00:00<00:11, 13.73it/s]
Epoch 66:   6%|▌         | 10/161 [00:00<00:11, 13.65it/s]
Epoch 66:   7%|▋         | 12/161 [00:00<00:10, 13.61it/s]
Epoch 66:   9%|▊         | 14/161 [00:01<00:10, 13.69it/s]
Epoch 66:  10%|▉         | 16/161 [00:01<00:10, 13.84it/s]
Epoch 66:  11%|█         | 18/161 [00:01<00:10, 13.87it/s]
Epoch 66:  12%|█▏        | 20/161 [00:01<00:10, 13.33it/s]
Epoch 66:  14%|█▎        | 22/161 [00:01<00:10, 13.25it/s]
Epoch 66:  15%|█▍        | 24/161 [00:01<00:10, 13.42it/s]
Epoch 66:  16%|█▌        | 26/161 [00:01<00:10, 13.41it/s]
Epoch 66:  17%|█▋        | 28/161 [00:02<00:09, 13.58it/s]
Epoch 66:  19%|█▊        | 30/161 [00:02<00:09, 13.49it/s]
Epo

Epoch 66/100 - loss: 2.4539


Training Epochs:  66%|██████▌   | 66/100 [17:02<07:42, 13.61s/it]
Epoch 67:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 67:   1%|          | 2/161 [00:00<00:12, 12.77it/s]
Epoch 67:   2%|▏         | 4/161 [00:00<00:11, 13.41it/s]
Epoch 67:   4%|▎         | 6/161 [00:00<00:14, 10.39it/s]
Epoch 67:   5%|▍         | 8/161 [00:00<00:13, 11.54it/s]
Epoch 67:   6%|▌         | 10/161 [00:00<00:12, 12.10it/s]
Epoch 67:   7%|▋         | 12/161 [00:00<00:11, 12.56it/s]
Epoch 67:   9%|▊         | 14/161 [00:01<00:11, 12.87it/s]
Epoch 67:  10%|▉         | 16/161 [00:01<00:11, 13.17it/s]
Epoch 67:  11%|█         | 18/161 [00:01<00:11, 12.50it/s]
Epoch 67:  12%|█▏        | 20/161 [00:01<00:10, 12.89it/s]
Epoch 67:  14%|█▎        | 22/161 [00:01<00:10, 13.21it/s]
Epoch 67:  15%|█▍        | 24/161 [00:01<00:10, 13.30it/s]
Epoch 67:  16%|█▌        | 26/161 [00:02<00:10, 13.40it/s]
Epoch 67:  17%|█▋        | 28/161 [00:02<00:09, 13.38it/s]
Epoch 67:  19%|█▊        | 30/161 [00:02<00:09, 13.54it/s]
Epo

Saved best model at epoch 67 with loss 1.1696


                                                                 

Epoch 67/100 - loss: 2.4478


Training Epochs:  67%|██████▋   | 67/100 [17:15<07:25, 13.49s/it]
Epoch 68:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 68:   1%|          | 2/161 [00:00<00:11, 13.27it/s]
Epoch 68:   2%|▏         | 4/161 [00:00<00:11, 13.54it/s]
Epoch 68:   4%|▎         | 6/161 [00:00<00:11, 13.54it/s]
Epoch 68:   5%|▍         | 8/161 [00:00<00:11, 13.67it/s]
Epoch 68:   6%|▌         | 10/161 [00:00<00:11, 13.68it/s]
Epoch 68:   7%|▋         | 12/161 [00:00<00:11, 13.49it/s]
Epoch 68:   9%|▊         | 14/161 [00:01<00:10, 13.46it/s]
Epoch 68:  10%|▉         | 16/161 [00:01<00:12, 11.46it/s]
Epoch 68:  11%|█         | 18/161 [00:01<00:11, 12.00it/s]
Epoch 68:  12%|█▏        | 20/161 [00:01<00:11, 12.11it/s]
Epoch 68:  14%|█▎        | 22/161 [00:01<00:11, 11.63it/s]
Epoch 68:  15%|█▍        | 24/161 [00:01<00:11, 12.16it/s]
Epoch 68:  16%|█▌        | 26/161 [00:02<00:10, 12.52it/s]
Epoch 68:  17%|█▋        | 28/161 [00:02<00:10, 12.79it/s]
Epoch 68:  19%|█▊        | 30/161 [00:02<00:10, 13.00it/s]
Epo

Saved best model at epoch 68 with loss 1.1693


                                                                 

Epoch 68/100 - loss: 2.4460


Training Epochs:  68%|██████▊   | 68/100 [17:29<07:08, 13.40s/it]
Epoch 69:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 69:   1%|          | 2/161 [00:00<00:11, 13.26it/s]
Epoch 69:   2%|▏         | 4/161 [00:00<00:11, 13.59it/s]
Epoch 69:   4%|▎         | 6/161 [00:00<00:11, 12.95it/s]
Epoch 69:   5%|▍         | 8/161 [00:00<00:11, 12.94it/s]
Epoch 69:   6%|▌         | 10/161 [00:00<00:12, 12.45it/s]
Epoch 69:   7%|▋         | 12/161 [00:00<00:11, 12.64it/s]
Epoch 69:   9%|▊         | 14/161 [00:01<00:11, 12.60it/s]
Epoch 69:  10%|▉         | 16/161 [00:01<00:12, 11.48it/s]
Epoch 69:  11%|█         | 18/161 [00:01<00:11, 11.98it/s]
Epoch 69:  12%|█▏        | 20/161 [00:01<00:11, 12.43it/s]
Epoch 69:  14%|█▎        | 22/161 [00:01<00:10, 12.76it/s]
Epoch 69:  15%|█▍        | 24/161 [00:01<00:10, 13.02it/s]
Epoch 69:  16%|█▌        | 26/161 [00:02<00:10, 13.18it/s]
Epoch 69:  17%|█▋        | 28/161 [00:02<00:09, 13.37it/s]
Epoch 69:  19%|█▊        | 30/161 [00:02<00:09, 13.44it/s]
Epo

Saved best model at epoch 69 with loss 1.1609


                                                                 

Epoch 69/100 - loss: 2.4384


Training Epochs:  69%|██████▉   | 69/100 [17:42<06:53, 13.34s/it]
Epoch 70:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 70:   1%|          | 2/161 [00:00<00:12, 12.87it/s]
Epoch 70:   2%|▏         | 4/161 [00:00<00:11, 13.28it/s]
Epoch 70:   4%|▎         | 6/161 [00:00<00:12, 12.23it/s]
Epoch 70:   5%|▍         | 8/161 [00:00<00:11, 12.85it/s]
Epoch 70:   6%|▌         | 10/161 [00:00<00:11, 13.28it/s]
Epoch 70:   7%|▋         | 12/161 [00:00<00:11, 13.34it/s]
Epoch 70:   9%|▊         | 14/161 [00:01<00:10, 13.54it/s]
Epoch 70:  10%|▉         | 16/161 [00:01<00:12, 11.84it/s]
Epoch 70:  11%|█         | 18/161 [00:01<00:11, 12.19it/s]
Epoch 70:  12%|█▏        | 20/161 [00:01<00:11, 12.52it/s]
Epoch 70:  14%|█▎        | 22/161 [00:01<00:10, 12.93it/s]
Epoch 70:  15%|█▍        | 24/161 [00:01<00:10, 13.14it/s]
Epoch 70:  16%|█▌        | 26/161 [00:02<00:10, 13.29it/s]
Epoch 70:  17%|█▋        | 28/161 [00:02<00:09, 13.52it/s]
Epoch 70:  19%|█▊        | 30/161 [00:02<00:09, 13.56it/s]
Epo

Epoch 70/100 - loss: 2.4418


Training Epochs:  70%|███████   | 70/100 [17:55<06:37, 13.26s/it]
Epoch 71:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 71:   1%|          | 2/161 [00:00<00:11, 13.33it/s]
Epoch 71:   2%|▏         | 4/161 [00:00<00:12, 12.15it/s]
Epoch 71:   4%|▎         | 6/161 [00:00<00:12, 12.87it/s]
Epoch 71:   5%|▍         | 8/161 [00:00<00:11, 13.16it/s]
Epoch 71:   6%|▌         | 10/161 [00:00<00:11, 13.37it/s]
Epoch 71:   7%|▋         | 12/161 [00:00<00:11, 13.53it/s]
Epoch 71:   9%|▊         | 14/161 [00:01<00:10, 13.63it/s]
Epoch 71:  10%|▉         | 16/161 [00:01<00:10, 13.75it/s]
Epoch 71:  11%|█         | 18/161 [00:01<00:10, 13.57it/s]
Epoch 71:  12%|█▏        | 20/161 [00:01<00:10, 13.51it/s]
Epoch 71:  14%|█▎        | 22/161 [00:01<00:10, 13.18it/s]
Epoch 71:  15%|█▍        | 24/161 [00:01<00:10, 13.28it/s]
Epoch 71:  16%|█▌        | 26/161 [00:01<00:10, 13.48it/s]
Epoch 71:  17%|█▋        | 28/161 [00:02<00:09, 13.47it/s]
Epoch 71:  19%|█▊        | 30/161 [00:02<00:09, 13.39it/s]
Epo

Saved best model at epoch 71 with loss 1.1502


                                                                 

Epoch 71/100 - loss: 2.4440


Training Epochs:  71%|███████   | 71/100 [18:08<06:24, 13.27s/it]
Epoch 72:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 72:   1%|          | 1/161 [00:00<00:28,  5.71it/s]
Epoch 72:   2%|▏         | 3/161 [00:00<00:15,  9.90it/s]
Epoch 72:   3%|▎         | 5/161 [00:00<00:13, 11.55it/s]
Epoch 72:   4%|▍         | 7/161 [00:00<00:12, 12.24it/s]
Epoch 72:   6%|▌         | 9/161 [00:00<00:12, 12.59it/s]
Epoch 72:   7%|▋         | 11/161 [00:00<00:11, 12.71it/s]
Epoch 72:   8%|▊         | 13/161 [00:01<00:11, 12.75it/s]
Epoch 72:   9%|▉         | 15/161 [00:01<00:12, 11.86it/s]
Epoch 72:  11%|█         | 17/161 [00:01<00:12, 11.58it/s]
Epoch 72:  12%|█▏        | 19/161 [00:01<00:11, 11.90it/s]
Epoch 72:  13%|█▎        | 21/161 [00:01<00:11, 12.42it/s]
Epoch 72:  14%|█▍        | 23/161 [00:01<00:11, 12.41it/s]
Epoch 72:  16%|█▌        | 25/161 [00:02<00:11, 12.36it/s]
Epoch 72:  17%|█▋        | 27/161 [00:02<00:10, 12.74it/s]
Epoch 72:  18%|█▊        | 29/161 [00:02<00:10, 13.01it/s]
Epoc

Epoch 72/100 - loss: 2.4269


Training Epochs:  72%|███████▏  | 72/100 [18:21<06:10, 13.24s/it]
Epoch 73:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 73:   1%|          | 2/161 [00:00<00:11, 13.62it/s]
Epoch 73:   2%|▏         | 4/161 [00:00<00:11, 13.61it/s]
Epoch 73:   4%|▎         | 6/161 [00:00<00:11, 13.64it/s]
Epoch 73:   5%|▍         | 8/161 [00:00<00:11, 13.42it/s]
Epoch 73:   6%|▌         | 10/161 [00:00<00:11, 13.44it/s]
Epoch 73:   7%|▋         | 12/161 [00:00<00:11, 13.48it/s]
Epoch 73:   9%|▊         | 14/161 [00:01<00:10, 13.57it/s]
Epoch 73:  10%|▉         | 16/161 [00:01<00:10, 13.67it/s]
Epoch 73:  11%|█         | 18/161 [00:01<00:10, 13.72it/s]
Epoch 73:  12%|█▏        | 20/161 [00:01<00:10, 13.61it/s]
Epoch 73:  14%|█▎        | 22/161 [00:01<00:10, 13.67it/s]
Epoch 73:  15%|█▍        | 24/161 [00:01<00:10, 12.92it/s]
Epoch 73:  16%|█▌        | 26/161 [00:01<00:10, 12.79it/s]
Epoch 73:  17%|█▋        | 28/161 [00:02<00:10, 13.06it/s]
Epoch 73:  19%|█▊        | 30/161 [00:02<00:09, 13.33it/s]
Epo

Epoch 73/100 - loss: 2.4292


Training Epochs:  73%|███████▎  | 73/100 [18:34<05:55, 13.16s/it]
Epoch 74:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 74:   1%|          | 2/161 [00:00<00:11, 13.42it/s]
Epoch 74:   2%|▏         | 4/161 [00:00<00:11, 13.53it/s]
Epoch 74:   4%|▎         | 6/161 [00:00<00:11, 13.70it/s]
Epoch 74:   5%|▍         | 8/161 [00:00<00:10, 13.93it/s]
Epoch 74:   6%|▌         | 10/161 [00:00<00:10, 13.74it/s]
Epoch 74:   7%|▋         | 12/161 [00:00<00:10, 13.77it/s]
Epoch 74:   9%|▊         | 14/161 [00:01<00:10, 13.82it/s]
Epoch 74:  10%|▉         | 16/161 [00:01<00:10, 13.83it/s]
Epoch 74:  11%|█         | 18/161 [00:01<00:10, 13.70it/s]
Epoch 74:  12%|█▏        | 20/161 [00:01<00:10, 13.77it/s]
Epoch 74:  14%|█▎        | 22/161 [00:01<00:10, 13.10it/s]
Epoch 74:  15%|█▍        | 24/161 [00:01<00:10, 13.18it/s]
Epoch 74:  16%|█▌        | 26/161 [00:01<00:10, 13.40it/s]
Epoch 74:  17%|█▋        | 28/161 [00:02<00:09, 13.57it/s]
Epoch 74:  19%|█▊        | 30/161 [00:02<00:09, 13.66it/s]
Epo

Epoch 74/100 - loss: 2.4252


Training Epochs:  74%|███████▍  | 74/100 [18:47<05:41, 13.14s/it]
Epoch 75:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 75:   1%|          | 1/161 [00:00<00:28,  5.60it/s]
Epoch 75:   2%|▏         | 3/161 [00:00<00:15,  9.90it/s]
Epoch 75:   3%|▎         | 5/161 [00:00<00:13, 11.59it/s]
Epoch 75:   4%|▍         | 7/161 [00:00<00:12, 12.23it/s]
Epoch 75:   6%|▌         | 9/161 [00:00<00:11, 12.80it/s]
Epoch 75:   7%|▋         | 11/161 [00:00<00:11, 13.17it/s]
Epoch 75:   8%|▊         | 13/161 [00:01<00:11, 13.30it/s]
Epoch 75:   9%|▉         | 15/161 [00:01<00:10, 13.37it/s]
Epoch 75:  11%|█         | 17/161 [00:01<00:10, 13.25it/s]
Epoch 75:  12%|█▏        | 19/161 [00:01<00:11, 12.87it/s]
Epoch 75:  13%|█▎        | 21/161 [00:01<00:10, 13.08it/s]
Epoch 75:  14%|█▍        | 23/161 [00:01<00:10, 13.32it/s]
Epoch 75:  16%|█▌        | 25/161 [00:01<00:10, 13.23it/s]
Epoch 75:  17%|█▋        | 27/161 [00:02<00:09, 13.41it/s]
Epoch 75:  18%|█▊        | 29/161 [00:02<00:09, 13.25it/s]
Epoc

Saved best model at epoch 75 with loss 1.1474


                                                                 

Epoch 75/100 - loss: 2.4277


Training Epochs:  75%|███████▌  | 75/100 [19:01<05:29, 13.17s/it]
Epoch 76:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 76:   1%|          | 2/161 [00:00<00:12, 13.09it/s]
Epoch 76:   2%|▏         | 4/161 [00:00<00:11, 13.45it/s]
Epoch 76:   4%|▎         | 6/161 [00:00<00:11, 13.56it/s]
Epoch 76:   5%|▍         | 8/161 [00:00<00:11, 13.47it/s]
Epoch 76:   6%|▌         | 10/161 [00:00<00:11, 13.57it/s]
Epoch 76:   7%|▋         | 12/161 [00:00<00:10, 13.68it/s]
Epoch 76:   9%|▊         | 14/161 [00:01<00:11, 12.70it/s]
Epoch 76:  10%|▉         | 16/161 [00:01<00:12, 11.48it/s]
Epoch 76:  11%|█         | 18/161 [00:01<00:11, 11.99it/s]
Epoch 76:  12%|█▏        | 20/161 [00:01<00:11, 12.44it/s]
Epoch 76:  14%|█▎        | 22/161 [00:01<00:10, 12.68it/s]
Epoch 76:  15%|█▍        | 24/161 [00:01<00:10, 13.07it/s]
Epoch 76:  16%|█▌        | 26/161 [00:02<00:10, 13.28it/s]
Epoch 76:  17%|█▋        | 28/161 [00:02<00:10, 13.27it/s]
Epoch 76:  19%|█▊        | 30/161 [00:02<00:09, 13.33it/s]
Epo

Saved best model at epoch 76 with loss 1.1433


                                                                 

Epoch 76/100 - loss: 2.4227


Training Epochs:  76%|███████▌  | 76/100 [19:14<05:15, 13.17s/it]
Epoch 77:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 77:   1%|          | 2/161 [00:00<00:12, 12.41it/s]
Epoch 77:   2%|▏         | 4/161 [00:00<00:11, 13.13it/s]
Epoch 77:   4%|▎         | 6/161 [00:00<00:11, 13.39it/s]
Epoch 77:   5%|▍         | 8/161 [00:00<00:11, 13.46it/s]
Epoch 77:   6%|▌         | 10/161 [00:00<00:11, 12.90it/s]
Epoch 77:   7%|▋         | 12/161 [00:00<00:11, 12.83it/s]
Epoch 77:   9%|▊         | 14/161 [00:01<00:11, 13.25it/s]
Epoch 77:  10%|▉         | 16/161 [00:01<00:12, 11.70it/s]
Epoch 77:  11%|█         | 18/161 [00:01<00:11, 12.20it/s]
Epoch 77:  12%|█▏        | 20/161 [00:01<00:11, 12.55it/s]
Epoch 77:  14%|█▎        | 22/161 [00:01<00:10, 12.78it/s]
Epoch 77:  15%|█▍        | 24/161 [00:01<00:10, 12.91it/s]
Epoch 77:  16%|█▌        | 26/161 [00:02<00:12, 11.04it/s]
Epoch 77:  17%|█▋        | 28/161 [00:02<00:11, 11.70it/s]
Epoch 77:  19%|█▊        | 30/161 [00:02<00:10, 12.24it/s]
Epo

Epoch 77/100 - loss: 2.4251


Training Epochs:  77%|███████▋  | 77/100 [19:27<05:03, 13.19s/it]
Epoch 78:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 78:   1%|          | 2/161 [00:00<00:11, 13.35it/s]
Epoch 78:   2%|▏         | 4/161 [00:00<00:11, 13.39it/s]
Epoch 78:   4%|▎         | 6/161 [00:00<00:12, 12.46it/s]
Epoch 78:   5%|▍         | 8/161 [00:00<00:12, 12.70it/s]
Epoch 78:   6%|▌         | 10/161 [00:00<00:11, 12.90it/s]
Epoch 78:   7%|▋         | 12/161 [00:00<00:11, 13.19it/s]
Epoch 78:   9%|▊         | 14/161 [00:01<00:11, 13.31it/s]
Epoch 78:  10%|▉         | 16/161 [00:01<00:10, 13.53it/s]
Epoch 78:  11%|█         | 18/161 [00:01<00:10, 13.52it/s]
Epoch 78:  12%|█▏        | 20/161 [00:01<00:10, 13.46it/s]
Epoch 78:  14%|█▎        | 22/161 [00:01<00:10, 13.65it/s]
Epoch 78:  15%|█▍        | 24/161 [00:01<00:09, 13.72it/s]
Epoch 78:  16%|█▌        | 26/161 [00:01<00:09, 13.70it/s]
Epoch 78:  17%|█▋        | 28/161 [00:02<00:09, 13.71it/s]
Epoch 78:  19%|█▊        | 30/161 [00:02<00:09, 13.74it/s]
Epo

Saved best model at epoch 78 with loss 1.1420


                                                                 

Epoch 78/100 - loss: 2.4183


Training Epochs:  78%|███████▊  | 78/100 [19:40<04:48, 13.11s/it]
Epoch 79:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 79:   1%|          | 1/161 [00:00<00:36,  4.37it/s]
Epoch 79:   1%|          | 2/161 [00:00<00:27,  5.85it/s]
Epoch 79:   2%|▏         | 4/161 [00:00<00:17,  9.08it/s]
Epoch 79:   4%|▎         | 6/161 [00:00<00:14, 10.77it/s]
Epoch 79:   5%|▍         | 8/161 [00:00<00:13, 11.70it/s]
Epoch 79:   6%|▌         | 10/161 [00:00<00:12, 12.31it/s]
Epoch 79:   7%|▋         | 12/161 [00:01<00:12, 12.17it/s]
Epoch 79:   9%|▊         | 14/161 [00:01<00:12, 11.58it/s]
Epoch 79:  10%|▉         | 16/161 [00:01<00:12, 12.03it/s]
Epoch 79:  11%|█         | 18/161 [00:01<00:11, 12.54it/s]
Epoch 79:  12%|█▏        | 20/161 [00:01<00:10, 12.95it/s]
Epoch 79:  14%|█▎        | 22/161 [00:01<00:10, 13.09it/s]
Epoch 79:  15%|█▍        | 24/161 [00:02<00:10, 13.30it/s]
Epoch 79:  16%|█▌        | 26/161 [00:02<00:10, 13.31it/s]
Epoch 79:  17%|█▋        | 28/161 [00:02<00:10, 13.25it/s]
Epoc

Epoch 79/100 - loss: 2.4122


Training Epochs:  79%|███████▉  | 79/100 [19:53<04:36, 13.15s/it]
Epoch 80:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 80:   1%|          | 2/161 [00:00<00:12, 12.96it/s]
Epoch 80:   2%|▏         | 4/161 [00:00<00:11, 13.27it/s]
Epoch 80:   4%|▎         | 6/161 [00:00<00:11, 13.40it/s]
Epoch 80:   5%|▍         | 8/161 [00:00<00:11, 13.62it/s]
Epoch 80:   6%|▌         | 10/161 [00:00<00:11, 13.71it/s]
Epoch 80:   7%|▋         | 12/161 [00:00<00:11, 13.30it/s]
Epoch 80:   9%|▊         | 14/161 [00:01<00:11, 13.18it/s]
Epoch 80:  10%|▉         | 16/161 [00:01<00:10, 13.39it/s]
Epoch 80:  11%|█         | 18/161 [00:01<00:10, 13.62it/s]
Epoch 80:  12%|█▏        | 20/161 [00:01<00:12, 11.32it/s]
Epoch 80:  14%|█▎        | 22/161 [00:01<00:11, 12.02it/s]
Epoch 80:  15%|█▍        | 24/161 [00:01<00:10, 12.50it/s]
Epoch 80:  16%|█▌        | 26/161 [00:02<00:10, 12.79it/s]
Epoch 80:  17%|█▋        | 28/161 [00:02<00:10, 12.29it/s]
Epoch 80:  19%|█▊        | 30/161 [00:02<00:10, 12.70it/s]
Epo

Epoch 80/100 - loss: 2.4154


Training Epochs:  80%|████████  | 80/100 [20:06<04:22, 13.14s/it]
Epoch 81:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 81:   1%|          | 2/161 [00:00<00:11, 13.53it/s]
Epoch 81:   2%|▏         | 4/161 [00:00<00:11, 13.60it/s]
Epoch 81:   4%|▎         | 6/161 [00:00<00:11, 13.84it/s]
Epoch 81:   5%|▍         | 8/161 [00:00<00:11, 13.64it/s]
Epoch 81:   6%|▌         | 10/161 [00:00<00:11, 13.67it/s]
Epoch 81:   7%|▋         | 12/161 [00:00<00:11, 13.52it/s]
Epoch 81:   9%|▊         | 14/161 [00:01<00:10, 13.67it/s]
Epoch 81:  10%|▉         | 16/161 [00:01<00:10, 13.55it/s]
Epoch 81:  11%|█         | 18/161 [00:01<00:10, 13.65it/s]
Epoch 81:  12%|█▏        | 20/161 [00:01<00:10, 13.76it/s]
Epoch 81:  14%|█▎        | 22/161 [00:01<00:10, 13.83it/s]
Epoch 81:  15%|█▍        | 24/161 [00:01<00:09, 13.75it/s]
Epoch 81:  16%|█▌        | 26/161 [00:01<00:10, 13.26it/s]
Epoch 81:  17%|█▋        | 28/161 [00:02<00:10, 13.16it/s]
Epoch 81:  19%|█▊        | 30/161 [00:02<00:09, 13.31it/s]
Epo

Saved best model at epoch 81 with loss 1.1401


                                                                 

Epoch 81/100 - loss: 2.4140


Training Epochs:  81%|████████  | 81/100 [20:20<04:10, 13.17s/it]
Epoch 82:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 82:   1%|          | 2/161 [00:00<00:12, 12.48it/s]
Epoch 82:   2%|▏         | 4/161 [00:00<00:11, 13.15it/s]
Epoch 82:   4%|▎         | 6/161 [00:00<00:11, 13.23it/s]
Epoch 82:   5%|▍         | 8/161 [00:00<00:11, 13.51it/s]
Epoch 82:   6%|▌         | 10/161 [00:00<00:11, 13.67it/s]
Epoch 82:   7%|▋         | 12/161 [00:00<00:10, 13.71it/s]
Epoch 82:   9%|▊         | 14/161 [00:01<00:10, 13.63it/s]
Epoch 82:  10%|▉         | 16/161 [00:01<00:12, 11.91it/s]
Epoch 82:  11%|█         | 18/161 [00:01<00:11, 12.46it/s]
Epoch 82:  12%|█▏        | 20/161 [00:01<00:11, 12.24it/s]
Epoch 82:  14%|█▎        | 22/161 [00:01<00:11, 12.47it/s]
Epoch 82:  15%|█▍        | 24/161 [00:01<00:10, 12.84it/s]
Epoch 82:  16%|█▌        | 26/161 [00:02<00:10, 13.15it/s]
Epoch 82:  17%|█▋        | 28/161 [00:02<00:10, 13.18it/s]
Epoch 82:  19%|█▊        | 30/161 [00:02<00:09, 13.44it/s]
Epo

Saved best model at epoch 82 with loss 1.1342


                                                                 

Epoch 82/100 - loss: 2.4121


Training Epochs:  82%|████████▏ | 82/100 [20:33<03:56, 13.13s/it]
Epoch 83:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 83:   1%|          | 1/161 [00:00<00:29,  5.48it/s]
Epoch 83:   2%|▏         | 3/161 [00:00<00:16,  9.60it/s]
Epoch 83:   3%|▎         | 5/161 [00:00<00:13, 11.23it/s]
Epoch 83:   4%|▍         | 7/161 [00:00<00:12, 12.27it/s]
Epoch 83:   6%|▌         | 9/161 [00:00<00:11, 12.81it/s]
Epoch 83:   7%|▋         | 11/161 [00:00<00:11, 13.01it/s]
Epoch 83:   8%|▊         | 13/161 [00:01<00:11, 13.34it/s]
Epoch 83:   9%|▉         | 15/161 [00:01<00:11, 12.61it/s]
Epoch 83:  11%|█         | 17/161 [00:01<00:12, 11.21it/s]
Epoch 83:  12%|█▏        | 19/161 [00:01<00:12, 11.73it/s]
Epoch 83:  13%|█▎        | 21/161 [00:01<00:11, 12.34it/s]
Epoch 83:  14%|█▍        | 23/161 [00:01<00:10, 12.62it/s]
Epoch 83:  16%|█▌        | 25/161 [00:02<00:10, 12.87it/s]
Epoch 83:  17%|█▋        | 27/161 [00:02<00:10, 13.21it/s]
Epoch 83:  18%|█▊        | 29/161 [00:02<00:09, 13.37it/s]
Epoc

Epoch 83/100 - loss: 2.4026


Training Epochs:  83%|████████▎ | 83/100 [20:46<03:44, 13.19s/it]
Epoch 84:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 84:   1%|          | 2/161 [00:00<00:12, 12.94it/s]
Epoch 84:   2%|▏         | 4/161 [00:00<00:11, 13.34it/s]
Epoch 84:   4%|▎         | 6/161 [00:00<00:11, 13.39it/s]
Epoch 84:   5%|▍         | 8/161 [00:00<00:11, 13.48it/s]
Epoch 84:   6%|▌         | 10/161 [00:00<00:11, 13.53it/s]
Epoch 84:   7%|▋         | 12/161 [00:00<00:10, 13.64it/s]
Epoch 84:   9%|▊         | 14/161 [00:01<00:11, 12.91it/s]
Epoch 84:  10%|▉         | 16/161 [00:01<00:11, 13.09it/s]
Epoch 84:  11%|█         | 18/161 [00:01<00:10, 13.44it/s]
Epoch 84:  12%|█▏        | 20/161 [00:01<00:10, 13.49it/s]
Epoch 84:  14%|█▎        | 22/161 [00:01<00:10, 13.48it/s]
Epoch 84:  15%|█▍        | 24/161 [00:01<00:10, 13.45it/s]
Epoch 84:  16%|█▌        | 26/161 [00:01<00:09, 13.64it/s]
Epoch 84:  17%|█▋        | 28/161 [00:02<00:09, 13.47it/s]
Epoch 84:  19%|█▊        | 30/161 [00:02<00:09, 13.43it/s]
Epo

Epoch 84/100 - loss: 2.4025


Training Epochs:  84%|████████▍ | 84/100 [20:59<03:30, 13.13s/it]
Epoch 85:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 85:   1%|          | 2/161 [00:00<00:11, 13.79it/s]
Epoch 85:   2%|▏         | 4/161 [00:00<00:11, 13.85it/s]
Epoch 85:   4%|▎         | 6/161 [00:00<00:11, 13.87it/s]
Epoch 85:   5%|▍         | 8/161 [00:00<00:11, 13.82it/s]
Epoch 85:   6%|▌         | 10/161 [00:00<00:10, 13.80it/s]
Epoch 85:   7%|▋         | 12/161 [00:00<00:11, 13.47it/s]
Epoch 85:   9%|▊         | 14/161 [00:01<00:11, 13.00it/s]
Epoch 85:  10%|▉         | 16/161 [00:01<00:11, 13.13it/s]
Epoch 85:  11%|█         | 18/161 [00:01<00:10, 13.27it/s]
Epoch 85:  12%|█▏        | 20/161 [00:01<00:10, 13.29it/s]
Epoch 85:  14%|█▎        | 22/161 [00:01<00:10, 13.42it/s]
Epoch 85:  15%|█▍        | 24/161 [00:01<00:10, 13.59it/s]
Epoch 85:  16%|█▌        | 26/161 [00:01<00:09, 13.60it/s]
Epoch 85:  17%|█▋        | 28/161 [00:02<00:09, 13.38it/s]
Epoch 85:  19%|█▊        | 30/161 [00:02<00:09, 13.53it/s]
Epo

Epoch 85/100 - loss: 2.3990


Training Epochs:  85%|████████▌ | 85/100 [21:13<03:18, 13.26s/it]
Epoch 86:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 86:   1%|          | 1/161 [00:00<00:22,  7.01it/s]
Epoch 86:   2%|▏         | 3/161 [00:00<00:16,  9.50it/s]
Epoch 86:   3%|▎         | 5/161 [00:00<00:14, 10.71it/s]
Epoch 86:   4%|▍         | 7/161 [00:00<00:13, 11.28it/s]
Epoch 86:   6%|▌         | 9/161 [00:00<00:12, 11.76it/s]
Epoch 86:   7%|▋         | 11/161 [00:00<00:12, 11.97it/s]
Epoch 86:   8%|▊         | 13/161 [00:01<00:12, 12.23it/s]
Epoch 86:   9%|▉         | 15/161 [00:01<00:11, 12.31it/s]
Epoch 86:  11%|█         | 17/161 [00:01<00:11, 12.35it/s]
Epoch 86:  12%|█▏        | 19/161 [00:01<00:11, 12.28it/s]
Epoch 86:  13%|█▎        | 21/161 [00:01<00:11, 12.39it/s]
Epoch 86:  14%|█▍        | 23/161 [00:01<00:11, 12.43it/s]
Epoch 86:  16%|█▌        | 25/161 [00:02<00:10, 12.45it/s]
Epoch 86:  17%|█▋        | 27/161 [00:02<00:10, 12.37it/s]
Epoch 86:  18%|█▊        | 29/161 [00:02<00:10, 12.00it/s]
Epoc

Epoch 86/100 - loss: 2.4012


Training Epochs:  86%|████████▌ | 86/100 [21:27<03:09, 13.57s/it]
Epoch 87:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 87:   1%|          | 2/161 [00:00<00:13, 12.22it/s]
Epoch 87:   2%|▏         | 4/161 [00:00<00:13, 11.79it/s]
Epoch 87:   4%|▎         | 6/161 [00:00<00:13, 11.47it/s]
Epoch 87:   5%|▍         | 8/161 [00:00<00:13, 11.27it/s]
Epoch 87:   6%|▌         | 10/161 [00:00<00:13, 11.45it/s]
Epoch 87:   7%|▋         | 12/161 [00:01<00:12, 11.47it/s]
Epoch 87:   9%|▊         | 14/161 [00:01<00:12, 11.51it/s]
Epoch 87:  10%|▉         | 16/161 [00:01<00:12, 11.59it/s]
Epoch 87:  11%|█         | 18/161 [00:01<00:12, 11.60it/s]
Epoch 87:  12%|█▏        | 20/161 [00:01<00:11, 11.82it/s]
Epoch 87:  14%|█▎        | 22/161 [00:01<00:11, 11.89it/s]
Epoch 87:  15%|█▍        | 24/161 [00:02<00:11, 12.05it/s]
Epoch 87:  16%|█▌        | 26/161 [00:02<00:11, 12.04it/s]
Epoch 87:  17%|█▋        | 28/161 [00:02<00:11, 11.98it/s]
Epoch 87:  19%|█▊        | 30/161 [00:02<00:10, 11.99it/s]
Epo

Saved best model at epoch 87 with loss 1.1316


                                                                 

Epoch 87/100 - loss: 2.4038


Training Epochs:  87%|████████▋ | 87/100 [21:41<02:59, 13.78s/it]
Epoch 88:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 88:   1%|          | 2/161 [00:00<00:11, 13.38it/s]
Epoch 88:   2%|▏         | 4/161 [00:00<00:11, 13.77it/s]
Epoch 88:   4%|▎         | 6/161 [00:00<00:11, 13.78it/s]
Epoch 88:   5%|▍         | 8/161 [00:00<00:11, 13.68it/s]
Epoch 88:   6%|▌         | 10/161 [00:00<00:11, 13.70it/s]
Epoch 88:   7%|▋         | 12/161 [00:00<00:10, 13.67it/s]
Epoch 88:   9%|▊         | 14/161 [00:01<00:10, 13.62it/s]
Epoch 88:  10%|▉         | 16/161 [00:01<00:12, 11.94it/s]
Epoch 88:  11%|█         | 18/161 [00:01<00:11, 12.31it/s]
Epoch 88:  12%|█▏        | 20/161 [00:01<00:11, 12.53it/s]
Epoch 88:  14%|█▎        | 22/161 [00:01<00:10, 12.92it/s]
Epoch 88:  15%|█▍        | 24/161 [00:01<00:11, 12.33it/s]
Epoch 88:  16%|█▌        | 26/161 [00:02<00:10, 12.80it/s]
Epoch 88:  17%|█▋        | 28/161 [00:02<00:10, 13.10it/s]
Epoch 88:  19%|█▊        | 30/161 [00:02<00:09, 13.16it/s]
Epo

Epoch 88/100 - loss: 2.3987


Training Epochs:  88%|████████▊ | 88/100 [21:54<02:43, 13.60s/it]
Epoch 89:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 89:   1%|          | 2/161 [00:00<00:11, 13.38it/s]
Epoch 89:   2%|▏         | 4/161 [00:00<00:11, 13.62it/s]
Epoch 89:   4%|▎         | 6/161 [00:00<00:11, 13.73it/s]
Epoch 89:   5%|▍         | 8/161 [00:00<00:11, 13.62it/s]
Epoch 89:   6%|▌         | 10/161 [00:00<00:11, 13.68it/s]
Epoch 89:   7%|▋         | 12/161 [00:00<00:10, 13.65it/s]
Epoch 89:   9%|▊         | 14/161 [00:01<00:10, 13.69it/s]
Epoch 89:  10%|▉         | 16/161 [00:01<00:10, 13.77it/s]
Epoch 89:  11%|█         | 18/161 [00:01<00:10, 13.83it/s]
Epoch 89:  12%|█▏        | 20/161 [00:01<00:10, 13.46it/s]
Epoch 89:  14%|█▎        | 22/161 [00:01<00:10, 12.95it/s]
Epoch 89:  15%|█▍        | 24/161 [00:01<00:10, 13.24it/s]
Epoch 89:  16%|█▌        | 26/161 [00:01<00:10, 13.44it/s]
Epoch 89:  17%|█▋        | 28/161 [00:02<00:09, 13.51it/s]
Epoch 89:  19%|█▊        | 30/161 [00:02<00:09, 13.55it/s]
Epo

Saved best model at epoch 89 with loss 1.1264


                                                                 

Epoch 89/100 - loss: 2.3951


Training Epochs:  89%|████████▉ | 89/100 [22:07<02:27, 13.45s/it]
Epoch 90:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 90:   1%|          | 2/161 [00:00<00:12, 13.21it/s]
Epoch 90:   2%|▏         | 4/161 [00:00<00:11, 13.50it/s]
Epoch 90:   4%|▎         | 6/161 [00:00<00:11, 13.63it/s]
Epoch 90:   5%|▍         | 8/161 [00:00<00:11, 13.45it/s]
Epoch 90:   6%|▌         | 10/161 [00:00<00:11, 13.55it/s]
Epoch 90:   7%|▋         | 12/161 [00:00<00:10, 13.66it/s]
Epoch 90:   9%|▊         | 14/161 [00:01<00:10, 13.61it/s]
Epoch 90:  10%|▉         | 16/161 [00:01<00:10, 13.67it/s]
Epoch 90:  11%|█         | 18/161 [00:01<00:11, 12.89it/s]
Epoch 90:  12%|█▏        | 20/161 [00:01<00:10, 13.03it/s]
Epoch 90:  14%|█▎        | 22/161 [00:01<00:10, 13.18it/s]
Epoch 90:  15%|█▍        | 24/161 [00:01<00:10, 13.26it/s]
Epoch 90:  16%|█▌        | 26/161 [00:01<00:10, 12.89it/s]
Epoch 90:  17%|█▋        | 28/161 [00:02<00:10, 13.22it/s]
Epoch 90:  19%|█▊        | 30/161 [00:02<00:09, 13.35it/s]
Epo

Saved best model at epoch 90 with loss 1.1257


                                                                 

Epoch 90/100 - loss: 2.3964


Training Epochs:  90%|█████████ | 90/100 [22:21<02:13, 13.35s/it]
Epoch 91:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 91:   1%|          | 1/161 [00:00<00:29,  5.39it/s]
Epoch 91:   2%|▏         | 3/161 [00:00<00:16,  9.78it/s]
Epoch 91:   3%|▎         | 5/161 [00:00<00:13, 11.55it/s]
Epoch 91:   4%|▍         | 7/161 [00:00<00:12, 12.36it/s]
Epoch 91:   6%|▌         | 9/161 [00:00<00:11, 12.88it/s]
Epoch 91:   7%|▋         | 11/161 [00:00<00:11, 13.12it/s]
Epoch 91:   8%|▊         | 13/161 [00:01<00:11, 12.67it/s]
Epoch 91:   9%|▉         | 15/161 [00:01<00:12, 11.33it/s]
Epoch 91:  11%|█         | 17/161 [00:01<00:12, 11.95it/s]
Epoch 91:  12%|█▏        | 19/161 [00:01<00:11, 12.47it/s]
Epoch 91:  13%|█▎        | 21/161 [00:01<00:10, 12.79it/s]
Epoch 91:  14%|█▍        | 23/161 [00:01<00:10, 13.04it/s]
Epoch 91:  16%|█▌        | 25/161 [00:02<00:10, 13.27it/s]
Epoch 91:  17%|█▋        | 27/161 [00:02<00:10, 13.29it/s]
Epoch 91:  18%|█▊        | 29/161 [00:02<00:09, 13.42it/s]
Epoc

Epoch 91/100 - loss: 2.3922


Training Epochs:  91%|█████████ | 91/100 [22:34<01:59, 13.29s/it]
Epoch 92:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 92:   1%|          | 2/161 [00:00<00:11, 13.30it/s]
Epoch 92:   2%|▏         | 4/161 [00:00<00:11, 13.61it/s]
Epoch 92:   4%|▎         | 6/161 [00:00<00:11, 13.67it/s]
Epoch 92:   5%|▍         | 8/161 [00:00<00:11, 13.81it/s]
Epoch 92:   6%|▌         | 10/161 [00:00<00:10, 13.76it/s]
Epoch 92:   7%|▋         | 12/161 [00:00<00:11, 12.89it/s]
Epoch 92:   9%|▊         | 14/161 [00:01<00:11, 13.28it/s]
Epoch 92:  10%|▉         | 16/161 [00:01<00:10, 13.37it/s]
Epoch 92:  11%|█         | 18/161 [00:01<00:10, 13.49it/s]
Epoch 92:  12%|█▏        | 20/161 [00:01<00:10, 13.65it/s]
Epoch 92:  14%|█▎        | 22/161 [00:01<00:10, 13.62it/s]
Epoch 92:  15%|█▍        | 24/161 [00:01<00:10, 13.63it/s]
Epoch 92:  16%|█▌        | 26/161 [00:01<00:09, 13.71it/s]
Epoch 92:  17%|█▋        | 28/161 [00:02<00:09, 13.69it/s]
Epoch 92:  19%|█▊        | 30/161 [00:02<00:09, 13.62it/s]
Epo

Saved best model at epoch 92 with loss 1.1162


                                                                 

Epoch 92/100 - loss: 2.3921


Training Epochs:  92%|█████████▏| 92/100 [22:47<01:45, 13.22s/it]
Epoch 93:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 93:   1%|          | 2/161 [00:00<00:12, 13.14it/s]
Epoch 93:   2%|▏         | 4/161 [00:00<00:11, 13.60it/s]
Epoch 93:   4%|▎         | 6/161 [00:00<00:11, 13.47it/s]
Epoch 93:   5%|▍         | 8/161 [00:00<00:11, 13.63it/s]
Epoch 93:   6%|▌         | 10/161 [00:00<00:11, 12.77it/s]
Epoch 93:   7%|▋         | 12/161 [00:00<00:11, 13.07it/s]
Epoch 93:   9%|▊         | 14/161 [00:01<00:11, 13.16it/s]
Epoch 93:  10%|▉         | 16/161 [00:01<00:12, 11.56it/s]
Epoch 93:  11%|█         | 18/161 [00:01<00:11, 12.11it/s]
Epoch 93:  12%|█▏        | 20/161 [00:01<00:11, 12.62it/s]
Epoch 93:  14%|█▎        | 22/161 [00:01<00:10, 12.81it/s]
Epoch 93:  15%|█▍        | 24/161 [00:01<00:10, 12.93it/s]
Epoch 93:  16%|█▌        | 26/161 [00:02<00:10, 13.08it/s]
Epoch 93:  17%|█▋        | 28/161 [00:02<00:10, 13.22it/s]
Epoch 93:  19%|█▊        | 30/161 [00:02<00:09, 13.32it/s]
Epo

Epoch 93/100 - loss: 2.3911


Training Epochs:  93%|█████████▎| 93/100 [23:00<01:32, 13.17s/it]
Epoch 94:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 94:   1%|          | 1/161 [00:00<00:29,  5.49it/s]
Epoch 94:   2%|▏         | 3/161 [00:00<00:16,  9.83it/s]
Epoch 94:   3%|▎         | 5/161 [00:00<00:14, 10.93it/s]
Epoch 94:   4%|▍         | 7/161 [00:00<00:13, 11.29it/s]
Epoch 94:   6%|▌         | 9/161 [00:00<00:12, 12.12it/s]
Epoch 94:   7%|▋         | 11/161 [00:00<00:12, 12.32it/s]
Epoch 94:   8%|▊         | 13/161 [00:01<00:11, 12.83it/s]
Epoch 94:   9%|▉         | 15/161 [00:01<00:11, 13.16it/s]
Epoch 94:  11%|█         | 17/161 [00:01<00:10, 13.21it/s]
Epoch 94:  12%|█▏        | 19/161 [00:01<00:10, 13.29it/s]
Epoch 94:  13%|█▎        | 21/161 [00:01<00:10, 13.56it/s]
Epoch 94:  14%|█▍        | 23/161 [00:01<00:10, 13.70it/s]
Epoch 94:  16%|█▌        | 25/161 [00:01<00:09, 13.70it/s]
Epoch 94:  17%|█▋        | 27/161 [00:02<00:09, 13.77it/s]
Epoch 94:  18%|█▊        | 29/161 [00:02<00:09, 13.77it/s]
Epoc

Epoch 94/100 - loss: 2.3890


Training Epochs:  94%|█████████▍| 94/100 [23:13<01:19, 13.18s/it]
Epoch 95:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 95:   1%|          | 2/161 [00:00<00:12, 12.29it/s]
Epoch 95:   2%|▏         | 4/161 [00:00<00:13, 11.52it/s]
Epoch 95:   4%|▎         | 6/161 [00:00<00:12, 12.25it/s]
Epoch 95:   5%|▍         | 8/161 [00:00<00:11, 12.80it/s]
Epoch 95:   6%|▌         | 10/161 [00:00<00:11, 13.17it/s]
Epoch 95:   7%|▋         | 12/161 [00:00<00:11, 13.45it/s]
Epoch 95:   9%|▊         | 14/161 [00:01<00:10, 13.47it/s]
Epoch 95:  10%|▉         | 16/161 [00:01<00:10, 13.62it/s]
Epoch 95:  11%|█         | 18/161 [00:01<00:10, 13.58it/s]
Epoch 95:  12%|█▏        | 20/161 [00:01<00:10, 13.63it/s]
Epoch 95:  14%|█▎        | 22/161 [00:01<00:10, 13.55it/s]
Epoch 95:  15%|█▍        | 24/161 [00:01<00:10, 13.62it/s]
Epoch 95:  16%|█▌        | 26/161 [00:01<00:09, 13.74it/s]
Epoch 95:  17%|█▋        | 28/161 [00:02<00:09, 13.79it/s]
Epoch 95:  19%|█▊        | 30/161 [00:02<00:09, 13.75it/s]
Epo

Epoch 95/100 - loss: 2.3843


Training Epochs:  95%|█████████▌| 95/100 [23:26<01:05, 13.11s/it]
Epoch 96:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 96:   1%|          | 2/161 [00:00<00:13, 11.93it/s]
Epoch 96:   2%|▏         | 4/161 [00:00<00:12, 12.15it/s]
Epoch 96:   4%|▎         | 6/161 [00:00<00:12, 12.90it/s]
Epoch 96:   5%|▍         | 8/161 [00:00<00:11, 13.22it/s]
Epoch 96:   6%|▌         | 10/161 [00:00<00:11, 13.33it/s]
Epoch 96:   7%|▋         | 12/161 [00:00<00:11, 13.50it/s]
Epoch 96:   9%|▊         | 14/161 [00:01<00:11, 13.18it/s]
Epoch 96:  10%|▉         | 16/161 [00:01<00:10, 13.18it/s]
Epoch 96:  11%|█         | 18/161 [00:01<00:10, 13.37it/s]
Epoch 96:  12%|█▏        | 20/161 [00:01<00:10, 13.62it/s]
Epoch 96:  14%|█▎        | 22/161 [00:01<00:10, 13.62it/s]
Epoch 96:  15%|█▍        | 24/161 [00:01<00:10, 13.61it/s]
Epoch 96:  16%|█▌        | 26/161 [00:01<00:10, 13.42it/s]
Epoch 96:  17%|█▋        | 28/161 [00:02<00:09, 13.49it/s]
Epoch 96:  19%|█▊        | 30/161 [00:02<00:09, 13.57it/s]
Epo

Epoch 96/100 - loss: 2.3834


Training Epochs:  96%|█████████▌| 96/100 [23:39<00:52, 13.06s/it]
Epoch 97:   0%|          | 0/161 [00:00<?, ?it/s]
Epoch 97:   1%|          | 1/161 [00:00<00:34,  4.68it/s]
Epoch 97:   2%|▏         | 3/161 [00:00<00:18,  8.75it/s]
Epoch 97:   3%|▎         | 5/161 [00:00<00:14, 10.87it/s]
Epoch 97:   4%|▍         | 7/161 [00:00<00:12, 11.98it/s]
Epoch 97:   6%|▌         | 9/161 [00:00<00:12, 12.64it/s]
Epoch 97:   7%|▋         | 11/161 [00:00<00:11, 12.92it/s]
Epoch 97:   8%|▊         | 13/161 [00:01<00:11, 13.22it/s]
Epoch 97:   9%|▉         | 15/161 [00:01<00:10, 13.35it/s]
Epoch 97:  11%|█         | 17/161 [00:01<00:10, 13.44it/s]
Epoch 97:  12%|█▏        | 19/161 [00:01<00:10, 13.59it/s]
Epoch 97:  13%|█▎        | 21/161 [00:01<00:10, 13.52it/s]
Epoch 97:  14%|█▍        | 23/161 [00:01<00:10, 13.47it/s]
Epoch 97:  16%|█▌        | 25/161 [00:01<00:10, 13.41it/s]
Epoch 97:  17%|█▋        | 27/161 [00:02<00:09, 13.45it/s]
Epoch 97:  18%|█▊        | 29/161 [00:02<00:10, 13.15it/s]
Epoc

Early stopping triggered at epoch 97.


Training Epochs:  96%|█████████▌| 96/100 [23:52<00:59, 14.92s/it]


Test loss: 1.1369


0,1
avg_loss,█▇▆▅▄▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
batch_loss,█▅▄▃▄▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
best_epoch,▁▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▅▆▆▆▇▇▇██
test_loss,▁
val_loss,█▅▄▄▄▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
avg_loss,2.38344
batch_loss,2.3819
best_epoch,92
best_model_saved,True
test_loss,1.13685
val_loss,1.1238


### 4

In [14]:
import torch

PROMPT = "Alice from the wonderland"
GEN_LENGTH = 50
TEMPERATURES = (0.1, 0.7, 1.0, 5.0, 10.0, 25.0)

# Load data and model
checkpoint = torch.load(DATA_PATH)
vocab = checkpoint["vocab"]
embedding_matrix = checkpoint["embedding_matrix"]

hidden_dim = 128  # Must match training
model = RNNLanguageModel(embedding_matrix, hidden_dim, vocab)
model.load_state_dict(torch.load(MODEL_PATH))
model.eval()

print("=" * 60)
print(f"Prompt: {PROMPT}\nGenerated length: {GEN_LENGTH}\n")
for t in TEMPERATURES:
    generated_text = model.generate(PROMPT, length=GEN_LENGTH, temperature=t)
    print("=" * 60)
    print(f"Temp={t}:\n")
    print(generated_text, "\n")

FileNotFoundError: [Errno 2] No such file or directory: 'alice_model.pt'

### Sandbox

In [None]:
import torch

data = torch.load("alice_dataset.pt")
vocab = data['vocab']
print(data["inputs"])