<a href="https://colab.research.google.com/github/Daniel-Perry-J/Daniel-Perry-J/blob/main/intro_to_machine_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What the *heck* even is machine learning ❓❗
Well that's a really great, but big question! Machine learning can come in all different shapes and sizes, but some categories are:

- <u>Supervised</u>: This is where you have some data paired up with "labels" or "true" values. This just means that you have data pairs of something like height matched up with a persons favourite colour. (Ex: {61 inches, blue})

 - <u>Regression</u>: This is where the thing that you are predicting is a series or singular scalar value. So based on the previous example our prompt to the model would be, given a person's favourite colour, what is their height.

 - <u>Classification</u>: This is where the thing that you are predicting is a series or singular categorical value. So based on the previous example our prompt to the model would be, given a person's height, what is their favourite colour.

- <u>Unsupervised</u>: This is usually where algorithms like clustering come into play where you are trying to learn something about your data.

- <u>Reinforcement</u>: You have probably seen those youtube videos where "Teach AI How to Play MarioKart", this is that! These algorithms are a little out of scope for this pseudo-lecture, but if you want to chat offline about this, these algorithms are actually the focus of a lot of my work!
<p align="center">
  <img src="https://www.researchgate.net/publication/354960266/figure/fig1/AS:1075175843983363@1633353305883/The-main-types-of-machine-learning-Main-approaches-include-classification-and.png" width=700 height=auto/>
</p>


# Neural Networks 🔥

## Introduction 💡

Neural networks have been the 🌟*star child*🌟 of machine learning as of late, and for good reason! Neural networks are what as known as "universal function approximators", this means that in the universe of any functions, a neural network can approximate the relation between input and output. This includes functions that you could never imagine, like the percent of cat a dog is! So neural networks will the focus of this pseudo lecture and tutorial. This is going to be more of an "engineering" styled approach to machine learning that will hopefully give you the tools to start making models from data today!


## Basic Mathematics 🤔💭📊🧮📝

The basic mathematics of neural networks is actually really simple! Do you remember that very simple equation for a linear function: $y=mx+b$? Well great news! Neural networks are just that but like a bunch of times! If you have taken Math 204 here at Western then you are probably familiar with the idea matrices and matrix multiplication, this is the way that we write these linear equations for neural networks.

One important note here though is that the composition of linear functions will result in another linear function: $m_2(m_1 x + b_1)+b_2 = m_3 x + b_3$. So to approximate any functions of any real complexity, we will need to introduce some nonlinear funcion that we will denote $\phi$ smack dab in the middle of the function! Our new function looks like this: $m_2(\phi(m_1 x + b_1))+b_2 \neq m_3 x + b_3$, not so bad right, this is literally the fundemental of all neural networks.

<p align="center">
  <img src="https://www.knime.com/sites/default/files/3-intro-deep-neural-networks.png" width=500, height=auto/>
</p>

## Code 🤖💻

Alright all of this is fine and dandy, but how the _heck_ do I write code for this stuff??
Lucky for us there are a plethora of packages and frameworks in all different languages that make this stuff easy for us.
For this we are going to stick with python and use the PyTorch library, there are a lot of other frameworks that are similar like TensorFlow and PyTorchLightning, but to keep things simple I am going to only use one library for demonstration as most of the things learned here are easily transferrable to other frameworks.

## What are We Cookin'?? 🥓🍳🥪

For this tutorial we are going to be training a handwriting classifier, we will take in an image of someone's hand drawn number, and then our model will predict what the number that they wrote was!

<p align="center">
  <img src="https://production-media.paperswithcode.com/datasets/MNIST-0000000001-2e09631a_09liOmx.jpg" width=500, height=auto/>
</p>

<p align="center">
  <b>Lets get to coding!</b>
</p>

In [None]:
import torch
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST # Lucky for us someone already built this!
import torchvision.transforms as transforms
import torch.nn as nn
import torch.functional as f

train_ds = MNIST(
      root='/content/MNIST/',
      train=True,
      download=True,
      transform=transforms.ToTensor()
    )

dev_ds = MNIST(
      root='/content/MNIST/',
      train=False,
      download=True,
      transform=transforms.ToTensor()
    )

# index of labels
print(f"{dev_ds[0][0].shape=}")
print(f"{dev_ds[0][1]=}")

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /content/MNIST/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 158912158.32it/s]

Extracting /content/MNIST/MNIST/raw/train-images-idx3-ubyte.gz to /content/MNIST/MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /content/MNIST/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 49322350.91it/s]


Extracting /content/MNIST/MNIST/raw/train-labels-idx1-ubyte.gz to /content/MNIST/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /content/MNIST/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 36149804.49it/s]

Extracting /content/MNIST/MNIST/raw/t10k-images-idx3-ubyte.gz to /content/MNIST/MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /content/MNIST/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 17590515.94it/s]

Extracting /content/MNIST/MNIST/raw/t10k-labels-idx1-ubyte.gz to /content/MNIST/MNIST/raw

dev_ds[0][0].shape=torch.Size([1, 28, 28])
dev_ds[0][1]=7





In [None]:
activations = {
    'ReLU': nn.ReLU(),
    'ReLU6': nn.ReLU6(),
    'LeakyReLU': nn.LeakyReLU(),
    'Sigmoid': nn.Sigmoid(),
    'Tanh': nn.Tanh(),
    'ELU': nn.ELU(),
    'PReLU': nn.PReLU(),
    'SELU': nn.SELU()
}

import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self, in_size, out_size, n_layers=3, hidden_size=128, act="ReLU"):
        super(NeuralNetwork, self).__init__()
        self.in_size = in_size
        self.input_layer = nn.Linear(in_size, hidden_size)
        self.hidden_layers = nn.ModuleList([nn.Linear(hidden_size, hidden_size) for _ in range(n_layers)])
        self.output_layer = nn.Linear(hidden_size, out_size)
        self.activation = activations[act]

    def forward(self, x):
        x = x.reshape(-1, self.in_size)
        x = self.input_layer(x)
        x = self.activation(x)
        for layer in self.hidden_layers:
            x = layer(x)
            x = self.activation(x)
        x = self.output_layer(x)
        return x

In [None]:
from tqdm import tqdm
import torch
import torch.nn as nn

epochs = 25 # number of iterations for learning
lr = 0.001  # Correct learning rate used
activation = "ReLU"
hidden_size = 128
n_layers = 3
batch_size = 64

model = NeuralNetwork(
    in_size=784,              # This is 28*28 becuase are images are 28x28
    out_size=10,              # This is 10, because we have 10 different classes (0-9)
    n_layers=n_layers,
    hidden_size=hidden_size,
    act=activation,
)

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
dev_loader = DataLoader(dev_ds, batch_size=batch_size, shuffle=False)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.9)  # Use lr variable

for i in range(epochs):
    running_loss = 0
    for image, label in tqdm(train_loader, desc="Training"):
        optimizer.zero_grad()
        prediction = model(image)
        loss = criterion(prediction, label)
        running_loss += loss.item()
        loss.backward()
        optimizer.step()
    print(f"Train loss at epoch {i}: {running_loss / len(train_loader)}")

    running_loss = 0
    with torch.no_grad():
        for image, label in tqdm(dev_loader, desc="Evaluating"):
            prediction = model(image)
            loss = criterion(prediction, label)
            running_loss += loss.item()
    print(f"Dev loss at epoch {i}: {running_loss / len(dev_loader)}")


Training: 100%|██████████| 938/938 [00:09<00:00, 98.06it/s]


Train loss at epoch 0: 2.2942448511306663


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 98.89it/s]


Dev loss at epoch 0: 2.276914879015297


Training: 100%|██████████| 938/938 [00:10<00:00, 91.86it/s]


Train loss at epoch 1: 2.056138999172365


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 124.57it/s]


Dev loss at epoch 1: 1.4741058273679892


Training: 100%|██████████| 938/938 [00:10<00:00, 87.66it/s]


Train loss at epoch 2: 0.9578287062614457


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 125.90it/s]


Dev loss at epoch 2: 0.6238277495666674


Training: 100%|██████████| 938/938 [00:10<00:00, 86.78it/s]


Train loss at epoch 3: 0.5088676068066026


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 123.80it/s]


Dev loss at epoch 3: 0.41429149385565406


Training: 100%|██████████| 938/938 [00:10<00:00, 87.44it/s]


Train loss at epoch 4: 0.3840575257915932


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 124.97it/s]


Dev loss at epoch 4: 0.342394120350575


Training: 100%|██████████| 938/938 [00:11<00:00, 82.82it/s]


Train loss at epoch 5: 0.31698101439646315


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 121.01it/s]


Dev loss at epoch 5: 0.28641897288097695


Training: 100%|██████████| 938/938 [00:10<00:00, 93.09it/s]


Train loss at epoch 6: 0.26768564959483615


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 85.72it/s]


Dev loss at epoch 6: 0.24703061032541998


Training: 100%|██████████| 938/938 [00:10<00:00, 87.73it/s]


Train loss at epoch 7: 0.2307183224556924


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 89.53it/s]


Dev loss at epoch 7: 0.20889245096451728


Training: 100%|██████████| 938/938 [00:10<00:00, 91.67it/s]


Train loss at epoch 8: 0.19987013329590944


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 125.24it/s]


Dev loss at epoch 8: 0.19642961562676414


Training: 100%|██████████| 938/938 [00:10<00:00, 86.86it/s] 


Train loss at epoch 9: 0.17767383936625808


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 122.67it/s]


Dev loss at epoch 9: 0.1704745136088674


Training: 100%|██████████| 938/938 [00:10<00:00, 86.95it/s]


Train loss at epoch 10: 0.1596874135540429


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 122.51it/s]


Dev loss at epoch 10: 0.1719545063764379


Training: 100%|██████████| 938/938 [00:10<00:00, 87.07it/s]


Train loss at epoch 11: 0.14479894192813875


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 117.19it/s]


Dev loss at epoch 11: 0.1487658514142938


Training: 100%|██████████| 938/938 [00:10<00:00, 87.04it/s]


Train loss at epoch 12: 0.13232479218258533


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 124.72it/s]


Dev loss at epoch 12: 0.14483269356386913


Training: 100%|██████████| 938/938 [00:10<00:00, 91.46it/s]


Train loss at epoch 13: 0.12177592862460977


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 88.76it/s]


Dev loss at epoch 13: 0.12948170529120856


Training: 100%|██████████| 938/938 [00:09<00:00, 97.72it/s]


Train loss at epoch 14: 0.11228434737724091


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 87.99it/s]


Dev loss at epoch 14: 0.12633488765787215


Training: 100%|██████████| 938/938 [00:10<00:00, 89.92it/s]


Train loss at epoch 15: 0.10521238947958389


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 121.88it/s]


Dev loss at epoch 15: 0.1232182874594916


Training: 100%|██████████| 938/938 [00:10<00:00, 86.38it/s] 


Train loss at epoch 16: 0.0971516982735649


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 122.81it/s]


Dev loss at epoch 16: 0.11793287667137375


Training: 100%|██████████| 938/938 [00:10<00:00, 86.78it/s]


Train loss at epoch 17: 0.09133957540179526


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 122.79it/s]


Dev loss at epoch 17: 0.11003833351943013


Training: 100%|██████████| 938/938 [00:10<00:00, 86.82it/s]


Train loss at epoch 18: 0.08519997828598343


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 124.59it/s]


Dev loss at epoch 18: 0.1142650038325732


Training: 100%|██████████| 938/938 [00:10<00:00, 86.80it/s]


Train loss at epoch 19: 0.07966120137638454


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 122.76it/s]


Dev loss at epoch 19: 0.10833332276603527


Training: 100%|██████████| 938/938 [00:10<00:00, 91.41it/s]


Train loss at epoch 20: 0.07436916283689249


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 87.45it/s]


Dev loss at epoch 20: 0.1043359097979435


Training: 100%|██████████| 938/938 [00:09<00:00, 98.05it/s]


Train loss at epoch 21: 0.07038093594013095


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 84.52it/s]


Dev loss at epoch 21: 0.11081973040457162


Training: 100%|██████████| 938/938 [00:10<00:00, 90.66it/s]


Train loss at epoch 22: 0.06598933570921969


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 124.71it/s]


Dev loss at epoch 22: 0.10295322468240337


Training: 100%|██████████| 938/938 [00:10<00:00, 86.69it/s]


Train loss at epoch 23: 0.06246972111051779


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 122.28it/s]


Dev loss at epoch 23: 0.1053439792055432


Training: 100%|██████████| 938/938 [00:10<00:00, 86.28it/s]


Train loss at epoch 24: 0.05904187341687331


Evaluating: 100%|██████████| 157/157 [00:01<00:00, 123.19it/s]

Dev loss at epoch 24: 0.10047767925796182





# This is great but what now??

Thats a great question, once you have a fully trained machine learning model, it's time to deploy!
These models need the data to have "matching shape" which basically just means that if it was trained on images that were 28x28 pixels, then it can only generalize to those images.
Another thing that these models cannot do is generalize to data types that are "out of distribution", which basically just means that if we train it on images of numbers, it wont generalize to smiley faces or letters (but I emplore you to try in the next section).
<p align="center">
  <img src="https://files.oaiusercontent.com/file-OlScYgSdL0bgKvRKUnXKOGqF?se=2023-12-02T00%3A17%3A16Z&sp=r&sv=2021-08-06&sr=b&rscc=max-age%3D31536000%2C%20immutable&rscd=attachment%3B%20filename%3D99a3fb82-b725-4e41-ba21-4a3e7c938093.webp&sig=OdCIDd04xyXzuYJupD0dYm3tFoUnAIOt0zQMYvzKo9Q%3D" width=300, height=auto/>
</p>


In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from PIL import Image, ImageOps
import numpy as np
import torch
import torch.nn.functional as F
from torchvision import transforms
import io

canvas_html = """
<canvas width="280px" height="280px" style="border:1px solid black;"></canvas>
<button id="save_button">Save</button>
<button id="clear_button">Clear</button>
<script>
var canvas = document.querySelector('canvas');
var ctx = canvas.getContext('2d');
var save_button = document.getElementById('save_button');
var clear_button = document.getElementById('clear_button');

var mouse = {x: 0, y: 0}
var last_mouse = {x: 0, y: 0}

/* Mouse Capturing Work */
canvas.addEventListener('mousemove', function(e) {
    last_mouse.x = mouse.x;
    last_mouse.y = mouse.y;

    mouse.x = e.pageX - this.offsetLeft;
    mouse.y = e.pageY - this.offsetTop;
}, false);

/* Drawing on Paint App */
ctx.lineWidth = 10;
ctx.lineJoin = 'round';
ctx.lineCap = 'round';
ctx.strokeStyle = 'black';

canvas.addEventListener('mousedown', function(e) {
    canvas.addEventListener('mousemove', onPaint, false);
}, false);

canvas.addEventListener('mouseup', function() {
    canvas.removeEventListener('mousemove', onPaint, false);
}, false);

var onPaint = function() {
    ctx.beginPath();
    ctx.moveTo(last_mouse.x, last_mouse.y);
    ctx.lineTo(mouse.x, mouse.y);
    ctx.closePath();
    ctx.stroke();
};

save_button.addEventListener('click', function() {
    var image_data = canvas.toDataURL('image/png').replace("image/png", "image/octet-stream");
    var link = document.createElement('a');
    link.download = 'drawing.jpg';
    link.href = image_data;
    link.click();
});

clear_button.addEventListener('click', function() {
    ctx.clearRect(0, 0, canvas.width, canvas.height);
});

</script>
"""
drawing_pad = widgets.HTML(value=canvas_html)
display(drawing_pad)

button = widgets.Button(description="Predict")
output = widgets.Output()

# Create a file upload widget
uploader = widgets.FileUpload(
    accept='.jpg',  # Accept only .png files
    multiple=False  # Allow only single file upload
)
predict_button = widgets.Button(description="Predict")

def on_predict_button_clicked(b):
    if uploader.value:
        uploaded_file = next(iter(uploader.value.values()))
        image_data = io.BytesIO(uploaded_file['content'])
        image = Image.open(image_data)

        image_resized = transforms.Resize((28, 28))(image)
        image_tensor = transforms.ToTensor()(image_resized)[-1]


        # Make a prediction
        model.eval()
        with torch.no_grad():
            prediction = model(image_tensor.unsqueeze(0))
            predicted_label = prediction.argmax(dim=1, keepdim=True)

        # Display the prediction
        with output:
            clear_output(wait=True)
            plt.clf()
            plt.imshow(image_tensor, cmap='gray')
            plt.title("What the model sees")
            plt.xticks([])
            plt.yticks([])
            plt.xlabel(f'Models Prediction: {predicted_label.item()}')
            plt.show()
            print("Predicted Digit:", predicted_label.item())

predict_button.on_click(on_predict_button_clicked)
display(uploader, predict_button, output)

HTML(value='\n<canvas width="280px" height="280px" style="border:1px solid black;"></canvas>\n<button id="save…

FileUpload(value={}, accept='.jpg', description='Upload')

Button(description='Predict', style=ButtonStyle())

Output()

# This is great, but I want to learn to approximate more functions

That's so great to hear! You have just some of the fundementals of machine learning.
The main thing that we would need to change is the `dataset_class`, which is just the thing that we stole from some one else at the begining.
I stole this from someone else so as to not overload people with arbitrary information that they may not have needed / wanted.
These dataset classes are basically just big "array-likes", which just means that I can index into them like a list!
So if I want to get the first item out of a dataset, I can do something like `input, label = ds[0]`.

## The Main Functions

- `__init__(self, **kwargs)`: This is the class function that sets everything up for us.
This could contain a lot of different things but it generally entails loading our custom data into memory, and saving some of our preferences for viewing this data.

- `__getitem__(self, index)`: This is just the function that our dataloader calls behind the scenes. So we just need to set up a way to grab each unique element of our custom data.
If our data is tabular, then we our first element will be the first row, and the last element will be the last row of the table.

- `__len__(self,)`: This function just tells us how many items are in our dataset, which allows us to do `len(ds)`. This is important because if we don't know how many elements are in our arraylike how do we know when to stop trying to get new datapoints?

___

Cool! Let's write a basic dataset that is tabular!

In [None]:
import pandas as pd

class CaliforniaHousingDataSet(torch.utils.data.Dataset):
    def __init__(self, split: str, means=None, stds=None):
        self.data = pd.read_csv(f"/content/sample_data/california_housing_{split}.csv")

        if means is None and stds is None:
            self.means = self.data.mean()
            self.stds = self.data.std()
        else:
            self.means = means
            self.stds = stds

        self.data = (self.data - self.means) / self.stds

    def __getitem__(self, idx):
        row = self.data.iloc[idx]
        x = torch.stack(
            [
                torch.tensor(row["housing_median_age"], dtype=torch.float32),
                torch.tensor(row["total_rooms"], dtype=torch.float32),
                torch.tensor(row["total_bedrooms"], dtype=torch.float32),
                torch.tensor(row["population"], dtype=torch.float32),
                torch.tensor(row["households"], dtype=torch.float32),
                torch.tensor(row["median_income"], dtype=torch.float32),
            ]
        )
        y = torch.tensor([row["median_house_value"]], dtype=torch.float32)
        return x, y

    def __len__(self,):
        return len(self.data)

    def get_stats(self,):
        return {"means": self.means, "stds": self.stds}

train_ds = CaliforniaHousingDataSet(split="train")
dev_ds = CaliforniaHousingDataSet(split="test", **train_ds.get_stats())
print(dev_ds[0])


(tensor([-0.1263,  0.5694,  0.2885,  0.0936,  0.2725,  1.4280]), tensor([1.1846]))


In [None]:
from tqdm import tqdm
import torch
import torch.nn as nn

epochs = 25 # how many iterations of learning
lr = 0.001  # Correct learning rate used
activation = "ReLU"
hidden_size = 128
n_layers = 3
batch_size = 64

model = NeuralNetwork(
    in_size=6,               # This is 6 because there are 6 input features
    out_size=1,              # we have one feature that we are predicting
    n_layers=n_layers,
    hidden_size=hidden_size,
    act=activation,
)

train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
dev_loader = DataLoader(dev_ds, batch_size=batch_size, shuffle=False)


criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=0.9)  # Use lr variable
for i in range(epochs):
    running_loss = 0
    for features, label in tqdm(train_loader, desc="Training"):
        optimizer.zero_grad()
        prediction = model(features)
        loss = criterion(prediction, label)
        running_loss += loss.item()
        loss.backward()
        optimizer.step()
    print(f"Train loss at epoch {i}: {running_loss / len(train_loader)}")

    running_loss = 0
    with torch.no_grad():
        for features, label in tqdm(dev_loader, desc="Evaluating"):
            prediction = model(features)
            loss = criterion(prediction, label)
            running_loss += loss.item()
    print(f"Dev loss at epoch {i}: {running_loss / len(dev_loader)}")

Training: 100%|██████████| 266/266 [00:03<00:00, 69.56it/s]


Train loss at epoch 0: 0.8747154740910781


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 118.72it/s]


Dev loss at epoch 0: 0.5306431828661168


Training: 100%|██████████| 266/266 [00:02<00:00, 99.40it/s] 


Train loss at epoch 1: 0.4345493591705659


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 117.28it/s]


Dev loss at epoch 1: 0.40021150257993254


Training: 100%|██████████| 266/266 [00:02<00:00, 97.61it/s]


Train loss at epoch 2: 0.39166920973842306


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 118.08it/s]


Dev loss at epoch 2: 0.3783680910759784


Training: 100%|██████████| 266/266 [00:02<00:00, 94.65it/s]


Train loss at epoch 3: 0.3711742468010214


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 66.01it/s]


Dev loss at epoch 3: 0.36138272887848794


Training: 100%|██████████| 266/266 [00:03<00:00, 68.57it/s]


Train loss at epoch 4: 0.3548066226723499


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 106.73it/s]


Dev loss at epoch 4: 0.3486702283646198


Training: 100%|██████████| 266/266 [00:03<00:00, 79.04it/s]


Train loss at epoch 5: 0.34186562044279917


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 113.34it/s]


Dev loss at epoch 5: 0.3399134401945358


Training: 100%|██████████| 266/266 [00:02<00:00, 94.84it/s]


Train loss at epoch 6: 0.3349866613857728


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 111.28it/s]


Dev loss at epoch 6: 0.33200565202439086


Training: 100%|██████████| 266/266 [00:03<00:00, 82.08it/s]


Train loss at epoch 7: 0.32993333937978386


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 77.23it/s]


Dev loss at epoch 7: 0.328103908515991


Training: 100%|██████████| 266/266 [00:03<00:00, 76.00it/s]


Train loss at epoch 8: 0.32489247272785443


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 110.87it/s]


Dev loss at epoch 8: 0.3267031578307456


Training: 100%|██████████| 266/266 [00:02<00:00, 97.16it/s]


Train loss at epoch 9: 0.32230657848872635


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 116.78it/s]


Dev loss at epoch 9: 0.324058398287347


Training: 100%|██████████| 266/266 [00:02<00:00, 95.46it/s]


Train loss at epoch 10: 0.3204279836911456


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 108.96it/s]


Dev loss at epoch 10: 0.32493494292523


Training: 100%|██████████| 266/266 [00:03<00:00, 79.57it/s]


Train loss at epoch 11: 0.3192096860673195


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 73.77it/s]


Dev loss at epoch 11: 0.32103194613406


Training: 100%|██████████| 266/266 [00:03<00:00, 77.89it/s]


Train loss at epoch 12: 0.3172427890332122


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 111.74it/s]


Dev loss at epoch 12: 0.31901441879094916


Training: 100%|██████████| 266/266 [00:02<00:00, 94.52it/s]


Train loss at epoch 13: 0.31524862466674103


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 113.24it/s]


Dev loss at epoch 13: 0.31827542217487986


Training: 100%|██████████| 266/266 [00:02<00:00, 95.67it/s]


Train loss at epoch 14: 0.3122770813174714


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 113.17it/s]


Dev loss at epoch 14: 0.3158277598150233


Training: 100%|██████████| 266/266 [00:03<00:00, 75.80it/s]


Train loss at epoch 15: 0.311990766821051


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 71.49it/s]


Dev loss at epoch 15: 0.31962089684415373


Training: 100%|██████████| 266/266 [00:03<00:00, 80.80it/s]


Train loss at epoch 16: 0.30940981164462583


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 108.04it/s]


Dev loss at epoch 16: 0.31868485662531343


Training: 100%|██████████| 266/266 [00:02<00:00, 97.59it/s]


Train loss at epoch 17: 0.31028182104341967


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 117.09it/s]


Dev loss at epoch 17: 0.3184686417275287


Training: 100%|██████████| 266/266 [00:02<00:00, 94.57it/s]


Train loss at epoch 18: 0.3091461539436552


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 115.59it/s]


Dev loss at epoch 18: 0.3309156923217976


Training: 100%|██████████| 266/266 [00:03<00:00, 76.95it/s]


Train loss at epoch 19: 0.30942546458620773


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 74.38it/s]


Dev loss at epoch 19: 0.3130246939811301


Training: 100%|██████████| 266/266 [00:03<00:00, 82.70it/s]


Train loss at epoch 20: 0.30746845001901


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 109.40it/s]


Dev loss at epoch 20: 0.3133827788715667


Training: 100%|██████████| 266/266 [00:02<00:00, 96.04it/s]


Train loss at epoch 21: 0.30560012671508285


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 110.32it/s]


Dev loss at epoch 21: 0.31730280578770537


Training: 100%|██████████| 266/266 [00:02<00:00, 95.85it/s]


Train loss at epoch 22: 0.3055685411830594


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 117.02it/s]


Dev loss at epoch 22: 0.3136649323587722


Training: 100%|██████████| 266/266 [00:03<00:00, 74.90it/s]


Train loss at epoch 23: 0.3062130804908903


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 70.70it/s]


Dev loss at epoch 23: 0.310473843774897


Training: 100%|██████████| 266/266 [00:03<00:00, 82.36it/s]


Train loss at epoch 24: 0.30399475493153233


Evaluating: 100%|██████████| 47/47 [00:00<00:00, 115.39it/s]

Dev loss at epoch 24: 0.31875840954958123



