# MNIST Model From "Scratch" -- Attempt 4

Goal: To reproduce SGD "from scratch", starting from the point in the book where the data has been loaded.

This means creating the following:
- parameter initialization
- linear net function
- loss function
- metric function (accuracy)
- step function
- function to train a single epoch

I'm doing it this way, because it will be a more efficient way to learn this piece of things. I've gotten caught up doing the data ingestion piece before.

### Copied Piece 
Mostly my own way, but also checking shapes with the reference along the way.

In [1]:
from fastai.data.all import untar_data, URLs
from pathlib import Path
from PIL import Image
import torch
from numpy import *

path = untar_data(URLs.MNIST_SAMPLE)
Path.BASE_PATH = path
threes = (path/'train'/'3').ls().sorted()
sevens = (path/'train'/'7').ls().sorted()
seven_tensors = [torch.as_tensor(array(Image.open(o))) for o in sevens]
three_tensors = [torch.as_tensor(array(Image.open(o))) for o in threes]

In [2]:
stacked_sevens = torch.stack(seven_tensors).float() / 255
stacked_threes = torch.stack(three_tensors).float() / 255
stacked_sevens.shape, stacked_threes.shape

(torch.Size([6265, 28, 28]), torch.Size([6131, 28, 28]))

In [3]:
train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28)
train_x.shape

torch.Size([12396, 784])

In [4]:
# This might be where I messed up last time...

train_y = torch.as_tensor(
    array([1]*len(threes) + [0]*len(sevens))
).unsqueeze(1)
train_x.shape,train_y.shape

(torch.Size([12396, 784]), torch.Size([12396, 1]))

In [5]:
dset = list(zip(train_x,train_y))
x,y = dset[0]
x.shape,y

(torch.Size([784]), tensor([1]))

In [6]:
valid_3_tens = torch.stack([torch.as_tensor(array(Image.open(o))) 
                            for o in (path/'valid'/'3').ls()])
valid_3_tens = valid_3_tens.float()/255
valid_7_tens = torch.stack([torch.as_tensor(array(Image.open(o))) 
                            for o in (path/'valid'/'7').ls()])
valid_7_tens = valid_7_tens.float()/255
valid_3_tens.shape,valid_7_tens.shape

(torch.Size([1010, 28, 28]), torch.Size([1028, 28, 28]))

In [7]:
valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)
valid_y = torch.as_tensor(array([1]*len(valid_3_tens) + [0]*len(valid_7_tens))).unsqueeze(1)
valid_dset = list(zip(valid_x,valid_y))

### Starting Without Reference Here

In [8]:
def param_init(shape): return torch.randn(shape).requires_grad_()

In [144]:
weights = param_init((28*28, 1))
bias = param_init(1)
weights.shape, bias.shape

(torch.Size([784, 1]), torch.Size([1]))

In [146]:
weights[0:5], bias

(tensor([[-0.3417],
         [-0.5647],
         [ 0.2852],
         [ 0.7432],
         [ 0.6148]], grad_fn=<SliceBackward0>),
 tensor([0.0722], requires_grad=True))

In [10]:
def mnist_loss(preds, tars): return torch.where(tars == 1, 1-preds, preds).mean()

In [11]:
def accuracy(preds, tars): return ((preds > 0.5) == tars).float().mean()

In [129]:
test_preds = torch.as_tensor([0.4, 0.7, 0.1])
test_tars = torch.as_tensor([0, 0, 1])
test_loss = mnist_loss(test_preds, test_tars)
test_acc = accuracy(test_preds, test_tars)
test_loss.item(), test_acc.item()

(0.6666666865348816, 0.3333333432674408)

In [92]:
def step(lr=1):
    for p in (weights, bias):
        p.data -= p.grad * lr
        p.grad = None

In [31]:
def linNet(xb): return xb@weights + bias

In [32]:
from fastai.data.load import DataLoader

dset = DataLoader(dset, bs=256)
dset.one_batch()[0].shape, dset.one_batch()[1].shape

(torch.Size([256, 784]), torch.Size([256, 1]))

In [100]:
def one_epoch():
    for xb, yb in dset:
        preds = linNet(xb).sigmoid_()
        loss = mnist_loss(preds, yb)
        # print(f"preds: {preds[10:12]}, yb: {yb[10:12]}, loss: {loss}")
        loss.backward()
        step()

In [134]:
def get_accuracy():
    with torch.no_grad():
        acc = torch.as_tensor([accuracy(linNet(xb).sigmoid_(), yb) for xb, yb in valid_dset]).mean()
    return acc

In [135]:
get_accuracy()

tensor(0.9637)

In [136]:
one_epoch()
get_accuracy()

tensor(0.9642)

In [137]:
def run_n_epochs(n):
    for i in range(n):
        one_epoch()
        print(f"acc: {round(get_accuracy().item(), 4)}, loss: {round(calc_loss().item(), 4)}")

In [143]:
def calc_loss():
    with torch.no_grad():
        return torch.as_tensor([mnist_loss(linNet(xb).sigmoid_(), yb) for xb, yb in dset]).mean()

In [145]:
run_n_epochs(10)

acc: 0.55, loss: 0.4154
acc: 0.7738, loss: 0.2103
acc: 0.8945, loss: 0.1075
acc: 0.9264, loss: 0.0759
acc: 0.9401, loss: 0.0614
acc: 0.947, loss: 0.0526
acc: 0.9524, loss: 0.0467
acc: 0.9588, loss: 0.0425
acc: 0.9617, loss: 0.0393
acc: 0.9652, loss: 0.0369


In [133]:
# Figuring out why the loss is negative...

pred = linNet(valid_dset[0][0])
tar = valid_dset[0][1]
pred, tar, mnist_loss(pred, tar)

(tensor([14.8719], grad_fn=<AddBackward0>),
 tensor([1]),
 tensor(-13.8719, grad_fn=<MeanBackward0>))

### Done!

The calc_loss function was returning a negative, because I was only getting the sigmoid during the actual training, not during the loss or accuracy checks. I think that's the only major thing I missed here.

Good session!

Next time: Working out the last bit -- using a Learner and Pytorch built-in nets.