# Chapter 4, Exercise 1: Implement your own Learner

> Create your own implmentation of Learner from scratch, based on the training loop shown in this chapter.

As a reminder, the loop is:

- Init
- Predict
- Loss 
- Gradient
- Step
- Stop

Let's start with the boilerplate:

In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.set_printoptions(edgeitems=2)
torch.manual_seed(42) # Life, the Universe, and Everything

<torch._C.Generator at 0x7fc2d6eb8c90>

I'll use the signature from the book; however, for now I'm going to leave out metrics.  I may come back to this later.

Let's create our model.  [Weights are  initialized for us](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear).

In [19]:
my_model = nn.Sequential(
    nn.Linear(in_features=28*28, out_features=30),
    nn.ReLU(),
    nn.Linear(30, 1),
    nn.Sigmoid()
)

Next, the data loader...which I guess means we'll need some data.  We'll use the FastAI 3/7 image set.

In [3]:
from fastai.vision.all import *
path = untar_data(URLs.MNIST_SAMPLE)
path.ls()

(#3) [Path('/home/aardvark/.fastai/data/mnist_sample/labels.csv'),Path('/home/aardvark/.fastai/data/mnist_sample/valid'),Path('/home/aardvark/.fastai/data/mnist_sample/train')]

Let's load those into tensors:

In [4]:
training_3 = torch.stack([tensor(Image.open(o)) for o in (path /'train/3').ls().sorted()]).float() / 255.0
training_7 = torch.stack([tensor(Image.open(o)) for o in (path /'train/7').ls().sorted()]).float() / 255.0
len(training_3), len(training_7)

(6131, 6265)

In [5]:
train_x = torch.cat([training_3, training_7]).view(-1, 28*28)
train_x.shape

torch.Size([12396, 784])

Time for some labels.

In [6]:
train_y = tensor([1] * len(training_3) + [0] * len(training_7))
train_y.shape

torch.Size([12396])

Now time for the loader:

In [7]:
dset = list(zip(train_x, train_y))

In [8]:
valid_3 = torch.stack([tensor(Image.open(o)) for o in (path / 'valid/3').ls().sorted()]).float() / 255.0
valid_7 = torch.stack([tensor(Image.open(o)) for o in (path / 'valid/7').ls().sorted()]).float() / 255.0
valid_x = torch.cat([valid_3, valid_7]).view(-1, 28*28)
valid_y = tensor([1] * len(valid_3) + [0] * len(valid_7))
valid_dset = list(zip(valid_x, valid_y))
valid_x.shape, valid_y.shape

(torch.Size([2038, 784]), torch.Size([2038]))

In [9]:
dl = DataLoader(dset, batch_size=256)
xb, yb = first(dl)
xb.shape, yb.shape

(torch.Size([256, 784]), torch.Size([256]))

In [10]:
valid_dl = DataLoader(valid_dset, batch_size=256)

Next up would be optimizer.  I'm going to use the PyTorch SGD optimizer here:

In [11]:
my_optimizer = optim.SGD

Now it's time to try some training!

In [20]:
class MyLearner():
    
    def __init__(self, dl, model, opt):
        self.dl_train = dl[0]
        self.dl_valid = dl[1]
        self.model = model
        self.opt = opt(self.model.parameters(), lr=0.1)
        
    def mnist_loss(self, preds, targets):
        preds = preds.sigmoid()
        return torch.where(targets==1, 1-preds, preds).mean()
    
    def batch_accuracy(self, xb, yb):
        preds = xb.sigmoid()
        correct = (preds > 0.5) == yb
        return correct.float().mean()
    
    def validate_epoch(self):
        accs = [self.batch_accuracy(self.model(x), y) for x, y in self.dl_valid]
        return round(torch.stack(accs).mean().item(), 4)
    
    def cal_grad(self, x, y):
        preds = self.model(x)
        loss = self.mnist_loss(preds, y)
        loss.backward
    
    def train_epoch(self):
        for x, y in self.dl_train:
            self.cal_grad(x, y)
            self.opt.step()
            self.opt.zero_grad()
            
    def fit(self, epochs):
        for i in range(epochs):
            self.train_epoch()
            print(self.validate_epoch(), end=" ")

And now to put it all together:

In [21]:
my_learner = MyLearner([dl, valid_dl], my_model, my_optimizer)

Now let's try it out!

In [22]:
my_learner.fit(20)

0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 0.4932 

# Status

This approach came from [here](https://forums.fast.ai/t/chapter-4-further-research-building-a-learner-from-scratch/78474), and seems to be the only discussion of this exercise on the forum.  The implementation is a great deal simpler than mine, so I thought it was worth trying.

The poster was asking about one problem they encountered:

> I have used the SGD directly as my optimizer, when I fit the learner without the “self.opt.zero_grad” step in the “train_epoch” method it works fine(getting me a score above 0.96 - 0.97- which it should actually do) but when I run it with the “self.opt.zero_grad” step it kind of sticks at one point getting a value eg:0.4957 for “n” number of epochs.

That is:

- using SGD directly, with the `opt.zero_grad()` line in `train_epoch` commented out, accuracy is ~ 0.91 as expected;
- when uncommenting `opt.zero_grad()`, accuracy gets stuck

However, I'm noticing that no matter what I set those to.  Not sure what's going wrong.

I did note what looks like double sigmoid: it's in the last layer of the model and in the class methods.  Some brief experimentation with that did not appear to change things.