# That TensorBoard playground

Who will win?

https://playground.tensorflow.org/

The simple dataset in the bottom left, and **no hidden layers** to your network.

# Review

* What does a linear regression look like (function)? $f(x) = X\vec{w} + b$ or $y = w_1 x_1 + w_2 x_2 + ... + b$
* When we fit a linear regression using a simple neural network:
  * What is the width of the input layer? for $n$ features, $n + 1$ (+ 1 for the bias)
  * What is the width of the output layer? 1 (because simple linear regression)
  * What type of neural network is it, in terms of connections? feedforward
  * What are the parameters? the weights and the bias
  * What does the loss function look like (function)? MSSE, or $1/n\sum_{i=1}^N 1/2(y-\hat{y})^2$ (could do exp 4, 6 etc or || but not odd valued exponents because sign shouldn't matter)



## Deep Dive on Gradient Descent

First, we install and import required libraries.

In [None]:
!pip install torch torchvision
!pip install d2l==1.0.0b0

In [None]:
%matplotlib inline
import torch
from d2l import torch as d2l

Alex Strick van Linschoten has a great overview of [gradient descent](https://mlops.systems/posts/2022-05-12-seven-steps-gradient-calculations.html). Let's take a look.
1. Initialise a set of weights
2. Use the weights to make a prediction
3. Loss: see how well we did with our predictions
4. Calculate the gradients across all the weights
5. ‘Step’: Update the weights
6. Repeat starting at step 2
7. Iterate until we decide to stop







With this in mind, let's take a deeper look at those Module and Trainer classes we have been subclassing.

1. Annotate (comment) these classes with the location of each of these seven staps in basic gradient descent.

2. Annotate the code that implements the *stochastic* part of what we are doing.

3. Annotate the code that implements *minibatch*. (What are the alternatives to minibatch that we have discussed so far?)

In [None]:
class Trainer(d2l.HyperParameters):  
    """The base class for training models with data. From https://d2l.ai/chapter_linear-regression/oo-design.html. Augmented with prepare_batch and fit_epoch for minibatch SGD."""
    def __init__(self, max_epochs, num_gpus=0, gradient_clip_val=0):
        self.save_hyperparameters()
        assert num_gpus == 0, 'No GPU support yet'

    def prepare_data(self, data):
        # pytorch defines a dataloader class
        self.train_dataloader = data.train_dataloader()
        self.val_dataloader = data.val_dataloader()
        self.num_train_batches = len(self.train_dataloader)
        self.num_val_batches = (len(self.val_dataloader)
                                if self.val_dataloader is not None else 0)

    def prepare_model(self, model):
        # a model has a trainer and a trainer has a model
        model.trainer = self
        # set up the plot
        model.board.xlim = [0, self.max_epochs]
        self.model = model

    def fit(self, model, data):
        # to fit, we need a model and data
        self.prepare_data(data)
        self.prepare_model(model)
        self.optim = model.configure_optimizers()
        self.epoch = 0
        self.train_batch_idx = 0
        self.val_batch_idx = 0
        # Step 6 and Step 7
        # no early stopping
        for self.epoch in range(self.max_epochs):
            # truly stochastic: shuffle the training data
            self.fit_epoch()
            # early stopping: if loss hasn't changed return

    def prepare_batch(self, batch):
        return batch

    def fit_epoch(self):
        self.model.train()
        # Minibatch
        for batch in self.train_dataloader:
            loss = self.model.training_step(self.prepare_batch(batch))
            self.optim.zero_grad()
            with torch.no_grad():
                # Step 4 happens in backward; also Step 5
                loss.backward()
                if self.gradient_clip_val > 0:  # To be discussed later
                    self.clip_gradients(self.gradient_clip_val, self.model)
                self.optim.step()
            self.train_batch_idx += 1
        # validation (or testing)
        if self.val_dataloader is None:
            return
        self.model.eval()
        for batch in self.val_dataloader:
            with torch.no_grad():
                self.model.validation_step(self.prepare_batch(batch))
            self.val_batch_idx += 1

class SGD(d2l.HyperParameters):  #@save
    "Our SGD class, from https://d2l.ai/chapter_linear-regression/linear-regression-scratch.html."
    def __init__(self, params, lr):
        """Minibatch stochastic gradient descent."""
        self.save_hyperparameters()

    # Step 4
    def step(self):
        for param in self.params:
            param -= self.lr * param.grad

    def zero_grad(self):
        for param in self.params:
            if param.grad is not None:
                param.grad.zero_()
                
class LinearRegressionScratch(d2l.Module):  #@save
    "Our linear regression class, from https://d2l.ai/chapter_linear-regression/linear-regression-scratch.html, with SGD as the optimizer and with all the functions from the base Module class added."
    def __init__(self, num_inputs, lr, sigma=0.01):
        super().__init__()
        self.save_hyperparameters()
        self.board = d2l.ProgressBoard()
        # Step 1 initialize weights and bias
        self.w = torch.normal(0, sigma, (num_inputs, 1), requires_grad=True)
        print(self.w)
        self.b = torch.zeros(1, requires_grad=True)
        print(self.b)

    def plot(self, key, value, train):
        """Plot a point in animation."""
        assert hasattr(self, 'trainer'), 'Trainer is not inited'
        self.board.xlabel = 'epoch'
        if train:
            x = self.trainer.train_batch_idx / \
                self.trainer.num_train_batches
            n = self.trainer.num_train_batches / \
                self.plot_train_per_epoch
        else:
            x = self.trainer.epoch + 1
            n = self.trainer.num_val_batches / \
                self.plot_valid_per_epoch
        self.board.draw(x, value.to(d2l.cpu()).detach().numpy(),
                        ('train_' if train else 'val_') + key,
                        every_n=int(n))

    def configure_optimizers(self):
        return SGD([self.w, self.b], self.lr)
        
    def training_step(self, batch):
        l = self.loss(self(*batch[:-1]), batch[-1])
        self.plot('loss', l, train=True)
        return l

    def validation_step(self, batch):
        l = self.loss(self(*batch[:-1]), batch[-1])
        self.plot('loss', l, train=False)

    # Step 2 use the weights to make a prediction
    def forward(self, X):
        return torch.matmul(X, self.w) + self.b

    # Step 3 calculate the loss
    def loss(self, y_hat, y):
        l = (y_hat - y) ** 2 / 2
        return l.mean()



Let's run it!

We implement a reader for CSV data.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split

class CsvData(d2l.DataModule):  #@save
    def __init__(self, labelColIndex, path, batch_size=32):
        super().__init__()
        self.save_hyperparameters()
        # read the data
        df = pd.read_csv(path)
        # drop any non-numeric columns
        df = df._get_numeric_data()
        # drop the label column from the features
        colIndices = list(range(df.shape[1]))
        colIndices.pop(labelColIndex)
        features = df.iloc[:, colIndices]
        # keep it in the label, obviously :)
        labels = df.iloc[:, labelColIndex]
        # split the dataset
        self.train, self.val, self.train_y, self.val_y = train_test_split(features, labels, test_size=0.2, shuffle=True)

    def get_dataloader(self, train):
        features = self.train if train else self.val
        labels = self.train_y if train else self.val_y
        get_tensor = lambda x: torch.tensor(x.values, dtype=torch.float32)
        tensors = (get_tensor(features), get_tensor(labels))
        return self.get_tensorloader(tensors, train)

You can get this data from https://archive-beta.ics.uci.edu/dataset/53/iris.

In [None]:
model = LinearRegressionScratch(3, lr=1)
data = CsvData(1,"data/iris.data", 32)

In [None]:
trainer = d2l.Trainer(max_epochs=20)
trainer.fit(model, data)

*What happens when you change the number of epochs?*

*What happens when you change the learning rate?*

*How does all this compare to your project 1 adaline class?*

4. Now write a short explanation of minibatch SGD suitable for a non-CS major. You can write a paragraph, make a diagram.... The method is up to you!