<img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" />
<!--- @wandbcode{edu_lit_mlp} -->

# A Multilayer Perceptron for MNIST

## Installing and Importing Libraries

In [None]:
%%capture
!pip install pytorch-lightning==1.3.8 torchviz wandb
!git clone https://github.com/wandb/lit_utils
!cd "/content/lit_utils" && git pull

import math

import pytorch_lightning as pl
import torch
import wandb

import lit_utils as lu

lu.utils.filter_warnings()

In [None]:
wandb.login()

## Defining the `Model`

In [None]:
class LitMLP(lu.nn.modules.LoggedImageClassifierModule):
  """A simple MLP Model with under-the-hood logging features."""

  def __init__(self, config):  # make the model
    super().__init__()

    self.layers = torch.nn.Sequential(*[  # specify our LEGOs. edit this by adding to the list!
      lu.nn.fc.FullyConnected(
          in_features=28 * 28, activation=config["activation"](),
                     out_features=config["fc1.size"]),  # hidden layer
      lu.nn.fc.FullyConnected(
          in_features=config["fc1.size"], activation=config["activation"](),
                     out_features=config["fc2.size"]),  # hidden layer
      lu.nn.fc.FullyConnected(
          in_features=config["fc2.size"],  # "read-out" layer
                     out_features=10),
    ])

    self.loss = config["loss_fn"]
    self.optimizer = config["optimizer"]
    self.optimizer_params = config["optimizer.params"]

  def forward(self, x):  # produce outputs
    x = torch.flatten(x, start_dim=1)
    for layer in self.layers:  # snap together the LEGOs
      x = layer(x)
    return x

## Choosing hyperparameters

In [None]:
config = {
  "batch_size": 32,
  "train_size": 1024,  # reducing to a small subset to observe overfitting; set to 50000 for full dataset
  "max_epochs": 35,
  "fc1.size": 128,
  "fc2.size": 64,
  "activation": torch.nn.ReLU,
  "loss_fn": torch.nn.CrossEntropyLoss(),
  "optimizer": torch.optim.Adam,
  "optimizer.params": {"lr": 3e-3},
}

lmlp = LitMLP(config)

## Loading the data

In [None]:
dmodule  = lu.datamodules.MNISTDataModule(batch_size=config["batch_size"])
dmodule.prepare_data()
dmodule.setup()
dmodule.training_data = torch.utils.data.Subset(  
  dmodule.training_data, indices=range(config["train_size"]))

## Building and Training the `Model`

### Debugging Code

In [None]:
# for debugging purposes (checking shapes, etc.), make these available
dloader = dmodule.train_dataloader()  # set up the Loader

example_batch = next(iter(dloader))  # grab a batch from the Loader
example_x, example_y = example_batch[0].to("cuda"), example_batch[1].to("cuda")

print(f"Input Shape: {example_x.shape}")
print(f"Target Shape: {example_y.shape}")

lmlp.to("cuda")
outputs = lmlp.forward(example_x)
print(f"Output Shape: {outputs.shape}")
print(f"Loss : {lmlp.loss(outputs, example_y)}")

### Running `.fit`

In [None]:
with wandb.init(project="lit-mlp", entity="wandb", config=config):
  
  # 🪵 configure logging
  cbs=[lu.callbacks.WandbCallback(),  # callbacks add extra features, like better logging
       lu.callbacks.FilterLogCallback(image_size=(28, 28), log_input=True),  # this one logs the weights as images
       lu.callbacks.ImagePredLogCallback(labels=dmodule.classes, on_train=True)  # and this one logs the inputs and outputs
       ]
  wandblogger = pl.loggers.WandbLogger(save_code=True)
  if hasattr(lmlp, "_wandb_watch_called") and lmlp._wandb_watch_called:
    wandblogger.watch(lmlp)  # track gradients

  # 👟 configure trainer
  trainer = pl.Trainer(gpus=1,  # use the GPU for .forward
                      logger=wandblogger,  # log to Weights & Biases
                      callbacks=cbs,  # use callbacks to log lots of run data
                      max_epochs=config["max_epochs"], log_every_n_steps=1,
                      progress_bar_refresh_rate=50)

  # 🏃‍♀️ run the Trainer on the model
  trainer.fit(lmlp, datamodule=dmodule)

  # 🧪 test the model on unseen data
  trainer.test(lmlp, datamodule=dmodule)

  # 💾 save the model
  lmlp.to_onnx("model.onnx", example_x, export_params=True)
  wandb.save("model.onnx", ".") 

## Exercises


For the exercises below, you'll want to review the results
of your training runs on [Weights & Biases](https://wandb.ai/site).
The link to each run's dashboard will be printed in the cell output above,
with the name "Run Page".

You should be able to find your run,
along with other runs created using this notebook,
in [this Weights & Biases dashboard](http://wandb.me/lit-mlp-workspace),
which shows results across many runs.

You can see an example run [here](https://wandb.ai/wandb/lit-mlp/runs/3h3iu4ec?workspace=user-charlesfrye).

 > _Tip_: to launch new training runs,
 restart the Colab notebook and run all the cells at once
 ("Runtime > Restart and run all").
 That way, you can always be sure what code
 was run, in case you hit an error.


### Set A.

#### **Exercise**: Deep learning models are built by snapping "LEGOs" together: modular, combinable pieces. What are the LEGOs of this model?

#### **Exercise**: How "wide" are the layers in this network? How would you make it wider or skinnier? (_Hint_: check out the `lu.nn.fc.FullyConnected` layers).

#### **Exercise**: How "deep" is this network? How would you make it deeper? (_Hint_: check out the `nn.Sequential` inside the `__init__` of the `LitMLP` class). Try making the network deeper and see what happens to the training, validation, and test accuracies.

#### **Exercise**: What happens when you decrease the value of `max_epochs` (by, say, a factor of 10)? Does training take more or less time? What happens to the training accuracy?

#### **Exercise**: At the end of training, compare the accuracy on the training set with that on the validation and test sets. Which one is lower and which one is higher?

### Set B.

#### **Exercise**: Add a `torch.nn.Dropout` layer to the network. (_Hint_: it goes in the `Sequential`. But where?). What effect does this have on the validation accuracy after a large number (say, 50) of epochs? What about after only a few epochs? You may want to increase the `dataset_size` in the `config` to `50000`.

#### **Exercise**: Add a `torch.nn.Dropout` layer after the final fully-connected layer -- the one that produces the un-normalized class probabilities. What impact does this have on training accuracy? What about validation accuracy? [Read about `model.eval`](https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch) and use what you've learned to explain this difference.