# Homework: PyTorch Multi-Layer Perceptron

This is a set of homework questions based on the `SimpleDNN.ipynb` tutorial, which will ask you to perform some experimentation and draw some insights about neural networks. We will use the same datasets as before, so make sure you still have those! It may also be beneficial to refer to that notebook if you're ever confused about the physics task at hand.

As you work, feel free to insert cells into this notebook as needed. Please keep the overall structure (order of the problems, etc) consistent, but don't feel overly bound by the scaffolding laid out!

To get started, we'll import the same tools we've used before.

In [None]:
import os
from itertools import cycle
import numpy as np
import vector
import torch
from matplotlib import pyplot as plt

To start, you should load in the datasets (and build `DataLoader`s) and re-create the model architecture you built from the tutorial. While I have no doubt you'll reference that code, try to make sure you're actually typing out at least a significant portion of it here. I've found that copy/pasting is often the enemy of knowledge retention!

In [None]:
# Your code here - load datasets and create dataloaders

In [None]:
# Your code here - define model class as subclass of torch.nn.Module

### Problem 1

Now that you have a model class defined and some datasets, you should implement training! Here, we're going to do things a little differently than in the tutorial: you'll be wrapping all of the training process up neatly into a function! This will allow you to run multiple trainings easily within the notebook, which you'll be doing in later problems. Your function should accept as parameters:
- an instance of your model class (the model to train)
- an optimizer (instance of a class in the [`torch.optim`](https://docs.pytorch.org/docs/stable/optim.html) module)
- a loss function (can be an instance of any subclass of `torch.nn.Module`, though will most likely be one of the losses that PyTorch implements)
- a number of epochs to train
- anything else you feel you'll need (or just want to include)

It should then return the trained model.

A couple of pointers for the design of this function:
- Try to make it as self-contained as possible. This will likely involve adding additional parameters beyond the four mentioned above (what else did we need to know in order to set up our training?), but will pay off in the long run, as you can easily run trainings with a variety of configurations.
- This function will likely spend more than a minute running when called (it takes time to train)! During that time, it might be nice to get some feedback that things are running as expected. Feel free to add `print` statements (or use other tools) to monitor your training progress!

A simple signature has been included for you.

In [None]:
def train_my_model(model, optimizer, loss_fn, epochs=5):
    
    # Your code here - train the model!
    
    return model

### Problem 2: Model Size

Now that we can create and train a variety of models easily, let's investigate a major aspect (and sometimes pain point) in machine learning: how does the performance of our model vary as its size changes? Train at least 4 different models (for at least 5 epochs each) and examine their performance on the validation (or testing) dataset. Which one performed the best? Write down your observations in markdown cells as you go, and try to compare performance directly (perhaps even in a plot?) among the different trainings you ran.

In [None]:
# Training 1

In [None]:
# Evaluation 1

In [None]:
# Training 2

In [None]:
# Evaluation 2

In [None]:
# Training 3

In [None]:
# Evaluation 3

In [None]:
# Training 4

In [None]:
# Evaluation 4

In [None]:
# Comparison and summary

### Problem 3: Our Input Data

To understand our problem a little better, let's look a bit more closely at what we're giving the model.

#### Part 3A: Taking a Look

Plot histograms of several of the observables that serve as inputs to our model. You should at least plot the $p_x$, $p_z$, and $E$ of the _first jet_ and _lepton_, and the $p_x$ and $p_y$ of the missing transverse energy. What do you see? Do all the variables lie in about the same range, or are their domains different? What about their distributions? Summarize your findings.

In [None]:
# Your code here - make some plots!