<small>Journal of Efficient Machine Learning Practice, Vol. 1 (2022)</small><br>
<small>Submitted 1/22; Published 2/22</small>

<h1><center>Journal of Efficient Machine Learning Practice<br>Notebook Format</center></h1>

<b>Jonathan S. Kent</b>, jonathan.s.kent@lmco.com<br>
*Advanced Technology Center*<br>
*Lockheed Martin*<br>
*Sunnyvale, CA 94089, USA*<br>

<b>Jonathan S. Kent</b>, jskent2@illinois.edu<br>
*Department of Mathematics*<br>
*University of Illinois at Urbana-Champaign*<br>
*Urbana, IL 61081, USA*<br>

<b>Editor: </b>Jonathan S. Kent

<h2>README</h2>
    
Welcome to the Journal of Efficient Machine Learning Practice. This is a journal that will be focusing on efficiency, robustness, and real-world application of Machine Learning, and also to promote the writing of reader-friendly articles. If that seems up your alley, you are welcome to make an account and submit a paper to the [Journal of Efficient Machine Learning Practice](https:\\www.jemlp.org).

The purpose of the README, here, is similar to that of the README of a given software package. It should explain in broad strokes how your code functions, and what considerations a reader should have going into it, e.g. hardware and software, such as "This code was written for a machine running Ubuntu 22.04, with an RTX 3070, a Ryzen 5 3600, and 16 GB of RAM."

Also, do not take the structure of this example notebook as gospel. If your code is cleaner and more sensible when organized in some other way, then structure it as is appropriate. However, your notebook should be structured so that selecting "Kernel -> Restart & Run All" completes one entire experimental run, and produces graphs and outputs as they would appear in your manuscript, e.g. accuracy metrics, training loss over time, etc. 

In [None]:
# Short explanations, like "importing packages," should be given at the start of code cells with an
# obvious purpose.

# A requirements .yml file listing packages and versions in standard Anaconda format must be included
# with your submission

# This .yml file is produced by running 
# `conda env export --from-history>ENV.yml`
# and can be install by the reader by running
# `conda env create -n ENVNAME --file ENV.yml`

import numpy as np
import torch
import torchvision
import progressbar
from matplotlib import pyplot as plt

In [None]:
# Hyperparameters and environmental constants should be given immediately after the import cells,
# and should have clear, readable names. If, in the context of reading the manuscript, the name alone
# is not enough to explain what the hyperparameter is, a short explanatory comment should be given. 
# Feel free to directly reference the manuscript that will be submitted alongside this notebook.
# Being constants, hyperparameters should be in block capital snake case. For the sake of
# consistency, even if something listed here is a function, e.g. a choice of loss function,
# it should still be in block capital snake case, to represent that it was just something arbitrarily
# chosen.

# Avoid leaving configuration-specific environmental constants, like directory names, in the submitted
# version of the notebook

LEARNING_RATE = 1e-4
NUM_EPOCHS = 10
DATA_LOC = '/path/to/dataset'
BATCH_SIZE = 32

DEPTH = 5 # Number of convolutional layers in the CNN
CHANNELS = 32 # Number of channels in the convolutional layers

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
LOSS_FN = torch.nn.CrossEntropyLoss()

TRAINING_TRANSFORMS = torchvision.transforms.Compose([ # Transforms used during model training
    torchvision.transforms.Resize([32, 32]),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.RandomRotation(10),
    torchvision.transforms.ToTensor(),
])

TESTING_TRANSFORMS = torchvision.transforms.Compose([ # Transforms used during testing
    torchvision.transforms.Resize([32, 32]),
    torchvision.transforms.ToTensor(),
])

In [None]:
# Important hyperparameters and environmental constants that are immediately derived from those in the
# preceeding cells should be defined immediately after them, and should be written with all lowercase
# snake case

training_dataset = torchvision.datasets.MNIST(DATA_LOC, train = True, download = True, 
                                                 transform = TRAINING_TRANSFORMS)
training_dataloader = torch.utils.data.DataLoader(training_dataset, batch_size = BATCH_SIZE, shuffle = True,
                                                 pin_memory = True, num_workers = 4)
testing_dataset = torchvision.datasets.MNIST(DATA_LOC, train = False, download = True,
                                                 transform = TESTING_TRANSFORMS)
testing_dataloader = torch.utils.data.DataLoader(testing_dataset, batch_size = BATCH_SIZE)

<h2>1. Model</h2>

Section headers should have a number associated with them, as well as a title, for easy reference. Feel free to include longer-form commentary on what you're about to do in the next few cells, if you think that it will help to explain things.

<h3>1.1. Model Definition</h3>

Section and subsection headers should be written according to the following example:
```
<h2>3. Section name</h2>
<h3>3.1. Subsection name</h3>
<h4>3.1.4. Subsubsection name</h4>
```
etc. This will make it easy to reference these sections in the manuscript, e.g. "This algorithm, which is implemented in notebook section X.Y.Z." New sections and subsections should be used to divide up your code into segments that are relevant and related within themselves, and should be ordered in a coherent, sensible fashion.

In [None]:
# Defining the classifier model

# Class names should be in capital camel case, while both functions and variables should be in lower snake case

class ClassifierModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # Creating the convolutional layers
        cnn_layers = [torch.nn.Conv2d(1, CHANNELS, 3, padding = 1, bias = False)]
        for i in range(1, DEPTH):
            cnn_layers += [torch.nn.BatchNorm2d(CHANNELS), torch.nn.ReLU(), 
                           torch.nn.Conv2d(CHANNELS, CHANNELS, 3, padding = 1, bias = False)]
        self.cnn = torch.nn.Sequential(*cnn_layers)
        
        # Averaging over each channel after the convolutional layers, and a fully connected layer
        self.pool = torch.nn.AdaptiveAvgPool2d(1)
        self.fcn = torch.nn.Linear(channels, 10)
        
    def forward(self, x):
        z = self.cnn(x)
        z = self.pool(z)
        z = torch.flatten(z, 1) # Flattening a dimension so that the fully connected layer can take z as input
        return(self.fcn(z))

<h2>2. Training</h2>

In [None]:
# Creating and training the model
model = ClassifierModel().to(DEVICE)

# Setting up the optimizer and gradscaler, to make use of FP16 hardware on modern Nvidia GPUs
opt = torch.optim.Adam(model.parameters(), lr = LEARNING_RATE)
scaler = torch.cuda.amp.GradScaler()

losses = []

for e in progressbar.progressbar(range(NUM_EPOCHS)):
    for i, (x_, y_) in enumerate(training_dataloader):
        x, y = x_.to(DEVICE), y_.to(DEVICE)
        opt.zero_grad()

        with torch.cuda.amp.autocast():
            y_hat = model(x)
            loss = LOSS_FN(y_hat, y)

        # Using the gradscaler on the loss value and the optimizer
        scaler.scale(loss).backward()
        scaler.step(opt)
        scaler.update()
        
        losses.append(loss.item())

# Graphing the training loss
plt.plot(losses)
plt.title("Training loss per step")
plt.xlabel("Training step")
plt.ylabel("Loss (cross entropy)")
plt.figure()

<h2>3. Testing</h2>

In total, your notebook should be relatively light on text; that's what the manuscript is for. However, please try and leave an appropriate number of comments, as well as explaining any discrepancies between your implementation and your Mathematical formulation, as well as any interesting practical considerations you made during implementation. 

In [None]:
# Evaluating the model on the testing set

correct = 0
total = 0

for i, (x_, y_) in enumerate(testing_dataloader):
    # Performing inference without holding gradients
    with torch.no_grad(), torch.cuda.amp.autocast():
        y_hat = model(x_.to(DEVICE)).to('cpu')
    
    total += len(y_)
    correct += (y_hat.argmax(axis = 1) == y_).to(torch.float32).sum().item()
    
print("Accuracy:", correct / total)