# Homework 5 Q2

Up to this point, we have created our own models and practiced implementing existing machine learning techniques and derivations. It is now time to move on to more recent advancements of machine learning–Deep Learning. In this homework, I will demonstrate how PyTorch can be used and some common and useful functionalities. In the end, you will be required to create your own model that will be trained on a dataset.

## PyTorch Tensors

We have made you practice NumPy extensively over the past few weeks: to understand the functions, the dimensions, and the axes. In PyTorch, the package doesn't like to work with NumPy objects. Instead, it works with its own version of NumPy called Torch.Tensors. Torch.Tensor is extremely similar to NumPy and we will show it here.

Let's make a random matrix to play with.

In [None]:
import torch
import numpy as np

X = torch.rand(5, 8)
X

We can easily create the NumPy object of the same tensor using the `.numpy()` method.



In [None]:
X_np = X.numpy()    # create a NumPy version of X
print(X_np)
print(type(X_np))

And change it back to Torch.Tensor by wrapping it around with torch.tensor().

In [None]:
X = torch.tensor(X_np)
print(X)
print("Shape:\t", X.shape)
print("X.shape is still represented by a tuple:\t", X.shape == (5, 8))
print("Type:\t", type(X))

Slicing works the same as NumPy as well.

In [None]:
print(X[1:3, 2:5])      # certain section of the matrix
print(X[[0, 2, 4], :])  # row 0, 2, and 4

Boolean slicing works the same as NumPy as well.

In [None]:
bool_idx = X < 0.5
print(bool_idx)
X_bool_filter = torch.clone(X)  # torch.clone() can help us deep copy a matrix so we don't accidentally modify the original copy
X_bool_filter[bool_idx] = 99    # using the boolean slicing, we change the values less than 0.5 to 99
print(torch.round(X_bool_filter, decimals=4))
print(X)                        # notice how the original tensor is unchanged, that's why torch.clone() is important

Of course, we have our favorite mean, var, and std functions.
Now, the only difference between tensors and NumPy is that tensors use the keyword **dim** in place of NumPy's **axis**, like so.

In [None]:
print(torch.mean(X))
print(torch.var(X))
print(torch.std(X))

In [None]:
print(torch.mean(X, dim=0))
print(torch.var(X, dim=0))
print(torch.std(X, dim=0))

Some linear algebra operations:


In [None]:
Z = torch.rand(8, 3)
# dot product - two ways of doing so (NumPy also takes @ for dot products)
print(X @ Z)
print(torch.matmul(X, Z))
print(torch.einsum("ij,jk->ik", [X, Z]))    # this is torch.einsum - not important for this class but a cool function regardless
print((X @ Z).shape)

Calculate eigenvectors and eigenvalues with PyTorch!

In [None]:
X = torch.rand(4, 4)
print(torch.linalg.eig(X))

## Neural Network 101

Now, we will learn how to create our own neural networks using PyTorch. Our model is still pretty light since this is our first exposure to PyTorch. Next week, we will be learning how to train on models on GPU's to speed up training time!

Firstly, we will mount on Google Drive, so you can access your files from Google Drive like how you would access files from your local computer.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Some standard imports!

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

import pandas as pd
import numpy as np

We will be working with a very popular deep learning dataset Fashion MNIST. Each data point is a 28x28 image of some type of clothes.

In [None]:
file_path = "/content/drive/MyDrive/bioeng245_sp2023/hw5/"      # ENTER YOUR OWN FILE PATH
data = pd.read_csv(file_path + "fashion_mnist_data.csv", sep=',')
data.head()

In [None]:
print(data.shape)   # (N, D + 1) (one column is the label column)

In the following cell, transform the DataFrame data to 2 NumPy arrays X and y, where X is the data matrix with 784 pixels per row. y should be a NumPy vector of the labels. From the cell above, we see that there are 8,000 samples to work with. Each sample has 784 features, or pixels, where the first column, "label", is the label of that sample. The final shape of X should be `(8000, 784)` and y should be `(8000, )`.

In [None]:
### YOUR CODE HERE
...
print(X.shape)
print(y.shape)

Let's quickly visualize some samples. You should see some images of clothes and the correct labels if you implemented the above cell correctly.

In [None]:
import matplotlib.pyplot as plt

labels = ["T-Shirt/Top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Boot"]

def show_image_labels(img, label):
    plt.imshow(img.reshape(28, 28))
    plt.title(labels[label])
    plt.show()

for i in range(5):
    show_image_labels(X[i], y[i])

Now, we will build our first neural network!

In [None]:
class Sample_Network(nn.Module):
    def __init__(self):
        super().__init__()
        # TODO
        input_dim =  ...    # what should be the input dimension? what are we feeding the network?
        output_dim = ...    # what should be the output dimension? what do we want from the network after running model(), or also known as model.forward()?
        self.input_layer = nn.Linear(input_dim, 126)
        self.hidden_layer = nn.Linear(126, 256)
        self.output_layer = nn.Linear(256, output_dim)

    # self.forward is the main function that you will run your model with!
    # model(X) is equivalent as model.forward(X).
    # you will see how it's used in just a bit!
    # nothing to implement here - an example of how you the forward method works
    # as you can see, X is first passed into the input layer, then a non-linearity ReLU
    # then, the hidden layer followed by a non-linearity ReLU
    # in the end, the output_layer produces a logit matrix based on how many classes
    # we want it to classify
    def forward(self, X):
        X = self.input_layer(X)
        X = F.relu(X)
        X = self.hidden_layer(X)
        X = F.relu(X)
        logits = self.output_layer(X)
        return logits

    # we will create our own function to help with prediction
    # we will utilize F.softmax and torch.argmax!
    def classify(self, X):
        """
        Q:  create a function classify that will take in X data points and produce
            the predicted classification of these points.

        HINT: use torch.softmax() and torch.argmax()! you NEED them.

        Inputs
        - X: the torch.tensor matrix to be classified with shape (N, D)

        Outputs
        - labels: a torch.tensor with the shape (N, ), each item being X[i]'s 
                  classification prediction
        """
        X = torch.tensor(X).type(torch.float32)     # enforce smooth-running with the model
        logits = self(X)
        ...
        return labels.type(torch.long)

Write code to transform X and y into Tensors. `X` needs to be of type `torch.float32` and `y` needs to be of type `torch.long`. Google how you can cast a tensor to some certain type with PyTorch.

In [None]:
### YOUR CODE HERE
X, y = ..., ...

Let's instantiate our model and see what the outputs look like! It should look like (2, 10) because we fed in 2 samples and we're classifying one of the 10 possible classes!

In [None]:
model = Sample_Network()
out = model(X[:2])
print(out)
print(out.shape)

If we apply softmax to the output, we get the probability distribution of each class!

In [None]:
preds = model.classify(X[:2])
print(preds)    # the model thinks that the first 2 samples are these classes

In [None]:
# let's see if the model is doing a good job
show_image_labels(X[0], preds[0])       # this should be a T-shirt, not a shirt - they are two different classes
show_image_labels(X[1], preds[1])

Technically, the first image is supposed to be a t-shirt, not a shirt, soooo the model got both wrong (if it somehow did it got lucky)... Why? Well, we haven't trained it yet! And that's our next section!

## Writing the Training Loop

This is arguably one of the most important components of Deep Learning and PyTorch. We need to write our own training loop that will repeatedly train our model a number of epochs of times, based on our own specifications.

A typical training loop will have the following structure:


*   Take in the model and the data X_train, y_train, X_val, and y_val.
*   Take in hyperparameters such as learning rate, batch size, optimizer, schedulers, etc.
*   Iterate through the epochs
*   Each epoch, iterate through the batches and keep track of the losses and the metrics of choice (accuracy in our case).

In [None]:
from sklearn.metrics import accuracy_score

def train(model, X_train, y_train, X_val, y_val, epochs=15, batch_size=32, lr=1e-3):
    """
    Q:  write the training loop following the schema shown above.

    Inputs
    - model: the model to be trained - a PyTorch nn.Module class object
    - X_train, y_train, X_val, y_val: training and validation data
    - epochs: num epochs, or the number of times we want to run through the entire training data
    - batch_size: number of data points per batch
    - lr: learning rate
    - optimizer: optimizer used

    Outputs
    - losses: a list of losses
    - accuracies: a list of validation accuracies
    - train_accs: a list of training accuracies
    """

    batches = ...   # using batch_size, determine the number of batches needed

    loss_fn = nn.CrossEntropyLoss()                             # read the write-up for an explanation on CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)     # read the write-up for an explanation on Adam

    losses = []
    train_accs = []
    accuracies = []

    for epoch in range(epochs):
        for i in range(batches):
            X_batch = ...
            y_batch = ...

            logits = ...

            loss = loss_fn(logits, y_batch)

            # these 3 functions will follow you whenever you train a model with PyTorch
            optimizer.zero_grad()   # erases the gradients from the previous epoch (sets all gradients to 0)
            loss.backward()         # calculates the gradients with respect to every single weight matrix in the model
            optimizer.step()        # takes ONE learning step with the gradients just calculated

        # feel free to use sklearn's accuracy_score function
        # calculate the training accuracy
        ...

        # calculate the validation accuracy and append the loss of this epoch
        ...

        # print epoch, loss, and current test accuracy
        print(f"Epoch {epoch}:\tloss {loss} & accuracy {accuracy}")
    
    return losses, accuracies, train_accs

If you implemented the above code `Sample_Network` and `train()` correctly, you should see close to a 0.79-0.80 accuracy.

In [None]:
model = Sample_Network()
losses, accuracies, train_accs = train(model, X[:7000], y[:7000], X[7000:], y[7000:], epochs=5)

In [None]:
# Let's see how much the model has learned
preds = model.classify(X[:5])
for i in range(5):
    show_image_labels(X[i], preds[i])

Compared to what we had earlier, the model clearly learned *something*. It is currently able to differentiate among different images of clothes with an 80% accuracy!

## Build Your Own Model

Now, fill in the following class and make your own neural network model that can classify the clothes with a validation accuracy of 90%!

Using the previous `Sample_Network` as an example, please implement `My_Network`, a class that is able to produce `num_layers` of hidden layers (NOT including the input and the output layers) where each hidden layer has `hidden_size` units. Additionally, please **add one or more extra features** to boost your performance. Some inspirations: adding Dropout() layers, Batchnorm() layers, normalize inputs, learning rate schedulers, etc.

When you're done with this section, click on the folder icon to your left (for those using Google Colab), and you should be able to download `my_model.pt` and `predictions.npy` - the 2 files you will submit to Gradescope.

In [None]:
class My_Network(nn.Module):
    def __init__(self, num_layers, hidden_size):
        super().__init__()
        ...
    
    def forward(self, X):
        ...

    def classify(self, X):
        ...

In [None]:
my_model = My_Network(...)
losses, accuracies, train_accs = train(my_model)

In [None]:
torch.save(my_model.state_dict(), file_path + "my_model.pt")       # save your model - do NOT change the name of 
                                                                   # the model or else the autograder won't recognize it!

In [None]:
def get_test_classifications(model, file_path):
    test_df = pd.read_csv(file_path, sep=',')
    X = test_df.values
    X = torch.tensor(X).type(torch.float32)
    out = model.classify(X)
    np.save("predictions.npy", out)
    print("predictions.npy saved!")

get_test_classifications(my_model, "/YOUR/PATH/TO/fashion_mnist_test.csv")

predictions.npy saved!


Graph out your training vs. validation accuracy. These values should be available after running your training loop.

In [None]:
def graph(accuracies, training_accs):
    """
    Q:  graph out the accuracies and training accuracies.
        make sure you label which curve is the validation/training accuracy.
        labels and titles are required.

    Inputs
    - accuracies: list of floats with length epochs
    - training_accs: list of floats with length epochs

    Outputs
    - None
    """
    ...

## Short report


1.   What extra feature did you end up implementing? Why? Did it improve model performance?
2.   What did you observe after plotting the training and validation accuracy? Were there any overfitting or underfitting?


Your response:



1.   ...
2.   ...




## Conclusion

We have explored the very fundamentals of PyTorch in this homework, and the tools we have currently worked with is only the tip of the iceberg! In our next homework, we will do some heavy convolutional neural networks and learn how we can send our model to the GPU for a dramatically faster training time.