## Classification with a Multi-layer Perceptron (MLP)

In this problem set, we will *not* be implementing neural networks from scratch. Yesterday, you built a *perceptron* in Python. Multi-layer perceptrons (MLPs) are, as discussed in the lecture, several layers of these perceptrons stacked. Here, we will learn how to use one of the most common modules for building neural networks: Pytorch

In [None]:
# this module contains our dataset
!pip install astronn
#this is pytorch, which we will use to build our nn
import torch
#Standards for plotting, math
import matplotlib.pyplot as plt
import numpy as np
#for our objective function
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay

# Problem 1: Understanding the Data

For this problem set, we will use the Galaxy10 dataset made available via the astroNN module. This dataset is made up of 17736 images of galaxies which have been labelled by hand. See this [link](https://astronn.readthedocs.io/en/latest/galaxy10.html) for more information. 

First we will visualize our data.

**Problem 1a** Show one example of each class as an image.



In [None]:
from astroNN.datasets import load_galaxy10
from astroNN.datasets.galaxy10 import galaxy10cls_lookup
%matplotlib inline

#helpful functions:
#Load the images and labels as numbers
images, labels_original = load_galaxy10()

#convert numbers to a string
galaxy10cls_lookup(labels_original[0])


**Problem 2b** Make a histogram showing the fraction of each class

Keep only the top two classes (i.e., the classes with the most galaxies)

This next block of code converts the data to a format which is more compatible with our neural network.

In [None]:
def to_categorical(y, num_classes):
    """ 1-hot encodes a tensor """
    return np.eye(num_classes, dtype='uint8')[y-1]
labels = to_categorical(labels_original, 2)
labels = labels.astype(np.float32)
images = images.astype(np.float32)
images = torch.tensor(images)
labels = torch.tensor(labels)

# we're going to flatten the images for our MLP
images = images.reshape(len(images),-1)

**Problem 2c** Split the data into a training and test set (66/33 split) using the train_test_split function from sklearn

In [None]:
from sklearn.model_selection import train_test_split

The next cell will outline how one can make a MLP with pytorch. 

**Problem 3a** Talk to a partner about how this code works, line by line. Add another hidden layer which is the same size as the first hidden layer. Choose an appropriate final nonlinear layer for this classification problem. Choose the appropriate number of outputs.

In [None]:
class MLP(torch.nn.Module):
      # this defines the model
        def __init__(self, input_size, hidden_size):
            super(MLP, self).__init__()
            self.input_size = input_size
            self.hidden_size  = hidden_size
            self.hiddenlayer = torch.nn.Linear(self.input_size, self.hidden_size)
            self.outputlayer = torch.nn.Linear(self.hidden_size, HOW_MANY_OUTPUTS)
            # some nonlinear options
            self.sigmoid = torch.nn.Sigmoid()
            self.softmax = torch.nn.Softmax()
            self.relu = torch.nn.ReLU()
        def forward(self, x):
            layer1 = self.hiddenlayer(x)
            activation = self.sigmoid(layer1)
            layer2 = self.outputlayer(activation)
            output = self.NONLINEAR(layer2)
            return output

The next block of code will show how one can train the model for 100 epochs. Note that we use the *binary cross-entropy* as our objective function and *stochastic gradient descent* as our optimization method.

**Problem 3b** Edit the code so that the function plots the loss for the training and test loss for each epoch.

In [None]:
# train the model
def train_model(training_data,training_labels, test_data,test_labels, model):
  # define the optimization
  criterion = torch.nn.BCELoss()
  optimizer = torch.optim.SGD(model.parameters(), lr=0.1,momentum=0.9)
  for epoch in range(100):
    # clear the gradient
    optimizer.zero_grad()
    # compute the model output
    myoutput = model(training_data)
    # calculate loss
    loss = criterion(myoutput, training_labels)
    # credit assignment
    loss.backward()
    # update model weights
    optimizer.step()

    # ADD PLOT


The next block trains the code, assuming a hidden layer size of 100 neurons.

**Problem 3c** Change the learning rate `lr` to minimize the cross entropy score

In [None]:
model = MLP(np.shape(images_train[0])[0],100)
train_model(images_train, labels_train, images_test, labels_test, model)


Write a function called `evaluate_model` which takes the image data, labels and model as input, and the accuracy as output. you can use the `accuracy_score` function.

In [None]:
# evaluate the model
def evaluate_model(data,labels, model):
  return(acc)
# evaluate the model
acc = evaluate_model(images_test,labels_test, model)
print('Accuracy: %.3f' % acc)

**Problem 3d** Make a confusion matrix for the test set using `confusiion_matrix` and 'ConfusionMatrixDisplay`

**Challenge Problem** Add a third class to your classifier and begin accounting for uneven classes. There are several steps to this:

1. Edit the neural network to output 3 classes
2. Change the criterion to a *custom criterion* function, such that the entropy of each class is weighted by the inverse fraction of each class size (e.g., if the galaxy class breakdowns are 1:2:3, the weights would be 6:3:2). 