**Homework 18**

**IMPORTANT!!!** Complete this notebook to do your homework, and then transfer your answers to homework18gradescope.ipynb (available on canvas).

In this assignment we'll start working with PyTorch:

In [2]:
import torch
from torch.nn import Linear, ReLU, Sequential

Let's use PyTorch to recreate the Neural Network from Homework 16:

In [3]:
network=Sequential(
    Linear(2,3),
    ReLU(),
    Linear(3,1)
)

You should be able to use it the exact same way, except that we apply PyTorch models to PyTorch tensors, rather than Numpy arrays:

In [4]:
X=torch.randn(15,2) #generate a random feature matrix with 2 features and 15 observations as a torch tensor
network(X) #Predictions of our network

tensor([[-0.4796],
        [-0.4796],
        [-0.4796],
        [-0.8946],
        [-0.7521],
        [-0.6748],
        [-1.3332],
        [-0.7937],
        [-1.6147],
        [-1.8863],
        [-1.9685],
        [-0.4796],
        [-0.8862],
        [-0.4826],
        [-0.5133]], grad_fn=<AddmmBackward0>)

You can see the weights and biases of your network as follows. Note that the first layer has a weight matrix of shape (3,2) and a bias vector of size 3, and the second layer has a weight matrix of shape(1,3) and a single bias.

In [5]:
for param in network.parameters():
    print(param,param.shape)

Parameter containing:
tensor([[ 0.0199,  0.6837],
        [-0.4606,  0.6944],
        [-0.2392,  0.5493]], requires_grad=True) torch.Size([3, 2])
Parameter containing:
tensor([ 0.5159,  0.5420, -0.4616], requires_grad=True) torch.Size([3])
Parameter containing:
tensor([[-0.3584, -0.3875, -0.3233]], requires_grad=True) torch.Size([1, 3])
Parameter containing:
tensor([-0.4796], requires_grad=True) torch.Size([1])


We now use gradient descent to train our network. Let's create a target vector:

In [6]:
target=torch.randn(15,1)

Our training loop now follows the pattern from homework 17:

In [7]:
for i in range(10000): #Do 10000 gradient descent steps
  network.zero_grad() #Zero out the derivative with respect to each network parameter

  prediction=network(X) #Compute the network prediction
  MSEloss=torch.nn.MSELoss()(prediction,target) #Compute MSE loss
  #MSEloss=((prediction-target)**2).mean() #This is the same!!

  MSEloss.backward() #Compute gradient

  for param in network.parameters():
    param.data-=0.01*param.grad  #Take a gradient descent step with learning rate of 0.01

  if i%1000==0:
    print(f"Step: {i}/10000, Loss: {MSEloss.item()}") #Periodic reporting to track progress

Step: 0/10000, Loss: 2.5258545875549316
Step: 1000/10000, Loss: 0.8870121240615845
Step: 2000/10000, Loss: 0.8075227737426758
Step: 3000/10000, Loss: 0.7620988488197327
Step: 4000/10000, Loss: 0.7281748652458191
Step: 5000/10000, Loss: 0.7123755216598511
Step: 6000/10000, Loss: 0.7085890173912048
Step: 7000/10000, Loss: 0.7076688408851624
Step: 8000/10000, Loss: 0.7074926495552063
Step: 9000/10000, Loss: 0.7074695825576782


Let's now use this on real data. We'll use the same three colunns from the `cars` dataset that we usd in Homework 7, and again use the `mpg` column for our target:

In [8]:
import pandas as pd
cars=pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/causaldata/auto.csv')

DWG=torch.tensor([cars.displacement,cars.weight,cars.gear_ratio],dtype=torch.float32).T
mpg=torch.tensor(cars.mpg,dtype=torch.float32)

Define a neural network appropriate for predicting `mpg` from `DWG`. Your network should have two hidden layers with 8 neurons and 4 neurons, and ReLU layers before and after all hidden layers.

In [9]:
cars_network=Sequential(
    Linear(3,8),
    ReLU(),
    Linear(8,4),
    ReLU(),
    Linear(4,1)
)

Write a training loop to train your network to predict `mpg` from `DWG`. Do 10000 gradient descent steps with a learning rate of 0.001, and report your MSE every 10000 steps.

In [10]:
lr = 0.001
report = 1000
for i in range(10000):
    cars_network.zero_grad()
    preds = cars_network(DWG)
    MSEloss = torch.nn.MSELoss()(preds,mpg)

    MSEloss.backward()

    for param in cars_network.parameters():
        param.data-=lr*param.grad

    if i%report == 0:
        print(f"Step: {i}/10000, Loss: {MSEloss.item()}") #Periodic reporting to track progress

  return F.mse_loss(input, target, reduction=self.reduction)


Step: 0/10000, Loss: 1880.758056640625
Step: 1000/10000, Loss: 759792.375
Step: 2000/10000, Loss: 13892.880859375
Step: 3000/10000, Loss: 285.8579406738281
Step: 4000/10000, Loss: 37.63215637207031
Step: 5000/10000, Loss: 33.10386276245117
Step: 6000/10000, Loss: 33.02125930786133
Step: 7000/10000, Loss: 33.019752502441406
Step: 8000/10000, Loss: 33.01972198486328
Step: 9000/10000, Loss: 33.01972198486328


Compute the final MSE for your trained model:

In [11]:
final_mse= torch.nn.MSELoss()(preds,mpg).item()
final_mse

33.01972198486328

For classification problems, here are the only changes:
1. If you are predicting $n$ classes, the final layer should have $n$ neurons.
2. For training, use Categorical Cross Entropy loss (torch.nn.CrossEntropyLoss) instead of MSE. (No need to one-hot encode target variable)
3. If you want to generate predictions of your model (to guage accuracy, for example), use: `torch.argmax(network(X), dim=1)`
4. If you want to see the probability that your model predicts each class, use: `torch.softmax(network(X), dim=1)`.

With these changes in mind, we'll revisit the iris dataset:

In [12]:
from sklearn.datasets import load_iris
iris=load_iris()

X=torch.tensor(iris.data, dtype=torch.float32)
y=torch.tensor(iris.target, dtype=torch.long)

Create a neural network appropriate for predicting `y` from `X`, with one hidden layer that has 10 neurons, and ReLU layers before and after that:

In [13]:
iris_net= Sequential(
    Linear(4,10),
    ReLU(),
    Linear(10,3) # since there are 3 possible classes
)


Train your neural network for 10000 steps, with a learning rate of 0.01:

In [14]:
lr = 0.01
for i in range(10000):
    iris_net.zero_grad()
    preds = iris_net(X)
    CCEloss = torch.nn.CrossEntropyLoss()(preds,y)

    CCEloss.backward()

    for param in iris_net.parameters():
        param.data -= lr*param.grad

    if i%1000==0:
        print(f"Step: {i}/10000, Loss: {CCEloss.item()}") #Periodic reporting to track progress

Step: 0/10000, Loss: 1.3251593112945557
Step: 1000/10000, Loss: 0.22469578683376312
Step: 2000/10000, Loss: 0.12114717066287994
Step: 3000/10000, Loss: 0.09376837313175201
Step: 4000/10000, Loss: 0.08178652077913284
Step: 5000/10000, Loss: 0.07503693550825119
Step: 6000/10000, Loss: 0.0706731379032135
Step: 7000/10000, Loss: 0.06759746372699738
Step: 8000/10000, Loss: 0.06529560685157776
Step: 9000/10000, Loss: 0.06349346786737442


What is the probability that your model assigns to flower 133 being in class 1? class 2?

In [25]:
probs = torch.softmax(iris_net(X), dim=1)

In [26]:
preds = torch.argmax(probs,axis=1)
preds[preds == 1].shape[0]/preds.shape[0]

0.31333333333333335

In [None]:
class1_prob= preds[preds == 1].shape[0]/preds.shape[0]
class2_prob=preds[preds == 2].shape[0]/preds.shape[0]
class1_prob,class2_prob

(0.31333333333333335, 0.35333333333333333)

Create a vector of predictions for your model:

In [18]:
predictions= preds
predictions

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,
        2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2])

Compute the accuracy of your model:

In [19]:
accuracy= (preds ==y).sum()/150
accuracy

tensor(0.9800)