**Homework 18**

**IMPORTANT!!!** Complete this notebook to do your homework, and then transfer your answers to homework18gradescope.ipynb (available on canvas).

In this assignment we'll start working with PyTorch:

In [1]:
import torch
from torch.nn import Linear, ReLU, Sequential

Let's use PyTorch to recreate the Neural Network from Homework 16:

In [2]:
network=Sequential(
    Linear(2,3),
    ReLU(),
    Linear(3,1)
)

You should be able to use it the exact same way, except that we apply PyTorch models to PyTorch tensors, rather than Numpy arrays:

In [3]:
X=torch.randn(15,2) #generate a random feature matrix with 2 features and 15 observations as a torch tensor
network(X) #Predictions of our network

tensor([[0.5283],
        [0.8579],
        [0.5344],
        [0.5102],
        [0.5273],
        [0.5303],
        [0.5272],
        [0.5669],
        [0.5126],
        [0.7010],
        [0.5180],
        [0.5344],
        [0.7098],
        [0.5264],
        [0.5217]], grad_fn=<AddmmBackward0>)

You can see the weights and biases of your network as follows. Note that the first layer has a weight matrix of shape (3,2) and a bias vector of size 3, and the second layer has a weight matrix of shape(1,3) and a single bias.

In [4]:
for param in network.parameters():
    print(param,param.shape)

Parameter containing:
tensor([[-0.4884,  0.1012],
        [-0.6083,  0.4605],
        [-0.1590, -0.5903]], requires_grad=True) torch.Size([3, 2])
Parameter containing:
tensor([-0.1485,  0.3392, -0.2125], requires_grad=True) torch.Size([3])
Parameter containing:
tensor([[ 0.2166, -0.0260,  0.2634]], requires_grad=True) torch.Size([1, 3])
Parameter containing:
tensor([0.5344], requires_grad=True) torch.Size([1])


We now use gradient descent to train our network. Let's create a target vector:

In [5]:
target=torch.randn(15,1)

Our training loop now follows the pattern from homework 17:

In [6]:
for i in range(10000): #Do 10000 gradient descent steps
  network.zero_grad() #Zero out the derivative with respect to each network parameter

  prediction=network(X) #Compute the network prediction
  MSEloss=torch.nn.MSELoss()(prediction,target) #Compute MSE loss
  #MSEloss=((prediction-target)**2).mean() #This is the same!!

  MSEloss.backward() #Compute gradient

  for param in network.parameters():
    param.data-=0.01*param.grad  #Take a gradient descent step with learning rate of 0.01

  if i%1000==0:
    print(f"Step: {i}/10000, Loss: {MSEloss.item()}") #Periodic reporting to track progress

Step: 0/10000, Loss: 0.763605535030365
Step: 1000/10000, Loss: 0.2666787803173065
Step: 2000/10000, Loss: 0.23930779099464417
Step: 3000/10000, Loss: 0.22691559791564941
Step: 4000/10000, Loss: 0.21765072643756866
Step: 5000/10000, Loss: 0.20988836884498596
Step: 6000/10000, Loss: 0.20325292646884918
Step: 7000/10000, Loss: 0.1982157677412033
Step: 8000/10000, Loss: 0.19464799761772156
Step: 9000/10000, Loss: 0.1924513876438141


Let's now use this on real data. We'll use the same three colunns from the `cars` dataset that we usd in Homework 7, and again use the `mpg` column for our target:

In [7]:
import pandas as pd
cars=pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/causaldata/auto.csv')

DWG=torch.tensor([cars.displacement,cars.weight,cars.gear_ratio],dtype=torch.float32).T
mpg=torch.tensor(cars.mpg,dtype=torch.float32)

Define a neural network appropriate for predicting `mpg` from `DWG`. Your network should have two hidden layers with 8 neurons and 4 neurons, and ReLU layers before and after all hidden layers.

In [8]:
cars_network=Sequential(
    Linear(3,8),
    ReLU(),
    Linear(8,4),
    ReLU(),
    Linear(4,1)
)

Write a training loop to train your network to predict `mpg` from `DWG`. Do 10000 gradient descent steps with a learning rate of 0.001, and report your MSE every 10000 steps.

In [9]:
lr = 0.001
report = 1000
for i in range(10000):
    cars_network.zero_grad()
    preds = cars_network(DWG)
    MSEloss = torch.nn.MSELoss()(preds,mpg)

    MSEloss.backward()

    for param in cars_network.parameters():
        param.data-=lr*param.grad

    if i%report == 0:
        print(f"Step: {i}/10000, Loss: {MSEloss.item()}") #Periodic reporting to track progress

  return F.mse_loss(input, target, reduction=self.reduction)


Step: 0/10000, Loss: 1962.44921875
Step: 1000/10000, Loss: 41.49964141845703
Step: 2000/10000, Loss: 33.17442321777344
Step: 3000/10000, Loss: 33.022544860839844
Step: 4000/10000, Loss: 33.019775390625
Step: 5000/10000, Loss: 33.01972198486328
Step: 6000/10000, Loss: 33.019718170166016
Step: 7000/10000, Loss: 33.019718170166016
Step: 8000/10000, Loss: 33.019718170166016
Step: 9000/10000, Loss: 33.019718170166016


Compute the final MSE for your trained model:

In [10]:
final_mse= torch.nn.MSELoss()(preds,mpg).item()
final_mse

33.019718170166016

For classification problems, here are the only changes:
1. If you are predicting $n$ classes, the final layer should have $n$ neurons.
2. For training, use Categorical Cross Entropy loss (torch.nn.CrossEntropyLoss) instead of MSE. (No need to one-hot encode target variable)
3. If you want to generate predictions of your model (to guage accuracy, for example), use: `torch.argmax(network(X), dim=1)`
4. If you want to see the probability that your model predicts each class, use: `torch.softmax(network(X), dim=1)`.

With these changes in mind, we'll revisit the iris dataset:

In [11]:
from sklearn.datasets import load_iris
iris=load_iris()

X=torch.tensor(iris.data, dtype=torch.float32)
y=torch.tensor(iris.target, dtype=torch.long)

Create a neural network appropriate for predicting `y` from `X`, with one hidden layer that has 10 neurons, and ReLU layers before and after that:

In [20]:
iris_net= Sequential(
    Linear(4,10),
    ReLU(),
    Linear(10,10),
    ReLU(),
    Linear(10,3) # since there are 3 possible classes
)


Train your neural network for 10000 steps, with a learning rate of 0.01:

In [21]:
lr = 0.01
for i in range(10000):
    iris_net.zero_grad()
    preds = iris_net(X)
    CCEloss = torch.nn.CrossEntropyLoss()(preds,y)

    CCEloss.backward()

    for param in iris_net.parameters():
        param.data -= lr*param.grad

    if i%1000==0:
        print(f"Step: {i}/10000, Loss: {CCEloss.item()}") #Periodic reporting to track progress

Step: 0/10000, Loss: 1.168735146522522
Step: 1000/10000, Loss: 0.14213278889656067
Step: 2000/10000, Loss: 0.08066406846046448
Step: 3000/10000, Loss: 0.06787962466478348
Step: 4000/10000, Loss: 0.062085557729005814
Step: 5000/10000, Loss: 0.05871378630399704
Step: 6000/10000, Loss: 0.05643817409873009
Step: 7000/10000, Loss: 0.05473622679710388
Step: 8000/10000, Loss: 0.053368695080280304
Step: 9000/10000, Loss: 0.05221503600478172


What is the probability that your model assigns to flower 133 being in class 1? class 2?

In [22]:
probs = torch.softmax(iris_net(X), dim=1)

In [23]:
probs[133]

tensor([3.2060e-07, 5.9528e-01, 4.0472e-01], grad_fn=<SelectBackward0>)

In [24]:
preds = torch.argmax(probs,axis=1)
preds[preds == 1].shape[0]/preds.shape[0]

0.32666666666666666

In [25]:
# class1_prob= preds[preds == 1].shape[0]/preds.shape[0]
# class2_prob=preds[preds == 2].shape[0]/preds.shape[0]
# class1_prob,class2_prob


class1_prob= probs[133,1]
class2_prob= probs[133,2]
class1_prob,class2_prob

(tensor(0.5953, grad_fn=<SelectBackward0>),
 tensor(0.4047, grad_fn=<SelectBackward0>))

Create a vector of predictions for your model:

In [26]:
predictions= preds
predictions

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2])

Compute the accuracy of your model:

In [27]:
accuracy= (preds ==y).sum()/150
accuracy

tensor(0.9800)