# Group Members : Mansi Sharma, Antra Sinha, Nikhil Nair

# MIS 285N: Homework 1

Submit:

A pdf of your notebook with solutions.
A link to your colab notebook or also upload your .ipynb if not working on colab.

# Goals of this Lab

**Fully Connected Models and XOR**


1. How to create data objects that pytorch can use
2. How to create a dataloader
3. How to define a basic fully connected single layer model
4. How to define a multi-layer fully connected model
5. How to add non-linear activation functions.
6. How to add layers in two different ways

We also see the importance of nonlinear activation functions directly, by experimenting with the simple 4-data-point XOR example that we saw in class.


In [1]:
import torch
import numpy as np
import time
from tqdm.notebook import tqdm

# First we define a linear regressor.
This is the same as a fully connected layer. It will be a building block in making deeper neural networks with fully connected layers.

In [2]:
# We define our first class: LinearRegressor
#
class LinearRegressor(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        """
        Define the layer(s) needed for the linear model.
        """
        super().__init__()
        self.linear = torch.nn.Linear(input_dim, output_dim, bias = True) # just linear

    def forward(self, x):
        """
        Calculate the regression score (MSE).

        Input:
            x (float tensor N x d): input rows
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.linear(x)
        return torch.flatten(x)


    # defining a separate predict function is useful for multi-class
    # classification as we will see later. Here it is
    # unnecessary.

    def predict(self, x):
        """
        Predict the regression label of the input vector.

        Input:
            x (float tensor N X d): input images
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.linear(x)
        return torch.flatten(x)




## Problem 1:

Now you will use torch.nn.Sequential (see https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) to construct a two layer neural network, with two fully connected layers (no non-linearity yet). Thus, you will combine torch.nn.Sequential with torch.nn.Linear that you saw above.


Design your network so that the first layer has as many neurons as the input.

Note: you have only one line to fill in here.

In [3]:
class TwoLayerLinearRegressor(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        """
        Define a model that stacks two linear fully connected layers.
        """
        super().__init__()
        self.TLL = torch.nn.Sequential(
            torch.nn.Linear(input_dim, input_dim),  # First fully connected layer with input_dim neurons
            torch.nn.Linear(input_dim, output_dim)  # Second fully connected layer with output_dim neurons
        )

    def forward(self, x):
        """
        Calculate the regression score (MSE).

        Input:
            x (float tensor N x d): input rows
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.TLL(x)
        return torch.flatten(x)

    def predict(self, x):
        """
        Predict the regression label of the input vector.

        Input:
            x (float tensor N X d): input images
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.TLL(x)
        return torch.flatten(x)






## Problem 2

Now you will create the same network, but using different syntax: you will not use torch.nn.Sequential. You need to fill in the two lines as noted by the comments.

In [4]:
class TwoLayerLinearRegressor2(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        """
        Define a model that stacks two linear fully connected layers.
        """
        super().__init__()
        self.fc1 = torch.nn.Linear(input_dim, input_dim)  # First fully connected layer with input_dim neurons
        self.fc2 = torch.nn.Linear(input_dim, output_dim)  # Second fully connected layer with output_dim neurons


    def forward(self, x):
        """
        Calculate the regression score (MSE).

        Input:
            x (float tensor N x d): input rows
        Output:
            y (float tensor N x 1): regression output
        """

        x = self.fc1(x)
        x = self.fc2(x)
        return torch.flatten(x)

    def predict(self, x):
        """
        Predict the regression label of the input vector.

        Input:
            x (float tensor N X d): input images
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.fc1(x)
        x = self.fc2(x)
        return torch.flatten(x)


## Problem 3

Now you will define a 2 layer neural network with ReLU activation at the first layer. In other words:

Let $x$ be the input.
Then writing $z = Wx + c$, $h=$ReLU$(z)$ is the first layer's neurons. Then the output is $y = w\cdot z+d$.

Create this neural network using the torch.nn.Sequential command. Conceptually, it may help to realize that this neural network is: a fully connected layer followed by a ReLU, followed by a fully connected layer.

Also see: https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

Note: You have only one line to fill in here.

In [5]:
class TwoLayerNonLinearRegressor(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        """
        Define a model that has a linear layer, a ReLU layer and another linear layer.
        """
        super().__init__()
        self.linear = torch.nn.Sequential(
            torch.nn.Linear(input_dim, input_dim),  # Linear layer with input_dim neurons
            torch.nn.ReLU(),  # ReLU activation function
            torch.nn.Linear(input_dim, output_dim)  # Linear layer with output_dim neurons
        )

    def forward(self, x):
        """
        Calculate the regression score (MSE).

        Input:
            x (float tensor N x d): input rows
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.linear(x)
        return torch.flatten(x)

    def predict(self, x):
        """
        Predict the regression label of the input vector.

        Input:
            x (float tensor N X d): input images
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.linear(x)
        return torch.flatten(x)



## Problem 4

Do this one more time, but now without torch.nn.Sequential.

You have three lines to fill in here.

In [6]:
# We now do this again, without using nn.sequential
# in order to illustrate different syntax.

class TwoLayerNonLinearRegressor2(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        """
        Define a model that has a linear layer, a ReLU layer and another linear layer.
        """
        super().__init__()
        self.fc1 = torch.nn.Linear(input_dim, input_dim)  # Linear layer with input_dim neurons
        self.relu = torch.nn.ReLU()  # ReLU activation function
        self.fc2 = torch.nn.Linear(input_dim, output_dim)  # Linear layer with output_dim neurons



    def forward(self, x):
        """
        Calculate the regression score (MSE).

        Input:
            x (float tensor N x d): input rows
        Output:
            y (float tensor N x 1): regression output
        """

        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return torch.flatten(x)

    def predict(self, x):
        """
        Predict the regression label of the input vector.

        Input:
            x (float tensor N X d): input images
        Output:
            y (float tensor N x 1): regression output
        """
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return torch.flatten(x)

## Problem 5 (Nothing to turn in)

Read the documentation https://pytorch.org/docs/stable/optim.html to see what are the options pytorch provides for an optimizer, and what the parameters are.

In [7]:
# Now we define a function for training
# Note each of the arguments that it takes

def train(model, data_train, data_val, device, lr=0.01, epochs=5000):
    """
    Train the model.

    Input:
      model (torch.nn.Module): the model to train
      data_train (torch.utils.data.Dataloader): yields batches of data
      data_val (torch.utils.data.Dataloader): use this to validate your model
      device (torch.device): which device to use to perform computation

      (optional) lr: learning rate hyperparameter
      (optional) epochs: number of passes over dataloader
    """

    # Setup the loss function to use: mean squared error
    loss_function = torch.nn.MSELoss(reduction = 'sum')

    # Setup the optimizer -- just generic ADAM
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    # Wrap in a progress bar.
    for epoch in tqdm(range(epochs)):
        # Set the model to training mode.
        model.train()

        for x, y in data_train:
            x = x.to(device)
            y = y.to(device)

            # Forward pass through the network
            output = model(x)

            # Compute loss
            loss = loss_function(output, y)

            # update model weights.
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        # Set the model to eval mode and compute accuracy.
        model.eval()

        accuracys_val = list()

        for x, y in data_val:
            x = x.to(device)
            y = y.to(device)

            y_pred = model.predict(x)

In [8]:
# We write a function that takes a model, evaluate on the validation
# data set and returns the predictions

def evaluate_model(model,data_val,device):
  model.eval()
  output_vals = list()
  accuracys_val = list()
  for x, y in data_val:
            x = x.to(device)
            y = y.to(device)

            y_pred = model.predict(x)
            output_vals.append(y_pred)
            # accuracy_val = (y_pred == y).float().mean().item()
            # accuracys_val.append(accuracy_val)

  # accuracy = torch.FloatTensor(accuracys_val).mean().item()
  return output_vals

## Problem 6 (Nothing to turn in)

Read the documentation and try to understand what a dataloader is. You can start here https://pytorch.org/docs/stable/data.html but there are many tutorials out there as well.

In [9]:
# Creating the data: Linear Regression on Linear Data
from torch.utils.data import TensorDataset, DataLoader
N = 15
X = np.random.randn(N,3)
beta = np.array([1,-1,2])
Y = np.dot(X,beta)
tensor_x = torch.Tensor(X) # transform to torch tensor
tensor_y = torch.Tensor(Y)
print('These are the labels:\n',Y)
print('These are the features:\n',X)

m = 1 # Batch size
data = TensorDataset(tensor_x,tensor_y) # create your datset
data_train = DataLoader(data,batch_size = m, shuffle = True) # create your dataloader with training data
data_val = DataLoader(data) # create your dataloader with validation data, here same as training

These are the labels:
 [ 0.33695415 -0.8520051  -1.53992622 -2.34035258 -4.02515816 -0.06205053
  0.56452003 -0.34712337 -3.39307138 -1.32522332 -2.48479914  4.17276431
  0.11809609  2.75087378  0.54158598]
These are the features:
 [[ 8.87631989e-01 -1.84989143e+00 -1.20028464e+00]
 [-1.16233390e+00 -7.83080843e-01 -2.36376021e-01]
 [-9.20701634e-01 -3.52530355e-01 -4.85877472e-01]
 [-4.06412538e-01  6.06986915e-01 -6.63476565e-01]
 [-8.76477046e-01 -1.57940154e-01 -1.65331063e+00]
 [-1.02398981e+00 -2.21225978e-02  4.69908342e-01]
 [ 1.40884426e+00 -1.28106141e+00 -1.06269282e+00]
 [-9.51055481e-02  4.56380599e-01  1.02181390e-01]
 [-1.29968978e+00  1.68917037e+00 -2.02105614e-01]
 [-5.26502808e-01  7.66512971e-01 -1.61037688e-02]
 [-1.64079471e+00  1.51266123e-01 -3.46369154e-01]
 [-3.48491078e-01 -2.01793321e+00  1.25166109e+00]
 [-1.50245433e-03  9.64863868e-01  5.42231208e-01]
 [-6.21924249e-01 -1.10320881e+00  1.13479461e+00]
 [-1.50450213e+00 -1.64204175e-01  9.40941965e-01]]


## Now we train and evaluate the linear model.

In [10]:
# Define the model we wish to use, and train it.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = LinearRegressor(3, 1)
model.to(device)

train(model, data_train, data_val, device)

  0%|          | 0/5000 [00:00<?, ?it/s]

## Here is some code for getting the parameters of the model.

In [11]:
# Now let's get the model parameters.
# We can see that we have succeeded in learning beta: [1,-1,2]
for name, param in model.named_parameters():
  print (name, param.data)

# If we wanted to, we could also only print the ones that we update (may be useful for more complex models)
"""
for name, param in model.named_parameters():
    if param.requires_grad:
        print (name, param.data)
"""

linear.weight tensor([[ 0.9881, -0.9941,  2.0202]])
linear.bias tensor([-0.0087])


'\nfor name, param in model.named_parameters():\n    if param.requires_grad:\n        print (name, param.data)\n'

In [12]:
# Now let's move to our second model: the two layer linear regressor.
# We again define the model using the class we created.
# Then we train the model, as above.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model2 = TwoLayerLinearRegressor(3, 1)
model2.to(device)

train(model2, data_train, data_val, device)

  0%|          | 0/5000 [00:00<?, ?it/s]

In [13]:
# Let's see how well this model agrees with the training data
output_values = evaluate_model(model2,data_val,device)
print('Ground Truth:\n',Y)
print('Model Output:\n',output_values)

Ground Truth:
 [ 0.33695415 -0.8520051  -1.53992622 -2.34035258 -4.02515816 -0.06205053
  0.56452003 -0.34712337 -3.39307138 -1.32522332 -2.48479914  4.17276431
  0.11809609  2.75087378  0.54158598]
Model Output:
 [tensor([0.3370], grad_fn=<ViewBackward0>), tensor([-0.8520], grad_fn=<ViewBackward0>), tensor([-1.5400], grad_fn=<ViewBackward0>), tensor([-2.3404], grad_fn=<ViewBackward0>), tensor([-4.0252], grad_fn=<ViewBackward0>), tensor([-0.0621], grad_fn=<ViewBackward0>), tensor([0.5646], grad_fn=<ViewBackward0>), tensor([-0.3471], grad_fn=<ViewBackward0>), tensor([-3.3932], grad_fn=<ViewBackward0>), tensor([-1.3253], grad_fn=<ViewBackward0>), tensor([-2.4849], grad_fn=<ViewBackward0>), tensor([4.1729], grad_fn=<ViewBackward0>), tensor([0.1181], grad_fn=<ViewBackward0>), tensor([2.7509], grad_fn=<ViewBackward0>), tensor([0.5416], grad_fn=<ViewBackward0>)]


# The XOR Data Set
We see that linear layers do not suffice.

In [14]:
"""
Here we create the simple XOR data set an a numpy array.
Then we make X and Y into tensor objects that torch uses,
and we package it into a Dataset object called data.
Then we create a DataLoader.
"""

Xxor = np.array([[0,0],[0,1],[1,0],[1,1]])
Yxor = np.array([0,1,1,0])
tensor_xxor = torch.Tensor(Xxor) # transform to torch tensor
tensor_yxor = torch.Tensor(Yxor)
print('These are the labels:\n',Yxor)
print('These are the features:\n',Xxor)

dataxor = TensorDataset(tensor_xxor,tensor_yxor) # create your datset
dataxor_train = DataLoader(dataxor) # create your dataloader with training data
dataxor_val = DataLoader(dataxor) # create your dataloader with validation data, here same as training

These are the labels:
 [0 1 1 0]
These are the features:
 [[0 0]
 [0 1]
 [1 0]
 [1 1]]


## Problem 7

Train your linear regressor on these data. Now see how well you do, by evaluating your solution on the training data. Remember the values we got in class.

Your output here should be predicted values for each of the 4 points in our XOR data set.

In [15]:
# Now we train a linear classifier on these data.
# We know (and can verify) that this will fail because no linear classifier can succeed

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# model3 = # TO DO
model3 = LinearRegressor(2, 1)
model3.to(device)

# TO DO -- give a command to train your model
# TO DO -- give a command to evaluate your model
# TO DO -- print the ground truth, and then also print what your model predicts for the 4 points in the training set.
# Train the linear classifier
train(model3, dataxor_train, dataxor_val, device)

# Evaluate the linear classifier on the training data
output_values_classifier = evaluate_model(model3, dataxor_train, device)

print("Linear Classifier Predictions:")
print("Ground Truth:", Yxor)
print("Predictions:", output_values_classifier)

  0%|          | 0/5000 [00:00<?, ?it/s]

Linear Classifier Predictions:
Ground Truth: [0 1 1 0]
Predictions: [tensor([0.5012], grad_fn=<ViewBackward0>), tensor([0.5006], grad_fn=<ViewBackward0>), tensor([0.5005], grad_fn=<ViewBackward0>), tensor([0.4999], grad_fn=<ViewBackward0>)]


## Problem 8

Now repeat this, but using both versions of your non-linear two-layer model. Thus: train both versions of your non-linear two layer models, and evaluate them on the data.  

If you did this correctly, the values you compute should equal (approximately) the values of the XOR function.

In [16]:


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model4 = TwoLayerNonLinearRegressor(2, 1)
model4.to(device)

train(model4, dataxor_train, dataxor_val, device)

# Evaluate the non linear classifier on the training data
output_values_classifier_4 = evaluate_model(model4, dataxor_train, device)

print("Linear Classifier Predictions:")
print("Ground Truth:", Yxor)
print("Predictions:", output_values_classifier_4)

  0%|          | 0/5000 [00:00<?, ?it/s]

Linear Classifier Predictions:
Ground Truth: [0 1 1 0]
Predictions: [tensor([0.6675], grad_fn=<ViewBackward0>), tensor([0.6675], grad_fn=<ViewBackward0>), tensor([0.6675], grad_fn=<ViewBackward0>), tensor([0.0006], grad_fn=<ViewBackward0>)]


In [17]:


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model5 = TwoLayerNonLinearRegressor2(2, 1)
model5.to(device)

train(model5, dataxor_train, dataxor_val, device)

# Evaluate the non linear classifier on the training data
output_values_classifier_5 = evaluate_model(model5, dataxor_train, device)

print("Linear Classifier Predictions:")
print("Ground Truth:", Yxor)
print("Predictions:", output_values_classifier_5)

  0%|          | 0/5000 [00:00<?, ?it/s]

Linear Classifier Predictions:
Ground Truth: [0 1 1 0]
Predictions: [tensor([0.6675], grad_fn=<ViewBackward0>), tensor([0.6675], grad_fn=<ViewBackward0>), tensor([0.6675], grad_fn=<ViewBackward0>), tensor([0.0006], grad_fn=<ViewBackward0>)]


## Problem 9

Print the parameters of one of your non-linear models. Thus, you should print: 4 weights plus 2 bias values for the first layer, and then 2 weights plus 1 bias value for the second: 9 parameters in total.

In [18]:
# Print the parameters of the non-linear model with nn.Sequential
print("Non-Linear Model with nn.Sequential Parameters:")
for name, param in model4.named_parameters():
    print(name, param.data)


Non-Linear Model with nn.Sequential Parameters:
linear.0.weight tensor([[ 0.6549,  0.6381],
        [ 0.1486, -0.3962]])
linear.0.bias tensor([-0.7620, -0.2441])
linear.2.weight tensor([[-1.2559, -0.4097]])
linear.2.bias tensor([0.6675])


In [19]:
# Print the parameters of the non-linear model with nn.Sequential
print("Non-Linear Model with nn.Sequential Parameters:")
for name, param in model5.named_parameters():
    print(name, param.data)


Non-Linear Model with nn.Sequential Parameters:
fc1.weight tensor([[-0.6871,  0.1491],
        [ 0.7178,  0.6934]])
fc1.bias tensor([-0.4360, -0.7681])
fc2.weight tensor([[ 0.1822, -1.0369]])
fc2.bias tensor([0.6675])
