# Introduction to pyTorch
In this first homework you will familiarize yourself with the basics of differentiable programming with PyTorch. At the end of this assignment you should be able to build and train your own multi-layer perceptron! At the end of the course we expect that you will be comfortable with PyTorch, which you will in turn also use for all the upcoming practical sessions.


## 0. Prerequisite

You will notice that the assignemnt comes in different flavours: sometimes you will be provided with already running code, which comes with appropriate explanations. Other times you will have to write code yourself, when this is the case the cells of the notebook will be marked in red with the following instruction <span style="color:red; font-style: italic">Your code comes below</span>. Sometimes, next to writing code yourself you will also have to motivate why you programmed certain things or guide us quickly through the results that you obtained. If this is required you will see the following green instruction: <span style="color:green; font-style:italic">Your discussion comes below</span>. 

Please note that you will **not** have to handle in any sort of written report by the end of the assignment. We do however expect **the notebook** with the solutions to the exercises.

Now that you have survived the boring part you are ready to start this first assignment!
Our first step is very easy, we just start with importing some appropriate libraries.

In [1]:
import torch
import numpy
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'torch'

## 1. Tensors and basic operations

Tensors are one of the main ingredients when it comes to modern Deep-Learning frameworks. Almost all deep learning computations can be expressed as tensor operations which make computation fast and efficient. We will now see how to manipulate tensors within the PyTorch library.

### 1.1 A guided tour

**What is a tensor?** There are many valid answers but to keep it simple you can see them as multidimensional arrays. 

Wikipedia says: *'A tensor may be represented as a (potentially multidimensional) array. Just as a vector in an n-dimensional space is represented by a one-dimensional array with n components with respect to a given basis, any tensor with respect to a basis is represented by a multidimensional array.'*

*In PyTorch, a tensor is an object from the class torch.Tensor (or torch.Tensor).*
This class handles almost any operations you could like to perform on a tensor, see below.

There are many ways to create tensors, here below are two examples (for further information you can check the [library](https://pytorch.org/docs/stable/tensors.html#:~:text=To%20create%20a%20tensor%20with,ops%20(see%20Creation%20Ops) \).

In [None]:
# We can create a tensor from an array:
t1 = torch.tensor([[1, 2], [7, 9]])
# Or we can also create a 2x2 tensor filled with zeros:
t2 = torch.zeros(2, 2)

If you want to check how a tensor looks like you can easily print it:

In [None]:
print(t1)
print(t2)

*As an array may contain different types of data (e.g: string, float, int, boolean), so can a tensor (although it is restricted to numerical types only).*

Let's check what type of data is in t1 and t2:

In [None]:
print(t1.type(), t2.type())

The type `LongTensor` is used to store integer values and `FloatTensor` for real values. You can also convert tensors from one type to another:

In [None]:
t1_real = t1.float()
print(t1, t1.type(), t1_real, t1_real.type())
t2_int = t2.long()
print(t2, t2.type(), t2_int, t2_int.type())

*Another important property of a tensor is its shape, you can use the `.shape` property or `.size()` method to check the dimension of a tensor.* 

Both return a `torch.Size` object that can be manipulated with its own operations. Most of the time you will just use torch.Size objects as a list (e.g. to check the size of a tensor along one of its dimension).

In [None]:
print(t1.shape, t1.size())
# if we want to check for the length of the tensor along its first dimension:
print(t1.shape[0])

**1D (vectors) and 2D (matrices) tensors**

Let us now familiarize ourselves with some simple tensor operations. If you are familiar with the numpy library this will look alike, similarly if you know about matlab this should also remind you of what you can do with this language. If none of the above, you might be reconsidering your life choices!

In [None]:
# Let's create 2 1D tensors of 5 random integers:
v1 = torch.randint(low=-100, high=100, size=[5])
v2 = torch.randint(low=-100, high=100, size=[5])
v1, v2

Let's review some basic operations:

In [None]:
# addition
v_sum = v1 + v2
print(v_sum)

Subtraction (-), element-wise multiplication (*) and element-wise division follow the same logic.

<span style="color:red; font-style:italic">Your code comes below</span>

In [None]:
# Subtract v1 from v2
v_sub = #your code
print(v_sub)

# Multiply the elements of v1 and v2
v_mul = #your code
print(v_mul)

# Divide the elements of v1 by v2
v_div = #your code
print(v_div)

Sometimes you would like to extract a subvector from the tensor, we call this operation *slicing* and it is as simple as this: 

In [None]:
# Slicing: extract sub-Tensor [from:to)
print(v_sum[0:3])

You may also extract specific elements:

In [None]:
# Retrieving first, fourth and fifth elements
print(v_sum[[0, 3, 4]])

Within deep learning, the fun really starts when the dimensionality of the tensors increases. So let's have a look at matrices.
Let's create two of them, again there are many ways of instanciating matrices -> Google is your friend whenever you look for [one of them](https://letmegooglethat.com/?q=Instantiate+matrix+in+pytorch).

In [None]:
# The first matrix will be a 5x 10 matrix full of normally distributed random values:
m1 = torch.randn(5, 10)

# The second is the 10x10 identity matrix:
m2 = torch.eye(10)

m1, m2

**Slicing**

Slicing in matrices (or any tensors) can be performed thanks to `:`, negative numbers count from the end. Feel free to play with this operator and to vizualize the results in order to be sure to understand what is slicing.

In [None]:
m1[:4, 2:-1]

**Squeeze and Unsqueeze**

Squeezing removes one dimension from the vector, on the opposite unsqueeze add a dimension.

In [None]:
print('Before unsqueeze: ', m1.shape)
m1 = m1.unsqueeze(0)
print('After unsqueeze: ', m1.shape)
m1 = m1.squeeze(0)
print('After squeeze: ', m1.shape)

**Expand**

Expand will reproduce the tensor along the expanded dimensions, see the [docs](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand).
For example if we want to create a 4x6 matrix where each column is a vector of integer that goes from 1 to 6 we could do the following:

In [None]:
v = torch.arange(1, 7)
m_v = v.unsqueeze(1).expand(-1, 4)
m_v

**View**

Sometimes a tensor has not the right shape, for example when you manipulated images you sometimes need to process them as vectors. The `.view` operator will create a view that has the new shape, by *view* we mean that nothing has changed in memory but the tensor can now be manipulated as it had this new shape.

Let us imagine that we have 10 256x256 RGB images that we would like to manipulated as vectors.

In [None]:
images = torch.randn(10, 3, 256, 256)
images_as_vectors = images.view(10, -1)
print(images.shape, images_as_vectors.shape)

Both variables model the same values though:

In [None]:
print(torch.norm(images - images_as_vectors.view(10, 3, 256, 256)))

We see below that if we change the values inside one of the view it will modify the tensor (which is common to both views)

In [None]:
images_as_vectors[0, :] = torch.zeros_like(images_as_vectors[0, :])
print(torch.norm(images - images_as_vectors.view(10, 3, 256, 256)))

**Permute**

The permute operation can be used to permuted between dimensions. For example you can transpose a matrix as follows:

In [None]:
m1.permute(1, 0), m1

**Matrix multiplication**

The `@` operator can be used to do matrix multiplication.

In [None]:
m1 @ m2

**Other predefined primitives**

Tensors in PyTorch already implements many useful methods such as `.mean()`, `.std()`, `.sum()` and many others. These functions may take many different arguments however you will often see the word `axis` in the docs. This word explicits on which dimension of the tensor the method must work. As an example we can create a $5 \times 10$ matrix of normally distributed random values and compute the mean value, the mean of each row or of each column.

In [None]:
m_g = torch.randn(5, 10)
m_g

In [None]:
m_g.mean(0) # Computes the mean value along the axis 0, it is to say the mean of each column.

In [None]:
m_g.mean(1) # Computes the mean value along the axis 1, it is to say the mean of each row.

In [None]:
m_g.mean() # Computes the global mean value.

### 1.2 Spot and solve the bug

Assume we would like to compute the mean of the following tensor, but we are not yet familiar enough with PyTorch and therefore encounter the following error.

In [None]:
a = torch.tensor([8, 12, 80, 100])
a.mean()

<span style="color:red; font-style:italic">Fix it with one operation for us, your code comes below.</span>

In [None]:
# Your code

### 1.3 You practice!

Create a tensor with $10^{5}$ i.i.d. normally (mean 5 and standard deviation 2) distributed values.

<span style="color:red; font-style:italic">Your code comes below</span>

In [None]:
# Your code

Compute the mean value of the tensor with a loop:

In [None]:
%%timeit
# Your code 

Compute the mean and the standard deviation with the correct PyTorch operators:

In [None]:
%%timeit
# Your code

What do you observe when you compare the running time of the two code snippets?

<span style="color:green; font-style:italic"> Explain in one sentence below</span>

To get even more familiar with tensor manipulations trace the curve defined by this set of equations


$t \in [0, 2\pi[$

$x = 16 \sin(t)^3$

$y = 13 \cos(t) - 5 \cos(2t) - 2 \cos(3t) - \cos(4t)$

For plotting purposes you can use the matplotlib library and the math library for the values of $\pi$.

In [None]:
from math import pi as pi 
# your code

Now you can send the result to your favourite tinder date

## 2. The autograd package

The autograd package is what really makes PyTorch different from other algebraic language/libraries such as Matlab or Numpy. Whenever you make operations in PyTorch, it will create a computation graph that can later be used to compute derivatives of the output quantities with respect to the input or other intermediate computation steps. 

The official documentation says: *'torch.autograd provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions. It requires minimal changes to the existing code - you only need to declare Tensor s for which gradients should be computed with the requires_grad=True keyword. '*

### 2.1 A guided tour

Let's see how autograd may be used to compute the derivative of an analytical function.

In [None]:
def f(x):
    return x**2 + 3*x - 2*torch.sin(x/5)

In [None]:
def hand_df(x):
    return 2*x + 3 - 2/5*torch.cos(x/5)

In [None]:
x = torch.randn(1, requires_grad=True)
y = f(x)
df = torch.autograd.grad(y, x)

x, y, df, hand_df(x)

You see how autograd is able to compute the gradient of scalar values with respecto to a vector automatically for you. We can exploit this for example to perform least squares curve fitting with a simple parametric model:

In [None]:
def unknown_function(x):
    return (5*x + 3*x**2)*(x - 1)/2 + 3

# generate some data
x = torch.arange(-3, 3, .1)
y = unknown_function(x)

In [None]:
# We define a parametric model
def parametric_function(w, x):
    x = x.unsqueeze(1)
    return torch.cat((torch.ones_like(x), x, x**2, x**3), 1) @ w

# random initalization of the parameters
param = torch.randn(4, requires_grad=True)

Let's take a look at what happens before training

In [None]:
plt.figure()
plt.title('Before Training')
plt.plot(x, y, label='Observed')
plt.plot(x, parametric_function(param, x).detach(), label='Predicted')
plt.legend()

We perform gradient descent on the mean squared error loss function in order to fit the observed curve.
Note how the torch.autograd.grad function gives us automatically access to the gradients which we then use during the optimization step. For more information about autograd you can check this short [tutorial](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) and the [documentation](https://pytorch.org/docs/stable/autograd.html).

In [None]:
lr = .001
for i in range(1000):
    y_pred = parametric_function(param, x)
    mse = ((y - y_pred)**2).mean()
    grad = torch.autograd.grad(mse, param)[0]
    param = param - lr * grad
plt.figure()
plt.title('After Training')
plt.plot(x, y, label='Observed')
plt.plot(x, parametric_function(param, x).detach(), label='Predicted')
plt.legend()

Storing the gradient values into new variables can quickly become cumbersome, fortunately PyTorch developers found a solution to this, PyTorch **accumulates** the gradient values directly into the corresponding tensor (in the 'grad' property). To use this option you have to use the backward function on the quantity of interest and it will compute the gradient of this quantity with respect to each tensor that has the 'requires_grad' flag set to True. Later on in this notebook we will see how this, together with Optimizer objects, makes gradient descent very simple.

In [None]:
x = torch.randn(10)
w = torch.arange(10).float()
w.requires_grad = True
y = x.T @ w # Simple linear function
y.backward()

x, w, w.grad

### 2.2 You practice!

Now we would like to perform the same kind of curve fitting that we did a few cells above, but this time we want to use the backward method instead of torch.autograd.grad(). Remember that PyTorch will keep accumulating the gradient values in the `.grad` property of each tensor. To deal with this you can reset the values of the gradients to 0 with the [`.zero_()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.zero_) method. To make your life easier, you can start from the learning loop two cells above, be careful to reset the parameters before training and to not use the `torch.autograd.grad` function.

<span style="color:red; font-style:italic">Your code comes below</span>

## 3. The `torch.nn` and `torch.optim` packages

You should now be able to define your own neural network and train it with tensorial operations and the autograd package. However this would require you to explicitely define every operation in the neural network and to keep track of all the parameters for performing gradient descent. Fortunately, PyTorch provides the `torch.nn` and `torch.optim` packages which implement everything you need to define and train a neural network efficiently.

In [None]:
# Importing both packages
import torch.nn as nn
import torch.optim as optim

### 3.1 A guided tour

**`nn.Functional`**

The `nn` module implements many predefined functions which should simplify your life for building your own neural networks, the complete list can be accessed [here](https://pytorch.org/docs/stable/nn.functional.html). 

Let's play with some of them, below we see some commonly used functions:

In [None]:
x = torch.arange(-5, 5, .1)

plt.figure(figsize=(10, 3))
y1 = nn.functional.relu(x)
plt.subplot(1, 3, 1)
plt.title('ReLU')
plt.plot(x, y1)

y2 = nn.functional.leaky_relu(x, negative_slope=.2)
plt.subplot(1, 3, 2)
plt.title('Leaky ReLU')
plt.plot(x, y2)

y3 = nn.functional.linear(x.unsqueeze(1), weight=torch.tensor([.5]))
plt.subplot(1, 3, 3)
plt.title('Linear')
plt.plot(x, y3)

**`nn.Module`**


PyTorch provides a very important base class named `nn.Module`. This class is used to build complex neural networks. In fact any class which inherits from it will automatically keep track of the parameters of its components (or properties). To define your own `nn.Module` sub-class you only need to implement the `__init__` constructor and the forward function. Let's see an example:

In [None]:
#This means we create a class named "MySimpleParametricModel" which inherits from nn.Module.
class MySimpleParametricModel(nn.Module): 
    # We could add arguments to the constructor however we do not need that here.
    def __init__(self):
        # We need to call the parent's constructor.
        super(MySimpleParametricModel, self).__init__()
        
        # Here we add a property 'w' to the module which is itself a module that implements a linear layer.
        self.w = nn.Linear(in_features=3, out_features=1, bias=True)
        
    def forward(self, x):
        return self.w(x)

We can now instantiate an object of the `MySimpleParametricModel` class and have a look at its parameters!

In [None]:
model = MySimpleParametricModel()
for param in model.parameters():
    print(param)

You can see that by inheritance from the `nn.Module` class, the parameters of any module used as a property of an object from the class `MySimpleParametricModel` is now accessible as a parameter of this object.
The `nn.Module` implements other useful instructions such as user-friendly printing of the module which summarize the modules contained in it:

In [None]:
print(model)

PyTorch already implements many `nn.modules` that are often used as neural networks components such as convolutions, activations and linear layers.
For example let's imagine we want to build a simple 3 layers MLP:

In [None]:
class MySimpleMLP(nn.Module):
    def __init__(self, in_size, hidden_units, out_size):
        super(MySimpleMLP, self).__init__()
        
        # Let us now define the linear layers we need:
        self.fc1 = nn.Linear(in_size, hidden_units)
        self.fc2 = nn.Linear(hidden_units, hidden_units)
        self.fc3 = nn.Linear(hidden_units, out_size)
        
    # We have also to define what is the forward of this module:
    def forward(self, x):
        h1 = nn.functional.relu(self.fc1(x))
        h2 = nn.functional.relu(self.fc2(h1))
        out = self.fc3(h2)
        return out

In [None]:
# Instantiate a 3 layers MLP (with 10 hidden neurons in each layer) that computes a scalar quantity from a scalar input.
my_net = MySimpleMLP(1, 10, 1)

# We usually give batches of values to nn modules, where the first dimension is the dimension of the batch while 
#the others must represent your data (here a scalar).
x = torch.arange(-2, 2, .1).unsqueeze(1) 

# Detach is used to detach the tensor from its computation graph, it is required to 
#be able to convert the tensor as numpy matrix (which is implicitely made when you plot a tensor).
y = my_net(x).detach() 
plt.plot(x, y)

**`nn.Sequential`**

Writing the forward pass can become inconvenient and dirty if you want to add a multiple number of layers, for that you can use the `nn.Sequential` module which automatically chains modules with each others.

In [None]:
class MyElegantSimpleMLP(nn.Module):
    def __init__(self, in_size, hidden_units, out_size):
        super(MyElegantSimpleMLP, self).__init__()
        
        self.net = nn.Sequential(nn.Linear(in_size, hidden_units), nn.ReLU(),
                                nn.Linear(hidden_units, hidden_units), nn.ReLU(),
                                nn.Linear(hidden_units, out_size))
        
    # We have also to define what is the forward of this module:
    def forward(self, x):
        out = self.net(x)
        return out

In [None]:
# Instantiate a 3 layers MLP (with 30 hidden neurons in each layer) 
# that computes scalar quantity from scalar input.
my_net = MyElegantSimpleMLP(1, 30, 1)

x = torch.arange(-2, 2, .1).unsqueeze(1) 
y = my_net(x).detach() 

plt.plot(x, y)

**Optimizer**

You should now be able to define any kind of neural network you like. However if you want to optimize it you would still need to iterate through all the parameters of the net thanks to the `.parameters()` iterator. In principle you could do that to update the values accordingly to their gradient and to your update rule, however this would require some code and would be prone to bugs. Instead, the `nn.Optimizer` implements classes that handle that for you!
For example let's say you would like to learn the function $y := f(x) = x^2$ with a neural network with mean squared error and stochastic gradient descent:

In [None]:
# We create an object from the class SGD that will make the updates for us.
sgd_optimizer = optim.SGD(params=my_net.parameters(), lr=.001)

# Let's do some learning steps with randomly generated x values:
for i in range(5000):
    x = torch.randn(100, 1)
    y = x**2
    y_pred = my_net(x)
    
    # We have to set all the grad values of the parameters of our net to zero, we can use zero_grad instruction
    sgd_optimizer.zero_grad()
    
    # Let's compute the loss and the gradients with respect to it.
    loss = ((y - y_pred)**2).mean()
    loss.backward()
    
    # And now we update the parameters:
    sgd_optimizer.step()

Let's check the result:

In [None]:
x = torch.arange(-2, 2, .1).unsqueeze(1) 

y = my_net(x).detach() 
plt.plot(x, y, label='Neural network')
plt.plot(x, x**2, label='$x^2$')
plt.legend()

This is not perfect but we can see that the network learned to model something that looks like a quadratic function.

### Training our first neural network classifier

We now have all the necessary knowledge to build and train our very first neural network on a simple binary classification task. To do this we will start by importing a toy example dataset from the sklearn library, and then create and use the resulting splits for training.

In [None]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import numpy as np

X, Y = make_moons(500, noise=0.1) # create artificial data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=73) # create splits

plt.scatter(X_train[:,0], X_train[:,1], c=Y_train) # visualize the data
plt.title('Moon Data')
plt.show()

When we train a neural network in PyTorch, no matter how complex the model is, we always go through a training loop. In this loop we feed the data to the model and get its predictions. We then compare the predictions of the network to the ground truth and adjust the parameters of the model by performing gradient descent. We have already seen all the components that are necessary for going through this process, so the only thing that remains to be done is to put all the pieces of this notebook together and train our **first awesome neural network!**

During the training stage we would like to keep track whether our model will improve over the different iterations. It is therefore good practice to monitor whether the loss we are minimizing decreases over time, and whether the overall performance of the model increases the more training iterations we perform. Remember that one of the key components of Deep Learning are tensors, and we might not always have the data coming in this specific format. It is therefore necessary to convert it if needed.

Once the data is in an appropriate format it can be given to the model, and we can obtain its predictions. This is what we usually call the **forward pass**. Once we obtain our predictions, we can compare how close they are to what we would like the network to predict: to do this we feed our predictions together with the true labels through the loss function which are minimizing. At its early training stages the network will very likely perform poorly: we can improve its performance by adjusting its weights by gradient descent. We can do this very easily by obtaining the gradients of the parameters with respect to the loss function we are minimizing (**backward pass**) and adjusting these weights with the optimizer we defined previously.

Once all of this is done we can measure the performance of our model, which in this case will be reflected by its accuracy in classifying the synthetic dataset we created beforehand. 

In [None]:
net = nn.Sequential(nn.Linear(2, 50), nn.ReLU(),
                   nn.Linear(50, 50), nn.ReLU(),
                   nn.Linear(50, 1), nn.Sigmoid())
optimizer = optim.Adam(net.parameters(), lr=.01)

In [None]:
def loss_func(y_hat, y):
    return nn.BCELoss()(y_hat, y)

In [None]:
train_loss = [] # where we keep track of the loss
train_accuracy = [] # where we keep track of the accuracy of the model
iters = 1000 # number of training iterations

Y_train_t = torch.FloatTensor(Y_train).reshape(-1, 1) # re-arrange the data to an appropriate tensor

for i in range(iters):
    X_train_t = torch.FloatTensor(X_train)
    y_hat = net(X_train_t) # forward pass
    
    loss = loss_func(y_hat, Y_train_t) # compute the loss
    loss.backward() # obtain the gradients with respect to the loss
    optimizer.step() # perform one step of gradient descent
    optimizer.zero_grad() # reset the gradients to 0
    
    y_hat_class = np.where(y_hat.detach().numpy()<0.5, 0, 1) # we assign an appropriate label based on the network's prediction
    accuracy = np.sum(Y_train.reshape(-1,1)==y_hat_class) / len(Y_train) # compute final accuracy
    
    train_accuracy.append(accuracy)
    train_loss.append(loss.item())

In [None]:
plt.figure(figsize=(13, 5))
plt.subplot(1, 2, 1)
plt.title('Training Loss')
plt.plot(train_loss)
plt.xlabel('Iterations')
plt.ylabel('Loss (Binary Cross Entropy)')

plt.subplot(1, 2, 2)
plt.title('Training Accuracy')
plt.plot(train_accuracy)
plt.xlabel('Iterations')
plt.ylabel('Accuracy')

In [None]:
def plot_decision_boundary(X, y, model, steps=1000, cmap='Paired'):
    """
    Function to plot the decision boundary and data points of a model.
    Data points are colored based on their actual label.
    """
    cmap = plt.get_cmap(cmap)

    # Define region of interest by data limits
    xmin, xmax = X[:,0].min() - .5, X[:,0].max() + .5
    ymin, ymax = X[:,1].min() - .5, X[:,1].max() + .5
    steps = 1000
    x_span = np.linspace(xmin, xmax, steps)
    y_span = np.linspace(ymin, ymax, steps)
    xx, yy = np.meshgrid(x_span, y_span)

    # Make predictions across region of interest
    with torch.no_grad():
        labels = model(torch.FloatTensor(np.c_[xx.ravel(), yy.ravel()]))

    # Plot decision boundary in region of interest
    z = labels.reshape(xx.shape)

    fig, ax = plt.subplots()
    ax.contourf(xx, yy, z, cmap=cmap, alpha=0.5)

    # Get predicted labels on training data and plot
    train_labels = model(torch.FloatTensor(X))
    ax.scatter(X[:,0], X[:,1], c=y.ravel(), cmap=cmap, lw=0)

    return fig, ax

plot_decision_boundary(X_train,Y_train, net, cmap = 'RdBu')

### 3.2 Discuss the bug!

A common loss function one can use when dealing with classification problems is the Cross-Entropy loss. However when implementing it in PyTorch a problem occurs in the following code.

What do you think is the cause of this bug? Maybe, first check the [documentation about `torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss).
Briefly discuss your idea and **bonus** can you come up with a solution (this should be feasible with 3 modifications)?

In [None]:
n_input_dim = X_train.shape[1]
n_hidden = 4 
n_output = 1

net = nn.Sequential(
    nn.Linear(n_input_dim, n_hidden),
    nn.ELU(),
    nn.Linear(n_hidden, n_output))

loss_func = nn.CrossEntropyLoss()
learning_rate = 0.01
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

train_loss = [] 
train_accuracy = [] 
iters = 1000 

Y_train_t = torch.FloatTensor(Y_train).reshape(-1, 1) 
for i in range(iters):
    X_train_t = torch.FloatTensor(X_train)
    y_hat = net(X_train_t) 
    loss = loss_func(y_hat, Y_train_t) 
    loss.backward() 
    optimizer.step() 
    optimizer.zero_grad() 

<span style="color:green; font-style:italic"> Your explanation comes below</span>

### 3.3 You practice!

Now try to play around with the neural network yourself: you are free to experiment with whatever you think is best exploring. Your goal will be to come up with three modifications to the network we initially provided you with and to compare the performance of your modified models to the one we have given you beforehand. As potential modifications you could experiment with e.g. changing the depth or width of the model, modify the activation function, learning rate, optimizer, etc.

Report your results with some appropriate plots and with a brief presentation of what you observe.

<span style="color:red; font-style:italic"> Your code comes below</span>

In [None]:
# your code

<span style="color:green; font-style:italic"> Your discussion comes below </span>

### Feedback

We will now ask you few questions to improve the content of this homework for next years.

<span style="color:blue">How long did you spend on this homework?</span>

<span style="color:blue">Did you learn something?</span>

<span style="color:blue">Do you now feel comfortable with writing simple mathematical operations (like you would do in matlab or with numpy) in PyTorch?</span>

<span style="color:blue">Do you now feel comfortable using PyTorch for performing gradient descent?</span>

<span style="color:blue">Do you now feel comfortable with building and learning neural networks with PyTorch?</span>