# Artificial Neural Networks

In this second chapter, we delve deeper into Artificial Neural Networks, learning how to train them with real datasets.

# (1) Activation functions

## Motivation

<img src="image/Screenshot 2021-01-26 171633.png">

```
input_layer = torch.tensor([2., 1.])
weight_1 = torch.tensor([[0.45, 0.32], [-0.12, 0.29]])
hidden_layer = torch.matmul(input_layer, weight_1)
weight_2 = torch.tensor([[0.48, -0.12], [0.64, 0.91]])
output_layer = torch.matmul(hidden_layer, weight_2)
print(output_layer)
```

## Matrix multiplication is a linear tranformation

```
input_layer = torch.tensor([2., 1.])
weight_1 = torch.tensor([[0.45, 0.32], [-0.12, 0.29]])
hidden_layer = torch.matmul(input_layer, weight_1)
weight_2 = torch.tensor([[0.48, -0.12], [0.64, 0.91]])
weight = torch.matmul(weight_1, weight_2)
output_layer = torch.matmul(hidden_layer, weight_2)
print(output_layer)
print(weight)
```

 ## Non linear separable datasets

 <img src="image/Screenshot 2021-01-26 172104.png">

 ## Activation functions

 <img src="image/Screenshot 2021-01-26 172208.png">

 ## ReLU activation function

 ```
 RelU(x) = max(0, x)
 ```

 ```
import torch
relu = nn.ReLU()

tensor_1 = torch.tensor([2., 4.])
print(relu(tensor_1))

tensor_2 = torch.tensor([[2., -4.], [1.2, 0.]])
print(relu(tensor_2))
 ```

# Exercise I: Neural networks

Let us see the differences between neural networks which apply `ReLU` and those which do not apply `ReLU`. We have already initialized the input called `input_layer`, and three sets of weights, called `weight_1`, `weight_2` and `weight_3`.

We are going to convince ourselves that networks with multiple layers which do not contain non-linearity can be expressed as neural networks with one layer.

The network and the shape of layers and weights is shown below.

<img src="image/net-ex.jpg">

### Instructions

- Calculate the first and second hidden layer by multiplying the appropriate inputs with the corresponding weights.
- Calculate and print the results of the output.
- Set `weight_composed_1` to the product of `weight_1` with `weight_2`, then set weight to the product of `weight_composed_1` with `weight_3`.
- Calculate and print the output.


In [None]:
# Calculate the first and second hidden layer
hidden_1 = torch.matmul(input_layer, weight_1)
hidden_2 = torch.matmul(hidden_1, weight_2)

# Calculate the output
print(torch.matmul(hidden_2, weight_3))

# Calculate weight_composed_1 and weight
weight_composed_1 = torch.matmul(weight_1, weight_2)
weight = torch.matmul(weight_composed_1, weight_3)

# Multiply input_layer with weight
print(torch.matmul(input_layer, weight))

# Exercise II: ReLU activation

n this exercise, we have the same settings as the previous exercise. In addition, we have instantiated the `ReLU` activation function called `relu()`.

Now we are going to build a neural network which has non-linearity and by doing so, we are going to convince ourselves that networks with multiple layers and non-linearity functions cannot be expressed as a neural network with one layer.

<img src="image/net-ex.jpg">

- Apply non-linearity on `hidden_1` and `hidden_2`.
- Apply non-linearity in the product of first two weight.
- Multiply the result of the previous step with `weight_3`.
- Multiply `input_layer` with `weight` and print the results.


In [None]:
# Apply non-linearity on hidden_1 and hidden_2
hidden_1_activated = relu(torch.matmul(input_layer, weight_1))
hidden_2_activated = relu(torch.matmul(hidden_1_activated, weight_2))
print(torch.matmul(hidden_2_activated, weight_3))

# Apply non-linearity in the product of first two weights. 
weight_composed_1_activated = relu(torch.matmul(weight_1, weight_2))

# Multiply `weight_composed_1_activated` with `weight_3
weight = torch.matmul(weight_composed_1_activated, weight_3)

# Multiply input_layer with weight
print(torch.matmul(input_layer, weight))

# Exercise III: ReLU activation again

Neural networks don't need to have the same number of units in each layer. Here, you are going to experiment with the `ReLU` activation function again, but this time we are going to have a different number of units in the layers of the neural network. The input layer will still have `4` features, but then the first hidden layer will have `6` units and the output layer will have `2` units.

<img src="image/net-ex2.jpg">

### Instructions

- Instantiate the `ReLU()` activation function as `relu` (the function is part of `nn` module).
- Initialize `weight_1` and `weight_2` with random numbers.
- Multiply the `input_layer` with `weight_1`, storing results in `hidden_1`.
- Apply the `relu` activation function over `hidden_1`, and then multiply the output of it with `weight_2`.


In [None]:
# Instantiate ReLU activation function as relu
relu = nn.ReLU()

# Initialize weight_1 and weight_2 with random numbers
weight_1 = torch.rand(4, 6)
weight_2 = torch.rand(6, 2)

# Multiply input_layer with weight_1
hidden_1 = torch.matmul(input_layer, weight_1)

# Apply ReLU activation function over hidden_1 and multiply with weight_2
hidden_1_activated = relu(hidden_1)
print(torch.matmul(hidden_1_activated, weight_2))

# (2) Loss Function

## Loss function

- Initialize neural networks with random weights
- Do a forward pass
- Calculate loss function (1 number)
- Change the weights based on gradients
- For regression: least squared loss
- For classification: softmax cross-entropy loss
- For more complicated problems (like object detection), more complicated losses

## Soft Cross-Entropy Loss

<img src="image/Screenshot 2021-01-26 204059.png">

## CE loss in PyTorch

```
logits = torch.tensor([[3.2, 5.1, -1.7]])
ground_truth = torch.tensor([0])
criterion = nn.CrossEntropyLoss()

loss = criterion(logits, ground_truth)
print(loss)
```

```
logits = torch.tensor([[10.2, 5.1, -1.7]])
loss = criterion(logits, ground_truth)
print(loss)
```

```
logits = torch.tensor([[-10, 5.1, -1.7]])
loss = criterion(logits, ground_truth)
print(loss)
```

# Exercise IV: Calculating loss function by hand

Let's start the exercises by calculating the loss function by hand. Don't do this exercise in PyTorch, it is important to first do it using only pen and paper (and a calculator).

We have the same example as before but now our object is actually a frog, and the predicted scores are `-1.2` for class `0` (cat), `0.12` for class `1` (car) and `4.8` for class `2` (frog).

What is the result of the softmax cross-entropy loss function?

| **Class** | **Predicted Score** |
| :- | :- |
| Cat | -1.2 |
| Car | 0.12 |
| Frog | 4.8 |

### Possible Answers

- 6.0117
- 4.6917
- 0.0117 (T)
- Score for frog is high, so loss is 0

# Exercise V: Calculating loss function in PyTorch

You are going to code the previous exercise, and make sure that we computed the loss correctly. Predicted scores are `-1.2` for class `0` (cat), `0.12` for class `1` (car) and `4.8` for class `2` (frog). The ground truth is class 2 (frog). Compute the loss function in PyTorch.

| **Class** | **Predicted Score** |
| :- | :- |
| Cat | -1.2 |
| Car | 0.12 |
| Frog | 4.8 |

### Instructions

- Initialize the tensor of scores with numbers `[[-1.2, 0.12, 4.8]]`, and the tensor of ground truth `[2]`.
- Instantiate the cross-entropy loss and call it `criterion`.
- Compute and print the loss.


In [None]:
# Initialize the scores and ground truth
logits = torch.tensor([[-1.2, 0.12, 4.8]])
ground_truth = torch.tensor([2])

# Instantiate cross entropy loss
criterion = nn.CrossEntropyLoss()

# Compute and print the loss
loss = criterion(logits, ground_truth)
print(loss)

# Exercise VI: Loss function of random scores

If the neural network predicts random scores, what would be its loss function? Let's find it out in PyTorch. The neural network is going to have 1000 classes, each having a random score. For ground truth, it will have class 111. Calculate the loss function.

### Instructions

- Import `torch` and `torch.nn as nn`
- Initialize `logits` with a random tensor of shape `(1, 1000)` and `ground_truth` with a tensor containing the number `111`.
- Instantiate the cross-entropy loss in a variable called `criterion`.
- Calculate and print the loss function.


In [None]:
# Import torch and torch.nn
import torch
import torch.nn as nn

# Initialize logits and ground truth
logits = torch.rand(1, 1000)
ground_truth = torch.tensor([111])

# Instantiate cross-entropy loss
criterion = nn.CrossEntropyLoss()

# Calculate and print the loss
loss = criterion(logits, ground_truth)
print(loss)

# (3) Preparing a dataset in PyTorch

## MNIST and CIFAR-10

<img src="image/Screenshot 2021-01-26 210140.png">

## Datasets and Dataloaders

```
import torch
import torchvision
import torch.utils.data
import torchvision.transforms as transforms
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4914, 0.48216, 0.44653), (0.24703, 0.24349, 0.26159))])
```

```
trainset = torchvision.dataset.CIFAR10(root='./data', train=True, download=True, transform=transform)

testset = torchvision.dataset.CIFAR10(root='./data', train=False, download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=4)

testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=4)
```

## Inspecting the dataloader

```
print(testloader.dataset.test_data.shape, trainloader.dataset.train_data.shape)
```

```
print(testloader.batch_size)
```

```
print(trainloader.sampler)
```

# Exercise VII: Preparing MNIST dataset

You are going to prepare dataloaders for `MNIST` training and testing set. As we explained in the lecture, `MNIST` has some differences to `CIFAR-10`, with the main difference being that `MNIST` images are `grayscale` (1 channel based) instead of `RGB` (3 channels).

### Instructions

- Transform the data to torch tensors and normalize it, `mean` is `0.1307` while `std` is `0.3081`.
- Prepare the `trainset` and the `testset`.
- Prepare the dataloaders for training and testing so that only 32 pictures are processed at a time.



In [None]:
# Transform the data to torch tensors and normalize it 
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307), ((0.3081)))])

# Prepare training set and testing set
trainset = torchvision.datasets.MNIST('mnist', train=True, download=True, transform=transform)
testset = torchvision.datasets.MNIST('mnist', train=False, download=True, transform=transform)

# Prepare training loader and testing loader
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=0) 

# Exercise VIII: Inspecting the dataloaders

Now you are going to explore a bit the `dataloaders` you created in the previous exercise. In particular, you will compute the shape of the dataset in addition to the minibatch size.

### Instructions

- Compute the shapes of the `trainset` and `testset`.
- Print the computed values.
- Compute the size of the minibatch for both `trainset` and `testset`.
- Print the minibatch size.


In [None]:
# Compute the shape of the training set and testing set
trainset_shape = trainloader.dataset.train_data.shape
testset_shape = testloader.dataset.test_data.shape

# Print the computed shapes
print(trainset_shape, testset_shape)

# Compute the size of the minibatch for training set and testing set
trainset_batchsize = trainloader.batch_size
testset_batchsize = testloader.batch_size

# Print sizes of the minibatch
print(trainset_batchsize, testset_batchsize)

# (4) Training neural networks

## Recipe for training neural networks

- Prepare the dataloaders
- Build a neural network

Loop over:

- Do a forward pass
- Calculate loss function (1 number)
- Calculate the gradients
- Change the weights based on gradients

## Gradient descent

<img src="image/Screenshot 2021-01-26 154814.png">

## Recap - Dataloaders

```
import torch
import torchvision
import torch.utils.data
import torchvision.transforms as transforms

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4914, 0.48216, 0.44653), (0.24703, 0.24349, 0.26159))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
```

## Neural Networks - Recap

```
import torch
import torch.nn = nn
import torch.nn.functional as F
import torch.nn.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(32 * 32 * 3, 500)
        self.fc2 = nn.Linaer(500, 10)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        return self.fc2(x)
```

## Training the Neural Network

```
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=3e-4)

for epoch in range(10):     # loop over the data set multiple times
    for i, data in enumerate(trainloader, 0):
        # Get the inputs
        inputs, labels = data
        inputs = inputs.view(-1, 32 * 32 * 3)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
```

## Using the net to get predictions

```
correct, total = 0, 0
predictions = []
net.eval()
for i, data in enumerate(testloader, 0):
    inputs, labels = data
    inputs = inputs.view(-1, 32 * 32 * 3)
    outputs = net(inputs)
    _, predicted = torch.max(outputs.data, 1)
    predictions.apppend(outputs)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()

print('The testing set accuracy of the network is: %d %%' % (100 * correct / total))
```

# Exercise IX: Building a neural network - again

You haven't created a neural network since the end of the first chapter, so this is a good time to build one (practice makes perfect). Build a class for a neural network which will be used to train on the `MNIST` dataset. The dataset contains images of shape `(28, 28, 1)`, so you should deduct the size of the input layer. For hidden layer use 200 units, while for output layer use 10 units (1 for each class). For activation function, use `relu` in a functional way (`nn.Functional` is already imported as `F`).

For context, the same net will be trained and used to make predictions in the next two exercises.

### Instructions

- Define the class called `Net` which inherits from `nn.Module`.
- In the `__init__()` method, define the parameters for the two fully connected layers.
- In the `.forward()` method, do the forward step.


In [None]:
# Define the class Net
class Net(nn.Module):
    def __init__(self):    
    	# Define all the parameters of the net
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28 * 1, 200)
        self.fc2 = nn.Linear(200, 10)

    def forward(self, x):   
    	# Do the forward pass
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Exercise X: Training a neural network

Given the fully connected neural network (called `model`) which you built in the previous exercise and a train loader called `train_loader` containing the `MNIST` dataset (which we created for you), you're to train the net in order to predict the classes of digits. You will use the Adam optimizer to optimize the network, and considering that this is a classification problem you are going to use cross entropy as loss function.

### Instructions

- Instantiate the Adam optimizer with learning rate `3e-4` and instantiate Cross-Entropy as loss function.
- Complete a forward pass on the neural network using the input `data`.
- Using backpropagation, compute the gradients of the weights, and then change the weights using the `Adam` optimizer.


In [None]:
# Instantiate the Adam optimizer and Cross-Entropy loss function
model = Net()   
optimizer = optim.Adam(model.parameters(), lr=3e-4)
criterion = nn.CrossEntropyLoss()
  
for batch_idx, data_target in enumerate(train_loader):
    data = data_target[0]
    target = data_target[1]
    data = data.view(-1, 28 * 28 * 1)
    optimizer.zero_grad()

    # Complete a forward pass
    output = model(data)

    # Compute the loss, gradients and change the weights
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()

# Exercise XI: Using the network to make predictions

Now that you have trained the network, use it to make predictions for the data in the testing set. The network is called `model` (same as in the previous exercise), and the loader is called `test_loader`. We have already initialized variables `total` and `correct` to `0`.

### Instructions

- Set the network in testing (eval) mode.
- Put each image into a vector using `inputs.view(-1, number_of_features)` where the number of features should be deducted by multiplying spatial dimensions (shape) of the image.
- Do the forward pass and put the predictions in `output` variable.


In [None]:
# Set the model in eval mode
model.eval()

for i, data in enumerate(test_loader, 0):
    inputs, labels = data
    
    # Put each image into a vector
    inputs = inputs.view(-1, 28 * 28)
    
    # Do the forward pass and get the predictions
    outputs = model(inputs)
    _, outputs = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (outputs == labels).sum().item()
print('The testing set accuracy of the network is: %d %%' % (100 * correct / total))