# Neural Networks with PyTorch - Exercises

In [2]:
import torch

### Tensors

PyTorch tensors are multidimensional data containers, conceptually identical to numpy arrays
(in many cases, identical functions exist in both frameworks for the respective data arrays).

Use the [PyTorch Documentation](https://pytorch.org/docs/stable/tensors.html) to initialize tensors:
* 1-D Tensor of size 5 containing the natural numbers 1,2,3,4,5
* 2-D Tensor of size [3,3] containing random float numbers
* 3-D Tensor of size [1, 1, 1] containing data of type unsigned integer (torch.uint8)

In [4]:
### YOUR SOLUTION HERE
d1 = torch.tensor([1,2,3,4,5])
d2 = torch.randn(3,3)
d3 = torch.randint(0, 256, [1, 1, 1], dtype=torch.uint8)
print(d3)

tensor([[[4]]], dtype=torch.uint8)


What happens if you sum (multiply) the previously created tensors of shape (5) and (1,1,1)?
Try it and assign the results to the variable `` tensor_sum`` (``tensor_product``).

In [5]:
### YOUR SOLUTION HERE
tensor_sum = d1+d3
tensor_product = d1*d3
### END OF SOLUTION

Have a look at these basic [tensor operations](https://pytorch.org/docs/stable/torch.html#indexing-slicing-joining-mutating-ops).

* Create a tensor of shape (10) by concatenating two versions of the previously created 5-entry tensor. 
* Create a tensor of shape (9) from the previously created random tensor of shape (3, 3).
* Create the transposed version of your random tensor of shape (3, 3).

In [7]:
### YOUR SOLUTION HERE
t10 = torch.cat((d1,d1))
t9 = torch.flatten(d2)
t2_t = torch.transpose(d2, 0, 1)
print(t2_t)
### END OF SOLUTION

tensor([[ 2.4403, -2.2917,  2.2799],
        [-0.9781,  1.9756, -0.0719],
        [-1.4433, -0.1874, -0.4242]])


### Computing a neuron with pyTorch

Performing the computations for a single Neuron with pyTorch and tensors is easy!
The required weighted sum can be computed by using built-in vector products,
and different activation functions are also readily available.

In the following, define a tensor ``weights`` of shape (5) which contains the weights of the neuron to be computed.
Further, instantiate the class [``torch.nn.Sigmoid()``](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html#sigmoid) in a variable ``activation``,
which will be used to compute the neuron's activation value.

Create a tensor variable ``input = torch.ones((5,1))`` incorporating the input values to the neuron.

Then, compute the neuron's output according to the lecture. 

In [8]:
weights = torch.randn((5))
activation = torch.nn.Sigmoid()

input = torch.ones((5))

### YOUR SOLUTION HERE
# Compute Neuron output (weighted sum and sigmoid activation)
output = activation(weights*input)
### END OF SOLUTION

print(output)

tensor([0.7278, 0.2782, 0.6759, 0.5987, 0.3448])


### Neural Network Components

However, computations like these do not need to be implemented manually in most cases.
PyTorch already contains classes and functions for most basic neural network layer types and components.

In the following, we will implement a simple neural network.
The network operates on input data of size 13.
There is one hidden layer within the network consisting of 20 neurons (fully-connected).
The network's output layer contains 3 neurons.

Use the class [``torch.nn.Linear``](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) to create two fully-connected neural network layers.
The class' constructor takes the number of input units and the number of output units as arguments.
Further, the argument ``bias=True`` activates the bias for all neurons in the layer.
The layer's weights are randomly initialized (using uniform Xavier initialization).

For each of the layers, create an activation function.
The first layer uses a Sigmoid activation function, 
for the second layer, employ an instance of the class [``torch.nn.Softmax``](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html#softmax) as activation function.

Last, define a function ``neural_net(input_tensor)`` which computes the neural network's output values for the given input tensor of shape (*, 13).
(Note, that 2-dimensional input tensors are required as pyTorch's neural networks are designed to compute outputs for multiple samples (a whole batch) at once.) 
Using the function, compute the neural network's output for the input tensor ``input = torch.ones(1,3)``.

In [10]:
# Initialize fully-connected layers and activation functions according to specifications above (dimensions, function types, etc.)
### YOUR SOLUTION HERE
fc1 = torch.nn.Linear(in_features=13, out_features=20, bias = True)
fc2 = torch.nn.Linear(in_features=20, out_features=3, bias=True)
softmax = torch.nn.Softmax()
### END OF SOLUTION

def neural_net(input_tensor):
    # Compute neural network output (pass input through both layers and apply activation functions as defined above)
    ### YOUR SOLUTION HERE
    output = fc1(input_tensor)
    output = fc2(output)
    output = softmax(output)
    return output
    ### END OF SOLUTION

input = torch.ones(1,13)
neural_net(input)

  return self._call_impl(*args, **kwargs)


tensor([[0.3178, 0.4302, 0.2520]], grad_fn=<SoftmaxBackward0>)

### Simple Neural Network with pyTorch

However, to be able to fully utilize the previously defined neural network inside pyTorch,
we need to instantiate it inside a class derived from the base class ``torch.nn.Module``.

Such a custom model class must implement the function ``forward(self, x)`` which computes the model's forward pass for an input tensor x.

In the following, implement the simple neural network described above inside the class ``MyNeuralNetwork``.

In [31]:
class MyNeuralNetwork(torch.nn.Module):
    
    def __init__(self):
        super(MyNeuralNetwork, self).__init__()
        ### YOUR SOLUTION HERE
        self.fc1 = torch.nn.Linear(in_features=5, out_features=60, bias = True)
        self.activation1 = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(in_features=60, out_features=3, bias=True)
        self.activation2 = torch.nn.Softmax(1)
        ### END OF SOLUTION
        
    def forward(self, x):
        ### YOUR SOLUTION HERE
        x = self.fc1(x)
        x = self.activation1(x)
        x = self.fc2(x)
        x = self.activation2(x)
        return x
        ### END OF SOLUTION

Now, we want to test the forward pass of the implemented neural network with a dummy input tensor.

Generally, inputs are processed batch-wise by pyTorch Modules. Thus, the network expects an 2-dimensional tensor as input: the first dimensions corresponds to the batch size, and the second dimension contains the features for a single sample.

Instantiate a tensor containing two samples (batch size 2) with all feature values equal to ``1``.
Pass the tensor with the two samples through your neural network.

What shape has the output tensor? How can the dimensions and entries be interpreted?

In [33]:
neural_net_instance = MyNeuralNetwork()
### YOUR SOLUTION HERE
input = torch.ones(2,5)
neural_net_instance(input)
### END OF SOLUTION

tensor([[0.3784, 0.3327, 0.2889],
        [0.3784, 0.3327, 0.2889]], grad_fn=<SoftmaxBackward0>)

### Interpreting Output Shapes

When passing a batch of input samples through the neural network, the output tensor has the shape `(batch_size, num_classes)`. 

- In this example, the input tensor has shape `(2, 5)`, meaning there are 2 samples, each with 5 features.
- The output tensor will have shape `(2, 3)`, corresponding to 2 samples and 3 output values (one for each class).

Each row in the output tensor contains the predicted class probabilities (after the softmax activation) for a single input sample. The highest value in each row indicates the model's predicted class for that sample.

**Example:**
If the output is:

```python
tensor([[0.1, 0.7, 0.2], [0.3, 0.3, 0.4]])
```

- The first sample is most likely class 1 (0-based indexing).
- The second sample is most likely class 2.

This shape convention is standard for classification tasks in PyTorch.

### Model Parameter Counting

The pyTorch class ``Module`` provides a number of very useful functionalities.

For instance, if set up correctly, the trainable network parameters of the model can be accessed by the functions ``parameters()`` or ``named_parameters()``.
In the following, the latter function is used to access all parameters names and values.
Complete the code to compute and output the number of parameters for each named parameter, as well as the total number of parameters in your model.

In [34]:
sum_of_parameters = 0
for par_name, par_values in neural_net_instance.named_parameters():
    print("Named parameter: {}".format(par_name))
    ### YOUR SOLUTION HERE
    print("Number of parameters: " +  str(par_values.numel()))
    sum_of_parameters += par_values.numel()
    ### END OF SOLUTION
print("Total number of parameters: {}".format(sum_of_parameters))

Named parameter: fc1.weight
Number of parameters: 300
Named parameter: fc1.bias
Number of parameters: 60
Named parameter: fc2.weight
Number of parameters: 180
Named parameter: fc2.bias
Number of parameters: 3
Total number of parameters: 543


### Loss functions

Like many important machine learning essentials, standard loss functions are already implemented in PyTorch.

In the lecture, we have seen the frequently used Cross-Entropy-Loss. This loss is incorporated in the PyTorch-class ``torch.nn.CrossEntropyLoss``.
The class can be initialized without arguments.
To compute the loss, the class instance can be called taking the probabilities for each class as first, and the true class labels (data type ``long``) as second argument.

Using the PyTorch-implementation, compute the cross entropy loss for a single sample in which class probabilites of ``[0.1, 0.8, 0.1]`` were predicted and the true class label is ``1``.

In [40]:
### YOUR SOLUTION HERE
loss = torch.nn.CrossEntropyLoss()
pred = torch.tensor([0.1,0.8,0.1])
true_value = torch.tensor([1]).long()
loss(pred, true_value)
### END OF SOLUTION

tensor(0.6897)

Which inputs are required to create an extremely low (high) loss value?

Try to create and use inputs which achieve a loss value equal or close to ``0``.

In [47]:
### YOUR SOLUTION HERE
pred = torch.tensor([[0., 50., 0.]])
loss(pred, true_value)
### END OF SOLUTION

tensor(0.)

### Optimizers

Similarly, pyTorch also contains implementations of optimization algorithms like gradient descent.
A broad variety of different [optimizers](https://pytorch.org/docs/stable/optim.html) are implemented.
Nearly all of these optimizers are improved or more sophisticated versions of vanilla gradient descent, which we discussed in the lecture.
We will use stochastic gradient descent, which is implemented in the class ``torch.optim.SGD``. 

Initialize a SGD optimizer, the constructor takes to optimizable parameters as argument (use the function ``parameter()`` of the above implemented model). Further, the named argument ``lr`` can be used to set the (initial) learning rate. Set that value to ``0.01``.

In [54]:
### YOUR SOLUTION HERE
sgd = torch.optim.SGD(neural_net_instance.parameters(), 0.01)
### END OF SOLUTION

Next, we will create an extremely small dummy dataset on which our previously defined model should be able to overfit.

In [55]:
n_samples = 12
dummy_training_data_x = torch.rand([n_samples, 5])
dummy_training_data_y = torch.Tensor([i%3 for i in range(n_samples)]).long()

First, we want to implement a function ``compute_accuracy()`` which computes the model's accuracy.

To compute the accuracy, complete the following steps:

* Run data through model to get class probabilities
* Find class prediction by picking highest probability for each sample
* Compute relative number of samples in which class predictions coincides with annotated class
* Return the computed accuracy

In [64]:
def compute_accuracy(model, data_x, data_y):
    ### YOUR SOLUTION HERE
    output = model(data_x)
    preds = torch.tensor(12)
    i = 0
    for sample in output:
        pred = 0
        best_pred = 0
        j = 0
        for prob in sample:
            if prob > best_pred:
                best_pred = prob
                pred = j
            j += 1
        print(preds, pred)
        preds[i] = pred
    return pred

    ### END OF SOLUTION
print(compute_accuracy(neural_net_instance, dummy_training_data_x, dummy_training_data_y))

tensor(12) 0


IndexError: invalid index of a 0-dim tensor. Use `tensor.item()` in Python or `tensor.item<T>()` in C++ to convert a 0-dim tensor to a number

Before we start to train the model, i.e. to optimize the model parameters, we want to compute the model's accuracy in an untrained state (random weights).

What accuracy value would you expect?

Use the function ``compute_accuracy`` to compute the model accuracy for the dummy training data and print it to the console.

In [None]:
### YOUR SOLUTION HERE
# expected accuracy: 0.33 equivalent to guessing one of three possible classes.
### END OF SOLUTION

To perform a single training step in pyTorch, the following steps need to be performed:
* Reset all gradients by calling the optimizer's method ``zero_grad()``
* Pass training samples through model (call ``Module`` instance)
* Loss computation
* Compute backward pass using the optimizer's method ``backward()``
* Update model parameters using the optimizer's method ``step()``

The following function ``training_step`` performs all of the above to perform one gradient descent step and returns the computed loss value.

In [None]:
def training_step(model: torch.nn.Module, loss: torch.nn.CrossEntropyLoss, optimizer: torch.optim.Optimizer, data_x: torch.Tensor, data_y: torch.Tensor):
    optimizer.zero_grad()
    prediction = model(data_x)
    opt_target = loss(prediction, data_y)
    opt_target.backward()
    optimizer.step()
    return opt_target.detach().numpy()

In the following, use all of the above variables and instances (model, loss, optimizer, dummy data, training function) to train your neural network on the dummy data.

In each training epoch, iterate over the shuffled training samples. 
For each training sample, perform one stochastic gradient descent step.
Sum of the losses for each sample to compute the epoch's overall loss and write it to ``epoch_losses``.
When these steps are performed for all data samples, compute the model's accuracy after the training epoch.

If the accuracy reaches 1.0 before all epochs are performed, training may be stopped.

In [None]:
import numpy as np
indizes = np.arange(dummy_training_data_x.shape[0])
epoch_losses = []

for epoch in range(1000):
    np.random.shuffle(indizes)
    ### YOUR SOLUTION HERE
    ### END OF SOLUTION
    if acc == 1.0:
        break

print("Finished training after {} epochs. Model accuracy: {}".format(epoch+1, acc))

Use the following lines of code to plot your training progress (the evolution of the loss during training).

In [None]:
from matplotlib import pyplot as plt

plt.figure()
plt.plot(epoch_losses)
plt.show()