<h1 style="color:rgb(0,120,170)">Hands-on AI I</h1>
<h2 style="color:rgb(0,120,170)">Unit 4 (Assignment) -- Your first neural networks </h2>

# Exercise 0
Before tackling all those exciting tasks of this notebook, the neccessary Python modules need to be loaded. Have a look at the notebook discussed during the lecture, and import the following modules/symbols:

- <code>u4_utils</code>
- <code>matplotlib.pyplot</code>
- <code>numpy</code>
- <code>torch</code>
- <code>torch.nn</code>

In [71]:
%matplotlib notebook


In [127]:
import u4_utils as u4
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import warnings
warnings.filterwarnings(r'ignore')


Afterwards, check if the <code>torch</code> module was correctly imported, by computing the <i>sum</i> of <code>[7, 2, 3]</code> and printing the result.

In [73]:
torch.sum(torch.as_tensor([7, 2, 3], dtype=torch.float32))


tensor(12.)

# Exercise 1
Normally, machine learning specific tasks start with digging into some <i>data set</i>. This time, we want to emphasize a different approach by focusing on miscellaneous kinds of <i>functions</i> at the beginning. <b>Exercise 1</b> is all about

- <i>convex</i> functions

and how their <i>derivative</i> can be used for optimizing the same. So, your <b>first task</b> of this exercise requires you to perform the following steps:

- Define&emsp;$y = x^{d}_{0} + x^{d}_{1} + \ldots{} + x^{d}_{n}$&emsp;as a <i>Python</i> function.
- Define the corresponding <i>derivative</i> as a <i>Python</i> function.

Note, that both <i>Python</i> functions should accept <i>exactly one</i> mandatory parameter, namely some one dimensional <i>numpy array</i> consisting of real values. Regardless of this requirement, optional parameters are allowed, though (e.g. to specify the corresponding <i>degree</i> of the current function of interest).

In [75]:
def pythonfunc(xt, degree=1):
    plus = 0
    for x in xt:
        plus += x ** degree
    return plus
def deri_pythonfunc(xtn, degree=1):
    return np.array(degree*xtn**(degree-1))


After you have <i>implemented</i> said function as well as the corresponding derivative, we want to visualize both to get more familiar with them as well as to get some <i>feeling</i> for their behaviour. Most often, some kind of visualization vastly supports problem finding processes (often termed as <i>debugging</i>), so keep this always in mind.

- Create two <i>numpy arrays</i> with values in the range of $[-2, 2]$, with a step size of $0.1$ (<i>hint:</i> look at <code>arange</code> supplied by <i>numpy</i>).
- Visualize the <i>convex</i> function as well as its <i>derivative</i> in $(1.2\ \ 1.5)$.

In [168]:
x0 = np.array([1.2, 1.5])
X = np.arange(-2, 2, 0.1)
Y = np.arange(-2, 2, 0.1)


In [169]:
u4.plot_function(x0, X, Y, pythonfunc, deri_pythonfunc)


<IPython.core.display.Javascript object>

As the <b>second</b> and <b>last task</b> of this exercise, we want to know the <i>exact</i> value of the <i>derivative</i> of some <i>result</i> of the convex function with respect to its <i>input</i>. For this to happen, the following steps are required:

- Transform the list $[1.2, 1.5]$ to a <i>numpy array</i> of type <i>float32</i>.
- Compute the <i>result</i> of the <i>convex</i> function applied to said newly created <i>input</i>
- Compute the <i>derivative</i> of the <i>result</i> with respect to the input.

Print the <i>result</i> as well as all <i>intermediate</i> values to the standard output.

In [78]:
a_list = [1.2, 1.5]
an_array =np.asarray(a_list, dtype=np.float32)


In [79]:
pythonfunc(an_array)


2.700000047683716

In [80]:
deri_pythonfunc(an_array)


array([1., 1.], dtype=float32)

# Exercise 2


This exercise is quite similar to the <i>previous</i> one, with a difference in the type of functions to be analyzed. <b>Exercise 2</b> is all about

- <i>non-convex</i> functions

and how their <i>derivative</i> can be used for optimizing the same. So, your <b>first task</b> of this exercise requires you to perform the following steps:

- Define&emsp;$y = \tanh\left(x^{d}_{0} + x^{d}_{1} + \ldots{} + x^{d}_{n}\right)$&emsp;as a <i>Python</i> function.
- Define the corresponding <i>derivative</i> as a <i>Python</i> function.

Note, that both <i>Python</i> functions should accept <i>exactly one</i> mandatory parameter, namely some one dimensional <i>numpy array</i> consisting of real values. Regardless of this requirement, optional parameters are allowed, though (e.g. to specify the corresponding <i>degree</i> of the current function of interest).

In [81]:
def noncon_func(xt, degree=1):
    plus = 0
    for x in xt:
        plus += x ** degree
    return np.tanh(plus)
 

In [82]:
def deri_noncon_func(xt, degree=1):
    plus = 0
    plus_two = 0
    for x in xt:
        plus += x ** degree
        plus_two += degree * x ** (degree-1)
 
    return np.repeat(1*plus_two/(np.cosh(plus)*2), len(xt))


After you have <i>implemented</i> said function as well as the corresponding derivative, we want to visualize both to get more familiar with them as well as to get some <i>feeling</i> for their behaviour.

- Create two <i>numpy arrays</i> with values in the range of $[-2, 2]$, with a step size of $0.1$ (<i>hint:</i> look at <code>arange</code> supplied by <i>numpy</i>).
- Visualize the <i>non-convex</i> function as well as its <i>derivative</i> in $(0.9\ \ 0.9)$.

The input of the <i>non-convex</i> function is in the same range as the input of the <i>convex</i> one. Nonetheless, their result (and so does their visualization) might differ. Do you notice any major <i>differences</i>? If you do, briefly describe them, otherwise leave a short notice.

In [170]:
x0 = np.array([0.9, 0.9])
X = np.arange(-2, 2, 0.1)
Y = np.arange(-2, 2, 0.1)


In [171]:
u4.plot_function(x0, X, Y, noncon_func, deri_noncon_func)


<IPython.core.display.Javascript object>

Similar to the <i>last tasl</i> of the <i> previous</i> exercise, the <b>second</b> and <b>last task</b> of this one requires you tocompute the <i>exact</i> value of the <i>derivative</i> of some <i>result</i> of the non-convex function with respect to its <i>input</i>. For this to happen, the following steps are necessary:

- Transform the list $[0.9, 0.9]$ to a <i>numpy array</i> of type <i>float32</i>.
- Compute the <i>result</i> of the <i>non-convex</i> function applied to said newly created <i>input</i>
- Compute the <i>derivative</i> of the <i>result</i> with respect to the input.

Print the <i>result</i> as well as all <i>intermediate</i> values to the standard output.

In [85]:
a_list = [0.9, 0.9]
an_array =np.asarray(a_list, dtype=np.float32)


In [86]:
noncon_func(an_array)


0.94680600790822

In [87]:
deri_noncon_func(an_array)


array([0.32180488, 0.32180488])

# Exercise 3

As you are now an expert in <i>convex</i> and <i>non-convex</i> functions, you would for sure happily get your hands dirty by applying your knowledge to some data set. In this exercise you will be working with one composed of various <i>images</i> of fashion items. For curious minds, more information regarding this data set can be found at:

<cite>Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Han Xiao, Kashif Rasul, Roland Vollgraf. arXiv:1708.07747</cite>

For the <b>first task</b> of this exercise you are required to perform the following steps:

- Set the <i>random seed</i> to $s = 42$ using the <i>PyTorch</i> interface.
- Load the <i>Fashion-MNIST</i> data set (returns the <i>training</i> as well as the <i>test</i> set).
- Display the first <i>eight</i> images of the <i>Fashion-MNIST</i> data set.

Can you identify possible <i>labels</i> of the eight images?

In [88]:
X = torch.rand(42) 


In [89]:
train_loader, test_loader = u4.load_fashion_mnist()


In [158]:
u4.display_FashionMNIST(train_loader, 8)


<IPython.core.display.Javascript object>

Possible labels for pictures:
1. picture: shoe
2. picture: T-shirt
3. picture: dress (here it is not quite clear which label is ordered to the 3. picture)
4. picture: dress
5. picture: dungarees
6. picture: sweater
7. picture: show
8. picture: sweater

In order to define a <i>logistic regression</i> model as well as a <i>dense feedforward neural network</i> for identifying images as visualized above, some minimal knowledge about the <i>structure</i> of the images is required:

- Find out the <i>input dimensionality</i> of the data set.
- Set the output dimensionality to be $d_{out} = 10$

In [91]:
train_shape = train_loader[1][0].shape
input_dim = train_shape[0]*train_shape[1]*train_shape[2]
print("Input dimension: {}.".format(input_dim))


Input dimension: 784.


In [92]:
output_dim = 10


Last time (for <i>assignment 3</i>) you were supplied with an implementation of <i>logistic regression</i> by us. As this would be too simple (and obviosuly no <i>fun</i> at all) for you, the <b>second task</b> of this exercise comprises:

- Implement a <i>Python class</i> <code>LogisticRegression</code> as discussed during the lecture.
- Keep in mind, which <i>activation</i> function a <i>multi-class</i> setting requires.
- Optionally, <i>initialize</i> the parameters of the model in a different way.

In [93]:
class LogisticRegression(nn.Module):
    def __init__(self, input_dim, output_dim): 
        super(LogisticRegression, self).__init__()
        self.n_classes = output_dim
        self.layer = nn.Linear(input_dim, self.n_classes)

    def forward(self, x):
        return self.layer(x)
    

Moreover, define an <i>instance</i> of the type <code>SimpleNamespace</code>, and set the hyperparameters accordingly:

- <code>batch_size = 64</code>
- <code>test_batch_size = 1000</code>
- <code>epochs = 10</code>
- <code>lr = 0.001</code>
- <code>momentum = 0.9</code>

The field <code>log_interval</code> can be chosen freely.

- Set the <i>random seed</i> to $s = 42$ using the <i>PyTorch</i> interface.
- Create additional instances of <code>DataLoader</code> for the <i>training</i> as well as the <i>test set</i> and enable <i>shuffling</i>.
- Create a <i>logistic regression</i> model using your <i>own</i> implementation, using the proper <i>input</i> and <i>output</i> dimensionalities.
- Create an optimizer of the type <code>SGD</code> and initialize it accordingly.

In [94]:
from types import SimpleNamespace
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.9, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')


In [95]:
X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)


Train the previously defined <i>logistic regression</i> model by applying the corresponding <i>data loader</i> (keep in mind for which set we want the model to be <i>trained</i>) as well as the <i>optimizer</i>. Report the performance on the <i>test set</i> afterwards. Experiment with different hyperparameter settings, for instance set different values for $\ldots$

- $\ldots$ the learning rate <code>lr</code>.
- $\ldots$ the momentum term <code>momentum</code>.
- $\ldots$ the amount of epochs <code>epochs</code>.

Do you notice any serious differences? If yes, which <i>settings</i> lead to them? If not, try to argue about a <i>possible</i> reason.

In [96]:
for epoch in range(1, args.epochs + 1):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0007, Accuracy: 7600/10000 (76.00%)


Test set: Average loss: 0.0006, Accuracy: 7866/10000 (78.66%)


Test set: Average loss: 0.0006, Accuracy: 8006/10000 (80.06%)


Test set: Average loss: 0.0006, Accuracy: 8087/10000 (80.87%)


Test set: Average loss: 0.0006, Accuracy: 8111/10000 (81.11%)


Test set: Average loss: 0.0005, Accuracy: 8162/10000 (81.62%)


Test set: Average loss: 0.0005, Accuracy: 8206/10000 (82.06%)


Test set: Average loss: 0.0005, Accuracy: 8204/10000 (82.04%)


Test set: Average loss: 0.0005, Accuracy: 8239/10000 (82.39%)


Test set: Average loss: 0.0005, Accuracy: 8258/10000 (82.58%)



In [97]:
from types import SimpleNamespace
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.1, momentum=0.9, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

for epoch in range(1, args.epochs + 1):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0009, Accuracy: 7757/10000 (77.57%)


Test set: Average loss: 0.0006, Accuracy: 8077/10000 (80.77%)


Test set: Average loss: 0.0007, Accuracy: 8141/10000 (81.41%)


Test set: Average loss: 0.0008, Accuracy: 7925/10000 (79.25%)


Test set: Average loss: 0.0006, Accuracy: 8098/10000 (80.98%)


Test set: Average loss: 0.0006, Accuracy: 8219/10000 (82.19%)


Test set: Average loss: 0.0007, Accuracy: 8281/10000 (82.81%)


Test set: Average loss: 0.0006, Accuracy: 8073/10000 (80.73%)


Test set: Average loss: 0.0012, Accuracy: 7820/10000 (78.20%)


Test set: Average loss: 0.0006, Accuracy: 8224/10000 (82.24%)



In [98]:
from types import SimpleNamespace
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.1, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

for epoch in range(1, args.epochs + 1):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0013, Accuracy: 6558/10000 (65.58%)


Test set: Average loss: 0.0011, Accuracy: 6725/10000 (67.25%)


Test set: Average loss: 0.0010, Accuracy: 6957/10000 (69.57%)


Test set: Average loss: 0.0009, Accuracy: 7126/10000 (71.26%)


Test set: Average loss: 0.0008, Accuracy: 7290/10000 (72.90%)


Test set: Average loss: 0.0008, Accuracy: 7389/10000 (73.89%)


Test set: Average loss: 0.0008, Accuracy: 7474/10000 (74.74%)


Test set: Average loss: 0.0008, Accuracy: 7522/10000 (75.22%)


Test set: Average loss: 0.0007, Accuracy: 7577/10000 (75.77%)


Test set: Average loss: 0.0007, Accuracy: 7624/10000 (76.24%)



In [99]:
from types import SimpleNamespace
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.9, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

for epoch in range(1, 2):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0007, Accuracy: 7570/10000 (75.70%)



In [100]:
from types import SimpleNamespace
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.1, momentum=0.1, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

for epoch in range(1, 2):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0006, Accuracy: 7858/10000 (78.58%)



In [101]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.1, momentum=0.1, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

for epoch in range(1, args.epochs + 1):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0005, Accuracy: 8098/10000 (80.98%)


Test set: Average loss: 0.0005, Accuracy: 8309/10000 (83.09%)


Test set: Average loss: 0.0005, Accuracy: 8295/10000 (82.95%)


Test set: Average loss: 0.0005, Accuracy: 8194/10000 (81.94%)


Test set: Average loss: 0.0005, Accuracy: 8190/10000 (81.90%)


Test set: Average loss: 0.0005, Accuracy: 8369/10000 (83.69%)


Test set: Average loss: 0.0005, Accuracy: 8326/10000 (83.26%)


Test set: Average loss: 0.0005, Accuracy: 8315/10000 (83.15%)


Test set: Average loss: 0.0005, Accuracy: 8379/10000 (83.79%)


Test set: Average loss: 0.0005, Accuracy: 8322/10000 (83.22%)



In [102]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.5, momentum=0.5, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

for epoch in range(1, args.epochs + 1):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0025, Accuracy: 6431/10000 (64.31%)


Test set: Average loss: 0.0008, Accuracy: 8300/10000 (83.00%)


Test set: Average loss: 0.0010, Accuracy: 7900/10000 (79.00%)


Test set: Average loss: 0.0009, Accuracy: 7948/10000 (79.48%)


Test set: Average loss: 0.0008, Accuracy: 8117/10000 (81.17%)


Test set: Average loss: 0.0011, Accuracy: 7745/10000 (77.45%)


Test set: Average loss: 0.0016, Accuracy: 7497/10000 (74.97%)


Test set: Average loss: 0.0010, Accuracy: 8137/10000 (81.37%)


Test set: Average loss: 0.0008, Accuracy: 8237/10000 (82.37%)


Test set: Average loss: 0.0008, Accuracy: 8333/10000 (83.33%)



In [103]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=2.2, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args.test_batch_size, shuffle=True)

model = LogisticRegression(input_dim, output_dim).to(device)
optimizer = u4.optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

for epoch in range(1, args.epochs + 1):
    u4.train(args, model, device, training_Loader, optimizer, epoch, input_dim)
    u4.test(args, model, device, testing_Loader, input_dim)
    


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)



Do you notice any serious differences? If yes, which settings lead to them? If not, try to argue about a possible reason.
0. lr     |        mom        |         epochs        |     accuracy for test set (in %)(values from a previous run)
1. 0.001     |     0.9         |        10           |      82.48
2. 0.1       |     0.9         |        10            |     81.43
3. 0.001     |     0.1          |       10           |      76.62
4. 0.001      |    0.9           |       1           |      75.91
5. 0.1        |    0.1           |       1            |     81.43
6. 0.1         |   0.1          |       10             |    84.13
7. 0.5          |  0.5           |      10             |    75.87
8. 2.2         |   0.9         |        10       |          81.76
9. 0.001       |   2.2      |           10         |        10.00

If the momentum is too high (here 2.2), then the accuracy adds only up to 10.00 % what is in comparison to the remaining result extremely low. It could be that a high momentum collect previous happening too exactly. So, the model can't fit the reality of dataset when it belongs to the training set too much in detail. 
Furthermore, the learning and amount of epochs don't seem to have a lot of influence on the accuracy. The reason for that could be the characteristics of the Fashion-MNIST dataset. 

On the basis of your <i>logistic regression</i> implementation, construct a <i>dense feedforward neural network</i> with the following attributes (to get you started, later on you will modify these settings in order to get a better performance on the corresponding test set):

- <i>One</i> input layer, accepting the same data as the <i>logistic regression</i> model.
- <i>Two</i> hidden layers with a dimensionality of $256$ each.
- <i>One</i> output layer, of the same output dimensionality as the <i>logistic regression</i> model. 

To summarize this task:

- Implement a <i>Python class</i> <code>DenseNeuralNet</code> as discussed during the lecture.
- Keep in mind, which <i>activation</i> function a <i>multi-class</i> setting requires.
- Optionally, <i>initialize</i> the parameters of the model in a different way.

In [104]:
class DenseNeuralNet(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(DenseNeuralNet, self).__init__()
        self.n_classes = output_dim
        self.layer=nn.Linear(input_dim, 256)
        self.layer1=nn.Linear(256, 256)
        self.layer2=nn.Linear(256, self.n_classes)

    def forward(self, x):
        x=torch.sigmoid(self.layer(x))
        y=torch.sigmoid(self.layer1(x))
        return self.layer2(x)
    

Moreover, define an <i>instance</i> of the type <code>SimpleNamespace</code>, and set the hyperparameters accordingly (similar to the settings of the <i>logistic regression</i>):

- <code>batch_size = 64</code>
- <code>test_batch_size = 1000</code>
- <code>epochs = 10</code>
- <code>lr = 0.001</code>
- <code>momentum = 0.9</code>

The field <code>log_interval</code> can be chosen freely.

- Set the <i>random seed</i> to $s = 42$ using the <i>PyTorch</i> interface.
- Create additional instances of <code>DataLoader</code> for the <i>training</i> as well as the <i>test set</i> and enable <i>shuffling</i>.
- Create a <i>dense feedforward neural network</i> model using your <i>own</i> implementation, using the proper <i>input</i> and <i>output</i> dimensionalities.
- Create an optimizer of the type <code>SGD</code> and initialize it accordingly.

In [105]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.9, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)


Train the previously defined <i>dense feedforward neural network</i> model by applying the corresponding <i>data loader</i> (keep in mind for which set we want the model to be <i>trained</i>) as well as the <i>optimizer</i>. Report the performance on the <i>test set</i> afterwards. As this kind of network behaves differently than a <i>logistic regression</i> model, experiment with different hyperparameter settings, for instance set different values for $\ldots$

- $\ldots$ the learning rate <code>lr</code>.
- $\ldots$ the momentum term <code>momentum</code>.
- $\ldots$ the amount of epochs <code>epochs</code>.

Do you notice any serious differences? If yes, which <i>settings</i> lead to them? If not, try to argue about a <i>possible</i> reason.

In [106]:
for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0015, Accuracy: 6462/10000 (64.62%)


Test set: Average loss: 0.0010, Accuracy: 6978/10000 (69.78%)


Test set: Average loss: 0.0009, Accuracy: 7182/10000 (71.82%)


Test set: Average loss: 0.0008, Accuracy: 7317/10000 (73.17%)


Test set: Average loss: 0.0007, Accuracy: 7481/10000 (74.81%)


Test set: Average loss: 0.0007, Accuracy: 7551/10000 (75.51%)


Test set: Average loss: 0.0007, Accuracy: 7630/10000 (76.30%)


Test set: Average loss: 0.0006, Accuracy: 7712/10000 (77.12%)


Test set: Average loss: 0.0006, Accuracy: 7781/10000 (77.81%)


Test set: Average loss: 0.0006, Accuracy: 7851/10000 (78.51%)



In [107]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.1, momentum=0.9, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0005, Accuracy: 8329/10000 (83.29%)


Test set: Average loss: 0.0004, Accuracy: 8495/10000 (84.95%)


Test set: Average loss: 0.0004, Accuracy: 8522/10000 (85.22%)


Test set: Average loss: 0.0004, Accuracy: 8611/10000 (86.11%)


Test set: Average loss: 0.0004, Accuracy: 8577/10000 (85.77%)


Test set: Average loss: 0.0004, Accuracy: 8704/10000 (87.04%)


Test set: Average loss: 0.0004, Accuracy: 8671/10000 (86.71%)


Test set: Average loss: 0.0004, Accuracy: 8671/10000 (86.71%)


Test set: Average loss: 0.0003, Accuracy: 8786/10000 (87.86%)


Test set: Average loss: 0.0004, Accuracy: 8684/10000 (86.84%)



In [108]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.1, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0022, Accuracy: 4540/10000 (45.40%)


Test set: Average loss: 0.0022, Accuracy: 6006/10000 (60.06%)


Test set: Average loss: 0.0021, Accuracy: 5997/10000 (59.97%)


Test set: Average loss: 0.0020, Accuracy: 5881/10000 (58.81%)


Test set: Average loss: 0.0018, Accuracy: 6485/10000 (64.85%)


Test set: Average loss: 0.0017, Accuracy: 6417/10000 (64.17%)


Test set: Average loss: 0.0016, Accuracy: 6078/10000 (60.78%)


Test set: Average loss: 0.0015, Accuracy: 6460/10000 (64.60%)


Test set: Average loss: 0.0015, Accuracy: 6596/10000 (65.96%)


Test set: Average loss: 0.0014, Accuracy: 6720/10000 (67.20%)



In [109]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=1, lr=0.001, momentum=0.9, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0015, Accuracy: 6141/10000 (61.41%)



In [110]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=1, lr=0.1, momentum=0.1, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0006, Accuracy: 7834/10000 (78.34%)



In [111]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.1, momentum=0.1, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0006, Accuracy: 7790/10000 (77.90%)


Test set: Average loss: 0.0005, Accuracy: 8136/10000 (81.36%)


Test set: Average loss: 0.0005, Accuracy: 7988/10000 (79.88%)


Test set: Average loss: 0.0005, Accuracy: 8296/10000 (82.96%)


Test set: Average loss: 0.0005, Accuracy: 8283/10000 (82.83%)


Test set: Average loss: 0.0004, Accuracy: 8392/10000 (83.92%)


Test set: Average loss: 0.0005, Accuracy: 8371/10000 (83.71%)


Test set: Average loss: 0.0005, Accuracy: 8284/10000 (82.84%)


Test set: Average loss: 0.0005, Accuracy: 8388/10000 (83.88%)


Test set: Average loss: 0.0004, Accuracy: 8461/10000 (84.61%)



In [112]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.5, momentum=0.5, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0005, Accuracy: 8283/10000 (82.83%)


Test set: Average loss: 0.0005, Accuracy: 8339/10000 (83.39%)


Test set: Average loss: 0.0004, Accuracy: 8341/10000 (83.41%)


Test set: Average loss: 0.0004, Accuracy: 8390/10000 (83.90%)


Test set: Average loss: 0.0004, Accuracy: 8507/10000 (85.07%)


Test set: Average loss: 0.0004, Accuracy: 8682/10000 (86.82%)


Test set: Average loss: 0.0004, Accuracy: 8673/10000 (86.73%)


Test set: Average loss: 0.0004, Accuracy: 8736/10000 (87.36%)


Test set: Average loss: 0.0003, Accuracy: 8769/10000 (87.69%)


Test set: Average loss: 0.0004, Accuracy: 8727/10000 (87.27%)



In [113]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=2.2, momentum=0.9, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: 0.0011, Accuracy: 5677/10000 (56.77%)


Test set: Average loss: 0.0009, Accuracy: 6724/10000 (67.24%)


Test set: Average loss: 0.0009, Accuracy: 6792/10000 (67.92%)


Test set: Average loss: 0.0010, Accuracy: 6639/10000 (66.39%)


Test set: Average loss: 0.0011, Accuracy: 6738/10000 (67.38%)


Test set: Average loss: 0.0008, Accuracy: 7387/10000 (73.87%)


Test set: Average loss: 0.0008, Accuracy: 7760/10000 (77.60%)


Test set: Average loss: 0.0008, Accuracy: 7752/10000 (77.52%)


Test set: Average loss: 0.0007, Accuracy: 7311/10000 (73.11%)


Test set: Average loss: 0.0007, Accuracy: 7479/10000 (74.79%)



In [114]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=2.2, log_interval=100)

use_cuda = torch.cuda.is_available()
device = torch.device('cuda' if use_cuda else 'cpu')

X = torch.rand(42) 

training_Loader = torch.utils.data.DataLoader(train_loader,batch_size=args2.batch_size, shuffle=True)
testing_Loader = torch.utils.data.DataLoader(test_loader,batch_size=args2.test_batch_size, shuffle=True)

model2 = DenseNeuralNet(input_dim, output_dim).to(device)
optimizer2 = u4.optim.SGD(model2.parameters(), lr=args2.lr, momentum=args2.momentum)

for epoch in range(1, args2.epochs + 1):
    u4.train(args2, model2, device, training_Loader, optimizer2, epoch, input_dim)
    u4.test(args2, model2, device, testing_Loader, input_dim)
    


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)


Test set: Average loss: nan, Accuracy: 1000/10000 (10.00%)



Do you notice any serious differences? If yes, which settings lead to them? If not, try to argue about a possible reason.
0. lr     |        mom        |         epochs        |     accuracy for test set (in %)(values from a previous run)
1. 0.001     |     0.9         |        10           |      88.41
2. 0.1       |     0.9         |        10            |     87.18
3. 0.001     |     0.1          |       10           |      65.37
4. 0.001      |    0.9           |       1           |      62.68
5. 0.1        |    0.1           |       1            |     77.32
6. 0.1         |   0.1          |       10             |    84.86
7. 0.5          |  0.5           |      10             |    86.63
8. 2.2         |   0.9         |        10       |          78.91
9. 0.001       |   2.2      |           10         |        10.00

If the momentum is too high (here 2.2), then the accuracy adds only up to 10.00 % what is in comparison to the remaining result extremely low. It could be that a high momentum collect previous happenings too exactly. So, the model can't fit the reality of dataset when it belongs to the training set too much in detail. 
When the momentum is low (here 0.1), then the accuracy adds only up to 65.37 %. This value is low when you compare it to the oters. It could be that a low momomentum collect too few information about previous happenings. That could lead to a lack of information.
Furthermore, the learning doesn't seem to have a lot of influence on the accuracy. The reason for that could be the characteristics of the Fashion-MNIST dataset. 
Finally, the combination of (learning rate 0.001,) a momentum of 0.9 and number of epochs = 1 lead to an accuracy of 62,68 %. When you modify the momentum to 0.1 the accuracy is 77.32 %. 

As already discussing during the lecture and experimented with during the <i>last</i> assignment, simply <i>inverting</i> the original images on which the model is trained, may already be enough to break it. To show this behavior, perform the following steps:

- Set the <i>random seed</i> to $s = 42$ using the <i>PyTorch</i> interface.
- Load the <i>Fashion-MNIST</i> data set with a <code>flip_probability</code> of $p = 1$.
- Display the first <i>eight</i> images of the <i>Fashion-MNIST</i> data set.

Can you identify possible <i>labels</i> of the three images? How do they differ from the previous visualization?

- Evaluate the previously trained <i>logistic regression</i> model on the flipped data set.
- Evaluate the previously trained <i>dense feedforward neural network</i> on the flipped data set.

If you experiment with different <i>hyperparameter settings</i> with respect to the original data set, do the performances differ when tested on the <i>flipped</i> data set?

In [115]:
X = torch.rand(42) 


In [116]:
train_flipped, test_flipped = u4.load_fashion_mnist(1.0)


In [122]:
u4.display_FashionMNIST(train_flipped, 8)


<IPython.core.display.Javascript object>

Can you identify possible labels of the three images? How do they differ from the previous visualization?
When you know, that the pictures are flipped, you can identify the picture labels in the same way as non-flipped pictures. The problem is when you don't recognize that these pictures are flipped. So, I would have problems with the third and fifth picture in labeling. 

The difference to the originial pictures is that these pictures are upside down.

In [139]:
test_flipped_data = torch.utils.data.DataLoader(test_flipped,batch_size=args.test_batch_size, shuffle=True)

args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.9, log_interval=100)
model = LogisticRegression(input_dim, output_dim).to(device)

u4.test(args, model, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0024, Accuracy: 1247/10000 (12.47%)



In [134]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.9, log_interval=100)
model2 = DenseNeuralNet(input_dim, output_dim).to(device)

u4.test(args2, model2, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 999/10000 (9.99%)



If you experiment with different hyperparameter settings with respect to the original data set, do the performances differ when tested on the flipped data set?

In [141]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.1, momentum=0.9, log_interval=100)
model = LogisticRegression(input_dim, output_dim).to(device)

u4.test(args, model, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 933/10000 (9.33%)



In [142]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.1, log_interval=100)
model = LogisticRegression(input_dim, output_dim).to(device)

u4.test(args, model, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 859/10000 (8.59%)



In [143]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=1, lr=0.001, momentum=0.9, log_interval=100)
model = LogisticRegression(input_dim, output_dim).to(device)

u4.test(args, model, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 1158/10000 (11.58%)



In [144]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=2.2, momentum=0.9, log_interval=100)
model = LogisticRegression(input_dim, output_dim).to(device)

u4.test(args, model, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 1023/10000 (10.23%)



In [146]:
args = SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=2.2, log_interval=100)
model = LogisticRegression(input_dim, output_dim).to(device)

u4.test(args, model, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 957/10000 (9.57%)



Neuronal Net:

In [148]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.1, momentum=0.9, log_interval=100)
model2 = DenseNeuralNet(input_dim, output_dim).to(device)

u4.test(args2, model2, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 1191/10000 (11.91%)



In [149]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=0.1, log_interval=100)
model2 = DenseNeuralNet(input_dim, output_dim).to(device)

u4.test(args2, model2, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0024, Accuracy: 1000/10000 (10.00%)



In [150]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=1, lr=0.001, momentum=0.9, log_interval=100)
model2 = DenseNeuralNet(input_dim, output_dim).to(device)

u4.test(args2, model2, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 1000/10000 (10.00%)



In [151]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=2.2, momentum=0.9, log_interval=100)
model2 = DenseNeuralNet(input_dim, output_dim).to(device)

u4.test(args2, model2, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 938/10000 (9.38%)



In [157]:
args2 = u4.SimpleNamespace(batch_size=64, test_batch_size=1000, epochs=10, lr=0.001, momentum=2.2, log_interval=100)
model2 = DenseNeuralNet(input_dim, output_dim).to(device)

u4.test(args2, model2, device, test_flipped_data, input_dim)



Test set: Average loss: 0.0023, Accuracy: 1000/10000 (10.00%)



If you experiment with different hyperparameter settings with respect to the original data set, do the performances differ when tested on the flipped data set?
Yes, now it isn't making a huge difference which values are chosen for the hyperparameters. They are always similar bad.