<a href="https://colab.research.google.com/github/bkneussl/Assignments/blob/main/assignment_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Theory

In the following assignment, your task is to complete the MNIST Basics chapter. It is best to repeat everything from last week and try to answer the following questions. Afterwards you have to summarize the learned facts with two programming tasks.

**What is "torch.cat()" and ".view(-1, 28*28)" doing in the beginning of the "The MNIST Loss Function" chapter?**


The torch.cat function combines two tensors by appending one to the other. On the other hand, torch.view can modify the dimensions of tensors. In the context of the example provided, where we have a tensor of shape (N, 28, 28), using view with a parameter of (-1, 2828) reduces the number of dimensions by one, resulting in a tensor of shape (N, 2828).

**Can you draw the neuronal network, which is manually trained in chapter "The MNIST Loss Function"?**


**Why is it not possible to use the accuracy as loss function?**

The accuracy of the prediction only changes when the model predicts a 3 and the correct answer is 7, or vice versa. If we make small adjustments to the input data, the prediction may not change at all because the gradient is almost zero throughout most of the input space.



**What is the defined `mnist_loss` function doing? **


```
def mnist_loss(predictions, targets):
    return torch.where(targets==1, 1-predictions, predictions).mean()
```


The purpose of this function is to calculate the distance between each prediction and the correct output value, which is either 0 or 1. The function computes the distance from 1 if the correct output is 1, and from 0 if the correct output is 0. The resulting distances are then averaged across all predictions. In essence, this function serves as a type of ternary operator, operating on PyTorch Tensors.

**Why do we need additionaly the sigmoid() function? What is it technically in our TLU?**


The Sigmoid function is utilized in the MNIST Loss function to constrain the values passed to it between 0 and 1. Specifically, it is used as the activation function of a Threshold Linear Unit (TLU) which enables the approximation of non-linear functions by introducing non-linearity into the network.

**Again, what are mini batches, why are we using them and why should they be shuffeld?** 


Mini-batches are smaller subsets of the training data that are utilized to update the model's parameters during training. Rather than using the entire dataset to perform a single update, the dataset is partitioned into smaller, equally-sized batches. These batches are processed, and the model's parameters are updated based on the average loss computed over the mini-batch.

Using mini-batches has several benefits over using the entire dataset. Firstly, it is computationally more efficient on GPUs because parallelization can be utilized to process multiple batches simultaneously. Secondly, mini-batches are a compromise between computing the loss function for a single instance and for every instance, thus speeding up the training process. Selecting an appropriate batch size is a critical decision to make as a deep learning practitioner in order to efficiently and effectively train the model.

It is crucial to shuffle the mini-batches during training to ensure that each mini-batch is a representative sample of the entire dataset. If the data is not shuffled, the model may encounter similar examples in each mini-batch, resulting in poor generalization and overfitting. Shuffling the mini-batches ensures that the model is exposed to a diverse range of examples in each iteration, aiding in better generalization.

#Practical Part

Try to understand all parts of the code needed to manually train a single TLU/Perceptron, so use and copy all parts of the code from "First Try: Pixel Similarity" to the "Putting it all together" chapter. In the second step, use an optimizer, a second layer, and a ReLU as a hidden activation function to train a simple neural network. When copying the code, think carefully about what you really need and how you can summarize it as compactly as possible. (Probably each attempt requires about 15 lines of code.)

In [4]:
#YOUR TASK: Manually train a single layer perceptron without using an optimizer.

# Load data
from fastai.vision.all import *
path = untar_data(URLs.MNIST_SAMPLE)
threes = (path/'train'/'3').ls().sorted()
sevens = (path/'train'/'7').ls().sorted()
three_tensors = [tensor(Image.open(o)) for o in threes]
seven_tensors = [tensor(Image.open(o)) for o in sevens]
stacked_threes = torch.stack(three_tensors).float()/255
stacked_sevens = torch.stack(seven_tensors).float()/255

# Prepare data for training and validation
train_x = torch.cat([stacked_threes[:500], stacked_sevens[:500]]).view(-1, 28*28)
train_y = tensor([1]*500 + [0]*500).unsqueeze(1)
valid_x = torch.cat([stacked_threes[500:], stacked_sevens[500:]]).view(-1, 28*28)
valid_y = tensor([1]*(len(threes)-500) + [0]*(len(sevens)-500)).unsqueeze(1)
train_dset = list(zip(train_x,train_y))
valid_dset = list(zip(valid_x,valid_y))

# Initialize weights and bias
def init_params(size, std=1.0):
    return (torch.randn(size)*std).requires_grad_()
weights = init_params((28*28,1))
bias = init_params(1)

# Define model
def linear1(xb):
    return xb@weights + bias

# Define loss function
def mnist_loss(predictions, targets):
    return torch.where(targets==1, 1-predictions, predictions).mean()

# Calculate accuracy
def accuracy(preds, targets):
    preds = preds.sigmoid()
    return ((preds > 0.5) == targets).float().mean()

# Train model
lr = 1.
for epoch in range(20):
    # Training phase
    for xb,yb in train_dset:
        preds = linear1(xb)
        loss = mnist_loss(preds, yb)
        loss.backward()
        weights.data -= weights.grad*lr
        bias.data -= bias.grad*lr
        weights.grad.zero_()
        bias.grad.zero_()

# Validation phase
valid_preds = [linear1(xb) for xb,yb in valid_dset]
valid_loss = mnist_loss(torch.cat(valid_preds), valid_y)
valid_acc = accuracy(torch.cat(valid_preds), valid_y)
print(f"Epoch {epoch}: Valid Loss: {valid_loss}, Valid Acc: {valid_acc}")
from fastai.vision.all import *
from fastai.vision.core import *
from fastai.vision.data import *
from fastai.vision.learner import *
from fastai.vision.models import *
from fastai.metrics import *

Epoch 19: Valid Loss: 544.1332397460938, Valid Acc: 0.4996914863586426


In [6]:
#YOUR TASK: Train a simple two-layer neural network (two perceptrons + hidden activation function) with built-in functions and an optimizer.

# Load data
path = untar_data(URLs.MNIST_SAMPLE)
train_items = get_image_files(path/'train')
train_labels = [1 if str(item).split('/')[-2] == '3' else 0 for item in train_items]
train_dset = [(PILImage.create(item), label) for item, label in zip(train_items, train_labels)]
dls = DataLoader(train_dset, batch_size=256)

# Define model
class TwoLayerNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(28*28, 50)
        self.layer2 = nn.Linear(50, 1)
        
    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = self.layer2(x)
        return x

# Initialize model and optimizer
model = TwoLayerNet()
opt = SGD(model.parameters(), lr=0.1)

# Define loss function and metric
loss_func = nn.BCEWithLogitsLoss()
metrics = accuracy_multi

# Train model
learn = Learner(dls, model, opt_func=opt, loss_func=loss_func, metrics=metrics)
learn.fit(10)



TypeError: ignored