<a href="https://colab.research.google.com/github/Camel-light/Assignments/blob/main/assignement_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Theory

In the following assignment, your task is to complete the MNIST Basics chapter. It is best to repeat everything from last week and try to answer the following questions. Afterwards you have to summarize the learned facts with two programming tasks.

**What is "torch.cat()" and ".view(-1, 28*28)" doing in the beginning of the "The MNIST Loss Function" chapter?**

The torch.cat() function is used to concatenate the tensors of the images containing the number 3 and the images containing the number 7 into a single tensor.

The view() function is then used to reshape this concatenated tensor into a 2D tensor, where each row represents a single image, and the number of columns equals the total number of pixels in each image.


**Can you draw the neuronal network, which is manually trained in chapter "The MNIST Loss Function"?**
Input (784 x 1) => weights (784 x 1) + bias => Linear1 =>  Output (1 x 1)


**Why is it not possible to use the accuracy as loss function?**

Because the accuracy changes only when the prediction changes from a 3 to a 7, and if we do only very small changes to the x, the prediction is likely not to change at all. The gradient is almost 0 everywhere.

**What is the defined `mnist_loss` function doing?**


```
def mnist_loss(predictions, targets):
    return torch.where(targets==1, 1-predictions, predictions).mean()
```
The function calculates the loss between the predicted output and the target output. It takes two arguments: predictions and targets.

The predictions argument represents the output of the neural network for a given input. This output is compared to the targets argument.

The torch.where function is used to calculate the loss. It compares the targets tensor to the value 1 element-wise. If an element in the targets tensor is 1, the corresponding element in the predictions tensor is subtracted from 1. If an element in the targets tensor is 0, the corresponding element in the predictions tensor is returned as is. This creates a tensor of loss values.

Finally, the mean method is called on the tensor of loss values, which calculates the average loss across all elements in the tensor. This average loss is returned as the output of the mnist_loss function.

**Why do we need additionaly the sigmoid() function? What is it technically in our TLU?**

The Sigmoid function is needed to ensure that the values given to the mnist_loss function are always between 0 and 1. 
In technical terms, it is the non-linear function of a TLU which allows to approximate non linear functions.

**Again, what are mini batches, why are we using them and why should they be shuffeld?**

Mini-batches are subsets of the training data that are used to update the parameters of the model during training. Instead of using the entire dataset to perform a single update, the dataset is divided into smaller, equally-sized batches. These batches are processed, and the parameters of the model are updated based on the average loss over the mini-batch.

Using mini-batches has several advantages over using the entire dataset. First, it is computationally more efficient on GPUs because we can use parallelization to process multiple batches at the same time. Second, mini-batches are a compromise between measuring only one instance of loss function and every instance for each pair of output and predictions.Choosing a good batch size is one of the decisions you need to make as a deep learning practitioner to train your model
quickly and accurately.

It is important to shuffle the mini-batches during training because it ensures that each mini-batch is representative of the entire dataset. If the data is not shuffled, the model may see similar examples in each mini-batch, which can lead to poor generalization and overfitting. Shuffling the mini-batches ensures that the model sees a diverse range of examples in each iteration, which can help the model to generalize better.

#Practical Part

Try to understand all parts of the code needed to manually train a single TLU/Perceptron, so use and copy all parts of the code from "First Try: Pixel Similarity" to the "Putting it all together" chapter. In the second step, use an optimizer, a second layer, and a ReLU as a hidden activation function to train a simple neural network. When copying the code, think carefully about what you really need and how you can summarize it as compactly as possible. (Probably each attempt requires about 15 lines of code.)

In [None]:
#YOUR TASK: Manually train a single layer perceptron without using an optimizer.

# Load data
from fastai.vision.all import *
path = untar_data(URLs.MNIST_SAMPLE)
threes = (path/'train'/'3').ls().sorted()
sevens = (path/'train'/'7').ls().sorted()
three_tensors = [tensor(Image.open(o)) for o in threes]
seven_tensors = [tensor(Image.open(o)) for o in sevens]
stacked_threes = torch.stack(three_tensors).float()/255
stacked_sevens = torch.stack(seven_tensors).float()/255

# Prepare data for training and validation
train_x = torch.cat([stacked_threes[:500], stacked_sevens[:500]]).view(-1, 28*28)
train_y = tensor([1]*500 + [0]*500).unsqueeze(1)
valid_x = torch.cat([stacked_threes[500:], stacked_sevens[500:]]).view(-1, 28*28)
valid_y = tensor([1]*(len(threes)-500) + [0]*(len(sevens)-500)).unsqueeze(1)
train_dset = list(zip(train_x,train_y))
valid_dset = list(zip(valid_x,valid_y))

# Initialize weights and bias
def init_params(size, std=1.0):
    return (torch.randn(size)*std).requires_grad_()
weights = init_params((28*28,1))
bias = init_params(1)

# Define model
def linear1(xb):
    return xb@weights + bias

# Define loss function
def mnist_loss(predictions, targets):
    return torch.where(targets==1, 1-predictions, predictions).mean()

# Calculate accuracy
def accuracy(preds, targets):
    preds = preds.sigmoid()
    return ((preds > 0.5) == targets).float().mean()

# Train model
lr = 1.
for epoch in range(20):
    # Training phase
    for xb,yb in train_dset:
        preds = linear1(xb)
        loss = mnist_loss(preds, yb)
        loss.backward()
        weights.data -= weights.grad*lr
        bias.data -= bias.grad*lr
        weights.grad.zero_()
        bias.grad.zero_()

# Validation phase
valid_preds = [linear1(xb) for xb,yb in valid_dset]
valid_loss = mnist_loss(torch.cat(valid_preds), valid_y)
valid_acc = accuracy(torch.cat(valid_preds), valid_y)
print(f"Epoch {epoch}: Valid Loss: {valid_loss}, Valid Acc: {valid_acc}")
from fastai.vision.all import *
from fastai.vision.core import *
from fastai.vision.data import *
from fastai.vision.learner import *
from fastai.vision.models import *
from fastai.metrics import *


In [None]:
#YOUR TASK: Train a simple two-layer neural network (two perceptrons + hidden activation function) with built-in functions and an optimizer.
from fastai.vision.all import *

# Load data
path = untar_data(URLs.MNIST_SAMPLE)
threes = (path/'train'/'3').ls().sorted()
sevens = (path/'train'/'7').ls().sorted()
train_items = threes + sevens
train_labels = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1)
train_dset = [(PILImage.create(item), label) for item, label in zip(train_items, train_labels)]
dls = DataLoader(train_dset, batch_size=256)

# Define model
class TwoLayerNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(28*28, 50)
        self.layer2 = nn.Linear(50, 1)
        
    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = self.layer2(x)
        return x

# Initialize model and optimizer
model = TwoLayerNet()
opt = SGD(model.parameters(), lr=0.1)

# Define loss function and metric
loss_func = nn.BCEWithLogitsLoss()
metrics = accuracy_multi

# Train model
learn = Learner(dls, model, opt_func=opt, loss_func=loss_func, metrics=metrics)
learn.fit(10)

