## Introduciton
This exercise looks into machine learning classification using a neural network-based classifier and a MNIST dataset consisting of images. The objective of the exercise is to train a classifier based on a neural network to minimize the classification error of the dataset (MNIST) through optimizing three parameters; number of neurons, number of epochs and learning factor. The model has been implemented with the use of Python and the torch-library. To limit our experiment, we have chosen to observe four different parameters for the number of neurons; 256, 512, 1024 and 2048, while the number of epochs ranges from 1 to 20 and with use of the following learning rates: 10^-2, 10^-3, 10^-4 and 10^-5. Finally, the results have been visualized using a 4-dimensional graph, taking into account the teloss in addition to the three mentioned parameters. A better result is defined as a lower teloss value. 

In [19]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
import pandas as pd
import plotly.express as px
import os


training_data = datasets.FashionMNIST(root="data", train=True,
    download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False,
    download=True, transform=ToTensor())

torch.manual_seed(23)
train_dataloader = DataLoader(training_data, batch_size=64,
  shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64,
  shuffle=True)

class NeuralNetwork(nn.Module):
    def __init__(self, input_size, layers: list, num_classes):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.layers = nn.ModuleList()
        self.input_size = input_size
        for output_size, activation_function in layers:
            self.layers.append(nn.Linear(input_size, output_size))
            input_size = output_size
            self.layers.append(activation_function)
        self.layers.append(nn.Linear(input_size, num_classes))
    def forward(self, x):
        x = self.flatten(x)
        for layer in self.layers:
          x = layer(x)
        return x

torch.manual_seed(23)
C = 10
N,H,W = training_data.data.shape; D = H*W

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        pred = model(X); loss = loss_fn(pred, y)
        optimizer.zero_grad(); loss.backward(); optimizer.step() # backprop
        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            # print(f"trloss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset); nbatches = len(dataloader)
    teloss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X); teloss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    teloss /= nbatches; correct /= size
    # print(f"teacc: {(100*correct):>0.1f}%, teloss: {teloss:>8f} \n")
    teacc = 100*correct
    return teacc, teloss

In [21]:
def experiment(new_epochs, new_M, new_learning_rate):
    M1, M2 = new_M, new_M # Main parameter
    model = NeuralNetwork(D, [(M1, nn.ReLU()), (M2, nn.ReLU())], C)

    learning_rate = new_learning_rate # Main parameter
    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    epochs = new_epochs # Main parameter

    data = pd.DataFrame(columns=['teloss', 'teacc', 'epoch', 'num_neurons', 'learning_rate'])

    for t in range(epochs):
        print(f"Epoch {t+1}, num_neuron {new_M}, learning_rate {new_learning_rate}")
        train_loop(train_dataloader, model, loss_fn, optimizer)
        teacc, teloss = test_loop(test_dataloader, model, loss_fn)
        data.loc[len(data)] = [teloss, teacc, t+1, new_M, new_learning_rate]

    dataFile = f'num_neurons_{new_M}-learning_rate_{new_learning_rate}'
    notebook_path = os.path.abspath("neural_network_experiment.ipynb")
    directory_path = os.path.dirname(notebook_path)
    data.to_csv(f'{directory_path}/results/{dataFile}.csv', index=False)
    print(f"Results successfully saved to the file '/results/{dataFile}.csv'.")

In [24]:
def obtain_data_from_experiements(epochs, num_neuron_list, learning_rate_list):
    for num_neuron in num_neuron_list:
        for learning_rate in learning_rate_list:
            experiment(epochs, num_neuron, learning_rate)

epochs = 1
num_neuron_list = [256, 512, 1024, 2048]
learning_rate_list = [1e-5, 1e-4, 1e-3, 1e-2]
obtain_data_from_experiements(epochs, num_neuron_list, learning_rate_list)

In [23]:
def obtain_all_data_merged(epochs, num_neuron_list, learning_rate_list):
    all_data = pd.DataFrame(columns=['teloss', 'teacc', 'epoch', 'num_neurons', 'learning_rate'])
    notebook_path = os.path.abspath("neural_network_experiment.ipynb")
    for num_neuron in num_neuron_list:
        for learning_rate in learning_rate_list:
            dataFile = f'num_neurons_{num_neuron}-learning_rate_{learning_rate}'
            notebook_path = os.path.abspath("neural_network_experiment.ipynb")
            directory_path = os.path.dirname(notebook_path)
            df = pd.read_csv(f'{directory_path}/results/{dataFile}.csv')
            df = df.reset_index()
            for index, row, in df.iterrows():
                all_data.loc[len(all_data)] = [row['teloss'], row['teacc'], row['epoch'], int(row['num_neurons']), row['learning_rate']]
    
    all_data.to_csv(f'/content/results/all_data.csv', index=False)
    print(f"Results successfully saved to the file '/results/{dataFile}.csv'.")

obtain_all_data_merged(epochs, num_neuron_list, learning_rate_list)

In [None]:
display(all_data)

Unnamed: 0,teloss,teacc,epoch,num_neurons,learning_rate
0,1.144522,65.54,1.0,256.0,0.00001
1,0.828813,70.31,2.0,256.0,0.00001
2,0.724212,73.66,3.0,256.0,0.00001
3,0.662784,76.45,4.0,256.0,0.00001
4,0.624259,78.02,5.0,256.0,0.00001
...,...,...,...,...,...
315,0.424950,85.33,16.0,2048.0,0.01000
316,0.435432,85.01,17.0,2048.0,0.01000
317,0.470281,84.67,18.0,2048.0,0.01000
318,0.433393,85.33,19.0,2048.0,0.01000


In [28]:
notebook_path = os.path.abspath("neural_network_experiment.ipynb")
directory_path = os.path.dirname(notebook_path)
df = pd.read_csv(f'{directory_path}/results/all_data.csv')
df["num_neurons"] = df["num_neurons"].astype(str)
fig = px.scatter_3d(df, x='epoch', y='learning_rate', z='teloss',
              color='num_neurons', log_y=True)

fig.update_traces(marker=dict(size=3))
fig.write_html("all_data_plot.html")
fig.show()

A 4-dimensional graph of the results was made to visualize, analyze and understand the results. It plots the result as a scatter plot where each result/datapoint is a colorized sphere in a 3-dimensional space. The x-axis is the epoch, y-axis is the learning rate, the z-axis is the teloss and the number of neurons is given as a color (blue: 256, red: 512, green: 1024 and purple:  2048).

## Analysis
**Number of epochs.**
As we can see from the graph, the results improve gradually with a higher number of epochs. For the learning rates 10^-2, 10^-3 and 10^-4 the improvement seems to stabilize around epoch number 15, while for rate 10^-5 the epoch number seems to steadily increase throughout. Therefore we could have possibly achieved better results by looking at higher epoch numbers for the learning rate 10^-5. In conclusion the optimal number for epochs was around 15 for learning rates 10^-2 to 10^-4, while for 10^-5 the optimal number was at least 20 epochs (probably more).

**Learning rate.**
From 10^-2 down to 10^-4 the results overall improved with a lower learning rate. However, 10^-3 seemed to be the optimal choice for learning rate, as the improvement worsened for learning rate 10^-5.

**Number of neurons.**
For the learning rates 10^-2 and 10^-3  the effect of the  number of neurons was not clear, while for 10^-4 and 10^-5 the result improved by a higher number of neurons. For these two learning rates the general trend was that 2048 neurons gave optimal results, while 256 gave the worse results.

In [None]:
best_row = df.loc[df['teloss'].idxmin()]
display(best_row)

teloss           0.300772
teacc               89.87
epoch                13.0
num_neurons        2048.0
learning_rate      0.0001
Name: 272, dtype: object

## The best result and optimal parameters found from the experiment

The best result found was a teloss of 0.300772 by using the following parameters: number of epochs = 13, number of neurons = 2048 and learning rate 10^-4 . This gave a teacc of 89.87.