# Experiment 2 Units vs dropout

Import data

In [23]:
from mads_datasets import DatasetFactoryProvider, DatasetType

from mltrainer.preprocessors import BasePreprocessor
from mltrainer import imagemodels, Trainer, TrainerSettings, ReportTypes, metrics

import torch.optim as optim
import gin

Hoe maak je automatisch een file model.gin?

In [24]:
gin.parse_config_file("model.gin")

ParsedConfigFileIncludesAndImports(filename='model.gin', imports=['gin.torch.external_configurables'], includes=[])

In [25]:
preprocessor = BasePreprocessor()
fashionfactory = DatasetFactoryProvider.create_factory(DatasetType.FASHION)
streamers = fashionfactory.create_datastreamer(batchsize=64, preprocessor=preprocessor)
train = streamers["train"]
valid = streamers["valid"]
trainstreamer = train.stream()
validstreamer = valid.stream()

[32m2024-12-10 18:20:40.607[0m | [1mINFO    [0m | [36mmads_datasets.base[0m:[36mdownload_data[0m:[36m121[0m - [1mFolder already exists at /home/azureuser/.cache/mads_datasets/fashionmnist[0m
[32m2024-12-10 18:20:40.608[0m | [1mINFO    [0m | [36mmads_datasets.base[0m:[36mdownload_data[0m:[36m124[0m - [1mFile already exists at /home/azureuser/.cache/mads_datasets/fashionmnist/fashionmnist.pt[0m
  data = torch.load(self.filepath)  # type: ignore


In [26]:
print(gin.config_str())

import gin.torch.external_configurables

# Parameters for NeuralNetwork:
NeuralNetwork.num_classes = 10
NeuralNetwork.units1 = 512



A big advantage is that we can save this config as a file; that way it is easy to track what you changed during your experiments.

In [27]:
accuracy = metrics.Accuracy()

Or, you can use gin, it will read the model.gin file, and instead of needing to set

You can gin.parce_config_file('model.gin') and then set the model with model = NeuralNetwork(), and the parameters will be loaded from the gin file.

If you want to combine this with a manual grid search, you could automate that with a double forloop:

In [28]:
units = [256, 128, 64]
for unit1 in units:
    for unit2 in units:
        if unit1 < unit2:
            continue
        print(f"Units: {unit1}, {unit2}")

Units: 256, 256
Units: 256, 128
Units: 256, 64
Units: 128, 128
Units: 128, 64
Units: 64, 64


Of course, this might not be the best way to search for a model; some configurations will be better than others (can you predict up front what will be the best configuration?).

So, feel free to improve upon the gridsearch by adding your own logic.

!!! Train_steps en valid_steps is verkleind om de runtime op de vm werkbaar te houden. Dit is gezet op 64. Dus 64 batches.

!!!! Epochs vergroot naar 50 om te zien wanneer er een kantelpunt is. Wanneer ga je overfitten?

!!!! Uit experiment 1 bleek Adam of AdamW beter te werken dan SGD. Daarom optimizer ingesteld op Adam

In [29]:
import torch
gin.parse_config_file("model.gin")

units = [256, 128, 64]
loss_fn = torch.nn.CrossEntropyLoss()

settings = TrainerSettings(
    epochs=50,
    metrics=[accuracy],
    logdir=(f"NN_{unit1}_{unit2}"),	
    train_steps=128,
    valid_steps=128,
    reporttypes=[ReportTypes.TENSORBOARD, ReportTypes.GIN],
)

for unit1 in units:
    for unit2 in units:
        if unit1 < unit2:
            continue
        gin.bind_parameter("NeuralNetwork.units1", unit1)
        gin.bind_parameter("NeuralNetwork.units2", unit2)

        settings.logdir = f"NN_{unit1}_{unit2}"

        model = imagemodels.NeuralNetwork()
        trainer = Trainer(
            model=model,
            settings=settings,
            loss_fn=loss_fn,
            optimizer=optim.Adam,
            traindataloader=trainstreamer,
            validdataloader=validstreamer,
            scheduler=optim.lr_scheduler.ReduceLROnPlateau
        )

        trainer.loop()

[32m2024-12-10 18:20:40.688[0m | [1mINFO    [0m | [36mmltrainer.settings[0m:[36mcheck_path[0m:[36m61[0m - [1mCreated logdir /home/azureuser/MachineLearning/notebooks/2_experiment2/NN_64_64[0m
[32m2024-12-10 18:20:40.701[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mdir_add_timestamp[0m:[36m29[0m - [1mLogging to NN_256_256/20241210-182040[0m
[32m2024-12-10 18:20:43.365[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36m__init__[0m:[36m72[0m - [1mFound earlystop_kwargs in settings.Set to None if you dont want earlystopping.[0m
100%|[38;2;30;71;6m██████████[0m| 128/128 [00:00<00:00, 192.21it/s]
[32m2024-12-10 18:20:44.426[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mreport[0m:[36m191[0m - [1mEpoch 0 train 0.8689 test 0.5949 metric ['0.7776'][0m
100%|[38;2;30;71;6m██████████[0m| 128/128 [00:00<00:00, 201.21it/s]
[32m2024-12-10 18:20:45.439[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mreport[0m:[36m191[0m - [1mE

In [30]:
settings

epochs: 50
metrics: [Accuracy]
logdir: NN_64_64
train_steps: 128
valid_steps: 128
reporttypes: [<ReportTypes.TENSORBOARD: 2>, <ReportTypes.GIN: 1>]
optimizer_kwargs: {'lr': 0.001, 'weight_decay': 1e-05}
scheduler_kwargs: {'factor': 0.1, 'patience': 10}
earlystop_kwargs: {'save': False, 'verbose': True, 'patience': 10}

In [31]:
model

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=64, bias=True)
    (1): ReLU()
    (2): Linear(in_features=64, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=10, bias=True)
  )
)

!!! Er zijn 10 verschillende fashion categorien. Dus out_features = 10

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
gin.parse_config_file("model.gin")

units = [256, 128, 64]
loss_fn = torch.nn.CrossEntropyLoss()
num_classes = 10 

settings = TrainerSettings(
    epochs=10,
    metrics=[accuracy],
    logdir=(f"{unit1}_{unit2}"),	
    train_steps=128,
    valid_steps=128,
    reporttypes=[ReportTypes.TENSORBOARD, ReportTypes.GIN],
)

for unit1 in units:
    for unit2 in units:
        if unit1 < unit2:
            continue
        gin.bind_parameter("NeuralNetwork.units1", unit1)
        gin.bind_parameter("NeuralNetwork.units2", unit2)

        settings.logdir = f"NN_{unit1}_{unit2}"

        flatten = nn.Flatten()
        linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, unit1),
            nn.ReLU(),
            nn.Linear(unit1, unit2),
            nn.ReLU(),
            nn.Linear(unit2, num_classes),
        )
        model = nn.Sequential(flatten, linear_relu_stack)

        trainer = Trainer(
            model=model,
            settings=settings,
            loss_fn=loss_fn,
            optimizer=optim.Adam,
            traindataloader=trainstreamer,
            validdataloader=validstreamer,
            scheduler=optim.lr_scheduler.ReduceLROnPlateau
        )

        trainer.loop() 
import

SyntaxError: invalid syntax (4181703118.py, line 50)

In [None]:
settings

In [None]:
model

Run the experiment, and study the result with tensorboard.

Locally, it is easy to do that with VS code itself. On the server, you have to take these steps:

in the terminal, cd to the location of the repository
activate the python environment for the shell. Note how the correct environment is being activated.
run tensorboard --logdir=modellogs in the terminal
tensorboard will launch at localhost:6006 and vscode will notify you that the port is forwarded
you can either press the launch button in VScode or open your local browser at localhost:6006

Run the experiment, and study the result with tensorboard. 

Locally, it is easy to do that with VS code itself. On the server, you have to take these steps:

- in the terminal, `cd` to the location of the repository
- activate the python environment for the shell. Note how the correct environment is being activated.
- run `tensorboard --logdir=modellogs` in the terminal
- tensorboard will launch at `localhost:6006` and vscode will notify you that the port is forwarded
- you can either press the `launch` button in VScode or open your local browser at `localhost:6006`


Experiment with things like:

- changing the amount of units1 and units2 to values between 16 and 1024. Use factors of 2: 16, 32, 64, etc.
- changing the batchsize to values between 4 and 128. Again, use factors of two.
- all your experiments are saved in the `modellogs` directory, with a timestamp. Inside you find a saved_config.gin file, that 
contains all the settings for that experiment. The `events` file is what tensorboard will show.
- plot the result in a heatmap: units vs batchsize.
- changing the learningrate to values between 1e-2 and 1e-5 
- changing the optimizer from SGD to one of the other available algoritms at [torch](https://pytorch.org/docs/stable/optim.html) (scroll down for the algorithms)

A note on train_steps: this is a setting that determines how often you get an update. 
Because our complete dataset is 938 (60000 / 64) batches long, you will need 938 trainstep to cover the complete 60.000 images.

This can actually be a bit confusion, because every value below 938 changes the meaning of `epoch` slightly, because one epoch is no longer
the full dataset, but simply `trainstep` batches. Setting trainsteps to 100 means you need to wait twice as long before you get feedback on the performance,
as compared to trainsteps=50. You will also see that settings trainsteps to 100 improves the learning, but that is simply because the model has seen twice as 
much examples as compared to trainsteps=50.

This implies that it is not usefull to compare trainsteps=50 and trainsteps=100, because setting it to 100 will always be better.
Just pick an amount, and adjust your number of epochs accordingly.