# Baseline experiments
This notebook contains the code to run a single baseline experiment.

## Set up
To use the custom modules defined in `src`, we first make sure that the working directory is the root folder.

In [None]:
import os

root_folder = 'transformers-on-a-diet'
while not os.getcwd().endswith(root_folder):
    os.chdir('../')

## Configuration
Specify the dataset, which portion of the data is used and the name that will be used to save the model in `results/raw/baseline`. We can then load the configuration.

In [None]:
# Change parameters here
dataset = 'mams'
fraction = 0.1
name = f'TEST_RUN_baseline_{dataset}_fr{fraction}'

# Load the configuration
from src.experiments import get_config

config = get_config()
data_config = config[dataset]
model_config = config['baseline']

## Load the data
Load the training data from `datasets`.

In [None]:
from src.data import Preprocessor

preprocessor = Preprocessor()

(trainX, trainY), _ = preprocessor.parse_train(dataset, data_config['train'], validation_split=1 - fraction)
val_data = preprocessor.parse_test(dataset, data_config['val'])
test_data = preprocessor.parse_test(dataset, data_config['test'])

## Fit the model
Load, compile and train the model using the data that has been loaded in the previous step.

The model is trained via the regular `model.fit`, with the following additional callbacks to monitor performance and save the model:
* `EvaluateCallback` evaluates the model on an additional dataset (the test dataset).
* `ModelCheckpoint(..., save_weights_only=True, ...)` triggers the `SavableModel.save_weights()` to save the weights of the model. (`BaselineModel` is a subclass of `Savablemodel`)
* `CSVLogger` logs the history to a CSV file. This includes the results from the `EvaluateCallback`.

In [None]:
from src.models import BaselineModel
from src.callbacks import EvaluateCallback
# noinspection PyUnresolvedReferences
from tensorflow.keras.callbacks import CSVLogger, ModelCheckpoint

# Load the model
model = BaselineModel(data_config['classes'])
model.compile(optimizer=config['optimizer'], loss=config['loss'], metrics=data_config['metrics'])

# Train the model
model.fit(trainX, trainY, batch_size=config['batch_size'], epochs=config['epochs'], validation_data=val_data, callbacks=[
    EvaluateCallback(test_data),
    ModelCheckpoint(
        os.path.join(config['result_path'], 'checkpoints', name),
        save_best_only=True,
        save_weights_only=True,
        monitor='val_macro_f1',
        mode='max'
    ),
    CSVLogger(os.path.join(config['result_path'], f'{name}.csv'))
])

## Quick methods
Alternatively, the following methods are available to quickly perform an experiment.

Please note: The second method generates name automatically (`baseline_{dataset}_fr{fraction}`)

In [None]:
from src.experiments import baseline_experiment, baseline_experiments

baseline_experiment(dataset, fraction, name)

# To perform multiple experiments:
baseline_experiments([dataset], [fraction], range(3))