# Baseline solution

In this notebook we will create a baseline solution to our image classification problem. To iterate fast a notebook is a handy solution. We will then refactor this code into a script to be able to use hyperparameter sweeps.

In [None]:
# autoreload modules after editing
# without the need of restarting the kernel
%load_ext autoreload
%autoreload 2

# import from file in the parent directory
import sys
sys.path.append('../')

import wandb
import pandas as pd
from fastai.vision.all import *
from fastai.callback.wandb import WandbCallback
import timm

import params

To get all available models from the `timm` library run the following command:
```python
import timm
models_to_benchmark = timm.list_models(pretrained=True)
```
From previous experiments promising candidates were selected as well `Inception` baselines were added. We will load names of those models from a file.

In [None]:
# load list from the file
with open("../models_to_benchmark.txt", "r") as f:
    models_to_benchmark = f.read().splitlines()
models_to_benchmark

Let's now create a `train_config` that we'll pass to W&B `run` to control training hyperparameters.

In [None]:
train_config = SimpleNamespace(
    framework="fastai",
    img_size=(224, 224),
    batch_size=8,
    augment=True, # use data augmentation
    epochs=10, 
    lr=None, # select learning rate automatically
    arch="res2net101d.in1k",
    pretrained=True,  # whether to use pretrained encoder
    seed=42,
)

We are setting seed for reproducibility.

In [None]:
set_seed(train_config.seed, reproducible=True)

In [None]:
run = wandb.init(project=params.WANDB_PROJECT, entity=params.ENTITY, job_type="training", config=train_config)

As usual, we will use W&B Artifacts to track the lineage of our models. 

In [None]:
processed_data_at = run.use_artifact(f'{params.PROCESSED_DATA_AT}:latest')
processed_dataset_dir = Path(processed_data_at.download())
df = pd.read_csv(processed_dataset_dir / 'data_split.csv')

We will not use the hold out dataset stage at this moment. `is_valid` column will tell our trainer how we want to split data between training and validation.

In [None]:
df = df[df.Stage != 'test'].reset_index(drop=True)
df['is_valid'] = df.Stage == 'valid'