## Baseline solution

In this notebook , we will create a baseline solution to our semantic segmentation problem . To iterate fast a notebook is a handy solution. We will then refactor this code into a script to be able to use hyperparameter sweeps.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# !pip install wandb

In [7]:
import sys
sys.path.append('/content/drive/MyDrive/wandb_mlOps/lesson2/')

In [9]:
import wandb
import pandas as pd
from fastai.vision.all import *
from fastai.callback.wandb import WandbCallback

import params
from utils import get_predictions, create_iou_table, MIOU, BackgroundIOU, \
                  RoadIOU, TrafficLightIOU, TrafficSignIOU, PersonIOU, VehicleIOU, BicycleIOU


In [10]:
train_config = SimpleNamespace(
    framework="fastai",
    img_size=(180, 320),
    batch_size=8,
    augment=True, # use data augmentation
    epochs=10,
    lr=2e-3,
    pretrained=True,  # whether to use pretrained encoder
    seed=42,
)

We are setting seed for reproducibility.

In [11]:
set_seed(train_config.seed, reproducible=True)

In [12]:
run = wandb.init(project=params.WANDB_PROJECT, entity=params.ENTITY, job_type="training", config=train_config)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


As usual, we will use W&B Artifacts to track the lineage of our models.

In [13]:
processed_data_at = run.use_artifact(f'{params.PROCESSED_DATA_AT}:latest')
processed_dataset_dir = Path(processed_data_at.download())
df = pd.read_csv(processed_dataset_dir / 'data_split.csv')

[34m[1mwandb[0m: Downloading large artifact bdd_simple_1k_split:latest, 813.25MB. 4010 files... 
[34m[1mwandb[0m:   4010 of 4010 files downloaded.  
Done. 0:0:43.0


We will not use the hold out dataset stage at this moment. is_valid column will tell our trainer how we want to split data between training and validation.

In [14]:
df = df[df.Stage != 'test'].reset_index(drop=True)
df['is_valid'] = df.Stage == 'valid'

In [15]:
def label_func(fname):
    return (fname.parent.parent/"labels")/f"{fname.stem}_mask.png"


We will use fastai's DataBlock API to feed data into model training and validation.

In [16]:
# assign paths
df["image_fname"] = [processed_dataset_dir/f'images/{f}' for f in df.File_Name.values]
df["label_fname"] = [label_func(f) for f in df.image_fname.values]

In [17]:
def get_data(df, bs=4, img_size=(180, 320), augment=True):
    block = DataBlock(blocks=(ImageBlock, MaskBlock(codes=params.BDD_CLASSES)),
                  get_x=ColReader("image_fname"),
                  get_y=ColReader("label_fname"),
                  splitter=ColSplitter(),
                  item_tfms=Resize(img_size),
                  batch_tfms=aug_transforms() if augment else None,
                 )
    return block.dataloaders(df, bs=bs)

We are using wandb.config to track our training hyperparameters.

In [18]:
config = wandb.config

In [19]:
dls = get_data(df, bs=config.batch_size, img_size=config.img_size, augment=config.augment)

We will use intersection over union metrics: mean across all classes (MIOU) and IOU for each class separately. Our model will be a unet based on pretrained resnet18 backbone.

In [20]:
metrics = [MIOU(), BackgroundIOU(), RoadIOU(), TrafficLightIOU(), TrafficSignIOU(), PersonIOU(), VehicleIOU(), BicycleIOU()]
learn = unet_learner(dls, arch=resnet18, pretrained=config.pretrained, metrics=metrics)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 55.4MB/s]


In fastai we already have a callback that integrates tightly with W&B, we only need to pass the WandbCallback to the learner and we are ready to go. The callback will log all the useful variables for us. For example, whatever metric we pass to the learner will be tracked by the callback.

In [21]:
callbacks = [
    SaveModelCallback(monitor='miou'),
    WandbCallback(log_preds=False, log_model=True)
]

lets' train our model

In [22]:
learn.fit_one_cycle(config.epochs, config.lr, cbs=callbacks)

epoch,train_loss,valid_loss,miou,background_iou,road_iou,traffic_light_iou,traffic_sign_iou,person_iou,vehicle_iou,bicycle_iou,time
0,0.49662,0.373687,0.301722,0.832415,0.72058,0.0,0.0,0.0,0.55906,0.0,00:46
1,0.456166,0.509567,0.302361,0.859269,0.709686,0.0,0.0,0.0,0.547574,0.0,00:42
2,0.379105,0.370398,0.282554,0.853718,0.687068,0.0,0.0,0.0,0.437092,0.0,00:43
3,0.305095,0.314512,0.327606,0.882836,0.761731,0.0,0.0,0.0,0.648676,0.0,00:41
4,0.263386,0.262682,0.346347,0.903382,0.826287,0.0,0.0,0.0,0.694762,0.0,00:41
5,0.250771,0.309694,0.331378,0.87516,0.816778,0.0,0.0,0.0,0.627709,0.0,00:42
6,0.225643,0.232735,0.355835,0.914719,0.838309,0.0,0.0,0.0,0.737819,0.0,00:42
7,0.202305,0.252933,0.352899,0.908471,0.819003,0.0,0.0,0.0,0.742818,0.0,00:41
8,0.198206,0.230722,0.361029,0.918891,0.842526,0.014056,0.0,0.0,0.751726,0.0,00:42
9,0.182713,0.233592,0.359686,0.918955,0.843122,0.004049,0.0,0.0,0.751676,0.0,00:43


Better model found at epoch 0 with miou value: 0.3017222245005371.
Better model found at epoch 1 with miou value: 0.3023613506731249.
Better model found at epoch 3 with miou value: 0.32760599104641003.
Better model found at epoch 4 with miou value: 0.34634731225423465.
Better model found at epoch 6 with miou value: 0.3558353912637191.
Better model found at epoch 8 with miou value: 0.36102853029240034.


We will log a table with model predictions and ground truth to W&B, so that we can do error analysis in the W&B dashboard.

In [23]:
samples, outputs, predictions = get_predictions(learn)
table = create_iou_table(samples, outputs, predictions, params.BDD_CLASSES)
wandb.log({"pred_table":table})

We are reloading the model from the best checkpoint at the end and saving it. To make sure we track the final metrics correctly, we will validate the model again and save the final loss and metrics to wandb.summary.

In [24]:
scores = learn.validate()
metric_names = ['final_loss'] + [f'final_{x.name}' for x in metrics]
final_results = {metric_names[i] : scores[i] for i in range(len(scores))}
for k,v in final_results.items():
    wandb.summary[k] = v

In [25]:
  wandb.finish()

VBox(children=(Label(value='126.509 MB of 126.509 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
background_iou,▁▃▃▅▇▄█▇██
bicycle_iou,▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eps_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eps_2,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▂▂▃▄▅▆▇███████▇▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁▁
lr_1,▁▂▂▃▄▅▆▇███████▇▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁▁
lr_2,▁▂▂▃▄▅▆▇███████▇▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁▁
miou,▃▃▁▅▇▅█▇██

0,1
background_iou,0.91896
bicycle_iou,0.0
epoch,10.0
eps_0,1e-05
eps_1,1e-05
eps_2,1e-05
final_background_iou,0.91889
final_bicycle_iou,0.0
final_loss,0.23072
final_miou,0.36103
