<a href="https://colab.research.google.com/github/abdulsam/MLOps_wandb/blob/main/Split_Dataset_and_Baseline_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install dependencies (run once)
!wget https://raw.githubusercontent.com/wandb/edu/main/mlops-001/lesson1/requirements.txt
!wget https://raw.githubusercontent.com/wandb/edu/main/mlops-001/lesson1/params.py
!wget https://raw.githubusercontent.com/wandb/edu/main/mlops-001/lesson1/utils.py
!pip install -r requirements.txt

--2023-08-03 14:34:09--  https://raw.githubusercontent.com/wandb/edu/main/mlops-001/lesson1/requirements.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 82 [text/plain]
Saving to: ‘requirements.txt’


2023-08-03 14:34:09 (4.03 MB/s) - ‘requirements.txt’ saved [82/82]

--2023-08-03 14:34:09--  https://raw.githubusercontent.com/wandb/edu/main/mlops-001/lesson1/params.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 295 [text/plain]
Saving to: ‘params.py’


2023-08-03 14:34:09 (25.8 MB/s) - ‘params.py’ saved [295/295]

--2023-0

In [3]:
WANDB_PROJECT = "MLOps"
ENTITY = None # set this to team name if working in a team
BDD_CLASSES = {i:c for i,c in enumerate(['background', 'road', 'traffic light', 'traffic sign', 'person', 'vehicle', 'bicycle'])}
RAW_DATA_AT = 'bdd_simple_1k'
PROCESSED_DATA_AT = 'bdd_simple_1k_split'

# Data preparation

<!--- @wandbcode{course-lesson1} -->

In this notebook we will prepare the data to later train our deep learning model. To do so,

- we will start a new W&B `run` and use our raw data artifact
- split the data and save the splits into a new W&B Artifact
- join information about the split with our EDA Table

In [4]:
import os, warnings
import wandb

import pandas as pd
from fastai.vision.all import *
from sklearn.model_selection import StratifiedGroupKFold

import params
warnings.filterwarnings('ignore')

In [None]:
run = wandb.init(project=WANDB_PROJECT, entity=ENTITY, job_type="data_split")

Let's use artifact we previously saved to W&B (we're storing artifact names and other global parameters in `params`).

In [None]:
raw_data_at = run.use_artifact(f'{RAW_DATA_AT}:latest')
path = Path(raw_data_at.download())

[34m[1mwandb[0m: Downloading large artifact bdd_simple_1k:latest, 813.75MB. 4007 files... 
[34m[1mwandb[0m:   4007 of 4007 files downloaded.  
Done. 0:0:40.4


In [None]:
path.ls()

(#5) [Path('artifacts/bdd_simple_1k:latest/images'),Path('artifacts/bdd_simple_1k:latest/eda_table.table.json'),Path('artifacts/bdd_simple_1k:latest/LICENSE.txt'),Path('artifacts/bdd_simple_1k:latest/labels'),Path('artifacts/bdd_simple_1k:latest/media')]

To split data between training, testing and validation, we need file names, groups (derived from the file name) and target (here we use our rare class bicycle for stratification). We previously saved these columns to EDA table, so let's retrieve it from the table now.

In [None]:
fnames = os.listdir(path/'images')
groups = [s.split('-')[0] for s in fnames]

In [None]:
orig_eda_table = raw_data_at.get("eda_table")

[34m[1mwandb[0m: Downloading large artifact bdd_simple_1k:latest, 813.75MB. 4007 files... 
[34m[1mwandb[0m:   4007 of 4007 files downloaded.  
Done. 0:0:12.5


In [None]:
y = orig_eda_table.get_column('bicycle')

Now we will split the data into train (80%), validation (10%) and test (10%) sets. As we do that, we need to be careful to:

- *avoid leakage*: for that reason we are grouping data according to video identifier (we want to make sure our model can generalize to new cars or video frames)
- handle the *label imbalance*: for that reason we stratify data with our target column

We will use sklearn's `StratifiedGroupKFold` to split the data into 10 folds and assign 1 fold for test, 1 for validation and the rest for training.

In [None]:
df = pd.DataFrame()
df['File_Name'] = fnames
df['fold'] = -1

In [None]:
cv = StratifiedGroupKFold(n_splits=10)
for i, (train_idxs, test_idxs) in enumerate(cv.split(fnames, y, groups)):
    df.loc[test_idxs, ['fold']] = i

In [None]:
df['Stage'] = 'train'
df.loc[df.fold == 0, ['Stage']] = 'test'
df.loc[df.fold == 1, ['Stage']] = 'valid'
del df['fold']
df.Stage.value_counts()

train    800
test     100
valid    100
Name: Stage, dtype: int64

In [None]:
df.to_csv('data_split.csv', index=False)

We will now create a new artifact and add our data there.

In [None]:
processed_data_at = wandb.Artifact(params.PROCESSED_DATA_AT, type="split_data")

In [None]:
processed_data_at.add_file('data_split.csv')
processed_data_at.add_dir(path)

[34m[1mwandb[0m: Adding directory to artifact (./artifacts/bdd_simple_1k:latest)... Done. 10.6s


Finally, the split information may be relevant for our analyses - rather than uploading images again, we will save the split information to a new table and join it with EDA table we created previously.

In [None]:
data_split_table = wandb.Table(dataframe=df[['File_Name', 'Stage']])

In [None]:
join_table = wandb.JoinedTable(orig_eda_table, data_split_table, "File_Name")

Let's add it to our artifact, log it and finish our `run`.

In [None]:
processed_data_at.add(join_table, "eda_table_data_split")

<wandb.sdk.artifacts.artifact_manifest_entry.ArtifactManifestEntry at 0x7efbc4a28340>

In [None]:
run.log_artifact(processed_data_at)
run.finish()

#utils.py

In [5]:
import wandb
from sklearn.metrics import ConfusionMatrixDisplay
from IPython.display import display, Markdown
from fastai.vision.all import *

CLASS_INDEX = {v:k for k,v in BDD_CLASSES.items()}

def iou_per_class(inp, targ):
    "Compute iou per class"
    iou_scores = []
    for c in range(inp.shape[0]):
        dec_preds = inp.argmax(dim=0)
        p = torch.where(dec_preds == c, 1, 0)
        t = torch.where(targ == c, 1, 0)
        c_inter = (p * t).float().sum().item()
        c_union = (p + t).float().sum().item()
        iou_scores.append(c_inter / (c_union - c_inter) if c_union > 0 else np.nan)
    return iou_scores

def create_row(sample, pred_label, prediction, class_labels):
    """"A simple function to create a row of (img, target, prediction, and scores...)"""
    (image, label) = sample
    # compute metrics
    iou_scores = iou_per_class(prediction, label)
    image = image.permute(1, 2, 0)
    row =[wandb.Image(
                image,
                masks={
                    "predictions": {
                        "mask_data": pred_label[0].numpy(),
                        "class_labels": class_labels,
                    },
                    "ground_truths": {
                        "mask_data": label.numpy(),
                        "class_labels": class_labels,
                    },
                },
            ),
            *iou_scores,
    ]
    return row

def create_iou_table(samples, outputs, predictions, class_labels):
    "Creates a wandb table with predictions and targets side by side"

    def _to_str(l):
        return [f'{str(x)} IoU' for x in l]

    items = list(zip(samples, outputs, predictions))

    table = wandb.Table(
        columns=["Image"]
        + _to_str(class_labels.values()),
    )
    # we create one row per sample
    for item in progress_bar(items):
        table.add_data(*create_row(*item, class_labels=class_labels))

    return table

def get_predictions(learner, test_dl=None, max_n=None):
    """Return the samples = (x,y) and outputs (model predictions decoded), and predictions (raw preds)"""
    test_dl = learner.dls.valid if test_dl is None else test_dl
    inputs, predictions, targets, outputs = learner.get_preds(
        dl=test_dl, with_input=True, with_decoded=True
    )
    x, y, samples, outputs = learner.dls.valid.show_results(
        tuplify(inputs) + tuplify(targets), outputs, show=False, max_n=max_n
    )
    return samples, outputs, predictions

    def value(self): return self.inter/(self.union-self.inter) if self.union > 0 else None

class MIOU(DiceMulti):
    @property
    def value(self):
        binary_iou_scores = np.array([])
        for c in self.inter:
            binary_iou_scores = np.append(binary_iou_scores, \
                                          self.inter[c]/(self.union[c]-self.inter[c]) if self.union[c] > 0 else np.nan)
        return np.nanmean(binary_iou_scores)

class IOU(DiceMulti):
    @property
    def value(self):
        c=CLASS_INDEX[self.nm]
        return self.inter[c]/(self.union[c]-self.inter[c]) if self.union[c] > 0 else np.nan

class BackgroundIOU(IOU): nm = 'background'
class RoadIOU(IOU): nm = 'road'
class TrafficLightIOU(IOU): nm = 'traffic light'
class TrafficSignIOU(IOU): nm = 'traffic sign'
class PersonIOU(IOU): nm = 'person'
class VehicleIOU(IOU): nm = 'vehicle'
class BicycleIOU(IOU): nm = 'bicycle'


class IOUMacro(DiceMulti):
    @property
    def value(self):
        c=CLASS_INDEX[self.nm]
        if c not in self.count: return np.nan
        else: return self.macro[c]/self.count[c] if self.count[c] > 0 else np.nan

    def reset(self): self.macro,self.count = {},{}

    def accumulate(self, learn):
        pred,targ = learn.pred.argmax(dim=self.axis), learn.y
        for c in range(learn.pred.shape[self.axis]):
            p = torch.where(pred == c, 1, 0)
            t = torch.where(targ == c, 1, 0)
            c_inter = (p*t).float().sum(dim=(1,2))
            c_union = (p+t).float().sum(dim=(1,2))
            m = c_inter / (c_union - c_inter)
            macro = m[~torch.any(m.isnan())]
            count = macro.shape[1]

            if count > 0:
                msum = macro.sum().item()
                if c in self.count:
                    self.count[c] += count
                    self.macro[c] += msum
                else:
                    self.count[c] = count
                    self.macro[c] = msum


class MIouMacro(IOUMacro):
    @property
    def value(self):
        binary_iou_scores = np.array([])
        for c in self.count:
            binary_iou_scores = np.append(binary_iou_scores, self.macro[c]/self.count[c] if self.count[c] > 0 else np.nan)
        return np.nanmean(binary_iou_scores)


class BackgroundIouMacro(IOUMacro): nm = 'background'
class RoadIouMacro(IOUMacro): nm = 'road'
class TrafficLightIouMacro(IOUMacro): nm = 'traffic light'
class TrafficSignIouMacro(IOUMacro): nm = 'traffic sign'
class PersonIouMacro(IOUMacro): nm = 'person'
class VehicleIouMacro(IOUMacro): nm = 'vehicle'
class BicycleIouMacro(IOUMacro): nm = 'bicycle'


def display_diagnostics(learner, dls=None, return_vals=False):
    """
    Display a confusion matrix for the unet learner.
    If `dls` is None it will get the validation set from the Learner

    You can create a test dataloader using the `test_dl()` method like so:
    >> dls = ... # You usually create this from the DataBlocks api, in this library it is get_data()
    >> tdls = dls.test_dl(test_dataframe, with_labels=True)

    See: https://docs.fast.ai/tutorial.pets.html#adding-a-test-dataloader-for-inference

    """
    probs, targs = learner.get_preds(dl = dls)
    preds = probs.argmax(dim=1)
    classes = list(params.BDD_CLASSES.values())
    y_true = targs.flatten().numpy()
    y_pred = preds.flatten().numpy()

    tdf, pdf = [pd.DataFrame(r).value_counts().to_frame(c) for r,c in zip((y_true, y_pred) , ['y_true', 'y_pred'])]
    countdf = tdf.join(pdf, how='outer').reset_index(drop=True).fillna(0).astype(int).rename(index= params.BDD_CLASSES)
    countdf = countdf/countdf.sum()
    display(Markdown('### % Of Pixels In Each Class'))
    display(countdf.style.format('{:.1%}'))


    disp = ConfusionMatrixDisplay.from_predictions(y_true=y_true, y_pred=y_pred,
                                                   display_labels=classes,
                                                   normalize='pred')
    fig = disp.ax_.get_figure()
    fig.set_figwidth(10)
    fig.set_figheight(10)
    disp.ax_.set_title('Confusion Matrix (by Pixels)', fontdict={'fontsize': 32, 'fontweight': 'medium'})
    fig.show()

    if return_vals: return countdf, disp



#Baseline Model

In [6]:
import wandb
import pandas as pd
from fastai.vision.all import *
from fastai.callback.wandb import WandbCallback

In [7]:
train_config = SimpleNamespace(
    framework="fastai",
    img_size=(180, 320),
    batch_size=8,
    augment=True, # use data augmentation
    epochs=10,
    lr=2e-3,
    pretrained=True,  # whether to use pretrained encoder
    seed=42,
)

In [8]:
set_seed(train_config.seed, reproducible=True)

In [9]:
run = wandb.init(project=WANDB_PROJECT, entity=ENTITY, job_type="training", config=train_config)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [10]:
processed_data_at = run.use_artifact(f'{PROCESSED_DATA_AT}:latest')
processed_dataset_dir = Path(processed_data_at.download())
df = pd.read_csv(processed_dataset_dir / 'data_split.csv')

[34m[1mwandb[0m: Downloading large artifact bdd_simple_1k_split:latest, 813.25MB. 4010 files... 
[34m[1mwandb[0m:   4010 of 4010 files downloaded.  
Done. 0:0:40.0


In [11]:
df = df[df.Stage != 'test'].reset_index(drop=True)
df['is_valid'] = df.Stage == 'valid'

In [12]:
def label_func(fname):
    return (fname.parent.parent/"labels")/f"{fname.stem}_mask.png"

In [13]:
# assign paths
df["image_fname"] = [processed_dataset_dir/f'images/{f}' for f in df.File_Name.values]
df["label_fname"] = [label_func(f) for f in df.image_fname.values]

In [14]:
def get_data(df, bs=4, img_size=(180, 320), augment=True):
    block = DataBlock(blocks=(ImageBlock, MaskBlock(codes=BDD_CLASSES)),
                  get_x=ColReader("image_fname"),
                  get_y=ColReader("label_fname"),
                  splitter=ColSplitter(),
                  item_tfms=Resize(img_size),
                  batch_tfms=aug_transforms() if augment else None,
                 )
    return block.dataloaders(df, bs=bs)

In [15]:
config = wandb.config

In [16]:
dls = get_data(df, bs=config.batch_size, img_size=config.img_size, augment=config.augment)

In [17]:
metrics = [MIOU(), BackgroundIOU(), RoadIOU(), TrafficLightIOU(), \
           TrafficSignIOU(), PersonIOU(), VehicleIOU(), BicycleIOU()]

learn = unet_learner(dls, arch=resnet18, pretrained=config.pretrained, metrics=metrics)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 65.9MB/s]


In [18]:
callbacks = [
    SaveModelCallback(monitor='miou'),
    WandbCallback(log_preds=False, log_model=True)
]

In [19]:
learn.fit_one_cycle(config.epochs, config.lr, cbs=callbacks)

epoch,train_loss,valid_loss,miou,background_iou,road_iou,traffic_light_iou,traffic_sign_iou,person_iou,vehicle_iou,bicycle_iou,time
0,0.5035,0.364788,0.302899,0.856045,0.682933,0.0,0.0,0.0,0.581317,0.0,00:44
1,0.44154,0.456096,0.223625,0.804077,0.752603,0.0,0.0,0.0,0.008696,0.0,00:40
2,0.341107,0.31037,0.335007,0.889018,0.781805,0.0,0.0,0.0,0.674224,0.0,00:44
3,0.308571,0.294588,0.335348,0.888502,0.773587,0.0,0.0,0.0,0.685345,0.0,00:42
4,0.284449,0.29517,0.337692,0.890041,0.753821,0.0,0.0,0.0,0.719981,0.0,00:40


Better model found at epoch 0 with miou value: 0.30289944952858733.
Better model found at epoch 2 with miou value: 0.33500671276999844.
Better model found at epoch 3 with miou value: 0.3353476620971991.
Better model found at epoch 4 with miou value: 0.33769184654925394.


epoch,train_loss,valid_loss,miou,background_iou,road_iou,traffic_light_iou,traffic_sign_iou,person_iou,vehicle_iou,bicycle_iou,time
0,0.5035,0.364788,0.302899,0.856045,0.682933,0.0,0.0,0.0,0.581317,0.0,00:44
1,0.44154,0.456096,0.223625,0.804077,0.752603,0.0,0.0,0.0,0.008696,0.0,00:40
2,0.341107,0.31037,0.335007,0.889018,0.781805,0.0,0.0,0.0,0.674224,0.0,00:44
3,0.308571,0.294588,0.335348,0.888502,0.773587,0.0,0.0,0.0,0.685345,0.0,00:42
4,0.284449,0.29517,0.337692,0.890041,0.753821,0.0,0.0,0.0,0.719981,0.0,00:40
5,0.244119,0.291671,0.347504,0.902776,0.799168,0.0,0.0,0.0,0.730585,0.0,00:40
6,0.226735,0.267549,0.35669,0.909136,0.822833,0.019914,0.0,0.0,0.744949,0.0,00:42
7,0.206693,0.256822,0.360256,0.910406,0.823629,0.038498,0.0,0.0,0.749256,0.0,00:41
8,0.19184,0.244747,0.367153,0.912149,0.82566,0.075743,0.0,0.0,0.756516,0.0,00:41
9,0.179307,0.243743,0.365229,0.912416,0.826369,0.063608,0.0,0.0,0.754213,0.0,00:41


Better model found at epoch 5 with miou value: 0.3475041149280384.
Better model found at epoch 6 with miou value: 0.3566903117209734.
Better model found at epoch 7 with miou value: 0.36025550714110466.
Better model found at epoch 8 with miou value: 0.3671525157565793.


In [20]:
samples, outputs, predictions = get_predictions(learn)
table = create_iou_table(samples, outputs, predictions, BDD_CLASSES)
wandb.log({"pred_table":table})

In [21]:
scores = learn.validate()
metric_names = ['final_loss'] + [f'final_{x.name}' for x in metrics]
final_results = {metric_names[i] : scores[i] for i in range(len(scores))}
for k,v in final_results.items():
    wandb.summary[k] = v

In [22]:
wandb.finish()

VBox(children=(Label(value='131.756 MB of 131.756 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0,…

0,1
background_iou,▄▁▆▆▇▇████
bicycle_iou,▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
eps_0,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eps_1,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eps_2,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
lr_0,▁▂▂▃▄▅▆▇███████▇▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁▁
lr_1,▁▂▂▃▄▅▆▇███████▇▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁▁
lr_2,▁▂▂▃▄▅▆▇███████▇▇▇▇▆▆▆▅▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁▁
miou,▅▁▆▆▇▇▇███

0,1
background_iou,0.91242
bicycle_iou,0.0
epoch,10.0
eps_0,1e-05
eps_1,1e-05
eps_2,1e-05
final_background_iou,0.91215
final_bicycle_iou,0.0
final_loss,0.24475
final_miou,0.36715
