# PyTorch Multi-GPU trainer using 🤗 accelerate + Mixed Precision + W&B Logging
This training notebook uses HuggingFace accelerate with mixed precision to train a VIT model on 2x T4 GPUs. The current performance is quite good but I am working on optimizing it.

I am also using Probabilistic F1 score with a `beta=0.5` (in hopes that it will penalize false positive). The entire training pipeline is working well and you could fork the notebook and play the model and other hyperparameters.

**Feel free to fork and change the models and do some preprocessing, but if you do please leave an upvote :)**

<center>
<img src="https://img.shields.io/badge/Upvote-If%20you%20like%20my%20work-07b3c8?style=for-the-badge&logo=kaggle">
</center>

## Installation and Imports

In [2]:
%%capture
! pip install timm
! pip install einops
! pip install git+https://github.com/huggingface/accelerate

In [3]:
import os
import sys
import cv2
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import wandb

import timm
import torch
import torch.nn as nn
from einops import rearrange
from torch.utils.data import DataLoader, Dataset

from sklearn.metrics import f1_score
from sklearn.model_selection import StratifiedGroupKFold

from accelerate.tracking import GeneralTracker
from accelerate import Accelerator, notebook_launcher

## Utility functions

In [4]:
def probabilistic_f1(labels, predictions, beta=0.5):
    y_true_count = 0
    ctp = 0
    cfp = 0

    for idx in range(len(labels)):
        prediction = min(max(predictions[idx], 0), 1)
        if (labels[idx]):
            y_true_count += 1
            ctp += prediction
            cfp += 1 - prediction
        else:
            cfp += prediction

    beta_squared = beta * beta
    c_precision = ctp / (ctp + cfp)
    c_recall = ctp / y_true_count
    if (c_precision > 0 and c_recall > 0):
        result = (1 + beta_squared) * (c_precision * c_recall) / (beta_squared * c_precision + c_recall)
        return result
    else:
        return 0
    
def wandb_log(**kwargs):
    for k, v in kwargs.items():
        wandb.log({k: v})

## Config and W&B

In [5]:
Config = {
    'TRAIN_BS': 32,
    'VALID_BS': 32,
    'MODEL_NAME': 'vit_base_patch16_224',
    'NUM_WORKERS': 8,
    'PARENT_PATH': '/kaggle/input/rsna-mammography-images-as-pngs/images_as_pngs_512/train_images_processed_512/',
    'FILE_PATH': '/kaggle/input/rsna-breast-cancer-detection/train.csv',
    'LOSS': 'BCEWithLogitsLoss',
    'EVAL_METRIC': 'F1',
    'NB_EPOCHS': 3,
    'SPLITS': 5,
    'T_0': 20,
    'η_min': 1e-4,
    'fc_dropout': 0.2,
    'betas': (0.9, 0.999),
    'N_LABELS': 1,
    'LR': 2e-4,
    'competition': 'rsna_mammography',
    '_wandb_kernel': 'tanaym',
}

### About W&B:
<center><img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases"/></center><br>
<p style="text-align:center">WandB is a developer tool for companies turn deep learning research projects into deployed software by helping teams track their models, visualize model performance and easily automate training and improving models.
We will use their tools to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.<br><br></p>

To login to W&B, you can use below snippet.

```python
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
wb_key = user_secrets.get_secret("WANDB_API_KEY")

wandb.login(key=wb_key)
```
Make sure you have your W&B key stored as `WANDB_API_KEY` under Add-ons -> Secrets

You can view [this](https://www.kaggle.com/ayuraj/experiment-tracking-with-weights-and-biases) notebook to learn more about W&B tracking.

If you don't want to login to W&B, the kernel will still work and log everything to W&B in anonymous mode.

I am making a Custom W&B tracker for easy experiment tracking during training

In [6]:
# Start W&B logging
# W&B Login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
wb_key = user_secrets.get_secret("WANDB_API_KEY")

wandb.login(key=wb_key)

## Dataset

I am only loading images for now but I am working on extending that to add the meta features available in the dataset

In [7]:
class RSNAData(Dataset):
    def __init__(self, df, img_folder, augments=None, is_test=False):
        self.df = df
        self.is_test = is_test
        self.augments = augments
        self.img_folder = img_folder
        
    def __getitem__(self, idx):
        img_path = os.path.join(self.img_folder, self.df['img_name'][idx])
        img = cv2.imread(img_path)
        img = cv2.resize(img, (224, 224))
        if self.augments:
            img = self.augments(image=img)['image']
        img = torch.tensor(img, dtype=torch.float)
        # Rearrange the image dimensions so that channels are first in format
        # This is because VIT Model requires Channels (c) to come first
        img = rearrange(img, 'h w c -> c h w')
        
        if not self.is_test:
            target = self.df['cancer'][idx]
            target = torch.tensor(target, dtype=torch.float)
            return (img, target)
        return (img)
    
    def __len__(self):
        return len(self.df)

## Model
Just using a simple VIT model, you can fork and extend it to your preference

In [8]:
class VITModel(nn.Module):
    def __init__(self, config, pretrained=True):
        super(VITModel, self).__init__()
        self.backbone = timm.create_model(config['MODEL_NAME'], pretrained=pretrained)
        self.backbone.head = nn.Linear(self.backbone.head.in_features, config['N_LABELS'])
    def forward(self, x):
        return self.backbone(x)

Below function uses accelerator scope so that we only download our model once and use it from cache later

In [9]:
def init_model(accelerator, config, pretrained=True):
    with accelerator.main_process_first():
        model = VITModel(config=config, pretrained=pretrained)
    return model

## Fit function

This function houses the entire training and validation code

In [11]:
def fit(model, fold, epochs, train_loader, valid_loader, optimizer, train_loss_fn, valid_loss_fn, accelerator):
    for epx in range(epochs):
        accelerator.print(f"{'='*20} Epoch: {epx+1} {'='*20}\n")
        # Training part of the model
        model.train()
        avg_loss = 0
        for idx, (images, targets) in enumerate(train_loader):
            outputs = model(images).view(-1)

            loss = train_loss_fn(outputs, targets)

            accelerator.backward(loss)
            optimizer.step()
            optimizer.zero_grad(set_to_none=True)

            avg_loss += loss.item()
            if idx % 100 == 0:
                accelerator.print(f"batch: {idx}, train_loss: {loss.item():.4f}")

        avg_loss = avg_loss / len(train_loader)
        accelerator.log({'train_loss': avg_loss})
        accelerator.print(f"\nEpoch: {epx+1} / {epochs}  |  Training Loss: {avg_loss:.4f}\n")

        # Validation part of the model
        model.eval()
        avg_loss = 0
        all_outputs, all_targets = [], []
        with torch.no_grad():
            for idx, (images, targets) in enumerate(valid_loader):
                outputs = model(images).view(-1)
                loss = valid_loss_fn(outputs, targets)

                if idx % 10 == 0:
                    accelerator.print(f"batch: {idx}, valid_loss: {loss.item():.4f}")
                avg_loss += loss.item()

                outputs, targets = accelerator.gather_for_metrics((
                    outputs, targets
                ))
                all_outputs.extend(torch.sigmoid(outputs).cpu().detach().tolist())
                all_targets.extend(targets.cpu().detach().tolist())
        
        prob_f1_score = probabilistic_f1(all_targets, all_outputs, beta=0.5)
        avg_loss = avg_loss / len(valid_loader)
        accelerator.log({'val_loss': avg_loss, 'val_prob_f1': prob_f1_score})
        accelerator.print(f"\nEpoch: {epx+1} / {epochs}  |  Validation Loss: {avg_loss:.4f}")
        accelerator.print(f"\nF1 Score for epoch: {epx+1} : {prob_f1_score:.4f}\n")
    
    # Save the model
    accelerator.wait_for_everyone() 
    model = accelerator.unwrap_model(model)
    accelerator.save(model, f"fold_{fold}_model.pth")
    
    # End wandb run
    wandb.finish()

## Run function + Training

The run function is the wrapping function that will invoke the `fit()` function by passing in the data and all the necessary params

In [12]:
# Training cell
def run(df, config):
    # We are splitting the data naively for now since StratifiedGroupKFold is giving some hard time in multi-GPU setup.
    split = 0.95
    df = df.sample(frac=1).reset_index(drop=True)
    train_samples = int(len(df) * split)
    train_df = df[:train_samples+1].reset_index(drop=True)
    valid_df = df[train_samples:].reset_index(drop=True)
    
    # Initialize Accelerator with Mixed Precision for training. Also init optimizer and loss functions
    accelerator = Accelerator(mixed_precision='fp16', log_with='wandb')
    accelerator.init_trackers("rsna_mammography_pytorch", config=Config)
    model = init_model(accelerator, Config)
    optimizer = torch.optim.Adam(params=model.parameters(), lr=Config['LR'])
    train_loss_fn, valid_loss_fn = nn.BCEWithLogitsLoss(), nn.BCEWithLogitsLoss()
    
    # Load the data into Datasets and then make DataLoaders out of them for training
    train_dataset = RSNAData(
        df = train_df,
        img_folder = Config['PARENT_PATH']
    )
    valid_dataset = RSNAData(
        df = valid_df,
        img_folder = Config['PARENT_PATH']
    )
    train_loader = DataLoader(
        train_dataset,
        batch_size=Config['TRAIN_BS'],
        shuffle=True
    )
    valid_loader = DataLoader(
        valid_dataset,
        batch_size=Config['VALID_BS'],
        shuffle=False
    )
    
    # Send all these things to the prepare function so they can be prepped for Multi-GPU training
    model, optimizer, train_loader, valid_loader = accelerator.prepare(
        model, optimizer, train_loader, valid_loader
    )
    
    # Print out the data sizes we are training on
    accelerator.print(f"Training on {len(train_df)} samples, Validating on {len(valid_df)} samples")
    
    # Train the model now
    fit(
        model=model,
        fold="single",
        epochs=Config['NB_EPOCHS'],
        train_loader=train_loader,
        valid_loader=valid_loader,
        optimizer=optimizer,
        train_loss_fn=train_loss_fn,
        valid_loss_fn=valid_loss_fn,
        accelerator=accelerator
    )

In [13]:
# Load the data and pass it onto the training function
df = pd.read_csv("/kaggle/input/rsna-breast-cancer-detection/train.csv")
df['img_name'] = df['patient_id'].astype(str) + "/" + df['image_id'].astype(str) + ".png"
df = df.sample(frac=1).reset_index(drop=True)
df.head()

Unnamed: 0,site_id,patient_id,image_id,laterality,view,age,cancer,biopsy,invasive,BIRADS,implant,density,machine_id,difficult_negative_case,img_name
0,2,33385,334975624,R,MLO,59.0,0,0,0,,0,,29,False,33385/334975624.png
1,2,51497,583625522,L,CC,68.0,0,0,0,,0,,48,False,51497/583625522.png
2,2,27563,1884309005,L,MLO,61.0,0,0,0,0.0,0,,21,True,27563/1884309005.png
3,1,32060,1116132612,R,MLO,58.0,0,1,0,0.0,0,B,49,True,32060/1116132612.png
4,2,2460,1258092255,R,MLO,61.0,0,0,0,,0,,48,False,2460/1258092255.png


In [None]:
# Launch training using HuggingFace accelerate on 2x T4 GPUs
notebook_launcher(run, args=(df, Config), num_processes=2)

Launching training on 2 GPUs.


[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

<center>
<img src="https://img.shields.io/badge/Upvote-If%20you%20like%20my%20work-07b3c8?style=for-the-badge&logo=kaggle">
</center>