<a href="https://colab.research.google.com/github/Armandpl/wandb_jetracer/blob/master/wandb_jetracer_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" />

# 🔥 = W&B ➕ PyTorch ➕ Nvidia jetracer

# 🚀 Install, Import, and Log In

In [None]:
import random

import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from tqdm.notebook import tqdm

# Device configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Manually set pytorch seed to get the same dataset split everytime
torch.manual_seed(42)

<torch._C.Generator at 0x7f27ea3e79b0>

### 0️⃣ Step 0: Install W&B

To get started, we'll need to get the library.
`wandb` is easily installed using `pip`.

In [None]:
%%capture
!pip install wandb --upgrade

### 1️⃣ Step 1: Import W&B and Login

In order to log data to our web service,
you'll need to log in.

If this is your first time using W&B,
you'll need to sign up for a free account at the link that appears.

In [None]:
import wandb

wandb.login()

True

# 👩‍🔬 Define the Experiment and Pipeline

## 2️⃣ Step 2: Track metadata and hyperparameters with `wandb.init`

In [None]:
config = dict(
    epochs=60,
    architecture="resnet18",
    pretrained=True,
    batch_size=64,
    learning_rate=1e-4,
    dataset="old-racetrack:latest",
    train_pct=0.8,
    train_augs=False
    )

Now, let's define the overall pipeline,
which is pretty typical for model-training:

1. we first `make` a model, plus associated data and optimizer, then
2. we `train` the model accordingly and
3. we `optimize` the model for inference on the Jetson Nano then
3. we `test` it to see how training went.
4. finally we `log` both the trained model and the optimized model

We'll implement these functions below.

In [None]:
def model_pipeline(hyperparameters):

    # tell wandb to get started
    with wandb.init(project="wandb-jetracer", config=hyperparameters, job_type="train") as run:
      # access all HPs through wandb.config, so logging matches execution!

      # make the model, data, and optimization problem
      model, train_loader, test_loader, criterion, optimizer = make(run)
      print(model)

      # and use them to train the model
      train(model, train_loader, test_loader, criterion, optimizer, run)

      # once it's trained we optimize it using tensorRT
      # trt_model = optimize(model)

      # and then test its final performance
      # test(optimized_model, test_loader, run)

      # finally we log both models to wandb
      torch.save(model.state_dict(), 'model.pth')
      artifact = wandb.Artifact('model', type='model')
      artifact.add_file('model.pth')

      #torch.save(trt_model.state_dict(), 'trt_model.pth')
      #trt_artifact = wandb.Artifact('trt-model', type='model')
      #trt_artifact.add_file('trt-model.pth')

      # Save the artifact version to W&B and mark it as the output of this run
      run.log_artifact(artifact)
      #run.log_artifact(trt_artifact)

    return model

In [None]:
!pip install -U git+https://github.com/albu/albumentations --no-cache-dir

Collecting git+https://github.com/albu/albumentations
  Cloning https://github.com/albu/albumentations to /tmp/pip-req-build-qa07lava
  Running command git clone -q https://github.com/albu/albumentations /tmp/pip-req-build-qa07lava
Building wheels for collected packages: albumentations
  Building wheel for albumentations (setup.py) ... [?25l[?25hdone
  Created wheel for albumentations: filename=albumentations-0.5.2-cp37-none-any.whl size=88321 sha256=bf35fd9c12aa70c3060e69a1365163799bf7cdf998e92e6c1498f3ced68656d5
  Stored in directory: /tmp/pip-ephem-wheel-cache-2cm1im9w/wheels/45/8b/e4/2837bbcf517d00732b8e394f8646f22b8723ac00993230188b
Successfully built albumentations
Installing collected packages: albumentations
  Found existing installation: albumentations 0.5.2
    Uninstalling albumentations-0.5.2:
      Successfully uninstalled albumentations-0.5.2
Successfully installed albumentations-0.5.2


In [None]:
import torch
import os
import glob
import uuid
import PIL.Image
import torch.utils.data
import subprocess
import cv2
import numpy as np
import random
import albumentations as A

class XYDataset(torch.utils.data.Dataset):
    def __init__(self, directory, transform=None, train=False):
        super(XYDataset, self).__init__()
        self.directory = directory
        self.transform = A.Compose([
            A.Resize(224, 224),
            A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
        ],
        keypoint_params=A.KeypointParams(format='xy'),
        )
        self.refresh()
        self.train = train

        self.augmentations = A.Compose([
            # A.RandomCrop(width=112, height=112, p=0.3),
            # A.RandomBrightnessContrast(p=0.5),
            A.ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2, always_apply=False, p=0.5),
            A.HorizontalFlip(p=0.5),
            A.IAAPerspective (scale=(0.05, 0.1), keep_size=True, always_apply=False, p=0.3),
            A.MotionBlur(p=0.3),
            # A.ISONoise (color_shift=(0.01, 0.05), intensity=(0.1, 0.5), always_apply=False, p=0.3)
            #A.OneOf([
            #    A.HueSaturationValue(p=0.5),
            #    A.RGBShift(p=0.7)
            #], p=1),
        ],
        keypoint_params=A.KeypointParams(format='xy'),
        )

    def __len__(self):
        return len(self.annotations)

    def __getitem__(self, idx):
        ann = self.annotations[idx]
        image = cv2.imread(ann['image_path'], cv2.IMREAD_COLOR)
        image = PIL.Image.fromarray(image)
        width = image.width
        height = image.height

        image = self.transform(image=np.array(image), keypoints=[])['image']

        if self.train:
            for i in range(10):
                transformed = self.augmentations(image=np.array(image), keypoints=[(ann['x'], ann['y'])])
                image = transformed['image']
                if len(transformed['keypoints']) > 0:
                    x, y = transformed['keypoints'][0]
                    break

        image = np.transpose(image, (2, 0, 1)).astype(np.float32)

        x = 2.0 * (ann['x'] / width - 0.5) # -1 left, +1 right
        y = 2.0 * (ann['y'] / height - 0.5) # -1 top, +1 bottom

        return torch.tensor(image, dtype=torch.float), torch.Tensor([x, y])

    def _parse(self, path):
        basename = os.path.basename(path)
        items = basename.split('_')
        x = items[0]
        y = items[1]
        return int(x), int(y)

    def refresh(self):
        self.annotations = []
        for image_path in glob.glob(os.path.join(self.directory, '*.jpg')):
            x, y = self._parse(image_path)
            self.annotations += [{
                'image_path': image_path,
                'x': x,
                'y': y
            }]
        
    def save_entry(self, image, x, y):
        if not os.path.exists(self.directory):
            subprocess.call(['mkdir', '-p', self.directory])
            

        height, width, _ = image.shape

        x = int(x/width * 224)
        y = int(y/height * 224)
        filename = '%d_%d_%s.jpg' % (x, y, str(uuid.uuid1()))
       
        image = cv2.resize(image, (224, 224), interpolation = cv2.INTER_AREA)  
        image_path = os.path.join(self.directory, filename)
        cv2.imwrite(image_path, image)
        self.refresh()

In [None]:
def make(run):
    # Pull the dataset
    artifact = run.use_artifact(run.config.dataset)
    artifact_dir = artifact.download()

    dataset_aug = XYDataset(artifact_dir, train=run.config.train_augs)
    dataset = XYDataset(artifact_dir, train=False)

    train_len = int(len(dataset)*run.config.train_pct)
    test_len = len(dataset)-train_len

    train, _ = torch.utils.data.random_split(dataset_aug, (train_len, test_len))
    _, test = torch.utils.data.random_split(dataset, (train_len, test_len))

    train_loader = make_loader(train, batch_size=run.config.batch_size)
    test_loader = make_loader(test, batch_size=run.config.batch_size)

    # Make the model
    model = torchvision.models.__dict__[run.config.architecture](pretrained=run.config.pretrained)
    model.fc = nn.Linear(model.fc.in_features, 2)
    model = model.to(device)

    # Make the loss and optimizer
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(
        model.parameters(), lr=run.config.learning_rate)
    
    return model, train_loader, test_loader, criterion, optimizer

In [None]:
def make_loader(dataset, batch_size):
    loader = torch.utils.data.DataLoader(dataset=dataset,
                                         batch_size=batch_size, 
                                         shuffle=True,
                                         pin_memory=True, num_workers=8)
    return loader

# 👟 Define Training Logic

In [None]:
def train(model, train_loader, test_loader, criterion, optimizer, run):
    # tell wandb to watch what the model gets up to: gradients, weights, and more!
    run.watch(model, criterion, log="all", log_freq=10)

    # Run training and track with wandb
    total_batches = len(train_loader) * run.config.epochs
    example_ct = 0  # number of examples seen
    batch_ct = 0
    for epoch in tqdm(range(run.config.epochs)):
        for _, (images, labels) in enumerate(train_loader):

            loss = train_batch(images, labels, model, optimizer, criterion)
            example_ct +=  len(images)
            batch_ct += 1

            # Report metrics every batch
            loss = float(loss)
            wandb.log({"epoch": epoch, "train_loss": loss}, step=example_ct)
        
        # evaluate every epoch
        test(model, test_loader, criterion, example_ct)

def train_batch(images, labels, model, optimizer, criterion):
    images, labels = images.to(device), labels.to(device)
    
    # Forward pass ➡
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward pass ⬅
    optimizer.zero_grad()
    loss.backward()

    # Step with optimizer
    optimizer.step()

    return loss

# 🧪 Define Testing Logic

In [None]:
def test(model, test_loader, criterion, example_ct):
    model.eval()

    # Run the model on some test examples
    with torch.no_grad():

        mean_loss = None
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            if mean_loss is None:
              mean_loss = loss
            else:
              mean_loss = (loss+mean_loss)/2

        mean_loss = float(mean_loss)
        print("{} test loss".format(mean_loss))
        
        wandb.log({"test_loss": mean_loss}, step=example_ct)

# 🏃‍♀️ Run training and watch your metrics live on [wandb.ai](https://wandb.ai)!

In [None]:
# Build, train and analyze the model with the pipeline
model = model_pipeline(config)

  cpuset_checked))


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

HBox(children=(FloatProgress(value=0.0, max=60.0), HTML(value='')))



0.34096258878707886 test loss
0.11732679605484009 test loss
0.06136243790388107 test loss
0.027065493166446686 test loss
0.019289672374725342 test loss
0.013117067515850067 test loss
0.0119261359795928 test loss
0.008784420788288116 test loss
0.008017823100090027 test loss
0.006177249364554882 test loss
0.00558243365958333 test loss
0.006223681848496199 test loss
0.00597898755222559 test loss
0.004403924569487572 test loss
0.003941160626709461 test loss
0.00601920485496521 test loss
0.005033294670283794 test loss
0.005099016707390547 test loss
0.004731135442852974 test loss
0.004077061545103788 test loss
0.003266919869929552 test loss
0.004597066901624203 test loss
0.00433465838432312 test loss
0.003947819583117962 test loss
0.0055326796136796474 test loss
0.003501846920698881 test loss
0.0029077078215777874 test loss
0.0030030477792024612 test loss
0.0035431724973022938 test loss
0.00375111261382699 test loss
0.003124529030174017 test loss
0.003972472622990608 test loss
0.003397186519

VBox(children=(Label(value=' 42.72MB of 42.72MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.…

0,1
epoch,59.0
train_loss,0.0025
_runtime,420.0
_timestamp,1619624179.0
_step,56220.0
test_loss,0.00406


0,1
epoch,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇▇███
train_loss,█▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
_runtime,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_timestamp,▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
test_loss,█▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
