# SuperGradients Quick Tour Notebook

Hi there and welcome to SuperGradients, a free open-source training library for PyTorch-based deep learning models.   Let's have a quick look at the SuperGradients library features. The library lets you train models from any Computer Vision tasks or import pre-trained SOTA models, such as object detection, calssification of images, and semantic segmentation for videos or images use cases.

Whether you are a beginer or an expert it is likely that you already have your own training script, model, loss function implementation etc.
In this notebook we present the modifications needed in order to launch your training so you can benefit from the various tools the SuperGradients package has to offer.


General requirements:
- Python 3.7, 3.8 or 3.9 installed.

To train on nvidia GPUs:
- Nvidia CUDA Toolkit >= 11.2
- CuDNN >= 8.1.x
- Nvidia Driver with CUDA >= 11.2 support (≥460.x)

In [None]:
# SuperGradients installation
# !pip install super-gradients

# To install from source instead of the last release, comment the command above and uncomment the following one.
# !pip install git+https://github.com/Deci-AI/super_gradients.git

> **NOTE:** All code examples presented in the documentation are in PyTorch framework.

## Getting started with training a model

**Integrating Your Loss Function**

The loss function class must be of torch.nn.module._LOSS type. For example, our LabelSmoothingCrossEntropyLoss implementation

In [1]:
import torch.nn as nn
from super_gradients.training.losses.label_smoothing_cross_entropy_loss import cross_entropy

class LabelSmoothingCrossEntropyLoss(nn.CrossEntropyLoss):
    def __init__(self, weight=None, ignore_index=-100, reduction='mean', smooth_eps=None, smooth_dist=None,
                 from_logits=True):
        super(LabelSmoothingCrossEntropyLoss, self).__init__(weight=weight,
                                                             ignore_index=ignore_index, reduction=reduction)
        self.smooth_eps = smooth_eps
        self.smooth_dist = smooth_dist
        self.from_logits = from_logits

    def forward(self, input, target, smooth_dist=None):
        if smooth_dist is None:
            smooth_dist = self.smooth_dist
        loss = cross_entropy(input, target, weight=self.weight, ignore_index=self.ignore_index,
                             reduction=self.reduction, smooth_eps=self.smooth_eps,
                             smooth_dist=smooth_dist, from_logits=self.from_logits)

        return loss

You did not mention an AWS environment.You can set the environment variable ENVIRONMENT_NAME with one of the values: development,staging,production


**Integrating Your Dataset**

In order to integrate your own dataset with our training scheme, we introduce the *dataset_interface* concept, which wraps the *torch dataloaders* used for training.
The specified dataset interface class must inherit from deci_trainer.trainer.datasets.dataset_interfaces.dataset_interface, which is where data augmentation and data loader configurations are defined.
For instance, a dataset interface for Cifar10:


In [2]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from super_gradients.training import utils as core_utils
from super_gradients.training.datasets.dataset_interfaces import DatasetInterface


class UserDataset(DatasetInterface):

    def __init__(self, name="cifar10", dataset_params={}):
        super(UserDataset, self).__init__(dataset_params)
        self.dataset_name = name
        self.lib_dataset_params = {'mean': (0.4914, 0.4822, 0.4465), 'std': (0.2023, 0.1994, 0.2010)}

        crop_size = core_utils.get_param(self.dataset_params, 'crop_size', default_val=32)

        transform_train = transforms.Compose([
            transforms.RandomCrop(crop_size, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize(self.lib_dataset_params['mean'], self.lib_dataset_params['std']),
        ])

        transform_test = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize(self.lib_dataset_params['mean'], self.lib_dataset_params['std']),
        ])

        self.trainset = datasets.CIFAR10(root=self.dataset_params.dataset_dir, train=True, download=True,
                                         transform=transform_train)

        self.valset = datasets.CIFAR10(root=self.dataset_params.dataset_dir, train=False, download=True,
                                        transform=transform_test)


Required parameters can be passed using the python dataset_params argument. When implementing a dataset interface, the *trainset* and *valset* attributes are required and must be initiated with a torch.utils.data.Dataset type.
 These fields will cause the DeciMode instance to use them accordingly, such as during training, validation, and so on.

**Integrating Your Network Architecture**

This is rather straightforward- the only requirement is that the model must be of torch.nn.Module type. In our case, a simple Lenet implementation (taken from https://github.com/icpm/pytorch-cifar10/blob/master/models/LeNet.py).

In [3]:
import torch.nn as nn
import torch.nn.functional as func


class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = func.relu(self.conv1(x))
        x = func.max_pool2d(x, 2)
        x = func.relu(self.conv2(x))
        x = func.max_pool2d(x, 2)
        x = x.view(x.size(0), -1)
        x = func.relu(self.fc1(x))
        x = func.relu(self.fc2(x))
        x = self.fc3(x)
        return x

**Putting It All Together**

We instantiate an SgModel and a UserDatasetInterface, then call *connect_dataset_interface* which will initialize the dataloaders and pass additional dataset parameters to the SgModel instance.

In [4]:
from super_gradients.training import SgModel

sg_model = SgModel(experiment_name='LeNet_cifar10_example')
# sg_model.build_model(architecture=network)
dataset_params = {"batch_size": 256}
dataset = UserDataset(dataset_params)
sg_model.connect_dataset_interface(dataset)

Files already downloaded and verified
Files already downloaded and verified


Now, we pass a LeNet instance we defined above to the SgModel:

In [5]:
network = LeNet()
sg_model.build_model(network)

Next, we define metrics in order to valuate our model.
The metrics objects to be logged during training must be of torchmetrics.Metric type. For more information on how to use torchmetric.Metric objects and implement your own metrics. see https://torchmetrics.readthedocs.io/en/latest/pages/overview.html.
During training, the metric's update is called with the model's raw outputs and raw targets. Therefore, any processing of the two must be taken into account and applied in the update.

For most of the familiar cases, an existing torchmetric.Metric implementation exists in super_gradients.training.metrics. Here we simply use the SuperGradients Top1 and Top5 accuracy metrics in order to define the metrics for evaluation on the train set and the validation set.

In [6]:
from super_gradients.training.metrics import Accuracy, Top5

train_metrics_list = [Accuracy(), Top5()]
valid_metrics_list = [Accuracy(), Top5()]

Finally, we can define the training parameters, and simply call *train*:

In [7]:
train_params = {"max_epochs": 250,
                "lr_updates": [100, 150, 200],
                "lr_decay_factor": 0.1,
                "lr_mode": "step",
                "lr_warmup_epochs": 0,
                "initial_lr": 0.1,
                "loss": LabelSmoothingCrossEntropyLoss(),
                "criterion_params": {},
                "optimizer": "SGD",
                "optimizer_params": {"weight_decay": 1e-4, "momentum": 0.9},
                "launch_tensorboard": False,
                "train_metrics_list": train_metrics_list,
                "valid_metrics_list": valid_metrics_list,
                "loss_logging_items_names": ["Loss"],
                "metric_to_watch": "Accuracy",
                "greater_metric_to_watch_is_better": True}

sg_model.train(train_params)

"events.out.tfevents.1637763896.h-MacBook-Pro-sl-Shay.local.47404.0" will not be deleted
"events.out.tfevents.1637763957.h-MacBook-Pro-sl-Shay.local.47404.1" will not be deleted
sg_model -INFO- Started training for 250 epochs (0/249)



Train epoch 0: 100%|██████████| 782/782 [00:12<00:00, 63.50it/s, Accuracy=0.121, Loss=2.28, Top5=0.558, gpu_mem=0]
Validation epoch 0: 100%|██████████| 50/50 [00:01<00:00, 48.59it/s]

sg_model -INFO- Best checkpoint overriden: validation Accuracy: 0.10000000149011612



Train epoch 1: 100%|██████████| 782/782 [00:10<00:00, 73.66it/s, Accuracy=0.0982, Loss=2.31, Top5=0.498, gpu_mem=0]
Validation epoch 1: 100%|██████████| 50/50 [00:01<00:00, 48.20it/s]
Train epoch 2: 100%|██████████| 782/782 [00:10<00:00, 77.19it/s, Accuracy=0.0995, Loss=2.31, Top5=0.503, gpu_mem=0]
Validation epoch 2: 100%|██████████| 50/50 [00:00<00:00, 51.24it/s]
Train epoch 3: 100%|██████████| 782/782 [00:10<00:00, 73.51it/s, Accuracy=0.0997, Loss=2.31, Top5=0.502, gpu_mem=0]
Validation epoch 3: 100%|██████████| 50/50 [00:01<00:00, 49.00it/s]
Train epoch 4: 100%|██████████| 782/782 [00:13<00:00, 59.94it/s, Accuracy=0.101, Loss=2.31, Top5=0.497, gpu_mem=0] 
Validation epoch 4: 100%|██████████| 50/50 [00:01<00:00, 37.44it/s]
Train epoch 5: 100%|██████████| 782/782 [00:11<00:00, 66.17it/s, Accuracy=0.0981, Loss=2.31, Top5=0.496, gpu_mem=0]
Validation epoch 5: 100%|██████████| 50/50 [00:00<00:00, 50.99it/s]
Train epoch 6:  44%|████▍     | 346/782 [00:05<00:06, 68.75it/s, Accuracy=0.1, 

sg_model -INFO- 
[MODEL TRAINING EXECUTION HAS BEEN INTERRUPTED]... Please wait until SOFT-TERMINATION process finishes and saves all of the Model Checkpoints and log files before terminating...
sg_model -INFO- For HARD Termination - Stop the process again





> **Training Parameter Notes:**
\
loss_logging_items_names parameter – Refers to the single item returned by our loss function described above.
*metric_to_watch* – Is the model’s metric that determines the checkpoint to be saved. In our example, this parameter is set to Accuracy, and can be set to any of the following:
A metric name (str) of one of the metric objects from the *valid_metrics_list* or "Loss" (which refers to the validation loss).
*greater_metric_to_watch_is_better* flag – Determines when to save a model's checkpoint according to the value of the metric_to_watch.

##TODO: How to load a pre-trained SOTA model and perform transfer learning

