# Model Regression Free Training

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-2/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

---

## Introduction

Reducing inconsistencies in the behavior of different versions of an AI system can be as important in practice as reducing its overall error. In image classification, sample wise inconsistencies appear as “negative flips”: A new model incorrectly predicts the output for a test sample that was correctly classified by the old (reference) model. 

In [1] and [2], authors show that, even for models trained on the same data with different initial conditions, data augmentations, and hyperparameters, the error rates could yield similar, but with errors occurring on different samples. Some samples are correctly classified by the old model but incorrectly by the new one. We call such samples as Negative Flips. Their fraction of the total number is called Negative Flip Rate (NFR).

To reduce the NFR between two models, in [1], authors propose a simple approach for reducing NFR, Focal Distillation (FD), which enforces congruence with the reference model by giving more weights to samples that were correctly classified. In [2], authors propose to use Logit Difference Inhibition (LDI) loss, that penalizes changes in the logits between the new and old model, without forcing them to coincide as in ordinary distillation.

In this notebook, we show how to train 2 models with SageMaker notebook instance and measure the regression metrics like negative flip rates between their outputs. Then we show how to apply FD/LDI loss for new model training to reduce their NFR against the old model.

## Set up SageMaker

In [None]:
import os
import numpy as np
import torchvision, torch
import sagemaker, boto3, json, logging

from sagemaker import get_execution_role
from sagemaker.local import LocalSession
from time import gmtime, strftime

logging.disable(logging.CRITICAL)
s3 = boto3.client("s3")

In [None]:
# Use remote mode
sagemaker_region = boto3.Session().region_name
sagemaker_session = sagemaker.Session()
bucket_name = sagemaker_session.default_bucket()

role = sagemaker.get_execution_role()
instance_type = "ml.p3.2xlarge"

## Set up SageMaker training env

In [None]:
from sagemaker.pytorch import PyTorch

In [None]:
# git configuration to download regression-free training script
git_config = {
    "repo": "https://github.com/amazon-science/regression-constraint-model-upgrade.git",
    "branch": "main",
}

In [None]:
base_job_name = f"jumpstart-example-regression-free-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"

cifar10_estimator = PyTorch(
    base_job_name=base_job_name,
    git_config=git_config,
    entry_point="train.py",
    source_dir="ConstrainedUpgrade/",
    role=role,
    instance_type=instance_type,
    instance_count=1,
    framework_version="1.7.1",
    py_version="py3",
    hyperparameters={
        "use_cifar": True,
        "epochs": 30,
        "arch": "resnet18",
        "batch-size": 128,
        "lr": 0.1,
        "lr_step": 20,
        "bucket_name": bucket_name,
        "seed": 42,
    },
)

In [None]:
cifar10_estimator.fit()

In [None]:
# train another resnet-18 with a different random seed

base_job_name = f"jumpstart-example-regression-free-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"

cifar10_estimator = PyTorch(
    base_job_name=base_job_name,
    git_config=git_config,
    entry_point="train.py",
    source_dir="ConstrainedUpgrade/",
    role=role,
    instance_type=instance_type,
    instance_count=1,
    framework_version="1.7.1",
    py_version="py3",
    hyperparameters={
        "use_cifar": True,
        "epochs": 30,
        "arch": "resnet18",
        "batch-size": 128,
        "lr": 0.1,
        "lr_step": 20,
        "bucket_name": bucket_name,
        "seed": 114514,
    },
)

In [None]:
cifar10_estimator.fit()

## Pull the model prediction from S3 for regression testing

In [None]:
# save model outputs from S3 to local machine

with open("model_resnet18_seed_42.result", "wb") as f:
    s3.download_fileobj(bucket_name, "model_resnet18_seed_42.result", f)

with open("model_resnet18_seed_114514.result", "wb") as f:
    s3.download_fileobj(bucket_name, "model_resnet18_seed_114514.result", f)

In [None]:
# Define regression analyzer


class ModelAnalyzer:
    def __init__(self, model_info):
        if type(model_info) == str:
            model_info = torch.load(model_info)
        self.pred = model_info["pred"].cpu()
        self.gt = model_info["gt"].cpu()

    def NFR(self, old_model):  # Negative Flip Rate
        return float(((old_model.pred == self.gt) & (self.pred != self.gt)).sum()) / len(self.gt)

    def PFR(self, old_model):  # Positive Flip Rate
        return float(((old_model.pred != self.gt) & (self.pred == self.gt)).sum()) / len(self.gt)

    def Acc(self):  # Top-1 Accuracy
        return (self.pred == self.gt).sum() * 1.0 / len(self.gt)

In [None]:
# regression rate testing
s42_result = ModelAnalyzer(
    torch.load("model_resnet18_seed_42.result", map_location=torch.device("cpu"))
)
s114514_result = ModelAnalyzer(
    torch.load("model_resnet18_seed_114514.result", map_location=torch.device("cpu"))
)

print("NFR between 2 models is {}".format(s42_result.NFR(s114514_result)))

# FD/LDI regression-free training with SageMaker

After training few models with different random seeds, we can use them as a guidance to implement regression-free training with different losses

## FD training with trained resnet-18

Focal Distillation (FD) [1] enforces congruence with the reference model by giving more weights to samples that were correctly classified, which is discribed as follows,

$L_{\text{focal}} = - \sum_{i=1}^{N} (\alpha + \beta * \textbf{1} * ((\hat{y}_{\text{old}}(x_i) = y_i)\cal{D}(\phi_{\text{new}(x_i)} , \phi_{old}(x_i) ) ) $,

where $\hat{y}_{\text{old}}(x_i)$ is the predicted label of sample $x_i$ by old model $\phi_{old}(x_i) $, $\cal{D}$ is  a distance metric (we use KL divergence here). The filter function $\textbf{1}$ applies a basic weight $\alpha$ for all samples in the training set and an additional weight to the samples correctly predicted by the old model. When $\alpha$ = 1 and $\beta$ = 0, focal distillation reduces to ordinary distillation. When $\alpha$ = 0 and $\beta$ > 0, we are only applying the distillation objective to the training samples predicted correctly by the old model. 

In [None]:
base_job_name = f"jumpstart-example-regression-free-FD-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"

cifar10_FD_estimator = PyTorch(
    base_job_name=base_job_name,
    git_config=git_config,
    entry_point="train.py",
    source_dir="ConstrainedUpgrade/",
    role=role,
    instance_type=instance_type,
    instance_count=1,
    framework_version="1.7.1",
    py_version="py3",
    hyperparameters={
        "use_cifar": True,
        "gpu": 0,
        "epochs": 30,
        "arch": "resnet18",
        "batch-size": 128,
        "lr": 0.01,
        "lr_step": 20,
        "bucket_name": bucket_name,
        "seed": 1,
        "kd_model_num_classes": 10,
        "kd_model_arch": "resnet18",
        "kd_model_path": "best_model_resnet18_seed_114514.pth.tar",
        "load_from_s3": True,
        "kd_loss_weight": 1,
        "kd_alpha": 0.9,
        "kd_loss_mode": "kl",
        "kd_temperature": 100,
        "kd_filter": "old_correct",
        "filter-base": 1,
        "filter-scale": 5,
        "desc": "FD",
    },
)

In [None]:
cifar10_FD_estimator.fit()

In [None]:
with open("model_resnet18_seed_1_loss_kl_FD.result", "wb") as f:
    s3.download_fileobj(bucket_name, "model_resnet18_seed_1_loss_kl_FD.result", f)

s1_fd_result = ModelAnalyzer(
    torch.load("model_resnet18_seed_1_loss_kl_FD.result", map_location=torch.device("cpu"))
)

print("NFR between 2 models is {}".format(s1_fd_result.NFR(s114514_result)))

In [None]:
print("Acc of original model is {}".format(s114514_result.Acc()))
print("Acc of FD model is {}".format(s1_fd_result.Acc()))

## LDI training with trained resnet-18

Another loss term can be used for regression-free training is Logit Difference Inhibition (LDI) loss [2], which penalizes changes in the logits between the new and old model, without forcing them to coincide as in ordinary distillation.

LDI loss is defined as, $L_{\text{LDI}} = - \sum_{i=1}^{N} \text{max}(||\phi_{\text{new}(x_i)} - \phi_{old}(x_i)||^p - \xi, \ 0) $,

where $\xi$ is truncating threshold such that difference below $\xi$ is tolerated. $p$ is normally set to 2.

In [None]:
base_job_name = f"jumpstart-example-regression-free-LDI-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}"

cifar10_LDI_estimator = PyTorch(
    base_job_name=base_job_name,
    git_config=git_config,
    entry_point="train.py",
    source_dir="ConstrainedUpgrade/",
    role=role,
    instance_type=instance_type,
    instance_count=1,
    framework_version="1.7.1",
    py_version="py3",
    hyperparameters={
        "use_cifar": True,
        "gpu": 0,
        "epochs": 30,
        "arch": "resnet18",
        "batch-size": 128,
        "lr": 0.1,
        "lr_step": 20,
        "bucket_name": bucket_name,
        "seed": 1,
        "kd_model_num_classes": 10,
        "kd_model_arch": "resnet18",
        "kd_model_path": "best_model_resnet18_seed_114514.pth.tar",
        "load_from_s3": True,
        "kd_loss_weight": 1,
        "kd_alpha": 0.5,
        "kd_loss_mode": "li",
        "kd_filter": "all_pass",
        "li_p": 2,
        "li_margin": 0.5,
        "desc": "LDI",
    },
)

In [None]:
cifar10_LDI_estimator.fit()

In [None]:
with open("model_resnet18_seed_1_loss_li_LDI.result", "wb") as f:
    s3.download_fileobj(bucket_name, "model_resnet18_seed_1_loss_li_LDI.result", f)

s1_ldi_result = ModelAnalyzer(
    torch.load("model_resnet18_seed_1_loss_li_LDI.result", map_location=torch.device("cpu"))
)

print("NFR between 2 models is {}".format(s1_ldi_result.NFR(s114514_result)))

In [None]:
print("Acc of original model is {}".format(s42_result.Acc()))
print("Acc of LDI model is {}".format(s1_ldi_result.Acc()))

## Reference

[1] Yan, Sijie, et al. "Positive-congruent training: Towards regression-free model updates." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.

[2] Zhao, Yue, et al. "ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training." arXiv preprint arXiv:2205.06265 (2022).

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-east-2/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/us-west-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ca-central-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/sa-east-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-2/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-west-3/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-central-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/eu-north-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-southeast-2/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-northeast-2/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://prod.us-west-2.tcx-beacon.docs.aws.dev/sagemaker-nb/ap-south-1/introduction_to_amazon_algorithms|jumpstart_regression_free_training|Amazon_JumpStart_Regression_Free_Training.ipynb)
