# Update

Latest Version : Updated the LR scheduler with a custom Step LR schedule similar to what Chris Deotte used in Melanoma Competition. The intuition being the noise present in the labels

Next Update : A better cross validation to simulate the test data

# About this Notebook

Ragnar [here](https://www.kaggle.com/ragnar123/shopee-efficientnetb3-arcmarginproduct) has showed us how to train efficientb3 with arcface and CrossEntropy Loss

With this notebook here you can train any efficientnet b0-b7 with following three Metric Learning techniques above the Cross-Entropy Loss:
* ArcFace : Most Popular in this competition 
* CosFace : https://arxiv.org/abs/1801.09414
* Adacos : https://arxiv.org/abs/1905.00292

Don't worry if these terms feel alien to you , following are few resources to understand them:
* https://www.kaggle.com/c/shopee-product-matching/discussion/226279 --- Beautiful Explanantion by Chris Deotte
* https://www.kaggle.com/slawekbiel/arcface-explained --  Beautiful kernel explaining ArcFace


The training strategy used is as follows:
* Make 5-folds stratified on label groups
* Use one of the metric learning losses to predict the label  groups using cross entropy loss

This notebook basically just converts ragnar's notebook into your beloved Pytorch . 
<font color='red'>Inference notebook for this training notebook can be found </font> [here](https://www.kaggle.com/tanulsingh077/metric-learning-image-tfidf-inference?scriptVersionId=57597192)

<br>I have quick run this notebook because I have exhausted my gpu quota for this week . To see how this pipeline runs you can fork and run it .Hope this helps

### Techniques planned to be added

* Multiface posted by Mobassir [here](https://www.kaggle.com/c/shopee-product-matching/discussion/227383)

You can also have a look at Siamese Type Training Example [here](https://www.kaggle.com/tanulsingh077/siamese-style-training-efficient-net-b0-on-tpu-s)


# Important Note

Please note that you can easily plug and play with this notebook, just by changing the values in the configuration. I have yet to analyze the data , but a lot of people have reported in noise in the dataset and thus using good augmentations like mixup,cutmix might help with training . Also losses like label smoothing , Focal loss,etc can be used. 

All my ideas I will keep on appending in this kernel and all the results will be maintained in [this](https://www.kaggle.com/c/shopee-product-matching/discussion/228148) discussion thread

In [1]:
import sys

package_paths = [
    '../input/pytorch-image-models/pytorch-image-models-master', #'../input/efficientnet-pytorch-07/efficientnet_pytorch-0.7.0'
    '../input/image-fmix/FMix-master'
]
for pth in package_paths:
    sys.path.append(pth)
# sys.path.append('../input/pytorch-image-models/pytorch-image-models-master')

In [2]:
# Preliminaries
from tqdm import tqdm
import math
import random
import os
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

# Visuals and CV2
import cv2

# albumentations for augs
import albumentations
from albumentations.pytorch.transforms import ToTensorV2
from fmix import sample_mask, make_low_freq_image, binarise_mask

#torch
import torch
import timm
import torch
import torch.nn as nn
from torch.nn import Parameter
from torch.nn import functional as F
from torch.utils.data import Dataset,DataLoader
from torch.optim import Adam
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts, CosineAnnealingLR, ReduceLROnPlateau
from torch.optim import Adam, lr_scheduler
from torch.optim.lr_scheduler import _LRScheduler
from sklearn.metrics import accuracy_score

# Configuration

In [3]:
# DIM = (512,512)
image_size = 512

NUM_WORKERS = 4
TRAIN_BATCH_SIZE = 16
VALID_BATCH_SIZE = 16
EPOCHS = 30
SEED = 42
#LR = 3e-4

device = torch.device('cuda')


################################################# MODEL ####################################################################

model_name = 'efficientnet_b3' #efficientnet_b0-b7

################################################ Metric Loss and its params #######################################################
loss_module = 'arcface' #'cosface' #'adacos'
s = 30.0
m = 0.5 
ls_eps = 0.0
easy_margin = False


####################################### Scheduler and its params ############################################################
# SCHEDULER = 'CosineAnnealingWarmRestarts' #'CosineAnnealingLR'
# factor=0.2 # ReduceLROnPlateau
# patience=4 # ReduceLROnPlateau
# eps=1e-6 # ReduceLROnPlateau
# T_max=10 # CosineAnnealingLR
# T_0=4 # CosineAnnealingWarmRestarts
# min_lr=1e-6


scheduler_params = {
        "lr_start": 1e-5,
        "lr_max": 1e-5 * TRAIN_BATCH_SIZE,
        "lr_min": 1e-6,
        "lr_ramp_ep": 5,
        "lr_sus_ep": 0,
        "lr_decay": 0.8,
    }

############################################## Model Params ###############################################################
model_params = {
    'n_classes':11014,
    'model_name':'efficientnet_b3',
    'use_fc':False,
    'fc_dim':512,
    'dropout':0.0,
    'loss_module':loss_module,
    's':30.0,
    'margin':0.50,
    'ls_eps':0.0,
    'theta_zero':0.785,
    'pretrained':True
}
############################################## Folds ###############################################################
fold_id = 0


# Utils

In [4]:
def seed_torch(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    
seed_torch(SEED)

In [5]:
class AverageMeter(object):
    def __init__(self):
        self.reset()
    
    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0
    
    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

In [6]:
def fetch_scheduler(optimizer):
        if SCHEDULER =='ReduceLROnPlateau':
            scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=factor, patience=patience, verbose=True, eps=eps)
        elif SCHEDULER =='CosineAnnealingLR':
            scheduler = CosineAnnealingLR(optimizer, T_max=T_max, eta_min=min_lr, last_epoch=-1)
        elif SCHEDULER =='CosineAnnealingWarmRestarts':
            scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=T_0, T_mult=1, eta_min=min_lr, last_epoch=-1)
        return scheduler

In [7]:
def fetch_loss():
    loss = nn.CrossEntropyLoss()
    return loss

# Augmentations

In [8]:
# def get_train_transforms():
#     return albumentations.Compose(
#         [   
#             albumentations.Resize(DIM[0],DIM[1],always_apply=True),
#             albumentations.HorizontalFlip(p=0.5),
#             albumentations.VerticalFlip(p=0.5),
#             albumentations.Rotate(limit=120, p=0.8),
#             albumentations.RandomBrightness(limit=(0.09, 0.6), p=0.5),
#             #albumentations.Cutout(num_holes=8, max_h_size=8, max_w_size=8, fill_value=0, always_apply=False, p=0.5),
#             #albumentations.ShiftScaleRotate(
#               #  shift_limit=0.25, scale_limit=0.1, rotate_limit=0
#             #),
#             albumentations.Normalize(),
#             ToTensorV2(p=1.0),
#         ]
#     )

# def get_valid_transforms():

#     return albumentations.Compose(
#         [
#             albumentations.Resize(DIM[0],DIM[1],always_apply=True),
#             albumentations.Normalize(),
#         ToTensorV2(p=1.0)
#         ]
#     )
# def transforms_train():
#     return albumentations.Compose([
#        albumentations.RandomResizedCrop(CFG['image_size'], CFG['image_size'], scale=(0.9, 1), p=1), 
#        albumentations.HorizontalFlip(p=0.5),
#        albumentations.ShiftScaleRotate(p=0.5),
#        albumentations.HueSaturationValue(hue_shift_limit=0.2, sat_shift_limit=0.2, val_shift_limit=0.2, p=0.7),
#        albumentations.RandomBrightnessContrast(brightness_limit=(-0.2,0.2), contrast_limit=(-0.2, 0.2), p=0.7),
#        albumentations.CLAHE(clip_limit=(1,4), p=0.5),
#        albumentations.OneOf([
#            albumentations.OpticalDistortion(distort_limit=1.0),
# #            albumentations.GridDistortion(num_steps=5, distort_limit=1.),
# #            albumentations.ElasticTransform(alpha=3),
#        ], p=0.2),
# #        albumentations.OneOf([
# #            albumentations.GaussNoise(var_limit=[10, 50]),
# #            albumentations.GaussianBlur(),
# #            albumentations.MotionBlur(),
# #            albumentations.MedianBlur(),
# #        ], p=0.2),
#       albumentations.Resize(CFG['image_size'], CFG['image_size']),
# #       albumentations.OneOf([
# #           JpegCompression(),
# #           Downscale(scale_min=0.1, scale_max=0.15),
# #       ], p=0.2),
#       IAAPiecewiseAffine(p=0.2),
#       IAASharpen(p=0.2),
# #       albumentations.Cutout(p=0.5),
#       albumentations.Cutout(max_h_size=int(CFG['image_size'] * 0.1), max_w_size=int(CFG['image_size'] * 0.1), num_holes=5, p=0.5),
#       albumentations.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255.0, p=1.0),
#     ])

def transforms_train():
    return albumentations.Compose([
        albumentations.Resize(image_size, image_size),
        albumentations.HorizontalFlip(p=0.5),
        albumentations.RandomBrightnessContrast(p=0.5, brightness_limit=(-0.2, 0.2), contrast_limit=(-0.2, 0.2)),
        albumentations.HueSaturationValue(p=0.5, hue_shift_limit=0.2, sat_shift_limit=0.2, val_shift_limit=0.2),
        albumentations.ShiftScaleRotate(p=0.5, shift_limit=0.0625, scale_limit=0.2, rotate_limit=20),
        albumentations.CoarseDropout(p=0.5),
        albumentations.Normalize()
])

# transforms_train = albumentations.Compose([
#     albumentations.Resize(image_size, image_size),
#     albumentations.HorizontalFlip(p=0.5),
#     albumentations.RandomBrightnessContrast(p=0.5, brightness_limit=(-0.2, 0.2), contrast_limit=(-0.2, 0.2)),
#     albumentations.HueSaturationValue(p=0.5, hue_shift_limit=0.2, sat_shift_limit=0.2, val_shift_limit=0.2),
#     albumentations.ShiftScaleRotate(p=0.5, shift_limit=0.0625, scale_limit=0.2, rotate_limit=20),
#     albumentations.CoarseDropout(p=0.5),
#     albumentations.Normalize()
# ])

# transforms_valid = albumentations.Compose([
#     albumentations.Resize(image_size, image_size),
#     albumentations.Normalize()
# ])
def transforms_valid():
    return albumentations.Compose([
        albumentations.Resize(image_size, image_size),
        albumentations.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], max_pixel_value=255.0, p=1.0),
    ])

# Dataset

In [9]:
def rand_bbox(size, lam):
    W = size[0]
    H = size[1]
    cut_rat = np.sqrt(1. - lam)
    cut_w = np.int(W * cut_rat)
    cut_h = np.int(H * cut_rat)

    # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)
    return bbx1, bby1, bbx2, bby2

class ShopeeDataset(Dataset):
    def __init__(self, df, mode, transform=None, do_fmix=False, 
                 fmix_params={'alpha': 1.,
                              'decay_power': 3.,
                              'shape': (image_size, image_size),
                              'max_soft': True,
                              'reformulate': False
                },do_cutmix=False,
                 cutmix_params={
                    'alpha': 1,
                }):
        
        self.df = df.reset_index(drop=True)
        self.mode = mode
        self.transform = transform
        self.do_fmix = do_fmix
        self.fmix_params = fmix_params
        self.do_cutmix = do_cutmix
        self.cutmix_params = cutmix_params
        self.labels = df.group.values
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        label = self.labels[index]
        
        row = self.df.loc[index]
        img = cv2.imread(row.file_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        
        
        if self.transform is not None:
            img = self.transform(image=img)['image']
            
        if self.do_fmix and np.random.uniform(0., 1., size=1)[0] > 0.5:
#             with torch.no_grad:
            lam = np.clip(np.random.beta(self.fmix_params['alpha'], self.fmix_params['alpha']), 0.6, 0.7)
            #Make mask, et mean/ std
            mask = make_low_freq_image(self.fmix_params['decay_power'], self.fmix_params['shape'])
            
            mask = binarise_mask(mask, lam, self.fmix_params['shape'], self.fmix_params['max_soft'])
#             print(self.df.index)
            fmix_ix = np.random.choice(self.df.index, size=1)[0]
#             print("==============Starts now===============")
            row = self.df.iloc[fmix_ix]
#             print(row)
            fmix_img = cv2.imread(row.file_path)
# #             print(fmix_img)
            fmix_img = cv2.cvtColor(fmix_img,cv2.COLOR_BGR2RGB)
            

            if self.transform is not None:
                fmix_img = self.transform(image=fmix_img)['image']
            
#             print("Mask shape: ", mask.shape)
#             mask_torch = mask.numpy()
            mask = mask.transpose(1,2,0)
#             print("Transposed mask torch:", mask.shape)
            mask_torch = mask
#             mask_torch = torch.from_numpy(mask)
            
#             print("Mask torch shape: ", mask_torch.shape)
            #mix image
            img = mask_torch*img+(1.-mask_torch)*fmix_img
#             print("Mix shape:", img.shape)
            rate = mask.sum()/image_size/image_size
            label = rate*label + (1.-rate)*self.labels[fmix_ix]
        
        if self.do_cutmix and np.random.uniform(0., 1., size=1)[0] > 0.5:
            #print(img.sum(), img.shape)
            with torch.no_grad():
                cmix_ix = np.random.choice(self.df.index, size=1)[0]
                
                row = self.df.iloc[cmix_ix]
                cmix_img = cv2.imread(row.file_path)
                cmix_img = cv2.cvtColor(cmix_img,cv2.COLOR_BGR2RGB)
                
                if self.transform:
                    cmix_img = self.transform(image=cmix_img)['image']
                    
                lam = np.clip(np.random.beta(self.cutmix_params['alpha'], self.cutmix_params['alpha']),0.3,0.4)
                bbx1, bby1, bbx2, bby2 = rand_bbox((image_size, image_size), lam)

                img[:, bbx1:bbx2, bby1:bby2] = cmix_img[:, bbx1:bbx2, bby1:bby2]

                rate = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (image_size * image_size))
                label = rate*label + (1.-rate)*self.labels[cmix_ix]
                
        img = img.astype(np.float32)
        img = img.transpose(2,0,1)
        
        if self.mode == 'test':
            return torch.tensor(img)
        else:
            return torch.tensor(img), torch.tensor(row.group)

# Model

In [10]:
class ShopeeNet(nn.Module):

    def __init__(self,
                 n_classes,
                 model_name='efficientnet_b0',
                 use_fc=False,
                 fc_dim=512,
                 dropout=0.0,
                 loss_module='softmax',
                 s=30.0,
                 margin=0.50,
                 ls_eps=0.0,
                 theta_zero=0.785,
                 pretrained=True):
        """
        :param n_classes:
        :param model_name: name of model from pretrainedmodels
            e.g. resnet50, resnext101_32x4d, pnasnet5large
        :param pooling: One of ('SPoC', 'MAC', 'RMAC', 'GeM', 'Rpool', 'Flatten', 'CompactBilinearPooling')
        :param loss_module: One of ('arcface', 'cosface', 'softmax')
        """
        super(ShopeeNet, self).__init__()
        print('Building Model Backbone for {} model'.format(model_name))

        self.backbone = timm.create_model(model_name, pretrained=pretrained)
        final_in_features = self.backbone.classifier.in_features
        
        self.backbone.classifier = nn.Identity()
        self.backbone.global_pool = nn.Identity()
        
        self.pooling =  nn.AdaptiveAvgPool2d(1)
            
        self.use_fc = use_fc
        if use_fc:
            self.dropout = nn.Dropout(p=dropout)
            self.fc = nn.Linear(final_in_features, fc_dim)
            self.bn = nn.BatchNorm1d(fc_dim)
            self._init_params()
            final_in_features = fc_dim

        self.loss_module = loss_module
        if loss_module == 'arcface':
            self.final = ArcMarginProduct(final_in_features, n_classes,
                                          s=s, m=margin, easy_margin=False, ls_eps=ls_eps)
        elif loss_module == 'cosface':
            self.final = AddMarginProduct(final_in_features, n_classes, s=s, m=margin)
        elif loss_module == 'adacos':
            self.final = AdaCos(final_in_features, n_classes, m=margin, theta_zero=theta_zero)
        else:
            self.final = nn.Linear(final_in_features, n_classes)

    def _init_params(self):
        nn.init.xavier_normal_(self.fc.weight)
        nn.init.constant_(self.fc.bias, 0)
        nn.init.constant_(self.bn.weight, 1)
        nn.init.constant_(self.bn.bias, 0)

    def forward(self, x, label):
        feature = self.extract_feat(x)
        if self.loss_module in ('arcface', 'cosface', 'adacos'):
            logits = self.final(feature, label)
        else:
            logits = self.final(feature)
        return logits

    def extract_feat(self, x):
        batch_size = x.shape[0]
        x = self.backbone(x)
        x = self.pooling(x).view(batch_size, -1)

        if self.use_fc:
            x = self.dropout(x)
            x = self.fc(x)
            x = self.bn(x)

        return x

# Metric Learning Losses

* https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution/blob/master/src/modeling/metric_learning.py -- Code Taken from here

In [11]:
class AdaCos(nn.Module):
    def __init__(self, in_features, out_features, m=0.50, ls_eps=0, theta_zero=math.pi/4):
        super(AdaCos, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.theta_zero = theta_zero
        self.s = math.log(out_features - 1) / math.cos(theta_zero)
        self.m = m
        self.ls_eps = ls_eps  # label smoothing
        self.weight = Parameter(torch.FloatTensor(out_features, in_features))
        nn.init.xavier_uniform_(self.weight)

    def forward(self, input, label):
        # normalize features
        x = F.normalize(input)
        # normalize weights
        W = F.normalize(self.weight)
        # dot product
        logits = F.linear(x, W)
        # add margin
        theta = torch.acos(torch.clamp(logits, -1.0 + 1e-7, 1.0 - 1e-7))
        target_logits = torch.cos(theta + self.m)
        one_hot = torch.zeros_like(logits)
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        if self.ls_eps > 0:
            one_hot = (1 - self.ls_eps) * one_hot + self.ls_eps / self.out_features
        output = logits * (1 - one_hot) + target_logits * one_hot
        # feature re-scale
        with torch.no_grad():
            B_avg = torch.where(one_hot < 1, torch.exp(self.s * logits), torch.zeros_like(logits))
            B_avg = torch.sum(B_avg) / input.size(0)
            theta_med = torch.median(theta)
            self.s = torch.log(B_avg) / torch.cos(torch.min(self.theta_zero * torch.ones_like(theta_med), theta_med))
        output *= self.s

        return output

In [12]:
class ArcMarginProduct(nn.Module):
    r"""Implement of large margin arc distance: :
        Args:
            in_features: size of each input sample
            out_features: size of each output sample
            s: norm of input feature
            m: margin
            cos(theta + m)
        """
    def __init__(self, in_features, out_features, s=30.0, m=0.50, easy_margin=False, ls_eps=0.0):
        super(ArcMarginProduct, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.s = s
        self.m = m
        self.ls_eps = ls_eps  # label smoothing
        self.weight = Parameter(torch.FloatTensor(out_features, in_features))
        nn.init.xavier_uniform_(self.weight)

        self.easy_margin = easy_margin
        self.cos_m = math.cos(m)
        self.sin_m = math.sin(m)
        self.th = math.cos(math.pi - m)
        self.mm = math.sin(math.pi - m) * m

    def forward(self, input, label):
        # --------------------------- cos(theta) & phi(theta) ---------------------------
        cosine = F.linear(F.normalize(input), F.normalize(self.weight))
        sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
        phi = cosine * self.cos_m - sine * self.sin_m
        if self.easy_margin:
            phi = torch.where(cosine > 0, phi, cosine)
        else:
            phi = torch.where(cosine > self.th, phi, cosine - self.mm)
        # --------------------------- convert label to one-hot ---------------------------
        # one_hot = torch.zeros(cosine.size(), requires_grad=True, device='cuda')
        one_hot = torch.zeros(cosine.size(), device='cuda')
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        if self.ls_eps > 0:
            one_hot = (1 - self.ls_eps) * one_hot + self.ls_eps / self.out_features
        # -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
        output *= self.s

        return output

In [13]:
class AddMarginProduct(nn.Module):
    r"""Implement of large margin cosine distance: :
    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        s: norm of input feature
        m: margin
        cos(theta) - m
    """

    def __init__(self, in_features, out_features, s=30.0, m=0.40):
        super(AddMarginProduct, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.s = s
        self.m = m
        self.weight = Parameter(torch.FloatTensor(out_features, in_features))
        nn.init.xavier_uniform_(self.weight)

    def forward(self, input, label):
        # --------------------------- cos(theta) & phi(theta) ---------------------------
        cosine = F.linear(F.normalize(input), F.normalize(self.weight))
        phi = cosine - self.m
        # --------------------------- convert label to one-hot ---------------------------
        one_hot = torch.zeros(cosine.size(), device='cuda')
        # one_hot = one_hot.cuda() if cosine.is_cuda else one_hot
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        # -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)  # you can use torch.where if your torch.__version__ is 0.4
        output *= self.s
        # print(output)

        return output

# Custom LR

In [14]:
class ShopeeScheduler(_LRScheduler):
    def __init__(self, optimizer, lr_start=5e-6, lr_max=1e-5,
                 lr_min=1e-6, lr_ramp_ep=5, lr_sus_ep=0, lr_decay=0.8,
                 last_epoch=-1):
        self.lr_start = lr_start
        self.lr_max = lr_max
        self.lr_min = lr_min
        self.lr_ramp_ep = lr_ramp_ep
        self.lr_sus_ep = lr_sus_ep
        self.lr_decay = lr_decay
        super(ShopeeScheduler, self).__init__(optimizer, last_epoch)
        
    def get_lr(self):
        if not self._get_lr_called_within_step:
            warnings.warn("To get the last learning rate computed by the scheduler, "
                          "please use `get_last_lr()`.", UserWarning)
        
        if self.last_epoch == 0:
            self.last_epoch += 1
            return [self.lr_start for _ in self.optimizer.param_groups]
        
        lr = self._compute_lr_from_epoch()
        self.last_epoch += 1
        
        return [lr for _ in self.optimizer.param_groups]
    
    def _get_closed_form_lr(self):
        return self.base_lrs
    
    def _compute_lr_from_epoch(self):
        if self.last_epoch < self.lr_ramp_ep:
            lr = ((self.lr_max - self.lr_start) / 
                  self.lr_ramp_ep * self.last_epoch + 
                  self.lr_start)
        
        elif self.last_epoch < self.lr_ramp_ep + self.lr_sus_ep:
            lr = self.lr_max
            
        else:
            lr = ((self.lr_max - self.lr_min) * self.lr_decay**
                  (self.last_epoch - self.lr_ramp_ep - self.lr_sus_ep) + 
                  self.lr_min)
        return lr

# Training Function

In [15]:
def train_fn(dataloader,model,criterion,optimizer,device,scheduler,epoch):
    model.train()
    loss_score = AverageMeter()
    
    tk0 = tqdm(enumerate(dataloader), total=len(dataloader))
    for bi,d in tk0:
        
        batch_size = d[0].shape[0]


        images = d[0]
        targets = d[1]

        images = images.to(device)
        targets = targets.to(device)

        optimizer.zero_grad()

        output = model(images,targets)
        
        loss = criterion(output,targets)
        
        loss.backward()
        optimizer.step()
        
        loss_score.update(loss.detach().item(), batch_size)
        tk0.set_postfix(Train_Loss=loss_score.avg,Epoch=epoch,LR=optimizer.param_groups[0]['lr'])
        
    if scheduler is not None:
            scheduler.step()
        
    return loss_score

# Evaluation Function

In [16]:
def eval_fn(data_loader,model,criterion,device):
    
    loss_score = AverageMeter()
    
    model.eval()
    TARGETS = []
    PREDS = []
    tk0 = tqdm(enumerate(data_loader), total=len(data_loader))
    
    with torch.no_grad():
        
        for bi,d in tk0:
            batch_size = d[0].size()[0]

            image = d[0]
            targets = d[1]

            image = image.to(device)
            targets = targets.to(device)
            
            output = model(image,targets)
            
            PREDS += [torch.argmax(logits, 1).detach().cpu()]
            TARGETS += [targets.detach().cpu()]

            loss = criterion(output,targets)
            
            loss_score.update(loss.detach().item(), batch_size)
            tk0.set_postfix(Eval_Loss=loss_score.avg)
            
        PREDS = torch.cat(PREDS).cpu().numpy()
        TARGETS = torch.cat(TARGETS).cpu().numpy()
        accuracy = (PREDS==TARGETS).mean()
            
    return loss_score, accuracy

# Engine

In [17]:
data = pd.read_csv('../input/shopee-folds/train_folds.csv')
# data['filepath'] = data['image'].apply(lambda x: os.path.join('../input/shopee-product-matching/', 'train_images', x))

In [18]:
data.head()

Unnamed: 0,posting_id,image,image_phash,title,label_group,file_path,group,fold
0,train_129225211,0000a68812bc7e98c42888dfb1c07da0.jpg,94974f937d4c2433,Paper Bag Victoria Secret,249114794,../input/shopee-product-matching/train_images/...,1475,2
1,train_3386243561,00039780dfc94d01db8676fe789ecd05.jpg,af3f9460c2838f0f,"Double Tape 3M VHB 12 mm x 4,5 m ORIGINAL / DO...",2937985045,../input/shopee-product-matching/train_images/...,4395,2
2,train_2288590299,000a190fdd715a2a36faed16e2c65df7.jpg,b94cb00ed3e50f78,Maling TTS Canned Pork Luncheon Meat 397 gr,2395904891,../input/shopee-product-matching/train_images/...,6202,0
3,train_2406599165,00117e4fc239b1b641ff08340b429633.jpg,8514fc58eafea283,Daster Batik Lengan pendek - Motif Acak / Camp...,4093212188,../input/shopee-product-matching/train_images/...,409,0
4,train_3369186413,00136d1cf4edede0203f32f05f660588.jpg,a6f319f924ad708c,Nescafe \xc3\x89clair Latte 220ml,3648931069,../input/shopee-product-matching/train_images/...,3063,3


In [19]:
# encoder = LabelEncoder()
# data['label_group'] = encoder.fit_transform(data['label_group'])

In [20]:
if __name__ == "__main__":
        
    train = data[data['fold'] != fold_id].reset_index(drop=True)
    valid = data[data['fold'] == fold_id].reset_index(drop=True)
    # Defining DataSet
    dataset_train = ShopeeDataset(train, 'train', transform = transforms_train(), do_fmix=True, do_cutmix=False)
    dataset_valid = ShopeeDataset(valid, 'test', transform = transforms_valid())
        
    train_loader = torch.utils.data.DataLoader(
        dataset_train,
        batch_size=TRAIN_BATCH_SIZE,
        pin_memory=True,
        drop_last=True,
        num_workers=NUM_WORKERS
    )
    
    valid_loader = torch.utils.data.DataLoader(
        dataset_valid,
        batch_size=VALID_BATCH_SIZE,
        num_workers=NUM_WORKERS,
        shuffle=False,
        pin_memory=True,
        drop_last=False,
    )
    
    # Defining Device
    device = torch.device("cuda")
    
    # Defining Model for specific fold
    model = ShopeeNet(**model_params)
    model.to(device)
    
    #DEfining criterion
    criterion = fetch_loss()
    criterion.to(device)
        
    # Defining Optimizer with weight decay to params other than bias and layer norms
#     param_optimizer = list(model.named_parameters())
#     no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
#     optimizer_parameters = [
#         {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.001},
#         {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0},
#             ]  
    
    optimizer = torch.optim.Adam(model.parameters(), lr=scheduler_params['lr_start'])
    
    #Defining LR SCheduler
    scheduler = ShopeeScheduler(optimizer,**scheduler_params)
        
    # THE ENGINE LOOP
    best_loss = 10000
    for epoch in range(EPOCHS):
        train_loss = train_fn(train_loader, model,criterion, optimizer, device,scheduler=scheduler,epoch=epoch)
        
        valid_loss = eval_fn(valid_loader, model, criterion,device)
        
        if valid_loss.avg < best_loss:
            best_loss = valid_loss.avg
            torch.save(model.state_dict(),f'model_{model_name}_IMG_SIZE_{image_size}_{loss_module}.pth')
            print(f'best model found for epoch {epoch}')

Building Model Backbone for efficientnet_b3 model


Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/efficientnet_b3_ra2-cf984f9c.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b3_ra2-cf984f9c.pth
100%|██████████| 1712/1712 [25:21<00:00,  1.13it/s, Epoch=0, LR=1e-5, Train_Loss=23.5]
  0%|          | 0/429 [00:01<?, ?it/s]


RuntimeError: Expected 4-dimensional input for 4-dimensional weight [40, 3, 3, 3], but got 3-dimensional input of size [3, 512, 512] instead

In [21]:
# run()