# Label Smoothing



## $$ p_k = \frac{e^{x^T w_k} }{ \sum_l ^L e^{x^T w_l}}$$

where $p_k$ is the likelihood the model assigns to the $k$-th class, $w_k$ represents the weights and bias of the last layer, x is the input vector of the last layer.


We minimize the expected value of the cross-entropy between hard targets $y_k$ and $p_k$ as in


## $$H(\mathbf{y},\mathbf{p}) = \sum_k ^K -y_k log(p_k),$$

where $y_k$ is 1 of the correct class and 0 for the rest.



Targets via label smoothing


## $$ y_k = y_k (1- \alpha) + \frac{\alpha}{K}$$

# Label Relaxation

TODO:


In [30]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [37]:
batch_size,dim=3,3
yhat=torch.randn(batch_size,dim).uniform_(0, 1)
y=torch.bernoulli(torch.randn(batch_size,dim).uniform_(0, 1))

In [38]:
def label_smoothing(y,alpha):
    return (1.0-alpha)*y + (1.0/y.size(1)) 

In [39]:
label_smoothing(y,alpha=.1)

tensor([[1.2333, 0.3333, 0.3333],
        [1.2333, 0.3333, 0.3333],
        [0.3333, 0.3333, 1.2333]])

In [40]:
def label_relaxation(yhat,y,gz_threshold,alpha):
    return torch.where(y > gz_threshold, torch.ones_like(y) - alpha, alpha * yhat / torch.unsqueeze(torch.sum((torch.ones_like(y)-y)*yhat,dim=-1), dim=-1))

In [41]:
y_lr=label_relaxation(yhat,y,gz_threshold=.1,alpha=.1)

In [42]:
y_lr

tensor([[0.9000, 0.0315, 0.0685],
        [0.9000, 0.0258, 0.0742],
        [0.0328, 0.0672, 0.9000]])