Mixup regularizes the NN to favor simple linear behavior in-between training examples.

$$\tilde{x} = \lambda x_i +(1-\lambda)x_j$$
$$\tilde{y} = \lambda y_i + (1-\lambda) y_j$$
where $x_i$ and $x_j$ are raw input vectors and $y_i$ and $y_j$ are one-hot label encodings.

$\lambda \sim Beta(\alpha,\alpha)$, for $\alpha \in (0,\infty).$
The mixup hyperparameter $\alpha$ controls the strength of interpolation between feature-target pairs, recoving the ERM principle when $\alpha\rightarrow 0.$

In [None]:
def mixup_data(x, y, alpha=1.0, use_cuda=True):
    '''Returns mixed inputs, pairs of targets, and lambda'''
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1

    batch_size = x.size()[0]
    if use_cuda:
        index = torch.randperm(batch_size).cuda()
    else:
        index = torch.randperm(batch_size)

    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

In [None]:
def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

In [None]:
correct += (lam * predicted.eq(targets_a.data).cpu().sum().float()
            + (1 - lam) * predicted.eq(targets_b.data).cpu().sum().float())