# Finding Mislabelled Samples through ResNet MNIST Training Process

This notebook trains a ResNet model using MNIST dataset and employed TrainIng Data analYzer (TIDY) method based on Forgetting Events algorithm, specifically `ForgettingEventsInterpreter`, to investigate the training process by recording the predictions in the process. Some samples are manually mislabelled and we are able to find them by looking into the predictions along the training. 

In [1]:
import paddle.fluid as fluid
import paddle
import numpy as np
import sys
from PIL import Image
import matplotlib.pyplot as plt
import tarfile, pickle, itertools

sys.path.append('..')
import interpretdl as it

Define a ResNet architecture for MNIST, the code is borrowed from [PaddlePaddle Official Documentation](https://www.paddlepaddle.org.cn/documentation/docs/en/user_guides/cv_case/image_classification/README.html).

In [2]:
def conv_bn_layer(input,
                  ch_out,
                  filter_size,
                  stride,
                  padding,
                  act='relu',
                  bias_attr=False):
    tmp = fluid.layers.conv2d(
        input=input,
        filter_size=filter_size,
        num_filters=ch_out,
        stride=stride,
        padding=padding,
        act=None,
        bias_attr=bias_attr)
    return fluid.layers.batch_norm(input=tmp, act=act)


def shortcut(input, ch_in, ch_out, stride):
    if ch_in != ch_out:
        return conv_bn_layer(input, ch_out, 1, stride, 0, None)
    else:
        return input


def basicblock(input, ch_in, ch_out, stride):
    tmp = conv_bn_layer(input, ch_out, 3, stride, 1)
    tmp = conv_bn_layer(tmp, ch_out, 3, 1, 1, act=None, bias_attr=True)
    short = shortcut(input, ch_in, ch_out, stride)
    return fluid.layers.elementwise_add(x=tmp, y=short, act='relu')


def layer_warp(block_func, input, ch_in, ch_out, count, stride):
    tmp = block_func(input, ch_in, ch_out, stride)
    for i in range(1, count):
        tmp = block_func(tmp, ch_out, ch_out, 1)
    return tmp


def resnet_mnist(ipt, depth=32):
    # depth should be one of 20, 32, 44, 56, 110, 1202
    assert (depth - 2) % 6 == 0
    n = (depth - 2) // 6
    nStages = {16, 64, 128}
    conv1 = conv_bn_layer(ipt, ch_out=16, filter_size=3, stride=1, padding=1)
    res1 = layer_warp(basicblock, conv1, 16, 16, n, 1)
    res2 = layer_warp(basicblock, res1, 16, 32, n, 2)
    res3 = layer_warp(basicblock, res2, 32, 64, n, 2)
    pool = fluid.layers.pool2d(
        input=res3, pool_size=8, pool_type='avg', pool_stride=1)
    predict = fluid.layers.fc(input=pool, size=10, act='softmax')
    return predict

Use the MNIST dataset generator from **paddle.dataset** API to get the labels and manually mislabel 1% samples.

In [3]:
dataset =  paddle.dataset.mnist.train()

labels = []
for data in dataset():
    labels.append(data[-1])
    
for i in range(100, 60000, 100):
    labels[i] = np.random.choice(np.delete(np.arange(10), labels[i]))

Define a new data generator based on MNIST data generator. It pads the 28 * 28 images to 32 * 32 so that it fits the model and replaces 1% true labels by the wrong ones. 

**Important:** the data generator shoud generate the index of each sample as the first element so that each sample's behavior can be recorded according to its index.

In [4]:
def reader_prepare(datareader, new_labels):
    def reader():
        idx = 0
        for data in datareader():
            data = list(data)
            data.insert(0, idx)
            # replace true labels by wrong ones
            if idx % 100  == 0:
                data[-1] = new_labels[idx]
            # padding
            d = np.ones((32,32,1)) * -1
            d[2:30, 2:30] = data[1].reshape((28,28,1))
            data[1] = d.reshape(-1)
            yield tuple(data)
            idx += 1
    return reader

Set up a data loader with batch size of 128, and an Adam optimizer for training.

In [5]:
BATCH_SIZE = 128
train_reader = paddle.batch(
    reader_prepare(dataset, labels), batch_size=BATCH_SIZE)
optimizer = fluid.optimizer.Adam(learning_rate=0.001)

First initialize the `ForgettingEventsInterpreter` and then start `interpret`ing the training process by training 100 epochs. 

*stats* is a dictionary that maps image index to predictions in the training process and if they are correct; *noisy_samples* is a list of mislabelled image ids. *stats* is saved at "assets/stats.pkl".

In [6]:
fe = it.ForgettingEventsInterpreter(resnet_cifar10, True, [3, 32, 32])

epochs = 100
print('Training %d epochs. This may take some time.' % epochs)
stats, noisy_samples = fe.interpret(
    train_reader,
    optimizer,
    batch_size=BATCH_SIZE,
    epochs=epochs,
    noisy_labels=True,
    save_path='assets')

Training 100 epochs. This may take some time.
| Epoch [100/100] Iter[469]		Loss: 0.0001 Acc@1: 99.915%%

Calculate the recall, precision and F1 for our found noisy samples. 

99.5% of mislabelled samples have been found and among those samples found, 82.7% are indeed mislabelled.

In [7]:
recall = np.sum([id_ % 100 == 0 for id_ in noisy_samples]) / (60000 / 100)
precision = np.sum([id_ % 100 == 0 for id_ in noisy_samples]) / len(noisy_samples)
print('Recall: ', recall)
print('Precision: ', precision)
print('F1 Score: ', 2 * (recall * precision) / (recall + precision))

Recall:  0.995
Precision:  0.8268698060941828
F1 Score:  0.9031770045385779
