# Eyeglasses presence detection

In this tutorial, we will create and train a simple model for predicting if a person on an image wears eyeglasses.

We would like the model to be small and fast, so for this simple task, it would be a good decision to train a small model from scratch instead of using transfer learning from a heavy pretrained model.

We will train our network on the famous CelebA dataset. It's a dataset of images with celebrities' faces. Every image is labeled with a set of attributes. One of them is the presence of eyeglasses, so the dataset suits perfectly for our task.

# Install libraries
First, we need to prepare our work environment and install the necessary Python packages. If you're using Google Colab, you already have them, and you can skip next cell.

We added strict version requirements for the packages for better reproducibility.

Note that these versions of packages will replace already installed ones.

In [0]:
%pip install -q \
    numpy==1.18.2 torch==1.4.0 torchvision==0.5.0 \
    tqdm==4.43.0 pillow==7.0.0 matplotlib==3.2.0 \
    pandas==1.0.1

# Import libraries

Let's import the libraries we will use in the project.

In [0]:
import math
import random
import shutil
from datetime import datetime
from pathlib import Path
from urllib.request import urlretrieve

import PIL.Image as Image
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as data
from IPython.display import clear_output, display, HTML
from torchvision import transforms
from tqdm import tqdm_notebook as tqdm

pd.options.display.precision = 3
pd.options.display.max_rows = 10

Let's check the PyTorch version.

In [0]:
torch.__version__

For the better reproducibility, it would be useful to set random seeds for the libraries.

In [0]:
RANDOM_SEED = 123

random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
torch.cuda.manual_seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Flip values for slower training speed, but more determenistic results.
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark = True

It's recommended to train neural networks on GPU. However, it's possible to train them on a CPU as well. We will use the 0th GPU if a GPU is available and CPU otherwise.

In [0]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device

Let's define some paths where we will store the images from the dataset and metadata.

In [0]:
MAIN_ATTRIBUTE = 'Eyeglasses'                                  # Name of the class we will predict
IMAGE_SIZE = 64                # Size of the model's input

In [0]:
WORKING_DIR_PATH = Path('.')                                   # Base directory for all the content

IMAGES_ZIP_PATH = WORKING_DIR_PATH / 'img_align_celeba.zip'    # Archive with images
IMAGES_DIR_PATH = IMAGES_ZIP_PATH.with_suffix('')              # Directory with images
ANNOTATIONS_PATH = WORKING_DIR_PATH / 'list_attr_celeba.txt'   # Text document with labels for every image.
PARTITION_PATH = WORKING_DIR_PATH / 'list_eval_partition.txt'  # Text document with marks what subset
                                                               # (train, validation or split) an image belongs to.

CHECKPOINTS_PATH = WORKING_DIR_PATH / 'checkpoints'            # Path to the weights of the trained model
ONNX_PATH = WORKING_DIR_PATH / 'classifier.onnx'

CHECKPOINTS_PATH.mkdir(exist_ok=True)

CELEBA_FACE_SIZE = 178

The following constants desctibe the parameters of our model and the training settings.

In [0]:

BATCH_SIZE = 128               # Batch size
EPOCHS = 20                    # Number of epochs of training

LR = 1e-2                      # Learning rate
LR_DECAY_STEP = 7              # Number of epochs after that we will decrease the learning rate
LR_DECAY_GAMMA = 0.1           # Decaying coefficient
WEIGHT_DECAY = 1e-4            # Weight decay coefficient

CONSISTENCY_WEIGHT = 0.1       # Coefficient for the consistency part of the loss function

NUM_WORKERS = 4                # Number of parallel processes for loading images

# Load the CelebA dataset


To train a model, you have to download images, annotations, and train/test partition lists from [the official CelebA Google Drive](https://drive.google.com/drive/folders/0B7EVK8r0v71pWEZsZE9oNnFzTm8). Unfortunately, we can't redistribute the dataset because of license restrictions.

You need three files.

- `img_align_celeba.zip` — an archive with cropped images,
- `list_attr_celeba.txt` — a list of the image attributes,
- `list_eval_partition.txt` — a file with dataset partition to three subsets.

After downloading, just put them into the working directory you set above or upload them to the CelebA folder of your Google Drive if you are using Google Colab. You can copy the files from your Google Drive to Colab with the following function. For that, uncomment the last line and run the cell.

In [0]:
def copy_files_from_drive():
    DRIVE_PATH = WORKING_DIR_PATH / 'drive' / 'My Drive'  # The mounted Google Drive
    from google.colab import drive
    drive.mount(str(DRIVE_PATH.parent))
    
    shutil.copy(DRIVE_PATH /'CelebA'/ 'Img'/ IMAGES_ZIP_PATH.name, IMAGES_ZIP_PATH)
    shutil.copy(DRIVE_PATH /'CelebA'/ 'Anno' /ANNOTATIONS_PATH.name, ANNOTATIONS_PATH)
    shutil.copy(DRIVE_PATH /'CelebA'/ 'Eval'/ PARTITION_PATH.name, PARTITION_PATH)


## Uncomment for copying files to Colab
# copy_files_from_drive()

Now we can extract the images to a directory. If you encounter message "Disk is almost full" on Google Colab, please click the "ignore" button.

In [0]:
if not IMAGES_DIR_PATH.exists():
    shutil.unpack_archive(str(IMAGES_ZIP_PATH), str(WORKING_DIR_PATH))

# Load the dataset

Now let's load the image attributes. The format of the text document with the image attributes is quite simple. It's just a plain text file. The first line is a number of rows, the second one is a header and the others contain file names and attributes.

We can load this file into a so-called dataframe with Pandas. There is a function `read_csv` that can load data from any text file with separators. In our case, we set the separator to a sequence of whitespaces (`\s+`), ignore the first line with the number of rows (we actually don't need it) and use the first column as an index.

In [0]:
df_attr = pd.read_csv(ANNOTATIONS_PATH, sep='\s+', skiprows=1, index_col=0)

Jupyter and Google Colab can show the contents of the dataframe. For that just run a cell with its name.

In [0]:
df_attr

As you can see, the values are encoded with -1 and 1. For convenience let's recode them to a binary (0 and 1) representation. Also, we can leave only the column with the attribute we are going to train the classifier for. For the latter, we can just replace -1 with 0.

In [0]:
df_attr = df_attr[[MAIN_ATTRIBUTE]].replace({-1: 0})

Now we have a dataframe with a single column.

In [0]:
df_attr

# Exploratory data analysis

The first step before model training is always exploring and analyzing the data. In this simple tutorial, we won't perform complex analysis but will take a look at the labels and determine if the dataset is balanced or not.

Let's pick a set of random images, both with eyeglasses and without them.

In [0]:
samples_with_glasses = list(df_attr[df_attr[MAIN_ATTRIBUTE] == 1].sample(25, random_state=RANDOM_SEED).index)
samples_without_glasses = list(df_attr[df_attr[MAIN_ATTRIBUTE] == 0].sample(25, random_state=RANDOM_SEED).index)

For convenience, we can write a function for displaying a grid with images and titles above them. It will accept a list of image names and a list of titles that contains image names by default.

In [0]:
def load_and_show(image_names, titles=None, directory=IMAGES_DIR_PATH):
    if titles is None:
        titles = image_names
    N = len(image_names)
    cols = int(math.sqrt(N))
    rows = int(math.ceil(N / cols))
    plt.figure(figsize=(12, 12))
    image_names_iter = iter(image_names)
    titles_iter = iter(titles)
    for r in range(rows):
        for c in range(cols):
            try:
                name = next(image_names_iter)
                title = next(titles_iter)
            except StopIteration:
                plt.show()
                return
            plt.subplot(rows, cols, cols * r + c + 1)
            with Image.open(Path(directory) / name) as image_pil:
                image = np.array(image_pil.convert('RGB'))
            plt.imshow(image)
            plt.axis('off')
            plt.title(title)

Let's take a look at the images!

With eyeglasses:

In [0]:
load_and_show(samples_with_glasses)

Without eyeglasses:

In [0]:
load_and_show(samples_without_glasses)

As we can see, the labels seem to be true.

Now let's take a look at the percentage of people with eyeglasses. 

In [0]:
ratio = df_attr[MAIN_ATTRIBUTE].mean()
print(f'{MAIN_ATTRIBUTE}: {ratio * 100:0.2f} %')

As we can see, this attribute is unbalanced, so that we have to use one of the balancing techniques like undersampling, oversampling, weighting, or something else.
Undersampling looks like the simplest solution for our case.

Now let's load the list of folds.

In [0]:
df_partition = pd.read_csv(PARTITION_PATH, sep='\s+', names=['fold'], index_col=0)
df_partition.fold.value_counts()

We can use the 0th fold as a train set and the union of the 1st and the 2nd folds as a validation set. We won't optimize hyperparameters in this small tutorial, so we don't need a test set.

In [0]:
df_attr = df_attr.join(df_partition)

In [0]:
df_train = df_attr[df_attr.fold == 0]
df_valid = df_attr[df_attr.fold != 0]

For undersampling, we can take an equal number of images with persons that wear and that don't wear eyeglasses. To do that, we will use the following handy function.

In [0]:
def equalize(df, attribute):
    N = len(df)
    ones = df[attribute].sum()
    k = min(ones, N - ones)
    df_ones = df[df[attribute] == 1].sample(k)
    df_zeros = df[df[attribute] == 0].sample(k)
    index = list(df_ones.index) + list(df_zeros.index)
    random.shuffle(index)
    return df.loc[index]

Now we can equalize the sizes of the subsets. If you like to train a model on the whole dataset just comment or skip the next cell.

In [0]:
df_train = equalize(df_train, MAIN_ATTRIBUTE)
df_valid = equalize(df_valid, MAIN_ATTRIBUTE)

# Create dataloaders

For loading the data, we have to create a dataset and a data loader for both of the folds.

A dataset is an iterator-like object that returns an image and its label at every step. A loader is an object that aggregates the output of the dataset and returns batches.

Let's start with image transformations that play an important role in the data loading process. They crop an original image to make it square, they also convert the image to a PyTorch tensor and apply normalization.

In the transforms for the train set, we can add augmentations. We will use only horizontal flipping, random affine transformation and random color changing.

Note that we don't normalize input images. By default `ToTensor()` rescales values to the range `[0, 1]`. Often for better convergence, it would be better to rescale pixel values to a different range. However, for our task, the default range is good enough.

If you need to normalize pixel values (eg. for transfer learning), consider adding `transforms.Normalize(mean, std)` after `transforms.ToTensor()`.

In [0]:
crop_resize = transforms.Compose([
    transforms.CenterCrop(CELEBA_FACE_SIZE),
    transforms.Resize(IMAGE_SIZE),
])

transforms_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.2, hue=0.1),
    transforms.RandomAffine(3, scale=(0.95, 1.05)),
    transforms.ToTensor(),
])

transforms_valid = transforms.Compose([
    transforms.ToTensor(),
])

For faster data loading, we can prepare images and load them to memory. We will put all the cropped images to a list.

If your dataset is big or you don't have enough amount of free memory, it might be a bad idea. In that case, you have to add images right in `Dataset` object or consider alternative ways to store your data.

In [0]:
def load_images(df):
    images = []
    for image_name in tqdm(df.index, dynamic_ncols=True, leave=False):
        image_pil = Image.open(IMAGES_DIR_PATH / image_name)
        image = crop_resize(image_pil)
        images.append(image)
    return images

In [0]:
images_train = load_images(df_train)
images_valid = load_images(df_valid)

The dataset object is pretty simple. It loads an image from disk and applies the transforms to it. It returns a transformed image and the corresponding label.

In [0]:
class CelebaDataset(data.Dataset):
    def __init__(self, df, images, transforms=None):
        self.df = df
        self.images = images
        self.transforms = transforms

    def __getitem__(self, index):
        row = self.df.iloc[index]
        image_name = row.name
        attrs = row[MAIN_ATTRIBUTE]
        image = self.images[index]
        image_tensor = self.transforms(image)
        return image_tensor, attrs

    def __len__(self):
        return len(self.df)

For convenience, we can also write a function that performs a reverse transformation from normalized tensor to a NumPy array with pixel values.

In [0]:
def decode(tensor):
    return (tensor.cpu()
                  .clamp(0, 1)
                  .numpy()
                  .transpose((1, 2, 0)))

Let's create datasets.

In [0]:
dataset_train = CelebaDataset(df_train, images_train, transforms_train)
dataset_valid = CelebaDataset(df_valid, images_valid, transforms_valid)

Now we can check them by loading the first image from the validation dataset.

In [0]:
image, attrs = next(iter(dataset_valid))

plt.imshow(decode(image))
plt.axis('off')
plt.show()

Looks good enough. The image is small, and at the same time, it contains the details for eyeglasses detecting.

Data loaders can also accept a weighted sampler. We can tune the weights so that the loader will load images with eyeglasses and without ones with approximately equal probability.

Note that it's useless for a dataset that was already equalized, but we leave this code for the case when you need to train on the whole unbalanced dataset.

For computing weights, we can write a function that multiplies the probability by the ratio of numbers of images with different labels.

In [0]:
def get_weights(arr):
    N = len(arr)
    ones = arr.sum()
    zeros = N - ones
    return (1 - arr) + arr * zeros / ones 

In [0]:
weights_train = get_weights(df_train[MAIN_ATTRIBUTE].values)
weights_valid = get_weights(df_valid[MAIN_ATTRIBUTE].values)

Now we can create the loaders. The train one will shuffle images and sample them with weights, the validation one will just return the images from the validation dataset.

In [0]:
loader_train = data.DataLoader(
    dataset_train,
    batch_size=BATCH_SIZE,
    sampler=data.WeightedRandomSampler(weights_train, len(df_train)),
    num_workers=NUM_WORKERS,
    drop_last=True)
loader_valid = data.DataLoader(
    dataset_valid,
    batch_size=BATCH_SIZE,
    num_workers=NUM_WORKERS,
    drop_last=False)

# Create a model

Now it's time to create a model. We will use separable convolution with batch normalization as the building blocks of our model. A separable convolution is a block with two convolutions instead of a single one: a depthwise one and a pointwise one. Such convolutions are faster and contain fewer parameters. The idea is borrowed from the paper [Howard A. G. et al. *MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications*](https://arxiv.org/abs/1704.04861).

In [0]:
class SeparableConvBN(nn.Module):
    def __init__(self, channels_in, channels_out):
        super().__init__()
        self.blocks = nn.Sequential(
            nn.Conv2d(channels_in, channels_in,
                      kernel_size=3, padding=1,
                      groups=channels_in, bias=False),
            nn.BatchNorm2d(channels_in),
            nn.ReLU(inplace=True),
            nn.Conv2d(channels_in, channels_out,
                      kernel_size=1, padding=0,
                      bias=False),
            nn.BatchNorm2d(channels_out),
        )

    def forward(self, x):
        return self.blocks(x)

Also, we will use a downsampling layer introduced in the paper [Zhang R. *Making Convolutional Networks Shift-Invariant Again*](https://richzhang.github.io/antialiased-cnns/). Antialiasing improves the stability of the predictions.

In [0]:
class Downsample(nn.Module):
    def __init__(self, channels):
        super().__init__()
        
        a = np.array([1., 2., 1.], dtype=np.float32)
        a2 = a[:,None] * a[None,:]
        filt = torch.tensor(a2 / a2.sum())[None,None,:,:].repeat((channels,1,1,1))
        self.register_buffer('filt', filt)
        self.pad = nn.ReflectionPad2d([1, 1, 1, 1])
        self.channels = channels

    def forward(self, x):
        return F.conv2d(self.pad(x), self.filt, stride=2, groups=self.channels)

The model is a sequence of separable antialiased convolutions with a linear layer at the end.

In [0]:
class Classifier(nn.Module):
    def __init__(self):
        super().__init__()

        blocks = []

        blocks.extend([ 
            # 64x64
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(16),
            nn.ReLU(inplace=True),
            Downsample(16),

            # 32x32
            SeparableConvBN(16, 32),
            nn.ReLU(inplace=True),
            Downsample(32),
            
            # 16x16
            SeparableConvBN(32, 32),
            nn.ReLU(inplace=True),
            Downsample(32),

            # 8x8
            SeparableConvBN(32, 32),
            nn.ReLU(inplace=True),
            Downsample(32),
            
            # 4x4
            SeparableConvBN(32, 64),
            nn.ReLU(inplace=True),
            Downsample(64),
            
            nn.AdaptiveAvgPool2d(1),

            nn.Conv2d(64, 1, kernel_size=1, stride=1, padding=0),
            nn.Flatten(1),
        ])

        self.blocks = nn.Sequential(*blocks)

    def forward(self, x):
        return self.blocks(x)

Now we can create a model instance.

In [0]:
model = Classifier()

# Train the model

One of the most significant parts of the model training process is a loss function. For classification, the natural choice is binary cross-entropy.

Also, we will add a so-called geometric consistency loss that demands the predictions after small shifts of flipping will become the same. We will minimize the mean squared error between two predictions.

In [0]:
bce = nn.BCEWithLogitsLoss()
mse = nn.MSELoss()

To make the model much faster, let's move it to the GPU.

In [0]:
model = model.to(device)

For measuring the quality of the model, we will use the [F1 score](https://en.wikipedia.org/wiki/F1_score). It's a geometric mean of precision and recall. It works well even with unbalanced datasets.

For computing F1, we will use the following function that accepts a confusion matrix.

In [0]:
def f1_score(conf_mat):
    tp = conf_mat[1, 1]
    fp = conf_mat[1, 0]
    fn = conf_mat[0, 1]
    precision = tp / (tp + fp)
    recall = tp / (tp + fn)
    f1 = 2 * (precision * recall) / (precision + recall)
    return precision, recall, f1

For validation, we will use the following function. It computes the value of the loss function and metrics for the whole validation set.

In [0]:
def valid(loader=loader_valid):
    model.eval()
    confusion_matrix = np.zeros((2, 2), dtype=int)
    loss = 0
    pbar = tqdm(enumerate(loader), total=len(loader), dynamic_ncols=True, leave=False, desc='Validating')
    for i, (images, attrs) in pbar:
        images = images.to(device)
        with torch.no_grad():
            outputs = model(images)
            loss += bce(outputs, attrs.unsqueeze(1).to(device).float()).cpu().item()
        outputs = outputs.squeeze(1)
        preds = (outputs.cpu() > 0).int()
        for y_pred, y_true in zip(preds, attrs):
            confusion_matrix[y_true.item()][y_pred.item()] += 1
        precision, recall, f1 = f1_score(confusion_matrix)
        pbar.set_postfix({
            'Lclass': f'{loss / (i + 1):0.2f}',
            'F1': f'{f1:0.2f}',
            'Precision': f'{precision:0.2f}',
            'Recall': f'{recall:0.2f}',
        }, refresh=False)
    return loss / len(loader), precision, recall, f1, confusion_matrix

Let's run it on an untrained model.

In [0]:
loss, precision, recall, f1, confusion_matrix = valid()
print(f'Loss: {loss:.3f}')

We got the value that is close to $\ln 2 \approx 0.693$ for the loss function. We use binary cross-entropy, and this value means that the probability of a correct answer for the untrained model is $\frac12$.

For optimization, we will use Adam. It's a well-known algorithm that is proved to be a good enough first choice.

Also, we will use a scheduler that will reduce the learning rate periodically.

In [0]:
optimizer = optim.Adam(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=LR_DECAY_STEP, gamma=LR_DECAY_GAMMA)

For random changing the batch, we will shift every image in the batch for one pixel to a random direction or alternatively flip images.

In [0]:
def random_change(images):
    pad = nn.ReflectionPad2d(1)
    images_padded = pad(images)
    outputs = []
    for i in range(images.size(0)):
        dx = random.randint(-1, 1)
        dy = random.randint(-1, 1)
        if dx == 0 and dy == 0:
            outputs.append(torch.flip(images[[i]], [-1]))
        else:
            h, w = image.shape[-2:]
            outputs.append(images_padded[[i], :, 1+dy:h+dy+1, 1+dx:w+dx+1])
    return torch.cat(outputs, 0)

To get smoother values for the loss function, we will compute an exponential moving average. `alpha` is a smoothing coefficient.

In [0]:
def ema(new, old, alpha=0.6):
    if old is None or math.isnan(old):
        return new
    return alpha * new + (1 - alpha) * old

It's time to write the main training loop. There are actually two loops, one for epochs and another for batches. We will perform a validation every epoch.

We will store running statistics to the `stats` dataframe. We will be able to use it later to take a look at the training dynamics.

In [0]:
column_names = [
    'L_class_valid', 'L_class', 
    'L_cons', 'L_total',
    'Precision', 'Recall', 'F1',
    'LR'
]
stats = pd.DataFrame(columns=['Epoch', *column_names]).set_index('Epoch')
best_checkpoint = None

for epoch in range(EPOCHS):    
    metrics = pd.Series(index=column_names, dtype=float, name=epoch)
    
    model.train()
    pbar = tqdm(enumerate(loader_train), total=len(loader_train),
                dynamic_ncols=True, leave=False, desc='Training')
    for i, (images, attrs) in pbar:
        images = images.to(device)
        attrs = attrs.to(device).float().unsqueeze(1)

        outputs = model(images)
        outputs_changed = model(random_change(images))
        
        loss_classification = bce(outputs, attrs)
        loss_consistency = mse(outputs, outputs_changed)
        loss_total = loss_classification + CONSISTENCY_WEIGHT * loss_consistency

        optimizer.zero_grad()
        loss_total.backward()
        optimizer.step()

        metrics.L_class = ema(loss_classification.item(), metrics.L_class)
        metrics.L_cons = ema(loss_consistency.item(), metrics.L_cons)
        metrics.L_total = ema(loss_total, metrics.L_total)
        
        lr = scheduler.get_last_lr()[0]
        pbar.set_postfix({
            'Lclass': f'{loss_classification.item():.3f}',
            'Lcons': f'{loss_consistency.item():.3f}',
            'LR': f'{lr:.0e}'
        }, refresh=False)

    loss_valid, precision, recall, f1, confusion_matrix = valid()
    scheduler.step()

    if best_checkpoint is None or f1 > stats.F1.max():
        current_time = datetime.now().isoformat()
        best_checkpoint = CHECKPOINTS_PATH / f'classifier-{epoch}-{f1:0.2f}.pth'
        torch.save(model.state_dict(), best_checkpoint)
    
    metrics.L_class_valid = loss_valid
    metrics.F1 = f1
    metrics.Precision = precision
    metrics.Recall = recall
    metrics.LR = lr
    stats = stats.append(metrics)
    
    clear_output(wait=True)
    display(HTML(stats.to_html()))

model.load_state_dict(torch.load(best_checkpoint))
model.eval()

print('Best checkpoint:', best_checkpoint)

Now we can draw the plots.

In [0]:
stats[['L_class', 'L_class_valid']].plot()
stats[['F1']].plot()
stats[['LR']].plot()
plt.show()

The learning rates on the train and the validation sets are close so that we can say that the model isn't overfitted.

Let's draw a confusion matrix.

In [0]:
plt.figure()

plt.imshow(confusion_matrix, interpolation='nearest', cmap=plt.get_cmap('Blues'))

plt.title('Confusion matrix (valid)')
plt.colorbar()

plt.xticks([0, 1], ['no', 'yes'])
plt.yticks([0, 1], ['no', 'yes'])

threshold = confusion_matrix.max() / 2

for i in range(2):
    for j in range(2):
        plt.text(j, i, confusion_matrix[i, j],
                 horizontalalignment='center',
                 color='white' if confusion_matrix[i, j] > threshold else 'black')

plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()

We got the model with an F1 score that is sufficient for practical use. Now you can save the weights to your Google Drive.

# Visualize and analyze the results

Let's see which images are the hardest for the model. For that, we have to prepare a dataframe with predictions.

In [0]:
model.eval()  
predictions = []
pbar = tqdm(enumerate(loader_valid), total=len(loader_valid))
for i, (images, _) in pbar:
    images = images.to(device)
    with torch.no_grad():
        outputs = model(images)
    outputs = outputs.squeeze(1)
    predictions.extend(x.item() for x in (outputs.cpu() > 0).int())

We got a list of predictions and now we can add it as a column to a dataframe with information about the validation dataset.

In [0]:
df_results = pd.DataFrame(
    {'image': df_valid.index, 'y_true': df_valid[MAIN_ATTRIBUTE], 'y_pred': predictions}
).set_index('image')

In [0]:
df_results

We can filter the dataframe to find false-negative results and show the corresponding images.

In [0]:
df_false_negative = df_results[(df_results.y_true == 1) & (df_results.y_pred == 0)]
df_false_negative

In [0]:
load_and_show(df_false_negative.index[:25])

As you can see, some people don't wear eyeglasses or the eyeglasses are nearly invisible, so the results for them were correct. It's because of the noisy dataset.

What about false-positive?

In [0]:
df_false_positive = df_results[(df_results.y_true == 0) & (df_results.y_pred == 1)]
df_false_positive

In [0]:
load_and_show(df_false_positive.index[:25])

For these images, the model gave incorrect answers. For some reason, these images are hard cases for the model. However, the overall result is good enough.

# Convert the model

It's time to convert our model to a universal format ONNX so that the model could be used in a lens.

Remember, we trained our model on images with pixel values from the range `[0, 1]`? By default, Lens Studio passes images with the range `[0, 255]` to the input. There are several options:

- Set scale and bias coefficients in Lens Studio on importing the model.
- Divide the input image by 255 right inside the model or equivalently divide the weights of the first convolution.
- Create a wrapper for the model that prepares inputs (we don't recommend you do that because it adds axtra operations to the model).

We will choose the second option, but if you think that the approach is tricky, you can set normalization options in Lens Studio. Anyway, it's very important to use the same normalization both for training and production.

You don't need to fuse batchnorms manually. They will be fused into the convolutions by the converter.

In [0]:
model_weights = torch.load(best_checkpoint)
model_weights['blocks.0.weight'] /= 255.0

Also, it would be convenient to add a sigmoid to the model. For that we will create a wrapper that will override `forward()` method of the model. The weights will be compatible with the wrapper because we won't change the model's parameters.

In [0]:
class ClassifierExport(Classifier):
    def forward(self, x):
        y = super().forward(x)
        return torch.sigmoid(y)

Now we can create an instance of the wrapper and save the converted model to a file.

We use names `input` and `prob` in the model because the same names are used in the lens.

In [0]:
model_export = ClassifierExport()
model_export.load_state_dict(model_weights)
model_export.eval()

dummy_input = torch.randn(1, 3, IMAGE_SIZE, IMAGE_SIZE)
torch.onnx.export(model_export, dummy_input,
                  ONNX_PATH, 
                  input_names=['input'],
                  output_names=['prob']) 

If you use Google Colab you can download the model with the following code.

In [0]:
def download_onnx_from_colab():
    from google.colab import files
    files.download(ONNX_PATH)

## Uncomment for downloading ONNX from Colab
# download_onnx_from_colab()