# Facemask Classifier

This notebook serves as a basic showcase of this project and the results we achieved.

> **Note:** we assume that at this point you have already set up a virtual environment as outlined in the [README](./README.md) and that all necessary dependencies are installed.

First things first, get imports and setup out of the way.

In [None]:
import torch
import sys
import os
import random
import matplotlib.pyplot as plt
from torchvision.transforms.functional import to_pil_image
import numpy as np
from sklearn.metrics import confusion_matrix, precision_score, recall_score, accuracy_score, f1_score

In [None]:
# for the sake of consistency and reproducability in this demo, we set a
# fixed seed. But feel free to play around with it or omit this part.
seed=1337
torch.manual_seed(seed)
random.seed(seed)

Let's download the evaluation dataset. Note, the model was **not** trained on these images, although they stem from the same dataset. Refer to the project's [README](./README.md) for the original dataset source.

In [None]:
from util import download_dataset, dataset_url
dataset_path='dataset'
download_dataset(dataset_url, out_path=dataset_path)

Now for the pre-trained model.

In [None]:
from util import download_model, model_url
model_path='model.pt'
download_model(model_url, out_path=model_path)

Let's choose a small subsample of images from the dataset and run our model against it.

In [None]:
from architecture import Model1
from torchvision import transforms
from torch.utils.data import DataLoader, random_split, Subset
from torchvision.datasets import ImageFolder

model = Model1()
model.load_state_dict(torch.load(model_path))
model.eval()
shared_transforms = transforms.Compose([transforms.ToTensor(), transforms.Resize((256, 256), antialias=True)])

# collect data for displaying later
eval_results=[]

testset_path = os.path.join(dataset_path, 'test')
all_test_imgs = ImageFolder(root=testset_path, transform=shared_transforms)

# try to use a perfect square, makes plotting calculations nicer :)
num_samples=25
test_img_subset = Subset(all_test_imgs, random.sample(range(0, len(all_test_imgs)-1), num_samples))
test_loader = DataLoader(test_img_subset, shuffle=True)

def get_class(idx):
    return all_test_imgs.classes[idx]

num_correct = 0
for data, label in test_loader:
    actual = label.item()
    prediction = 1 if torch.sigmoid(model(data)).item() > 0.5 else 0
    print(
        f"True label: {actual} ({get_class(actual)}), Predicted label: {prediction} ({get_class(prediction)})"
    )
    eval_results.append( (data.squeeze(), prediction, actual) )
    if actual == prediction:
        num_correct += 1
print(f"Accuracy: {num_correct/num_samples}")

In [None]:
plt_samples=[test_img_subset.dataset.imgs[x] for x in test_img_subset.indices]
axis = int(np.ceil(np.sqrt(num_samples)))
fig, ax = plt.subplots(axis, axis, figsize = (16, 16))
i = 0
for row in range(axis):
     ax_row = ax[row]
     for column in range(axis):
         ax_column = ax_row[column]
         ax_column.set_xticklabels([])
         ax_column.set_yticklabels([])
         img = to_pil_image(eval_results[i][0])
         ax_column.imshow(img, cmap='gray')
         actual = eval_results[i][2]
         predicted = eval_results[i][1]
         is_correct = actual == predicted
         col = 'blue'
         ax_column.set_title(f"actual: {get_class(actual)}\npredicted: {get_class(predicted)}",
                    color = 'green' if is_correct else 'red')
         i += 1

Not too bad. Let's run another evaluation loop and collect some metrics, this time on the entirety of the testset (256 images).

In [None]:
def evaluate_model_with_metrics(test_loader, predict_mask_label=1):
    labels_true = []
    labels_predictions = []
    for data, label in test_loader:
        actual = label.item()
        labels_true.append(actual)
        # NOTE: some datasets will make 'masked' have a label of 0.
        # Since our model is trained to equate 1 == masked, we manually
        # flip the label here if necessary.
        prediction = predict_mask_label if torch.sigmoid(model(data)).item() > 0.5 else (1-predict_mask_label)
        labels_predictions.append(prediction)

    labels_true = np.array(labels_true)
    labels_predictions = np.array(labels_predictions)

    # calculate metrics
    precision = precision_score(labels_true, labels_predictions)
    recall = recall_score(labels_true, labels_predictions)
    accuracy = accuracy_score(labels_true, labels_predictions)
    f1 = f1_score(labels_true, labels_predictions)

    print(f"""
    Model name: {model.__class__.__name__}
    Total samples: {len(test_loader)}
    Total correct: {np.sum(labels_true == labels_predictions)}
    Accuracy: {accuracy}
    Precision: {precision}
    Recall: {recall}
    F1 Score: {f1}
    """)
    
    return labels_true, labels_predictions

test_set = all_test_imgs # use all images instead of a subsample
test_loader = DataLoader(test_set, shuffle=True)
labels_true, labels_predictions = evaluate_model_with_metrics(test_loader)

In [None]:
# now let's visualize our metrics in a confusion matrix

def plot_confusion_matrix(conf_matrix, title):
    plt.imshow(conf_matrix, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title(title)
    plt.colorbar()
    classes = ['Masked', 'Unmasked']
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    thresh = conf_matrix.max() / 2.
    for i in range(conf_matrix.shape[0]):
        for j in range(conf_matrix.shape[1]):
            plt.text(j, i, format(conf_matrix[i, j], 'd'),
                     ha="center", va="center",
                     color="white" if conf_matrix[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

# use the labels we just got from our evaluation run above
conf_matrix = confusion_matrix(labels_true, labels_predictions)
plot_confusion_matrix(conf_matrix, title='Model Confusion Matrix')

As you can see, the model performs decently on test images from the same dataset. But what about generalization? During training, we observed some overfitting. With this in mind, how well does our model generalize to other facemask datasets? Let's quickly compare.

For this we will be using Kaggle user **pranavsingaraju**'s facemask dataset, which can be found [here](https://www.kaggle.com/datasets/pranavsingaraju/facemask-detection-dataset-20000-images). We have once again mirrored the dataset on Google Drive to make downloading more convenient, as Kaggle requires API tokens. 

In [None]:
generalization_dataset_path='dataset-generalization'
dataset_url='https://drive.google.com/uc?id=1zsCPUhyPL6ndkXdVJlF16JoBVpJ5meZT'
download_dataset(dataset_url, out_path=generalization_dataset_path)

In [None]:
generalization_imgs = ImageFolder(root=os.path.abspath(generalization_dataset_path), transform=shared_transforms)
generalization_loader = DataLoader(generalization_imgs, shuffle=True)

labels_true, labels_predictions = evaluate_model_with_metrics(generalization_loader, predict_mask_label=0)
conf_matrix = confusion_matrix(labels_true, labels_predictions)
plot_confusion_matrix(conf_matrix, title='Model Confusion Matrix')

Still better than random guessing. The model seems to have a tendency to over-eagarly predict an individual to be wearing a mask, although false negatives remain quite low.

This concludes the demo for our model. You can find more information, discussions, etc. in the project [report](./report/report.pdf).