# SpotMask
## A Face Mask Correctness Detection Software

Jacopo Bugini - 207525 - 2020/2021
<br>
*Deep Learning for Computer Vision (20600)*

## Index

```
1. Introduction
   └─ 1.1 Business Idea
   └─ 1.2 Problem Definition
   
2. Methodology
   └─ 2.1 Project Structure & Strategy
   └─ 2.2 Datasets

3. Development
   └─ 3.1 Project Repo Structure
   └─ 3.2 Project Rquirements
   └─ 3.3 Models   
          └─ Face Detection    
          └─ Mask Detection   
          └─ Mask Correctness   
          └─ Mask Suggestions  

4. Improvements & Challenges

5. SpotMask

```

_____________________

## 1. Introductuion

### 1.1 Business Idea

COVID-19 pandemic (as of August 2021) has sickened more than 203 million people globally and claimed the lives of more than 4.2 million people worldwide. With new variants coming out in different places and the vaccination process that is advancing at different rates around the globe the current crisis is still far away to be over.

For this reason, now more than ever, ensuring the respect of hygenic standards and law enforcements is extremely important. Especially in public places, on public transports and in situations where people need to stay close to each other or in a restricted spaces.

Face mask is now compulsory in many different scenarios according to each country / region / city regulations and it is currently very difficult to monitor and expensive in terms of resources and security agents. An automated facemask correctness detector would be really useful in terms of business usage and resource efficient.

Such automatic detector could be placed in different locations where the need of compliance with rules is even more sensible, like airports, public transports, museums and many other places. The deployment costs are limited to the hardware at disposal and at the warning mechanism that the entity would like to implement.

### 1.2 Problem Definition

The problem that we want to address is being able to **determin automatically and quicly if a mask is worn properly or not** and **suggest the user how to wear it correctly** based on how he is keeping it.

In order to be able to assess the correctness of a mask we have three major incremental steps (or goals):

1. **Detect the face/s** in a specific frame.
2. Detecting weather a mask is **worn or not**.
3. Detecting weather a mask is **worn correclty or incorrectly**.

Eventually, we will try to bring it one step further increasing the complexity of step 3 even more:

4. **Classifying why it is not correct** and prompting a suggestion in order to make it properly worn

The project is ment to provide an output in real time due to its business purpose and impact, hence the project outcome will be an executable script that through the device' webcam will detect and prompt the results accordingly. <br>*More details will follow in Section 5*.

The major concerns regarding the project involve the dataset, given the freshness off the topic there are only limited real datasets available at the moment. Many researches are now focusing on generating artificial datasets able to support such developement and researches. As we will see later we will use also one of these *artificial* dataset in our project. <br>*More details will follow in Section 3 and Section 4*.

___________________________

## 2. Methodology

### 2.1 Project Structure & Strategy

The project structure is divided into three main clusters:
1. **Face Detection** -> Detect weather one or more face/s is/are present into a specific frame
2. **Mask Detection** -> Given a detected face, detect weather the subject has a mask or not
3. **Mask Correctness** -> Given a detected subject with a mask, classify weather the mask is properly or improperly worn

For each one of the differen steps we are going to define a specific methodology and adopt a specific model according to the task. The specs and details off each model will be found in section 3.3.

### 2.2 Datasets

- **Flickr-Faces-HQ Dataset (FFHQ)**
    
    >*[A Style-Based Generator Architecture for Generative Adversarial Networks
    Tero Karras (NVIDIA), Samuli Laine (NVIDIA), Timo Aila (NVIDIA)
    https://github.com/NVlabs/ffhq-dataset]*

    Flickr-Faces-HQ Dataset (FFHQ) dataset consist of human faces images which includes considerable variations in terms of age, ethnicity and image background, with also a great coverage of accessories such as eyeglasses, sunglasses, hats, etc. The images were crawled from Flickr and then automatically aligned and cropped.
    <br>
    The original dataset consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background. However for our scope we are using the thumbnail version already cropped down to 128x128 pixels, both for size matter and computational power.
    
    ![image](https://raw.githubusercontent.com/NVlabs/ffhq-dataset/master/ffhq-teaser.png)
    
    <br>
    <br>
    
- **MaskedFace-Net**

    >*[Cabani A, Hammoudi K, Benhabiles H, Melkemi M. MaskedFace-Net - A dataset of correctly/incorrectly masked face images in the context of COVID-19. Smart Health (Amst). 2021;19:100144. doi:10.1016/j.smhl.2020.100144, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7837194/#fn3]*
    
    A dataset of correctly/incorrectly masked face images in the context of COVID-19. The dataset is generated artificially by the researchers due to the lack of generalized and vast dataset regarding incorrectness and correctness of mask usage.
    <br>
    The datasets consists of 66,000 high quality images of proper worn masks and 66,000 high quality images of improper worn mask. For the purpose of our project we are going to use a subset of those images for computational and memory constraints.
    
     ![image](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7837194/bin/gr1_lrg.jpg)

**Dataset Structure**

The datasets used can be found at this drive link already clustered in sets.
The dataset structure is reflecting the project structure and strategy defined above (3.1) we will go through each one of the dataset in the following section for the developement phase.

```
dataset
│   sources.txt    
│
└─── detect_mask_dataset
│    └─── test
│    │    └─── mask (10,000)
│    │    └─── no_mask (10,000)
│    └─── train
│    │    └─── mask (30,000)
│    │    └─── no_mask (30,000)
│    └─── validation
│         └─── mask (14,000)
│         └─── no_mask (14,000)
│   
└─── mask_correctness_dataset
│    └─── test
│    │    └─── proper (9,000)
│    │    └─── improper (9,000)
│    └─── train
│    │    └─── proper (30,000)
│    │    └─── improper (30,000)
│    └─── validation
│    │    └─── proper (4,000)
│    │    └─── improper (4,000)
```

______________________

## 3. Development

### 3.1 Project Repo Structure

The repository is organized as follows:

- The main executable file is ```SpotMask.py``` that will be explained and utilized in section 5.
- The ```dataset``` folder contains the different dataset used as previously anticipated.
- The ```facemask-correctbess-model.ipynb``` is the jupyter containing the training and testing for each model.
- The ```utils``` folder contains useful functions and templates used for the report (images, graphs,etc.).
- The ```models``` folder contains the different trained models and their variations.

**Repo's Tree**

```
facemask-correctness-detection
│   README.md 
│   Report.ipynb 
│   LICENSE 
│   facemask-correctness-model.ipynb 
│   SpotMask.py 
│   utils.py 
│
└─── dataset
│    └─── detect_mask_dataset
│    └─── mask_correctness_dataset
│    └─── mask_suggestion_dataset
│   
└─── models
│    └─── face-detection
│    │    └─── haar-cascade
│    │    └─── yolo-v3
│    └─── mask-detection
│    │    └─── hyperparameters_tuning
│    │    └─── mask_detection.h5
│    └─── facemask-correctness
│    │    └─── hyperparameters_tuning
│    │    └─── facemask_correctness.h5
```

In order to access the repo here below the cloning snippet:

In [None]:
! git clone https://github.com/JacopoBugini/facemask-correctness-detection.git

**NB** *Please note that the dataset is not present in the repository for storage limits, it can be found at the Google Drive Link shared, or from here.*

### 3.2 Requirements

The project is entirely developed in python with the support of:

- **Keras** for what concerns the models
  > Chollet, F., & others. (2015). Keras. https://keras.io.
  
- **OpenCV** for what concerns the video interface and image preprocessing
  > Bradski, G. (2000). The OpenCV Library. Dr. Dobb's Journal of Software Tools.

- A built-in **GPU** has been used for the training and testing phases, hence in trying to replicate training timings expect a slight increase if you do not have one on your local machine.

### 3.3 Models

Note that the **training** and **testing** for each model has been carried out in the ```facemask-correctness-model.ipynb``` notebook that can be found in the repository or be reached from here:

|[![Open In Notebook](https://drive.google.com/uc?export=view&id=1xfYuNfDKlhQORXi4_oyeFVoWsXX6zbeV)](https://github.com/JacopoBugini/SpotMask/blob/main/facemask-correctness-model.ipynb)|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JacopoBugini/SpotMask/blob/main/facemask-correctness-model.ipynb)|
|-|-|

**NB** *Note that in order to be executed in colab additional steps are required in order to mount drive and make the dataset available for the training phase*.

Let's import load_model from keras in order to display the trained models below:

In [2]:
from tensorflow.keras.models import load_model

### (A) Face Detection

For the face detection problem we needed to identify and capture the different faces present in a determined frame. The two model that we tested are a pre-trained face detection version of **YOLO (You Only Learn Once)** and the more used **Haar Cascade**.

After a few tests eventually we opted for **YOLO-v3** as during the testing phase was more stable, fast and precise.

The model will be found in ```models/face-detection/yolo-v3``` with the corresponding ```.weights``` and ```.cfg``` files.

References for both models:

- **YOLO-v3** 
> Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M. Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment [published online ahead of print, 2021 Mar 1]. Multimed Tools Appl. 2021;1-16. doi:10.1007/s11042-021-10711-8

- **Haar Cascade** 
> https://docs.opencv.org/3.4/db/d28/tutorial_cascade_classifier.html

![Image](https://drive.google.com/uc?export=view&id=1LDqnsionr3SQmcX0AysVeqfWyq5vjxr3)

### (B) Mask Detection

Mask detection is carried out with a CNN model trained on the ```mask-detection``` dataset:
- **mask**, including a variegate set of images with masks both worn correctly and improperly in order to generalize at best the mask detection also when worn in a bad way. The dataset used is the MaskedFace-Net.

- **no_mask**, a set of images from the Flickr-Faces-HQ Dataset without any sort of mask.

Model:

In [3]:
mask_detection_model = load_model('models/mask-detection/mask_detection_model.h5')
mask_detection_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 128)       1

Outcome:

|![Image](https://drive.google.com/uc?export=view&id=14vOeTrRH7ygqqVP6Jncm98vkq5qIfy0J)|![Image](https://drive.google.com/uc?export=view&id=1sN_7jtMAWnd6CvU_u0PuGO9JnFZCvWwf)|
|-|-|

### (C) Mask Correctness

Mask correctnes is carried out with a sequential CNN model trained on the ```mask-correctness``` dataset:
- **proper**, including a set of images with mask worn correctly convering accordingly nose, mouthand chin. The dataset used is the MaskedFace-Ne (*FaceDetectionCorrectMask*).

- **improper**, including a variegate set of images with masks worn improperly in order to generalize at best the mask detection also when worn in unusual or strange ways. The dataset used is the MaskedFace-Net (*FaceDetectionImproperMask*).

Model:

In [4]:
mask_correctness_model = load_model('models/facemask-correctness/mask_correctness_model.h5')
mask_correctness_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 148, 148, 128)     3584      
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 74, 74, 128)       0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 72, 72, 128)       147584    
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 36, 36, 128)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 34, 34, 64)        73792     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 17, 17, 64)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 15, 15, 64)       

Outcome:

|![Image](https://drive.google.com/uc?export=view&id=1GyAKkS5fgHZsmCHzEpsISdmRxwvkzbbP)|![Image](https://drive.google.com/uc?export=view&id=1q_lS9AmhTaxW3ljiAfKYNn9ADZQ-8S3_)|
|-|-|

### (D) Mask Suggestions

Mask suggestions is carried out with a CNN model trained on the ```suggestions``` dataset:
- **Proper** - The mask is properly worn
- **Off** - The mask is under the chin not covering any part of the face leaving nose and mouth uncovered
- **Nose** - The mask is worn leaving the nose outside
- **Chion** - The Mask is covering mouth and nose but is worn incorrectly leaving the chin outside

This time we will try to generate a model utilizing a hyperparameter tuner in order to find the most efficient model. You can read more in the tuning section in the tarining notebook.

More on the hyperparameter tuning through ```keras-tuner``` here:
> O'Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., Invernizzi, L., & others. (2019). Keras Tuner. https://github.com/keras-team/keras-tuner.

Model:

In [6]:
suggestuions_model = load_model('models/suggestions-detection/suggestions_model.h5')
mask_detection_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 15, 15, 128)       1

Outcome:

|![Image](https://drive.google.com/uc?export=view&id=1vqlZxV8txfIB191hiqtVIsnimROeLGBg)|![Image](https://drive.google.com/uc?export=view&id=151Ia_JW2YtJ-d-UWUkl0K478R4uIXGE0)|
|-|-|
|![Image](https://drive.google.com/uc?export=view&id=1130Cds3Hx-SzZ31uFwg6bNy6lX4LDkgf)|![Image](https://drive.google.com/uc?export=view&id=1LeA_bmJLwqz6KGwmaKmSQ3C_sQ6IL-TY)|

_______________

## 4. Improvements & Challenges

_______________________

## 5. SpotMask

The final software is executable through the SpotMask python file. Here below a quick example with a fast guide.

```python SpotMask-py```

The only argument needed by the parser is the ```--mode``` argument which allows to decide weather to execute the software with the suggestions model or the simple correctness model.

For the first one you need to execute ```python SpotMask-py --mode 'suggestions'``` while for the second one ```python SpotMask-py --mode 'simple'``` shall be used. 

```--mode 'suggestions'``` is the default one.

______________________

# Thanks!

**Jacopo Bugini** <br>
jacopo.bugini@studbocconi.it