# Face Mask Detection

Coronavirus has now become the talk of the town, most people in the world right now are suffering badly and every day thousands of people are dying because of COVID-19. As per WHO, face masks combined with other preventive measures such as frequent hand-washing and social distancing help slow down the spread of the coronavirus.

![](https://images.pexels.com/photos/4472976/pexels-photo-4472976.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500)

## Loading Libraries
All Python capabilities are not loaded to our working environment by default (even they are already installed in your system). So, we import each and every library that we want to use.

We chose alias names for our libraries for the sake of our convenience (numpy --> np and pandas --> pd, tensorlow --> tf).

Note: You can import all the libraries that you think will be required or can import it as you go along.

In [None]:
import pandas as pd                                     # Data analysis and manipultion tool
import numpy as np                                      # Fundamental package for linear algebra and multidimensional arrays
import tensorflow as tf                                 # Deep Learning Tool
import os                                               # OS module in Python provides a way of using operating system dependent functionality
import cv2                                              # Library for image processing
from sklearn.model_selection import train_test_split    # For splitting the data into train and validation set

## Loading and preparing training data
The train and test images are given in two different folders - 'train' and 'test'. The labels of train images are given in a csv file 'Training_set_face_mask.csv' with respective image id (i.e. image file name).

#### Getting the labels of the images

In [None]:
labels = pd.read_csv("../input/face-mask-dataset/train_labels.csv")   # loading the labels
labels.head()           # will display the first five rows in labels dataframe

In [None]:
labels.tail()            # will display the last five rows in labels dataframe

#### Getting images file path

In [None]:
file_paths = [[fname, '/kaggle/input/face-mask-dataset/train/train/' + fname] for fname in labels['filename']]
file_paths

#### Confirming if no. of labels is equal to no. of images

In [None]:
# Confirm if number of images is same as number of labels given
if len(labels) == len(file_paths):
    print('Number of labels i.e. ', len(labels), 'matches the number of filenames i.e. ', len(file_paths))
else:
    print('Number of labels does not match the number of filenames')

In [None]:
#viewing any image from the train data.
from IPython.display import Image
Image('/kaggle/input/face-mask-dataset/train/train/Image_1000.jpg')

#### Converting the file_paths to dataframe

In [None]:
images = pd.DataFrame(file_paths, columns=['filename', 'filepaths'])
images.head()

#### Combining the labels with the images

In [None]:
train_data = pd.merge(images, labels, how = 'inner', on = 'filename')
train_data.head()       

The 'train_data' dataframe contains all the image id, their locations and their respective labels. Now the training data is ready.

## Data Pre-processing
It is necessary to bring all the images in the same shape and size, also convert them to their pixel values because all machine learning or deep learning models accepts only the numerical data. Also we need to convert all the labels from categorical to numerical values.

In [None]:
data = []     # initialize an empty numpy array
image_size = 100      # image size taken is 100 here. one can take other size too
for i in range(len(train_data)):


    img_array = cv2.imread(train_data['filepaths'][i], cv2.IMREAD_GRAYSCALE)   # converting the image to gray scale

    new_img_array = cv2.resize(img_array, (image_size, image_size))      # resizing the image array

    # encoding the labels. with_mask = 1 and without_mask = 0
    if train_data['label'][i] == 'with_mask':
        data.append([new_img_array, 1])
    else:
        data.append([new_img_array, 0])

In [None]:
# image pixels of a image
data[0]

In [None]:
# The shape of an image array
data = np.array(data)
data[0][0].shape

#### Shuffle the data
The first half images are without mask and the second half are with mask. So, when fitting a model it's necessary to train the model with both categories otherwise if model don't see the other category of images, it won't detect them.

In [None]:
np.random.shuffle(data)

#### Take a look at some of the images

In [None]:
import matplotlib.pyplot as plt

In [None]:
# code to view the images
num_rows, num_cols = 2, 5
f, ax = plt.subplots(num_rows, num_cols, figsize=(12,5),
                     gridspec_kw={'wspace':0.03, 'hspace':0.01}, 
                     squeeze=True)

for r in range(num_rows):
    for c in range(num_cols):
      
        image_index = r * 100 + c
        ax[r,c].axis("off")
        ax[r,c].imshow( data[image_index][0], cmap='gray')
        if data[image_index][1] == 0:
          ax[r,c].set_title('without_mask')
        else:
          ax[r,c].set_title('with_mask')
plt.show()
plt.close()

#### Separating the images and labels


In [None]:
x = []
y = []
for image in data:
  x.append(image[0])
  y.append(image[1])

# converting x & y to numpy array as they are list
x = np.array(x)
y = np.array(y)

In [None]:
np.unique(y, return_counts=True)

#### Normalizing the data
Normalization is a process that changes the range of pixel intensity values to the range 0 to 1.

But **why to normalize?**

The motivation to normalize is to achieve consistency in dynamic range for a set of data, signals, or images to avoid mental distraction and reduce the data redundancy. Also, normalizing the data can help you improve the model performance.

In [None]:
x = x / 255

# Why divided by 255?
# --> The pixel value lie in the range 0 - 255 representing the RGB (Red Green Blue) value.

#### Splitting the data into Train and Validation Set
We want to check the performance of the model that we built. For this purpose, we always split (both independent and dependent data) the given data into training set which will be used to train the model, and test set which will be used to check how accurately the model is predicting outcomes.

For this purpose we have a class called 'train_test_split' in the 'sklearn.model_selection' module.

In [None]:
# split the data
X_train, X_val, y_train, y_val = train_test_split(x,y,test_size=0.3, random_state = 42)

# X_train: independent/input feature data for training the model
# y_train: dependent/output feature data for training the model
# X_test: independent/input feature data for testing the model; will be used to predict the output values
# y_test: original dependent/output values of X_test; We will compare this values with our predicted values to check the performance of our built model.
 
# test_size = 0.30: 30% of the data will go for test set and 70% of the data will go for train set
# random_state = 42: this will fix the split i.e. there will be same split for each time you run the co

## Building Model
Now we are finally ready, and we can train the model.

There are many machine learning or deep learning models like Random Forest, Decision Tree, Multi-Layer Perceptron (MLP), Convolution Neural Network (CNN), etc. to say you some.

However here we are building a simple Multi-Layer Perceptron Model.

Then we would feed the model both with the data (X_train) and the answers for that data (y_train)

In [None]:
# Defining the model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(100, 100)),    # flattening the image
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size = 20)

## Validate the model
Wonder🤔 how well your model learned! Lets check its performance on the X_val data.

In [None]:
model.evaluate(X_val, y_val)

The model is giving 86% accuracy on unseen data. We can use some other models like CNN, Transfer Learnings, etc. to build a better model.

# **Well Done! 👍**