### Import all libaraies

In [28]:
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import cv2
import os
from tensorflow.keras.preprocessing.image import load_img
import warnings
warnings.filterwarnings("ignore")

#### Lets load our trained classifier.

In [29]:
model = load_model("save.model4")
warnings.filterwarnings("ignore")

![alt text](images/opencv.png)

OpenCV (Open Source Computer Vision) is a library with functions that mainly aiming real-time computer vision. OpenCV supports Deep Learning frameworks **Caffe, Tensorflow, Torch/PyTorch.**

With OpenCV you can perform face detection using pre-trained deep learning face detector model which is shipped with the library. OpenCV’s face detector is based on the **Single Shot Detector framework** with a **ResNet** base network.

For more information on comparison accross various libaries please see:

[**Face Detection – OpenCV, Dlib and Deep Learning ( C++ / Python )**](https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/)

### DNN Face Detector in OpenCV

This model was included in OpenCV from version **3.3**. It is based on [Single-Shot-Multibox](https://arxiv.org/abs/1512.02325) detector and uses **ResNet-10** Architecture as backbone. The model was trained using images available from the web, but the source is not disclosed. OpenCV provides 2 models for this face detector under this category.

   1. Floating point 16 version of the original **Caffe** implementation ( 5.4 MB )
   2. 8 bit quantized version using **Tensorflow** ( 2.7 MB )
   
The method has the following merits :

   1. Most accurate out of the four [**Methods**](https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/) in the libaries described above.
   2. Runs at real-time on CPU.
   3. Works for different face orientations – up, down, left, right, side-face etc.
   4. Works even under substantial occlusion.
   5. Detects faces across of various sizes.

If we want to use floating point model of Caffe, we use the **caffemodel** and **prototxt** files. 

Otherwise, we use the **quantized tensorflow model**. 

Also note the difference in the way we read the networks for Caffe and Tensorflow and the computations..

### Function to load our detector files.

In [30]:
def load_detector(DNN):
    if DNN == "Caffe":
        modelFile =  "detector/res10_300x300_ssd_iter_140000_fp16.caffemodel" 
        configFile = "detector/deploy.prototxt" 
        net = cv2.dnn.readNetFromCaffe(configFile, modelFile)
    else:
        modelFile = "detector/opencv_face_detector_uint8.pb"
        configFile = "detector/opencv_face_detector.pbtxt"
        net = cv2.dnn.readNetFromTensorflow(modelFile, configFile)
    return net

### Load the input image

Now lets read our set of images. When the image file is read with the OpenCV function imread(), the order of the colour is `(B, G, R)`.

In [31]:
image = cv2.imread(os.path.sep.join([r'testfiles/',"example_01.png"]))

In [32]:
image.shape

(500, 600, 3)

### Constructing blob from images

First we need to construct blobs for our test images. Similar to our training test, for our test test also we first need to pre-process our test images. This will help our deep neural networks perform better. **Pre-processing is handled by openCVs blobfromImage function.**

A blob is just a (potentially collection) of image(s) with the same spatial dimensions (i.e., width and height), same depth (number of channels), that have all be preprocessed in the same manner.

Now lets convert the image to a blob from the image using openCVs [blobfromImage](https://www.pyimagesearch.com/2017/11/06/deep-learning-opencvs-blobfromimage-works/) and pass it through the network using the forward() function.

The **blobfromimage** function of openCV performs:
   1. Mean subtraction
   2. Image scaling
   3. RGB Channel Swapping.


### Mean subtraction

In order to handle intensity variations and normalization, sometimes we calculate the average pixel value on the training dataset and subtract it from each image during training. If we are doing mean subtraction during training, then we must apply it during inference.
 
Since our models have been trained with weights from **ImageNet** training, we use the the mean values for the ImageNet training set, which are are **R=103.93, G=116.77, and B=123.68**. We should also check if certain deep neural nets perform mean subtraction.

![alt text](images/meansub.png)
Taken From:
[Adrians blobfromImage explanation](https://www.pyimagesearch.com/2017/11/06/deep-learning-opencvs-blobfromimage-works/)

##### This mean will be a tuple corresponding to R, G, B channels. Ensure your tuple is in the format `(R, G, B)` order with the default **swapRB = True** setting. 
**Mean subtraction** is used to help combat illumination changes in the input image.


### Other parameters:

**scalefactor:** how much we want to scale our images. If we want, we can scale our images by multiplying them by a constant number. A lot of times we divide all of our uint8 images by 255, this way all the pixels are between 0 and 1(0/255-255/255). The default value is 1.0 which means no scaling.

**size:** The spatial size of the output image. It will be equal to the input size required for the follow-on neural networks as the output of blobFromImage.

**mean:** This is the mean subtraction value from ImageNet.


**swapRB:** Boolean to indicate if we want to swap the first and last channel in 3 channel image. OpenCV assumes images are in  `(B, G, R)` channel order; however, the **mean** value assumes we are using `(R, G, B)` order. To resolve this discrepancy we can swap the R and B channels in image  by setting this value to **True**. By default OpenCV performs this channel swapping for us.

**crop:** Boolean flag to indicate if we want to center crop our images. If it’s set to True, the input image is cropped from the center in such a way that smaller dimension is equal to the corresponding dimension in size and other dimension is equal or larger. However, if we set it to False, it would preserve the aspect ratio and just resize to dimensions in size.

In [16]:
(h, w) = image.shape[:2]
print(h,w)

blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(300, 300), mean=[104, 117, 123])
blob.shape

500 600


(1, 3, 300, 300)

1. The first dimension is our total number of images.
2. The second dimension is the total number of channels in our image.
3. The thid dimension here is our image height.
4. The fourth dimension is our image width. 

Having the second dimension contain the channels is “channels first” ordering. Having the channels as the last dimension is called “channels last” ordering.

### loading our object detector

In [17]:
net = load_detector(DNN='Caffe') 
net.setInput(blob) 
detections = net.forward()
detections.shape

(1, 1, 200, 7)

The **net.forward()** function here is an object detection network and it return labels, probabilities, and bounding box coordinates.


The **detections** variable is a 4-D matrix, where
   1. The 3rd dimension iterates over the detected faces. 
   2. The fourth dimension contains information about the bounding box and score for each face. 
   **For example:** detections[0,0,0,2] gives the confidence score for the first face, and detections[0,0,0,3:7] give the bounding box.



#### First face data

In [18]:
detections[0,0,0]

array([0.        , 1.        , 0.99865437, 0.54847866, 0.12316754,
       0.67100626, 0.35501212], dtype=float32)

#### First face probability of detection

In [19]:
detections[0,0,0,2]

0.99865437

#### First face Bounding box coordinates

In [21]:
detections[0,0,0,3:7]

array([0.54847866, 0.12316754, 0.67100626, 0.35501212], dtype=float32)

The output coordinates of the bounding box are normalized between [0,1]. Thus the coordinates should be multiplied by the height and width of the original image to get the correct bounding box on the image.

Now we specify a threshold to filter out weak detections and multiply the box with the original images width and height.

In [22]:
threshold=0.5
for i in range(0, detections.shape[2]):
    box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
    print(box)

[329.08719778  61.58376858 402.60375738 177.50605941]
[2496.37870789 2000.04816055 2903.20243835 2492.21229553]
[524.63046312 193.2027936  537.14622259 210.96256375]
[  96.52560353 1997.42662907  512.41099834 2489.26997185]
[517.99521446 180.28657138 530.17830849 199.18651879]
[505.34330606 170.52203417 517.48627424 187.78061867]
[518.2597518  172.77945578 529.57760096 188.48751485]
[504.58052158 187.5872612  518.67799759 208.47991109]
[ 13.02573681  61.72707677 590.78292847 438.72758746]
[2411.81688309   63.64038587 2985.8622551   438.62167001]
[246.42219543 413.96456957 276.65587664 486.61243916]
[504.37996387 157.14120865 518.36743355 175.35701394]
[487.42647171 173.08168113 499.35772419 189.8663491 ]
[396.21874094  97.25946933 423.03797007 143.57316494]
[494.68964338 156.60759807 508.60630274 176.80442333]
[511.13369465 158.34522247 525.45261383 181.66083097]
[481.2608242  180.06406724 494.19021606 198.79747927]
[501.14843845 148.53717387 516.36164188 166.83028638]
[519.70460415 19

Now we convert to int and make sure that our bounding boxes falls in the dimensions of our frame.

In [23]:
threshold=0.5
for i in range(0, detections.shape[2]):
    box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
    (startX, startY, endX, endY) = box.astype("int")
    print((startX, startY, endX, endY))
    if (startX) < 0:
        startX
    if (endX) > w-1:
        endX = w-1
    if (startY) < 0:
        startY = 0
    if (endY) > h-1:
        endY = h -1
    print((startX, startY, endX, endY))

(329, 61, 402, 177)
(329, 61, 402, 177)
(2496, 2000, 2903, 2492)
(2496, 2000, 599, 499)
(524, 193, 537, 210)
(524, 193, 537, 210)
(96, 1997, 512, 2489)
(96, 1997, 512, 499)
(517, 180, 530, 199)
(517, 180, 530, 199)
(505, 170, 517, 187)
(505, 170, 517, 187)
(518, 172, 529, 188)
(518, 172, 529, 188)
(504, 187, 518, 208)
(504, 187, 518, 208)
(13, 61, 590, 438)
(13, 61, 590, 438)
(2411, 63, 2985, 438)
(2411, 63, 599, 438)
(246, 413, 276, 486)
(246, 413, 276, 486)
(504, 157, 518, 175)
(504, 157, 518, 175)
(487, 173, 499, 189)
(487, 173, 499, 189)
(396, 97, 423, 143)
(396, 97, 423, 143)
(494, 156, 508, 176)
(494, 156, 508, 176)
(511, 158, 525, 181)
(511, 158, 525, 181)
(481, 180, 494, 198)
(481, 180, 494, 198)
(501, 148, 516, 166)
(501, 148, 516, 166)
(519, 199, 535, 224)
(519, 199, 535, 224)
(523, 174, 536, 191)
(523, 174, 536, 191)
(496, 195, 516, 229)
(496, 195, 516, 229)
(506, 204, 526, 230)
(506, 204, 526, 230)
(520, 162, 535, 182)
(520, 162, 535, 182)
(697, 2004, 1098, 2475)
(697, 2004

This can be simplified as follows:

In [24]:
threshold=0.5
for i in range(0, detections.shape[2]):
    box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
    (startX, startY, endX, endY) = box.astype("int")
    print((startX, startY, endX, endY))
    
    (startX, startY) = (max(0, startX), max(0, startY))
    (endX, endY) = (min(w - 1, endX), min(h - 1, endY))
    print((startX, startY, endX, endY))

(329, 61, 402, 177)
(329, 61, 402, 177)
(2496, 2000, 2903, 2492)
(2496, 2000, 599, 499)
(524, 193, 537, 210)
(524, 193, 537, 210)
(96, 1997, 512, 2489)
(96, 1997, 512, 499)
(517, 180, 530, 199)
(517, 180, 530, 199)
(505, 170, 517, 187)
(505, 170, 517, 187)
(518, 172, 529, 188)
(518, 172, 529, 188)
(504, 187, 518, 208)
(504, 187, 518, 208)
(13, 61, 590, 438)
(13, 61, 590, 438)
(2411, 63, 2985, 438)
(2411, 63, 599, 438)
(246, 413, 276, 486)
(246, 413, 276, 486)
(504, 157, 518, 175)
(504, 157, 518, 175)
(487, 173, 499, 189)
(487, 173, 499, 189)
(396, 97, 423, 143)
(396, 97, 423, 143)
(494, 156, 508, 176)
(494, 156, 508, 176)
(511, 158, 525, 181)
(511, 158, 525, 181)
(481, 180, 494, 198)
(481, 180, 494, 198)
(501, 148, 516, 166)
(501, 148, 516, 166)
(519, 199, 535, 224)
(519, 199, 535, 224)
(523, 174, 536, 191)
(523, 174, 536, 191)
(496, 195, 516, 229)
(496, 195, 516, 229)
(506, 204, 526, 230)
(506, 204, 526, 230)
(520, 162, 535, 182)
(520, 162, 535, 182)
(697, 2004, 1098, 2475)
(697, 2004

In [53]:
image = cv2.imread(os.path.sep.join([r'testfiles/',"example_01.png"]))
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(300, 300), mean=[104, 117, 123])
net = load_detector(DNN='Caffe') 
net.setInput(blob) 
detections = net.forward()
threshold=0.5
for i in range(0, detections.shape[2]):
    confidence = detections[0, 0, i, 2] # extract the confidence (i.e., probability) associated with  the detection
    if confidence > threshold: #filter out weak detections.
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])#construct bounding box.
            
        (startX, startY, endX, endY) = box.astype("int") #compute the x-y coordinates.
        (startX, startY) = (max(0, startX), max(0, startY))#  ensure the bounding boxes fall within the dimensions of frame
        (endX, endY) = (min(w - 1, endX), min(h - 1, endY))
        face = image[startY:endY, startX:endX] # extract the face ROI
        face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB) #convert BGR to RGB
        face = cv2.resize(face, (224, 224)) #resize to input of our cnn.
        face = preprocess_input(img_to_array(face)) #pre-process 
        face = np.expand_dims(face, axis=0)    
        (withoutMask, mask) = model.predict(face)[0]
            
        if mask > withoutMask:  #bgr  
            label = "Mask"
            color = (0, 255, 0)
        else:
            label = "No Mask"
            color = (0, 0, 255)
            
        # draw rectangles and text   
        label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
        cv2.putText(image, label, (startX, startY - 10),cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
        cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)    
cv2.imshow("Output", image) # Show image
cv2.waitKey(0)    # Display the image infinitely until any keypress  waitKey(0) will display the window infinitely until any keypress (it is suitable for image display).0
cv2.destroyAllWindows()    

#### Putting this in a function

In [54]:
def detect_image_mask(threshold,image,net):
    (h, w) = image.shape[:2] #get the height and width of image
    
    blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size=(300, 300), mean=[104, 117, 123]) #input blob
    net.setInput(blob) #input blob to your net
    detections = net.forward()

    
    for i in range(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2] # extract the confidence (i.e., probability) associated with  the detection
        if confidence > threshold: #filter out weak detections.
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])#construct bounding box.
            
            (startX, startY, endX, endY) = box.astype("int") #compute the x-y coordinates.
            (startX, startY) = (max(0, startX), max(0, startY))#  ensure the bounding boxes fall within the dimensions of frame
            (endX, endY) = (min(w - 1, endX), min(h - 1, endY))
            face = image[startY:endY, startX:endX] # extract the face ROI
            face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB) #convert BGR to RGB
            face = cv2.resize(face, (224, 224)) #resize to input of our cnn.
            face = preprocess_input(img_to_array(face)) #pre-process 
            face = np.expand_dims(face, axis=0)
            
            (withoutMask, mask) = model.predict(face)[0]
            
            if mask > withoutMask:  #bgr  
                label = "Mask"
                color = (0, 255, 0)
            else:
                label = "No Mask"
                color = (0, 0, 255)
            
            # draw rectangles and text   
            label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
            cv2.putText(image, label, (startX, startY - 10),cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
            cv2.rectangle(image, (startX, startY), (endX, endY), color, 2)
    
    cv2.imshow("Output", image) # Show image
    cv2.waitKey(0)    # Display the image infinitely until any keypress  

    
        
    

#### Test Images

In [55]:
threshold=0.2
for i in os.listdir('testfiles'):
    net = load_detector(DNN='Caffe0')
    image = cv2.imread(os.path.join('testfiles',i))
    detect_image_mask(threshold,image, net)
cv2.destroyAllWindows()    

## Real Time Detection

Now we perform realtime face detection. Our code is exactly the same except for a few minor changes:
    
   1. We are detecting frames from video.
   2. We stores all faces detected and their locations in these frames in lists.
   3. If any faces are detected, we run our model trained earlier to give predictions.
   4. We display these predictions and bounding boxes on these frames.


In [57]:
from imutils.video import VideoStream
import time

threshold=0.2
net = load_detector(DNN='Caffe')

#nitialize the video stream and allow the camera sensor to warm up
vs = VideoStream(src=0).start() #// open the default camera
time.sleep(2.0)
while True:
    videoframe = vs.read()
    key=detect_video_mask(threshold,videoframe,net)
    if key == ord("q"): #ord function returns the ascii or decimal value of q. Bitwise and ensures this is equal when q is pressed.
        break
cv2.destroyAllWindows()
vs.stop()
vs.stream.release()


In [56]:
def detect_video_mask(threshold,videoframe,net):
    (h, w) = videoframe.shape[:2] #get the height and width of videoframe
    
    blob = cv2.dnn.blobFromImage(videoframe, scalefactor=1.0, size=(300, 300), mean=[104, 117, 123]) #input blob
    net.setInput(blob) #input blob to your net
    detections = net.forward()
    
    # initialize our list of faces, their corresponding locations.
    faces = []
    locs = []
    

    
    for i in range(0, detections.shape[2]):
        confidence = detections[0, 0, i, 2] # extract the confidence (i.e., probability) associated with  the detection
        if confidence > threshold: #filter out weak detections.
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])#construct bounding box.
            
            (startX, startY, endX, endY) = box.astype("int") #compute the x-y coordinates.
            (startX, startY) = (max(0, startX), max(0, startY))#  ensure the bounding boxes fall within the dimensions of frame
            (endX, endY) = (min(w - 1, endX), min(h - 1, endY))
            face = videoframe[startY:endY, startX:endX] # extract the face ROI
            face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB) #convert BGR to RGB
            face = cv2.resize(face, (224, 224)) #resize to input of our cnn.
            face = preprocess_input(img_to_array(face)) #pre-process 
            
            faces.append(face) #store the face images and their locations.
            locs.append((startX, startY, endX, endY))
            
            
            
    predictions=[]  #intialize predictions.      
    if len(faces) > 0: #if you detected at least one face.
        faces = np.array(faces, dtype="float32")
        predictions =  model.predict(faces, batch_size=32) #predictions in batches to speed up flow.
        for (bbox,pred) in zip(locs,predictions):
            (startX, startY, endX, endY)=bbox
            (withoutMask, mask)=pred
            if mask > withoutMask:  #bgr  
                label = "Mask"
                color = (0, 255, 0)
            else:
                label = "No Mask"
                color = (0, 0, 255)
           
        # draw rectangles and text   
            label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)
            cv2.putText(videoframe, label, (startX, startY - 10),cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
            cv2.rectangle(videoframe, (startX, startY), (endX, endY), color, 2)
    cv2.imshow("Output", videoframe) # Show image
    key=cv2.waitKey(1) & 0xFF #  waitKey(1) will display a frame for 1 ms, after which display will be automatically closed
    return key
    



## cv2.waitKey()

1. **cv2.waitKey()** will return the keyword that you press, in case if u just click on the close button when the window is opened then it will return -1.

2. When you have pressed **'q'**, then cv2.waitkey() will return that **'q'** but the format it returns will be in string data type. In order to change it to binary, we are performing bitwise AND operation(&) with **0xFF** which is in **hexadecimal** format also know as hexadecimal constant, which is **255** in decimal or **11111111** in binary. 

**Note it is the same value in different formats**

3. **'&'** in python is used to perform **bitwise AND operation.** 
 
### AND operation logic:
1. 0&0=0
2. 0&1=0
3. 1&0=0
4. 1&1=1



| **Letter** | **ASCII Code** |  **Binary**  |
|:------:|:----------:|:--------:|
|    q   |     113    | 01110001 |

4: since we have given the hexadecimal constant **0xFF** whose value in binary is 11111111, let's perform the bit AND OPERATION with the binary value of letter **'q'** which is  01110001.

        q= 01110001
      0xFF=11111111
          ----------
           01110001   ----->q so when do bitwise and operation we get the same value of q
          ----------
5. Pnce the bitwise operation is completed or performed, the result will change to the decimal format, so since we are using ord('q') function which will return the decimal value or ASCII value of 'q', so both will be equal the condition if condition becomes true and the loop will break.

## Improvements

Our current method of detecting whether a person is wearing a mask or not is a two-step process:

1. Perform face detection
2. Apply our face mask detector to each face

Face mask obscures part of the face. If enough of the face is obscured, the face cannot be detected, and therefore, the face mask detector will not be applied.

We can also train a two class object detector with mask and no mask class. In this way the detector will be able to detect people with masks more effectively and our computational pipeline would be a single step avoiding the face detection stage.