# Face Detection with OpenCV

In this first example we’ll learn how to apply face detection with OpenCV to the video.

First we need to import some required libraries

In [8]:
import os
import cv2
import argparse
import numpy as np
import matplotlib.pyplot as plt

Define parameters:

In [9]:
# DNN stands for OpenCV: Deep Neural Networks
DNN = "TF" # Or CAFFE, or any other suported framework
min_confidence = 0.5 # minimum probability to filter weak detections

---

These files can be downloaded from the Internet, or created and trained manually.

For Caffe:

* res10_300x300_ssd_iter_140000_fp16.caffemodel

* deploy.prototxt

For Tensorflow:

* opencv_face_detector_uint8.pb

* opencv_face_detector.pbtxt

In [10]:
# load our serialized model from disk
print("[INFO] loading model...")

if DNN == "CAFFE":
    modelFile = "res10_300x300_ssd_iter_140000_fp16.caffemodel"
    configFile= "deploy.prototxt"
    
    # Here we need to read our pre-trained neural net created using Caffe
    net = cv2.dnn.readNetFromCaffe(configFile, modelFile)
else:
    modelFile = "opencv_face_detector_uint8.pb"
    configFile= "opencv_face_detector.pbtxt"
    
    # Here we need to read our pre-trained neural net created using Tensorflow
    net = cv2.dnn.readNetFromTensorflow(modelFile, configFile)
    
print("[INFO] model loaded.")

[INFO] loading model...
[INFO] model loaded.


---
**cv2.dnn.blobFromImage**

This function perform:

* Mean subtraction
* Scaling
* Channel swapping (optionally)

**Mean subtraction** is used to help combat illumination changes in the input images in our dataset.

Before we even begin training our deep neural network, we first compute the average pixel intensity across all images in the training set for each of the Red, Green, and Blue channels.

This implies that we end up with three variables:

$\mu_R$, $\mu_G$, and $\mu_B$

Typically the resulting values are a 3-tuple consisting of the mean of the Red, Green, and Blue channels, respectively.

When we are ready to pass an image through our network (whether for training or testing), we subtract the mean, \mu, from each input channel of the input image:

R = R - $\mu_R$

G = G - $\mu_G$

B = B - $\mu_B$

We may also have a scaling factor, $\sigma$. The value of $\sigma$ may be the standard deviation across the training set which adds in a normalization:

R = (R - $\mu_R$) / $\sigma$

G = (G - $\mu_G$) / $\sigma$

B = (B - $\mu_B$) / $\sigma$

Function signature:

    blob = cv2.dnn.blobFromImage(image, scalefactor=1.0, size, mean, swapRB=True)
    
Where:
* scalefactor  - we can optionally scale our images by some factor. This value defaults to 1.0 (no scaling) 
* size - spatial size that the Convolutional Neural Network expects
* mean - our mean subtraction values
* swapRB -  OpenCV assumes images are in BGR channel order; however, the mean value assumes we are using RGB order. To resolve this discrepancy we can swap the R and B channels in image  by setting this value to True.

---
cv2::dnn::Net Class Reference
This class allows to create and manipulate comprehensive artificial neural networks.

Neural network is presented as directed acyclic graph (DAG), where vertices are Layer instances, and edges specify relationships between layers inputs and outputs.

Each network layer has unique integer id and unique string name inside its network. LayerId can store either layer name or layer id.

This class supports reference counting of its instances, i. e. copies point to the same instance.

In [11]:
# load the input video and construct an input blob for every frame
# by resizing to a fixed 600x400 pixels and then normalizing it

cap = cv2.VideoCapture("babies-video2.mp4")

while(True):
    # Capture frame-by-frame
    ret, frame = cap.read()

    # Our operations on the frame come here
    frame1 = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    frame1 = cv2.resize(frame,(int(600),int(400)))

    blob = cv2.dnn.blobFromImage(cv2.resize(frame1, (300, 300)), 1.0, (300, 300), (104.0, 177.0, 123.0))
    net.setInput(blob)
    detections = net.forward()
    
    (h, w) = frame1.shape[:2]
    # loop over the detections
    for i in range(0, detections.shape[2]):
        # extract the confidence (probability) associated with the prediction
        confidence = detections[0, 0, i, 2]

        # filter out weak detections by ensuring the `confidence` is
        # greater than the minimum confidence
        if confidence > min_confidence:
            # compute the (x, y)-coordinates of the bounding box for the
            # object
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")

            # draw the bounding box of the face
            cv2.rectangle(frame1, (startX, startY), (endX, endY),(0, 69, 255), 2)

    # show the output frame
    cv2.imshow("Frame", frame1)
    key = cv2.waitKey(1) & 0xFF
 
    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break
        
# do a bit of cleanup
cv2.destroyAllWindows()