# Face recognition

Face recognition using computer vision, openCV, python

This will be performed in 2 steps. In step1, we will generate 128-d facial embedding of known faces. In step2, we will check new face embeddings with stored one to identify faces. 

These 2 steps should be executed as different scripts. Step-1 of generating and storing 128-d facial embeding is one time process whereas step-2 of comparing new face's embedding would be more frequent. 



## STEP 1: Generate 128-d facial embeddings for known faces

In [6]:
# import required libraries
import dlib
import face_recognition
import pickle
import cv2
import os
import time
from imutils import paths


In [7]:
# define input arguments
# path to input directory of faces + images
arg_dataset = ".\\DATA\\face_recognition\\dataset\\"
# path to serialized db of facial encoding
arg_encoding = ".\\DATA\\face_recognition\\encodings.pickle"
# face detection model to use : either hog or cnn
# CNN method is more accurate but slower. 
# HOG method is faster but less accurate
arg_detection_method = "cnn"
# better to use HOG method if running these without GPU (means running on CPU)
# because CNN uses more time. 
# If running on Raspberry Pi, then use HOG because it wont have enough memory to run the CNN. 


In [8]:
# grab the paths to the input images in dataset
print("[INFO] quantifying faces...")
imagePaths = list(paths.list_images(arg_dataset))

# initialize the list of known face encodings and known names
knownEncodings = []
# list of correspondng known names of each known face encodings
knownNames = []


[INFO] quantifying faces...


One important thing to note here is that OpenCV orders color channels in BGR, but the dlib actually expects RGB. The face_recognition module uses dlib, so before we procees, We need to swap color space. 

For each iteration of the loop, we are going to detect a face (or possibly multiple faces)



In [9]:
# capture start time
start = time.time()

# loop over the image paths 
for (i, imagePath) in enumerate(imagePaths):
    # extract the person name from the image path
    #print("[INFO] processing image {}/{}".format(i+1, len(imagePaths)))
    #print("[INFO] imagePath:",imagePath)
    name = imagePath.split(os.path.sep)[-2]
    #print("[INFO] name:", name) 
    # load the input image and convert it from BGR (OpenCV ordering)
    # to dlib ordering (RGB) 
    image = cv2.imread(imagePath)
    rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # NOTE: this is time consuming process
    # detect the (x,y)-coordinates of the bounding boxes
    # corresponding to each face in the input image
    # CNN is more accurate but slower
    # HOG is faster but less accurate
    boxes = face_recognition.face_locations(rgb, model = arg_detection_method)
    #print("[INFO] boxes: ",boxes) 
    
    # compute the facial embedding for the face
    # Facial encoding is a 128 number array
    # this is known as encoding the face into a vector 
    encodings = face_recognition.face_encodings(rgb, boxes)
    #print("[INFO] encodings: ",encodings) 
    # loop over the encodings keep them in list 
    for encoding in encodings:
        # add each encoding + name to our set of known names and encodings
        knownEncodings.append(encoding) 
        knownNames.append(name) 

print("[INFO] DONE...")
# capture end time
end = time.time()
print("[INFO]: Generating face encodings took {:.5} seconds".format(end - start))

# Construct a dictionary with encodings and names and store them as pickle file
# store the facial encodings and names to disk
print("[INFO] serializing encodings...")
data = {"encodings":knownEncodings, "names":knownNames}
f = open(arg_encoding, "wb")
f.write(pickle.dumps(data))
f.close()


[INFO] DONE...
[INFO]: Generating face encodings took 85.449 seconds
[INFO] serializing encodings...


## STEP 2: For new faces, check embedding with stored one

Now that we have created our 128-d face embeddings for each image in our dataset, we are now ready to recognize faces in image using OpenCV, python and deep learning. 

Important thing to note is that CNN algorithm takes time but is more accurate. Whereas HOG is less accurate but takes less time. On CPU, use HOG whereas on GPU use HOG. If GPU not available then another work around could be to generate face-encoding using CNN (which will obviously take time) and run face recognition on new images using HOG which would be fast. 


In [4]:
import face_recognition
import pickle
import cv2
import time


In [36]:
# define input arguments

# path to serialized db of facial encoding
arg_encoding = ".\\DATA\\face_recognition\\encodings.pickle"
# path to input image
arg_image = ".\\DATA\\face_recognition\\examples\\example_03.png"
# face detection model to use : either hog or cnn
#arg_detection_method = "cnn"
arg_detection_method = "hog"


Load the pickle encodings of known faces. Any new faces would be checked against these encoding to find out who's face it is. 

For a new face image, we compute 128-d encodings and initialize a list of names for each face that is detected. 

In [37]:
# load the known faces and embeddings 
print("[INFO] loading encodings...")
data = pickle.loads(open(arg_encoding,"rb").read())

# load the input image and convert it from BGR to RGB
image = cv2.imread(arg_image)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# capture start time
start = time.time()
# detect the (x,y)-coordinates of the bounding box corresponding
# to each face in the input image, then compute the facial embeddings
# for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb, model = arg_detection_method)
encodings = face_recognition.face_encodings(rgb, boxes)
# capture end time
end = time.time()
print("[INFO]: Generating face encodings took {:.5} seconds".format(end - start))


[INFO] loading encodings...
[INFO] recognizing faces...
[INFO]: Generating face encodings took 1.5197 seconds


Now we will attempt to match each face in input image to our known encodings dataset. The function face_recognition.compare_faces returns True/False values, one for each image in our dataset. In this case, our Jurassic Park example, there are 218 images in the dataset and therefore the returned list will have 218 boolean values. 

Internally, compare_faces function is computing the Euclidean distance between the candidate embedding and all faces in our dataset. 
- If the distance is below some tolerance (the smaller the tolerance, the more strict our facial recognition system will be) then function returns True, indicating the faces match
- Otherwise, if the distance is above the tolerance threshold, function will return False as the faces do not match

Essentially, here we are utilizing a "more fancy" KNN model for classification. variable name will eventually hold the name string of the person. 

From the matches list, we can compute the number of "votes" for each name (number of True values asociated with each name), tallly up the votes and select the person's name with most corresponding votes;


In [38]:
# initialize the list of names for each face detected
names = []

# loop over the facial embeddings
for encoding in encodings:
    # attempt to match each face in the input image to our known encoding 
    matches = face_recognition.compare_faces(data["encodings"], encoding)
    #print("[INFO] matches: ",matches)
    name = "Unknown" 
    # check to see if we have found a match
    if True in matches:
        # find the indexes of all matched faces then initialize a 
        # distionary to count the total number of times each 
        # face was matched
        matchedIdxs = [i for (i,x) in enumerate(matches) if x]
        counts = {}
        print("[INFO] matchedIdxs:",matchedIdxs)
        # loop over the matched indexes and 
        # maintain a count for each recognized face
        for i in matchedIdxs:
            name = data["names"][i] 
            counts[name] = counts.get(name, 0) + 1
        
        # determine the recognized face with the largest number
        # of votes (note: in the event of an unlikely tie, Python
        # will select first entry in the dictionary)
        print("[INFO] counts:",counts)
        name = max(counts, key=counts.get)
        #print("[INFO] name:",name)
    
    # update the list of names
    names.append(name)


[INFO] matchedIdxs: [35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 71, 72, 73, 74, 75]
[INFO] counts: {'ian_malcolm': 40}
[INFO] matchedIdxs: [4, 24, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 99, 100, 101, 102, 103, 104, 105, 106, 107, 112, 139, 150, 159]
[INFO] counts: {'john_hammond': 2, 'ellie_sattler': 30, 'claire_dearing': 4}
[INFO] matchedIdxs: [162, 164, 167, 168, 169, 170, 171, 172, 174, 175, 176, 177, 178, 180, 181]
[INFO] counts: {'alan_grant': 15}


For first picture, we got 40 votes for Ian Malcolm which is really good score, given that we have only 41 pictures of Ian. 

For second face, we got only 5 votes for Alan Grant. But still, there is only one name in the dictionary so we likely have found Alan. 


Draw bounding-box and labeled names for each person and draw them on our output image for visualization purposes.

In [39]:
# loop over the recognized faces
for ((top, right, bottom, left),name) in zip(boxes, names):
    # draw the predicted face name on the image
    cv2.rectangle(image, (left,top),(right,bottom), (0,255,0),2)
    y = top-15 if top-15 > 15 else top + 15
    cv2.putText(image, name, (left,y), cv2.FONT_HERSHEY_SIMPLEX,
               0.75, (0,255,0),2)

# show the output image
cv2.imshow("Recognized faces",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

# Recognizing faces in videos

#### Important Performance Note: 
The CNN face recognizer should only be used in real-time if you are working with a GPU (you can use it with a CPU, but expect less than 0.5 FPS which makes for a choppy video). Alternatively (you are using a CPU), you should use the HoG method (or even OpenCV Haar cascades covered in a future blog post) and expect adequate speeds.

If GPU not available then another work around could be to generate face-encoding using CNN (which will obviously take time) and run face recognition on new images using HOG which would be fast. 


In [1]:
from imutils.video import VideoStream
import face_recognition
import imutils
import pickle
import time
import cv2


In [6]:
# parse input argument
# path to serialized db of facial encodings
arg_encodings = ".\\DATA\\face_recognition\\encodings.pickle"
# path to input video
arg_input_video = ".\\DATA\\face_recognition\\videos\\lunch_scene.mp4"
# path to output video
arg_output = ".\\DATA\\face_recognition\\output\\input_face_output.avi"
#arg_output = ".\\DATA\\face_recognition\\output\\webcam_face_output.avi"
# Whether or not to display output frame to screen
arg_display = 1
# face detection model to use : either hog or cnn
#arg_detection_method = "cnn"
arg_detection_method = "hog"


Using videostream to access our camera. if you have multiple cameras on your system (such as built-in webcam and an external USB cam), you can change the src=0 to src=1 and so forth. 

We will be optionally writing processed video frames to disk later, so we initialize write to None. Sleeping for 2 complete seconds allows our camera to warm up. 




In [8]:
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(arg_encodings,"rb").read())

# initialize the video stream and pointer to output video files,
# then allow the camera sensor to warm up

if not arg_input_video:
    print("[INFO] starting Webcam stream...")
    vs = VideoStream(src=0).start()
else:
    print("[INFO] starting Input video stream...")
    vs = cv2.VideoCapture(arg_input_video)

writer = None
time.sleep(2.0)

# loop over frames from the video file stream
while True:
    # grab the frame from the threaded video stream
    frame = vs.read()
    frame = frame[1] if arg_input_video else frame
    
    # check to see if we have reached the end of the stream
    if frame is None:
        break
    
    # convert the input frame from BGR to RGB then resize it to have 
    # a width of 750px (to speedup processing)
    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    rgb = imutils.resize(frame, width=750)
    r = frame.shape[1] / float(rgb.shape[1])
    
    # detect the (x,y)-coordinates of the bounding-boxes
    # corresponding to each face in the input frame, then
    # compute the facial embeddings for each face
    boxes = face_recognition.face_locations(rgb, model=arg_detection_method)
    encodings = face_recognition.face_encodings(rgb, boxes)
    names = []
    # loop over the facial embeddings
    for encoding in encodings:
        # attempt to match each face in the input image 
        # to our known encodings
        matches = face_recognition.compare_faces(data["encodings"],encoding)
        name = "Unknown" 
        
        # check to see if we have found a match
        if True in matches:
            # find the index of all matched faces then initialize a 
            # dict to count the total number of ties each face
            # was matched
            matchedIdxs = [i for (i,b) in enumerate(matches) if b]
            counts = {}
            
            # loop over the matched indexes and maintain a count for
            # each recognized face
            for i in matchedIdxs:
                name = data["names"][i]
                counts[name] = counts.get(name,0)+1
            # determine the recognized face with the largest number of votes
            name = max(counts, key=counts.get)
        
        # update the list of names
        names.append(name)
        
    # loop over the recognized faces
    for ((top,right,bottom,left),name) in zip(boxes,name):
        # rescale the face coordinates
        top = int(top*r)
        right = int(right*r)
        bottom = int(bottom*r)
        left = int(left*r)
        
        # draw the predicted face names on the image
        cv2.rectangle(frame, (left,top),(right,bottom),(0,255,0),2)
        y = top - 15 if top - 15 > 15 else top + 15
        cv2.putText(frame, name, (left,y),cv2.FONT_HERSHEY_SIMPLEX,
                   0.75, (0,255,0),2)
    # if the video write is None and we are supposed to write 
    # the output video to disk, then initialize the writer
    if writer is None and arg_output is not None:
        fourcc = cv2.VideoWriter_fourcc(*"MJPG")
        writer = cv2.VideoWriter(arg_output,fourcc,20,
                                (frame.shape[1],frame.shape[0]), True)
    # if the write is not None, write the frame with recognized
    # faces to disk
    if writer is not None:
        writer.write(frame)
    # check if we are supposed to display the output frame 
    # to the screen
    if arg_display > 0:
        cv2.imshow("Frame",frame)
        # if the q key was pressed, break from the loop
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    
# do cleanup
cv2.destroyAllWindows()
vs.stop()
# check to see if the video writer point needs to be released
if writer is not None:
    writer.release()


[INFO] loading encodings...
[INFO] starting Input video stream...


AttributeError: 'cv2.VideoCapture' object has no attribute 'stop'