![](https://img.evbuc.com/https%3A%2F%2Fcdn.evbuc.com%2Fimages%2F33408608%2F38309748108%2F1%2Foriginal.jpg?w=1000&rect=379%2C46%2C1366%2C683&s=8fc2ce5c141e04f89f8125cd6e24ec17)

# Face Detection

---------------------

## A Proprietary API and Open-Source Performance Comparison

How well do cloud service providers' deep learning models detect faces in images, really? How does that compare to an open-source package like OpenCV? And how much is that (potentially) improved accuracy and detail going to cost you?

### Available Features
Below is a table of features from the top three cloud service providers, as of November 2017. These features and prices are subject to change without notice. A designation of `conf` indicates a confidence value of 0 to 1. 

_Also, note:_ Google's Vision API lists facial landmarks in three dimensions (x, y, z), whereas Microsoft's Face API, AWS Rekognition, and OpenCV only list two. For the OpenCV analysis, we will only be including out of the box XML files.

-----------------------------------

| Feature            | Google Vision API | AWS Rekognition | Microsoft Face API | OpenCV |
| ------------------ |:-----------------:|:---------------:|:------------------:|:------:|
| Custom Face Detect |                   |                 | X                  |        |
| Bounding Box       | 2                 | 1               | 1                  | 1      |
| Age                |                   | int range       | float + conf       |        |
| Smile              | T/F               | T/F + conf      | conf               | conf   |
| Eyeglasses         |                   | T/F + conf      | T/F + conf         |        |
| Sunglasses         |                   | T/F + conf      | T/F + conf         |        |
| Gender             |                   | M/F + conf      | M/F + conf         |        |
| Hair Color         |                   |                 | color + conf       |        |
| Makeup: Eye        |                   |                 | T/F + conf         |        |
| Makeup: Lip        |                   |                 | T/F + conf         |        |
| Hair: Bald         |                   |                 | conf               |        |
| Hair: Invisible    |                   | T/F             | T/F + conf         |        |
| Beard              |                   | T/F + conf      | T/F + conf         |        |
| Moustache          |                   | T/F + conf      | T/F + conf         |        |
| Sideburns          |                   |                 | T/F + conf         |        |
| Eyes Open          |                   | T/F + conf      |                    |        |
| Mouth Open         |                   | T/F + conf      |                    |        |
| Eye Left           | X                 | X               | X                  |  X     |
| Eye Right          | X                 | X               | X                  |  X     |
| Left Pupil         | X                 | X               | X                  |        |
| Right Pupil        | X                 | X               | X                  |        |
| Nose Tip           | X                 | X               | X                  |        |
| Nose Left          | X                 | X               | X                  |        |
| Nose Right         | X                 | X               | X                  |        |
| Nose Bottom Right  | X                 |                 | X                  |        |
| Nose Bottom Left   | X                 |                 | X                  |        |
| Nose Bottom Center | X                 |                 | X                  |        |
| Upper Lip          | X                 |                 | X                  |        |
| Lower Lip          | X                 |                 | X                  |        |
| Upper Lip Top      |                   |                 | X                  |        |
| Upper Lip Bottom   |                   |                 | X                  |        |
| Lower Lip Top      |                   |                 | X                  |        |
| Lower Lip Bottom   |                   |                 | X                  |        |
| Mouth Left         | X                 | X               | X                  |        |
| Mouth Right        | X                 | X               | X                  |        |
| Mouth Center       | X                 |                 |                    |        | 
| Mouth Up           | X                 | X               |                    |        | 
| Mouth Down         | X                 | X               |                    |        | 
| Left Eyebrow Left  | X                 | X               | X                  |        |
| Left Eyebrow Up    | X                 | X               |                    |        | 
| Left Eyebrow Right | X                 | X               | X                  |        |
| Right Eyebrow Left | X                 | X               | X                  |        |
| Right Eyebrow Up   | X                 | X               |                    |        |
| Right Eyebrow Right| X                 | X               | X                  |        |
| Eye Midpoint       | X                 |                 |                    |        | 
| Left Eye Left      | X                 | X               | X                  |        |
| Left Eye Right     | X                 | X               | X                  |        |
| Left Eye Top       | X                 | X               | X                  |        |
| Left Eye Bottom    | X                 | X               | X                  |        |
| Right Eye Left     | X                 | X               | X                  |        |
| Right Eye Right    | X                 | X               | X                  |        |
| Right Eye Top      | X                 | X               | X                  |        |
| Right Eye Bottom   | X                 | X               | X                  |        |
| Left Ear Tragion   | X                 |                 |                    |        |
| Right Ear Tragion  | X                 |                 |                    |        |
| Forehead Glabella  | X                 |                 |                    |        |
| Chin Gnathion      | X                 |                 |                    |        |
| Chin Left Gonion   | X                 |                 |                    |        |
| Chin Right Gonion  | X                 |                 |                    |        |
| Pitch              |                   |  X              |  X                 |        |
| Yaw                |                   |  X              |  X                 |        |
| Roll               | X                 |  X              |  X                 |        |
| Pan                | X                 |                 |                    |        |
| Tilt               | X                 |                 |                    |        |
| Brightness         |                   |  % + conf       |                    |        |
| Sharpness          |                   |  % + conf       | text + conf        |        |
| Joy                | text              |   conf          | conf               |        |
| Sorrow             | text              |                 | conf               |        |
| Contempt           |                   |                 | conf               |        |
| Disgust            |                   |                 | conf               |        |
| Fear               |                   |                 | conf               |        |
| Anger              | text              |   conf          | conf               |        |
| Surprise           | text              |                 | conf               |        |
| Confused           |                   |   conf          |                    |        |
| Neutral            |                   |                 | conf               |        |
| Exposure           | T/F               |                 | text + conf        |        |
| Blur               | T/F               |                 | text + conf        |        |
| Headwear           | T/F               |                 | X                  |        |
| Occlusion          |                   |                 | T/F + conf         |        | |  

Price structures, as of November 2017, are as follows (first 20 million transactions):

![](http://paigesear.ch/api_costs.JPG)

The graph above shows a monthly cost comparison for OpenCV, Amazon Web Services, Google Cloud Platform, and Microsoft's Azure Cognitive Services.

Obviously, OpenCV (being open-source) has the lowest up-front cost; but you should also factor in what it would take to have a machine learning engineer or data scientist on staff who would be capable of building products, refining them, and supporting them. The same goes for deep learning frameworks: TensorFlow, CNTK, and Caffe2 would give results similar to the three proprietary APIs, but would be difficult to train, deploy as models, and maintain.

### Facial Recognition - Full-Frontal Performance
Our happy test subjects! Extra points if you can name all four.

![](http://paigesear.ch/heroes.jpg)
![](http://paigesear.ch/myotherdudes_analyzed.png)

As you can see, all faces are detected by all listed tools - though there is some confusion around eye detection with OpenCV. It is also interesting to note that Amazon Web Services' Rekognition places bounding boxes at a tilt, as opposed to a square; this could potentially make custom algorithms or other libraries more difficult to implement.

At any rate: detecting folks who are looking straight at the camera seems to be a bit too easy. Let's try something more challenging, shall we?

### CoolPeopleHangingOutTogether.jpg

![](coolpeoplehangingouttogether.jpg)

### Importing JSON Files

I've made calls to the three APIs listed above, using the picture shown. The responses are stored as `google`, `amazon`, and `microsoft`, respectively; and we will be using them for the remainder of the notebook.

In [196]:
import json

with open('C:\\Users\\pabailey\\Documents\\FaceAPITesting\\google.json') as goog_data:
    google = json.load(goog_data)
    #print(google)

with open('C:\\Users\\pabailey\\Documents\\FaceAPITesting\\amazon.json') as aws_data:
    amazon = json.load(aws_data)
    #print(amazon)
    
with open('C:\\Users\\pabailey\\Documents\\FaceAPITesting\\microsoft.json') as azure_data:
    microsoft = json.load(azure_data)
    #print(microsoft)

## Bounding Polygons
It is a frustrating fact that none of the APIs return bounding box coordinates in a similar fashion; the closest are Google and Microsoft, with pixel locations. AWS Rekognition returns a proportion of the picture's height and width. The bounding boxes with a subset of the facial landmarks are shown below - and this time, the performance is much less than could be desired. Let's unpack it.

_Note, again:_ AWS returns bounding polygons that are slightly askew. This could cause difficulty when creating custom facial algorithms, or attempting to integrate or standardize with other APIs.

In [204]:
# GoogleCloud Response
print("// Exterior Polygons")
for i in range(4):
    print(google['faceAnnotations'][i]['fdBoundingPoly'])

print("\n // Interior Polygons")
for i in range(4):
    print(google['faceAnnotations'][i]['boundingPoly'])

// Exterior Polygons
{'vertices': [{'x': 500, 'y': 223}, {'x': 689, 'y': 223}, {'x': 689, 'y': 412}, {'x': 500, 'y': 412}]}
{'vertices': [{'x': 839, 'y': 282}, {'x': 1015, 'y': 282}, {'x': 1015, 'y': 459}, {'x': 839, 'y': 459}]}
{'vertices': [{'x': 1184, 'y': 254}, {'x': 1392, 'y': 254}, {'x': 1392, 'y': 462}, {'x': 1184, 'y': 462}]}
{'vertices': [{'x': 116, 'y': 233}, {'x': 319, 'y': 233}, {'x': 319, 'y': 436}, {'x': 116, 'y': 436}]}

 // Interior Polygons
{'vertices': [{'x': 450, 'y': 161}, {'x': 696, 'y': 161}, {'x': 696, 'y': 447}, {'x': 450, 'y': 447}]}
{'vertices': [{'x': 807, 'y': 190}, {'x': 1046, 'y': 190}, {'x': 1046, 'y': 468}, {'x': 807, 'y': 468}]}
{'vertices': [{'x': 1168, 'y': 175}, {'x': 1445, 'y': 175}, {'x': 1445, 'y': 496}, {'x': 1168, 'y': 496}]}
{'vertices': [{'x': 41, 'y': 128}, {'x': 331, 'y': 128}, {'x': 331, 'y': 465}, {'x': 41, 'y': 465}]}


In [198]:
# AWS Rekognition
for i in range(3):
    print(amazon['FaceDetails'][i]['BoundingBox'])

{'Width': 0.13437500596046448, 'Height': 0.18566493690013885, 'Left': 0.7174999713897705, 'Top': 0.21329879760742188}
{'Width': 0.12812499701976776, 'Height': 0.17702935636043549, 'Left': 0.512499988079071, 'Top': 0.22452504932880402}
{'Width': 0.12687499821186066, 'Height': 0.1744386851787567, 'Left': 0.31187498569488525, 'Top': 0.18134714663028717}


In [199]:
# Microsoft Cognitive Services
for i in range(3):
    print(microsoft[i]['faceRectangle'])

{'top': 237, 'left': 521, 'width': 169, 'height': 169}
{'top': 264, 'left': 188, 'width': 165, 'height': 165}
{'top': 294, 'left': 840, 'width': 155, 'height': 155}


OpenCV, after a bit of parameter tweaking, and AWS are able to detect everyone except for Claude Shannon; Azure is able to detect everyone except for Joseph Weizenbaum. Google's Vision API is the only tool that is able to detect features for all four, out of the box.

_Note:_ the white and blue dots on faces for both AWS and Azure are just a subset of their available facial landmarks. For a full listing, see the table above.

![](http://paigesear.ch/mydudes_analyzed.png)

## Age Estimates
_Note: only available from AWS and from Microsoft Azure._

#### Correct ages of our dudes in April 1968:

- Ed Fredkin: 34
- Claude Shannon: 52
- John McCarthy: 41
- Joseph Weizenbaum:  45

AWS Rekognition is the most spot-on for age estimates; however, the spreads are **huge** (20+ years for each of the detected faces). I assume the ranges equate to 95% confidence intervals, but that is not listed in documentation.

In [205]:
print("Amazon Rekognition - Age Detection")
for i in range(3):
    faces = ["John McCarthy", "Ed Fredkin", "Joseph Weizenbaum"]
    print(amazon['FaceDetails'][i]['AgeRange'], "for", faces[i])

print("\n")

print("Microsoft Cognitive Services - Age Detection")
for i in range(3):
    faces = ["Claude Shannon", "John McCarthy", "Ed Fredkin"]
    print(microsoft[i]['faceAttributes']['age'], "for", faces[i])

Amazon Rekognition - Age Detection
{'Low': 38, 'High': 59} for John McCarthy
{'Low': 35, 'High': 52} for Ed Fredkin
{'Low': 48, 'High': 68} for Joseph Weizenbaum


Microsoft Cognitive Services - Age Detection
47.3 for Claude Shannon
52.8 for John McCarthy
39.4 for Ed Fredkin


### How happy is Ed Fredkin?
God knows I'd be if I was standing next to Shannon, McCarthy, and Weizenbaum.

![](http://paigesear.ch/EdFredkin.JPG)

All three APIs agree that Ed's a pretty ecstatic guy; but the values are returned in quite dissimilar manners. Azure's Face API has top performance (99% confident in our test subject's happiness), while AWS is 52 - 62% confident that Ed is also disgusted and confused. Google's Vision API merely returns a `VERY_LIKELY`, with no corresponding confidence level.

It should also be noted that the `SMILE` feature from Google, Microsoft, and AWS could merely look at positions for facial landmarks, and detect whether the two side corners of a subject's mouth are lifted about the upper lip's location.

In [174]:
# Microsoft Face API
print(microsoft[2]['faceAttributes']['emotion'])

{'anger': 0.0, 'contempt': 0.0, 'disgust': 0.0, 'fear': 0.0, 'happiness': 0.99, 'neutral': 0.009, 'sadness': 0.0, 'surprise': 0.0}


In [175]:
# Google Cloud Platform - Vision API
print(google['faceAnnotations'][1]['joyLikelihood'])
print(google['faceAnnotations'][1]['sorrowLikelihood'])
print(google['faceAnnotations'][1]['surpriseLikelihood'])
print(google['faceAnnotations'][1]['angerLikelihood'])

VERY_LIKELY
VERY_UNLIKELY
VERY_UNLIKELY
VERY_UNLIKELY


In [183]:
# Amazon Rekognition
print(amazon['FaceDetails'][1]['Emotions'])

[{'Type': 'HAPPY', 'Confidence': 99.95712280273438}, {'Type': 'CONFUSED', 'Confidence': 0.6194353103637695}, {'Type': 'DISGUSTED', 'Confidence': 0.5203723907470703}]


## Comparison of Facial Landmarks and Orientation

Google's Vision API lists all of its facial landmarks in three dimensions rather than two - which is excellent! The third dimension for AWS Rekognition and Microsoft's Face API could be derived with vector calculus, however (a combination of pitch, yaw, and known facial landmark locations).

All three proprietary APIs place facial landmarks in similar locations; OpenCV often struggles with placement for eyes.

In [146]:
print(microsoft[0]['faceLandmarks']['eyebrowLeftOuter'])
print(microsoft[0]['faceLandmarks']['pupilLeft'])
print(microsoft[0]['faceAttributes']['headPose'])
print(microsoft[1]['faceAttributes']['headPose'])
print(microsoft[2]['faceAttributes']['headPose'])

{'x': 535.6, 'y': 267.0}
{'x': 566.3, 'y': 283.6}
{'pitch': 0.0, 'roll': -1.6, 'yaw': 18.3}
{'pitch': 0.0, 'roll': 0.0, 'yaw': 3.3}
{'pitch': 0.0, 'roll': 2.7, 'yaw': -19.5}


In [153]:
print(google['faceAnnotations'][0]['landmarks'][2])
print(google['faceAnnotations'][0]['landmarks'][0])
print(google['faceAnnotations'][0]['rollAngle'])
print(google['faceAnnotations'][0]['panAngle'])
print(google['faceAnnotations'][0]['tiltAngle'])
print(google['faceAnnotations'][1]['rollAngle'])
print(google['faceAnnotations'][1]['panAngle'])
print(google['faceAnnotations'][1]['tiltAngle'])
print(google['faceAnnotations'][2]['rollAngle'])
print(google['faceAnnotations'][2]['panAngle'])
print(google['faceAnnotations'][2]['tiltAngle'])
print(google['faceAnnotations'][3]['rollAngle'])
print(google['faceAnnotations'][3]['panAngle'])
print(google['faceAnnotations'][3]['tiltAngle'])

{'type': 'LEFT_OF_LEFT_EYEBROW', 'position': {'x': 547.7276, 'y': 267.71582, 'z': -3.303433}}
{'type': 'LEFT_EYE', 'position': {'x': 570.0902, 'y': 283.44037, 'z': -0.0011843754}}
-1.5004579
29.29505
5.0191813
-1.1626552
-8.133945
-17.181002
-3.3079503
-59.581093
-4.2570724
8.72574
69.301544
-12.04517


In [131]:
print(amazon['FaceDetails'][0]['Landmarks'][7])

{'Type': 'leftEyeBrowLeft', 'X': 0.7546904683113098, 'Y': 0.26113489270210266}
