# Facial Keypoint Detection

The objective of this task is to predict keypoint positions on face images. This can be used as a building block in several applications, such as:

    - tracking faces in images and video
    - analysing facial expressions
    - detecting dysmorphic facial signs for medical diagnosis
    - biometrics / face recognition
    
Detecing facial keypoints is a very challenging problem.  Facial features vary greatly from one individual to another, and even for a single individual, there is a large amount of variation due to 3D pose, size, position, viewing angle, and illumination conditions. Computer vision research has come a long way in addressing these difficulties, but there remain many opportunities for improvement.

First things first: how can we make a facial keypoint detector? Well, at a high level, notice that facial keypoint detection is a convolutional neural network problem. A single face corresponds to a set of 15 facial keypoints (a set of 15 corresponding $(x, y)$ coordinates, i.e., an output point). Because our input data are images, we can employ a convolutional neural network to recognize patterns in our images and learn how to identify these keypoint given sets of labeled data.

On a high level, it does this through edge detection, then associating certain patterns of edges with a keypoint. Then, it relays that same information onto new images by detecting edges there and using the relations it has learned.


For this task, we need:
- pandas for data
- cv2 for the classifier, resizing, and filtering
- numpy for its amazing arrays
- Pil for pillow processing
- and some other helper modules

We need to load the data, process it, and scale pixel values as necessary to achieve the grayscale effect

We also need to come up with a method for plotting the keypoints of each face onto the face itself, so we can view predicted values and also manually examine our data

For our model itself, we will use a 4 layer sequential model from keras, with:
- an input layer
- two middle layers
- one final prediction layer

Apart from the model architecture, we will also use the RMSProp optimization algororithm and the mean squared error loss metric, centering our model around accuracy

Let's not go into how the RMS prop algorithm works. For a detailed explanation, go to [this link](http://www.ashukumar27.io/optimization-algorithms/) - warning, it is quite confusing.

__On a surface level__, the RMSprop optimization algorithm is similar to a gradient descent algorithm (our classic algorithm) with momentum. It restricts all oscillations in the vertical direction, essentially going from this (blue):
![](http://www.ashukumar27.io/assets/neuralnets/decay1.png)

to this (green):
![](http://www.ashukumar27.io/assets/neuralnets/decay2.png)

This way, the data converges to a minumum loss in less epochs, saving both computational power and time


Anyways, our program will use the CNN it has constructed, train it on the data given by the Kaggle dataset, and then use that same CNN on the new data to predict the main keypoints of each face.

Obviously, we have the drawback that the features originally trained were in grayscale, while the features we will most likely be predicting on will be in color. 

We can fix this by simply converting the image inputted into a grayscale, predicting the keypoints, and then sending those same points to the color image.