<a href="https://colab.research.google.com/github/JWuzyk/ML-Lab-Facial-Keypoint-Detection/blob/master/Machine_Learning_Lab_Project2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Tasks:
  - Preprocessing ()
  - Data Augmentation (partially done) see https://imgaug.readthedocs.io/en/latest/source/examples_keypoints.html (Reflection, Rotation, Contrast Jittering?)
  - Different Models for each keypoint
  - Transfer Learning (partially done) - Could add more refinement in the imported network, see keras documentation, need to retrain some of layers in the network, consider using other networks, potentially one pretrained on faces not ImageNet ,eg. VGGFace https://github.com/rcmalli/keras-vggface
  - Hyperparameter Tuning either sklearn https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/ or with Tensorboard  see https://www.youtube.com/watch?v=BqgTU7_cBnk (learing rate, optimiser, augmentation ) + saving model weights
  - Do a proper validartion split
  - Set up Kaggle submission
  - Use an emsemble model
  - Find More Data e.g. http://www.milbo.org/muct/,https://github.com/soheillll/Facial-Keypoint-Detection,https://www.kaggle.com/selfishgene/youtube-faces-with-facial-keypoints, http://umdfaces.io/

Stretch Goals
  - Combine with face detection and apply to other images
  
Challenge:
  - Lots of the data is missing values so we can't train on all of it and there is no sensible way to fill in missing data
  
Solution: 
  - Train individual models for each feature
  
Resources:
  -http://cs231n.stanford.edu/reports/2016/pdfs/010_Report.pdf ; basically a paper by someone doing exactly what we are doing

## Load data



In [1]:
# Load from github


!git clone https://github.com/JWuzyk/ML-Lab-Facial-Keypoint-Detection/
%cd ML-Lab-Facial-Keypoint-Detection/ 

!unzip data/test.zip 
!unzip data/training.zip

!ls

Cloning into 'ML-Lab-Facial-Keypoint-Detection'...
remote: Enumerating objects: 24, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 24 (delta 5), reused 9 (delta 0), pack-reused 0[K
Unpacking objects: 100% (24/24), done.
/content/ML-Lab-Facial-Keypoint-Detection
Archive:  data/test.zip
  inflating: test.csv                
Archive:  data/training.zip
  inflating: training.csv            
data  Machine_Learning_Lab_Project.ipynb  README.md  test.csv  training.csv


In [0]:
#Reading in the data from my drive as pandas Dataframes

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

IDLookupTable = pd.read_csv('data/IdLookupTable.csv')
Training = pd.read_csv('training.csv')
Test = pd.read_csv('test.csv')

In [0]:
#separate data into different parts (so we can use different models given that some data is missing)

#separatedata gets rid of the rows containing nan values for subsection of Data_Frame between col1 and col2
def separatedata(df, col1, col2):
  sel = df.dropna(subset=[*df.columns[col1:col2]])
  sel = sel.reset_index(drop=True)
  
  keys = sel.iloc[:,col1:col2]
  keys = keys.to_numpy()
  
  images = sel.iloc[:,-1]
  images = images.apply(lambda n: np.fromstring(n, dtype = int, sep=' '))
  images = np.vstack(images.values)
  images = images.reshape(images.shape[0],96,96)
  
  # I run into issues running the model if I don't have an array with shape (-,96,96,3), 
  ret = np.empty((*images.shape, 3), dtype=np.uint8)
  ret[:, :, :,:] = images[:,:, :, np.newaxis]
  images = ret
  
  return images,keys

def normalize(x, y):
  x = x/255
  y = y/96
  
  return (x, y)
  



In [0]:
#I played around a bit and think we should separate the data as follows, can also think about splitting the eyebrow data if we want to

x_eye_center, y_eye_center = separatedata(Training, 0, 4)
#x_eye_corner, y_eye_corner = separatedata(Training, 4, 12)
#x_eyebrow, y_eyebrow = separatedata(Training, 12, 20)
#x_nose_tip, y_nose_tip = separatedata(Training, 20, 22)
#x_mouth_corner_top, y_mouth_corner_top = separatedata(Training, 22, 28)
#x_mouth_center_bottom_lip, y_mouth_center_bottom_lip = separatedata(Training, 28, 30)


## Data Augmentation

In [0]:
import imgaug as ia
import imgaug.augmenters as iaa
from imgaug.augmentables import Keypoint, KeypointsOnImage

# Data Augmentation 

def generate(x,y):
  images = (x).astype(np.uint8)
  keys = y
  #ia.imshow(image)

  n_images = images.shape[0]
  n_keys = keys.shape[1]//2

  keys = keys.reshape(n_images,n_keys,2)

  keypoints = []
  for k in range(n_images):
    kps = KeypointsOnImage([Keypoint(*keys[k,i,:]) for i in range(n_keys)], shape=images.shape)
    keypoints.append(kps)

  # rotate each image some angle between -30 and 30 degrees and flip horizontaly 50% of the time
  seq = iaa.Sequential([
      iaa.Affine(rotate=(-30,30)),
      iaa.Fliplr(0.5)
  ])

  image_aug, kps_aug = seq(images=images, keypoints=keypoints)

  # image with keypoints before/after augmentation (shown below)
  #image_before = keypoints[30].draw_on_image(images[30], size=2)
  #image_after = kps_aug[30].draw_on_image(image_aug[30], size=2)
  #ia.imshow(image_before)
  #ia.imshow(image_after)

  keys_aug = np.array([keypoints[i].to_xy_array() for i in range(len(keypoints))]).reshape(len(keypoints),n_keys*2)


  # add augmented data to training data
  x_train = np.concatenate((x,image_aug))
  y_train = np.concatenate((y,keys_aug))
  
  return image_aug,keys_aug

def augment(x_train,y_train):
  x,y=generate(x_train,y_train)
  x_ret = np.concatenate((x_train,x))
  y_ret = np.concatenate((y_train,y))
  return x_ret,y_ret

# Plotting functions

In [0]:
from PIL import Image

#Turning the data from a dataframe into x_train,y_train, x_test numpy arrays so that we can use it
# x_train has shape (2140,96,96,3) 2140 images each 96x96 with 3 colour channels, y_train has shape (2140,30), in the form (x_1,y_1,...x_15,y_15) for the 15 keypoints

# Copied the getimage method from https://github.com/shichaoji/img_extract/blob/master/img_extract.py, definitely is a faster way for reading in data but it's nice for viewing the images


# Takes a string representing a 96x96 image as in the last column of the training.csv and returns a PIL image
def getimage(each):
    img = Image.new( 'RGB', (96,96), "black") 
    pixels = img.load() # create the pixel map
    
    cot=[int(i) for i in each.split(' ')]
    for i in range(img.size[0]):    # for every pixel:
        for j in range(img.size[1]):
            pixels[i,j] = (cot[i+j*96],cot[i+j*96],cot[i+j*96]) # set the colour accordingly
            
    return img
  
def str_to_np(images):
  # converts a string of integers to a numpy array representing it as an image
   
  n_rows = images.shape[0]
  
  np_images=np.zeros((n_rows,96,96,3))

  for i in range(n_rows):
    im = np.array(getimage(images[i]))
    np_images[i] = im
    
  return np_images


#There's also a ready made method dataframe.to_numpy()...I think it does exactly what this function does?
def keys_to_np(keys):
  n_rows, n_keys = keys.shape
  
  np_keys = np.zeros((n_rows,n_keys))
  for i in range(n_rows):
    np_keys[i] = np.array(keys.iloc[i,:])
  return np_keys


In [0]:

from PIL import ImageDraw

def plotWithKeypoints1(data):

  key = np.array(data.iloc[:-1])
  key = key.astype(int).reshape(15,2)
  
  im = getimage(data['Image'])
  
  draw = ImageDraw.Draw(im)
  for x,y in zip(key[:,0],key[:,1]):
    draw.ellipse((x-1, y-1, x+1, y+1),fill = 'blue')
  return im

def plotWithKeypoints2(im,key):
  
  key = key.astype(int).reshape(15,2)
  img = getimage(im)
  draw = ImageDraw.Draw(img)
  for x,y in zip(key[:,0],key[:,1]):
    draw.ellipse((x-1, y-1, x+1, y+1),fill = 'blue')
  
  return img

plotWithKeypoints1(Training.iloc[100,:])

# Create and Train model

In [0]:
from keras.models import Sequential
from keras.layers import GlobalAveragePooling2D, Dense, Conv2D, MaxPooling2D, Activation, Dropout, Flatten
from keras.preprocessing import image
from keras.applications.xception import Xception
from keras.models import Model
from keras import backend as K

In [0]:
#Code stolen from sheet 4 (was provided to us, not written by us)

def plot_history(history):
    """Create a plot showing the training history of `model.fit`.
    
    Example:
        history = model.fit(...)
        plot_history(history)
    """

    plt.style.use("seaborn-poster")
    x = range(history.params['epochs'])
#     acc, val_acc = history.history['acc'], history.history.get('val_acc')
    f, axarr = plt.subplots(1, sharex=True)
#     axarr[0].set_title('accuracy')
#     axarr[0].plot(x, acc, label='train')
#     if val_acc:
#         axarr[0].plot(x, val_acc, label='validation')
#     axarr[0].legend()
    
    loss, val_loss = history.history['loss'], history.history.get('val_loss')
    axarr.set_title('loss')
    axarr.plot(x, loss, label='train')
    if val_loss:
        axarr.plot(x, val_loss, label='validation')
    axarr.legend()

## Set data

In [0]:
from sklearn.model_selection import train_test_split

# # set data
# x_eye_center, y_eye_center = separatedata(Training, 0, 4)
# x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)
# x_train, y_train = augment(x_train, y_train)
# x_train,_ = normalize(x_train, y_train)
# x_valid,_ = normalize(x_valid, y_valid)

def return_data(data_selection):
  
  if data_selection == 'eye_center':
    x_eye_center, y_eye_center = separatedata(Training, 0, 4)
    x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)
    
  elif data_selection == 'eye_corner':
    x_eye_corner, y_eye_corner = separatedata(Training, 4, 12)
    x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)

  elif data_selection == 'eyebrow':
    x_eyebrow, y_eyebrow = separatedata(Training, 12, 20)
    x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)
    
  elif data_selection == 'nose_tip':
    x_nose_tip, y_nose_tip = separatedata(Training, 20, 22)
    x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)
    
  elif data_selection == 'mouth_corner_top':
    x_mouth_corner_top, y_mouth_corner_top = separatedata(Training, 22, 28)
    x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)
    
  elif data_selection == 'mouth_center_bottom_lip':
    x_mouth_center_bottom_lip, y_mouth_center_bottom_lip = separatedata(Training, 28, 30)
    x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)
    
  else:
    print("data_selection must be one of the following:eye_center, eye_corner, eyebrow, nose_tip, mouth_corner_top, mouth_center_bottom_lip")
    return 0
          
  x_train, y_train = augment(x_train, y_train)
  x_train,_ = normalize(x_train, y_train)
  x_valid,_ = normalize(x_valid, y_valid)
          
  return x_train, y_train, x_valid, y_valid
  

## **Xception based model**

In [0]:
# Transfer learning based on https://keras.io/applications/


def make_model_Xception(n_keys):
  # Create a model based on transfer learning from one of the built in keras models, note this section uses the functional API not Sequential model
  # Xception is a cool model see https://www.youtube.com/watch?v=KfV8CJh7hE0
  # Should replace Xception with VGG Face or at the very least retrain some of it
  
  #importing the base model, should double check that the input is fine
  base_model = Xception(weights='imagenet', include_top=False)
  x = base_model.output
  
  # add a Global Average Pooling for some reason
  x = GlobalAveragePooling2D()(x)
  
  # add a dense layer
  x = Dropout(0.3)(x)
  x = Dense(1024, activation='relu')(x)
  x = Dropout(0.3)(x)
  
  # a dense layer to compute the predicted keypoints
  coords = Dense(n_keys, activation='linear')(x)
  model = Model(inputs=base_model.input, outputs=coords)
  
  return model


# remember to activate the GPU before training (edit -> notebook settings), runs really quick for me ~2 per epoch, loss went down to 20ish but stayed there
# Tried Transfer learning, error went way down to 3ish, I suspect this is overfitting though, can't quite quantify this yet


In [0]:
Xception_model =  make_model_Xception(4)
Xception_model.compile(optimizer = 'adam',loss='mean_squared_error')
historyX = Xception_model.fit(x_train,y_train, batch_size=32, epochs=10, validation_data=(x_valid, y_valid))

In [0]:
# historyX

# plt.style.use("seaborn-poster")
# x = range(historyX.params['epochs'])

# f, axarr = plt.subplots(1, sharex=True)

    
# loss, val_loss = historyX.history['loss'], historyX.history.get('val_loss')
# axarr.set_title('loss')
# axarr.plot(x, loss, label='train')
# if val_loss:
#   axarr.plot(x, val_loss, label='validation')
# axarr.legend()

plot_history(historyX)


## **Home-made Model**

In [0]:
# normal learning
def make_model_custom(n_keys):
  model = Sequential()
  model.add(Conv2D(32, (3, 3), padding='same',
                   input_shape=x_train.shape[1:]))
  model.add(Activation('relu'))
  model.add(Conv2D(32, (3, 3)))
  model.add(Activation('relu'))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.25))

  model.add(Conv2D(64, (3, 3), padding='same'))
  model.add(Activation('relu'))
  model.add(Conv2D(64, (3, 3)))
  model.add(Activation('relu'))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.25))

  model.add(Flatten())
  model.add(Dense(512))
  model.add(Activation('relu'))
  model.add(Dropout(0.5))
  model.add(Dense(n_keys))
  model.add(Activation('linear'))
  return model

In [0]:
model =  make_model_custom(4)
model.compile(optimizer = 'adam',loss='mean_squared_error',metrics =['accuracy'])

In [0]:
history = model.fit(x_train,y_train, batch_size=32, epochs=15, validation_split = 0.3)

In [0]:
plot_history(history)

## **VGG Model**

In [0]:
!pip install git+https://github.com/rcmalli/keras-vggface.git
  
from keras.engine import  Model
from keras.layers import Flatten, Dense, Input, MaxPooling2D, Dropout
from keras_vggface.vggface import VGGFace

In [0]:
#function to give a model based on vgg face
def make_model_VGGFace(n_keys,freeze = True):
  
  #importing the base model, should double check that the input is fine
  #base_model = VGGFace(model='resnet50')
  #x = base_model.output
  
  vgg_model = VGGFace(include_top=False, input_shape=(96, 96,3), weights = 'vggface')
  x = vgg_model.get_layer('pool5').output
  x = MaxPooling2D(pool_size=(2,2))(x)
  x = Dropout(0.25)(x)
  x = Flatten()(x)
  x = Dense(512, activation= 'relu')(x)
  x = Dropout(0.5)(x)
  x = Dense(512, activation='relu')(x)
  x = Dropout(0.5)(x)
  x = Dense(512, activation='relu')(x)
  out = Dense(n_keys, activation='linear')(x)
   
  custom_vgg_model = Model(vgg_model.input, output = out)
  if freeze:
    for layer in custom_vgg_model.layers[:-4]:
      layer.trainable = False
  
  
  return custom_vgg_model

In [0]:
vggmodel =  make_model_VGGFace(4,freeze = False)
vggmodel.compile(optimizer = 'adam',loss='mean_squared_error')

In [0]:
#See how good the model looks

history = vggmodel.fit(x_train,y_train, batch_size=32, epochs=5, validation_split = 0.3)

In [0]:
#I am confused by the loss thing
#I found an article explaining discrepancies between loss and accuracy that's quite nice
#http://www.jussihuotari.com/2018/01/17/why-loss-and-accuracy-metrics-conflict/

plot_history(history)

# Optimising

## Using hyperas

In [3]:
#Using hyperas and the tutorial from this guy
#https://towardsdatascience.com/keras-hyperparameter-tuning-in-google-colab-using-hyperas-624fa4bbf673

!pip install hyperas
!pip install hyperopt


Collecting hyperas
  Downloading https://files.pythonhosted.org/packages/04/34/87ad6ffb42df9c1fa9c4c906f65813d42ad70d68c66af4ffff048c228cd4/hyperas-0.4.1-py3-none-any.whl
Collecting prompt-toolkit<2.1.0,>=2.0.0 (from jupyter-console->jupyter->hyperas)
[?25l  Downloading https://files.pythonhosted.org/packages/f7/a7/9b1dd14ef45345f186ef69d175bdd2491c40ab1dfa4b2b3e4352df719ed7/prompt_toolkit-2.0.9-py3-none-any.whl (337kB)
[K     |████████████████████████████████| 337kB 9.9MB/s 
[31mERROR: ipython 5.5.0 has requirement prompt-toolkit<2.0.0,>=1.0.4, but you'll have prompt-toolkit 2.0.9 which is incompatible.[0m
Installing collected packages: hyperas, prompt-toolkit
  Found existing installation: prompt-toolkit 1.0.16
    Uninstalling prompt-toolkit-1.0.16:
      Successfully uninstalled prompt-toolkit-1.0.16
Successfully installed hyperas-0.4.1 prompt-toolkit-2.0.9




In [0]:
from __future__ import print_function

from hyperopt import Trials, STATUS_OK, tpe
from keras.datasets import mnist
from keras.layers.core import Dense, Dropout, Activation
from keras.models import Sequential
from keras.utils import np_utils

from hyperas import optim
from hyperas.distributions import choice, uniform

In [0]:
def model(X_train, Y_train, X_test, Y_test):
    '''
    Model providing function:
    Create Keras model with double curly brackets dropped-in as needed.
    Return value has to be a valid python dictionary with two customary keys:
        - loss: Specify a numeric evaluation metric to be minimized
        - status: Just use STATUS_OK and see hyperopt documentation if not feasible
    The last one is optional, though recommended, namely:
        - model: specify the model just created so that we can later use it again.
    '''
    model = Sequential()
    model.add(Dense(512, input_shape=(784,)))
    model.add(Activation('relu'))
    model.add(Dropout({{uniform(0, 1)}}))
    model.add(Dense({{choice([256, 512, 1024])}}))
    model.add(Activation({{choice(['relu', 'sigmoid'])}}))
    model.add(Dropout({{uniform(0, 1)}}))

    # If we choose 'four', add an additional fourth layer
    if {{choice(['three', 'four'])}} == 'four':
        model.add(Dense(100))
        model.add({{choice([Dropout(0.5), Activation('linear')])}})
        model.add(Activation('relu'))

    model.add(Dense(10))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy',
                  optimizer={{choice(['rmsprop', 'adam', 'sgd'])}},
                  metrics=['accuracy'])

    model.fit(X_train, Y_train,
              batch_size={{choice([64, 128])}},
              nb_epoch=1,
              verbose=2,
              validation_data=(X_test, Y_test))
    score, acc = model.evaluate(X_test, Y_test, verbose=0)
    print('Test accuracy:', acc)
    return {'loss': -acc, 'status': STATUS_OK, 'model': model}

In [13]:
data = return_data('eye_center')

best_run, best_model = optim.minimize(model=model,
                                          data=data,
                                          max_evals=10,
                                          algo=tpe.suggest,
                                          notebook_name='Machine Learning Lab Project', # This is important!
                                          trials=Trials())


FileNotFoundError: ignored

## Using sklearn

In [0]:
Parameters:
   - Model [custom, VGGFace, Xception]
   - optimizer ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam'] 
   - learning rate 
   - batch size
   - data augmentation []
   - custom layers
  
variable learning rate??

In [0]:
from sklearn.model_selection import train_test_split

# set data
x_eye_center, y_eye_center = separatedata(Training, 0, 4)
x_train, x_valid, y_train, y_valid = train_test_split(x_eye_center, y_eye_center, test_size=0.33, shuffle= True)
#x_train, y_train = augment(x_train, y_train)
x_train,_ = normalize(x_train, y_train)
x_valid,_ = normalize(x_valid, y_valid)


In [0]:
from keras import optimizers

def create_model(model_type = 'VGGFace', optimizer = 'RMSprop', lr = 0.001, n_keys=4):
  
  #model type
  if model_type == 'VGGFace':
    model =  make_model_VGGFace(n_keys)
    
  #optimiser  
  if optimizer == 'adam':     
    opt = optimizers.Adam(lr=lr)
  if optimizer == 'RMSprop':     
    opt = optimizers.RMSprop(lr=lr)
  
  model.compile(optimizer = opt,loss='mean_squared_error')
  
  return model




In [0]:
#Apparently it is way better to use https://github.com/maxpumperla/hyperas, feel free to replace this wiht that


# Careful running search over too many parameters at once causes a crash

import numpy
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor

# Function to create model, required for KerasClassifier

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# create model
model = KerasRegressor(build_fn=create_model, epochs=10, verbose=1)
# define the grid search parameters
batch_size = [8,32,64]
param_grid = dict(batch_size=batch_size)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(x_train, y_train)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Should write something so that grid searc automatically saves it's results but from now

 Best optimiser after 10 epochs: RMSprop 




In [0]:
seed = 7
numpy.random.seed(seed)

model =  create_model(optimizer = 'RMSprop')
model.fit(x_train, y_train, epochs = 10, batch_size=10, validation_data=(x_valid, y_valid))
model.evaluate(x=x_valid, y=y_valid)

In [0]:
15.8

In [0]:

keys = model.predict(x_train[100].reshape(1,96,96,3))
plt.imshow(plotWithKeypoints2(Training['Image'][100],keys))
plt.show()
plt.imshow(plotWithKeypoints1(Training.iloc[100,:]))
plt.show()