# Overview
The objective of this task is to predict keypoint positions on face images. This can be used as a building block in several applications, such as:

    tracking faces in images and video
    analysing facial expressions
    detecting dysmorphic facial signs for medical diagnosis
    biometrics / face recognition

Detecing facial keypoints is a very challenging problem.  Facial features vary greatly from one individual to another, and even for a single individual, there is a large amount of variation due to 3D pose, size, position, viewing angle, and illumination conditions. Computer vision research has come a long way in addressing these difficulties, but there remain many opportunities for improvement.



# Evaluation
Root Mean Squared Error (RMSE)

Submissions are scored on the root mean squared error. RMSE is very common and is a suitable general-purpose error metric. Compared to the Mean Absolute Error, RMSE punishes large errors:

RMSE=1n∑i=1n(yi−y^i)2−−−−−−−−−−−−√,

where y hat is the predicted value and y is the original value.

# Data Description
Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 keypoints, which represent the following elements of the face:

left_eye_center, right_eye_center, left_eye_inner_corner, left_eye_outer_corner, right_eye_inner_corner, right_eye_outer_corner, left_eyebrow_inner_end, left_eyebrow_outer_end, right_eyebrow_inner_end, right_eyebrow_outer_end, nose_tip, mouth_left_corner, mouth_right_corner, mouth_center_top_lip, mouth_center_bottom_lip

Left and right here refers to the point of view of the subject.

In some examples, some of the target keypoint positions are misssing (encoded as missing entries in the csv, i.e., with nothing between two commas).

The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels.

Data files

    training.csv: list of training 7049 images. Each row contains the (x,y) coordinates for 15 keypoints, and image data as row-ordered list of pixels.
    test.csv: list of 1783 test images. Each row contains ImageId and image data as row-ordered list of pixels
    submissionFileFormat.csv: list of 27124 keypoints to predict. Each row contains a RowId, ImageId, FeatureName, Location. FeatureName are "left_eye_center_x," "right_eyebrow_outer_end_y," etc. Location is what you need to predict. 


# 读取数据

In [10]:
import pandas as pd
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.autograd import Variable
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from torchvision.utils import make_grid

import math
import random

from PIL import Image, ImageOps, ImageEnhance
import numbers

import matplotlib.pyplot as plt
%matplotlib inline


In [11]:
train_df=pd.read_csv('../data/facial/training.csv')
test_df=pd.read_csv('../data/facial/test.csv')


num_train=len(train_df)
print("Number of train examples:{0}".format(num_train))
print("the shape of train_df{0}x{1}".format(train_df.shape[0],train_df.shape[1]))

num_test=len(test_df)
print("Number of test examples:{0}".format(num_test))
print("the shape of test_df{0}x{1}".format(test_df.shape[0],test_df.shape[1]))


#去掉NAN行
print("----------------train----dropna():-------------------")
train_df.dropna(axis=0, how='any', inplace=True)
print("Number of train examples:{0}".format(len(train_df)))
print("the shape of train_df{0}x{1}".format(train_df.shape[0],train_df.shape[1]))


print("----------------test----dropna():-------------------")
test_df.dropna(axis=0, how='any', inplace=True)
print("Number of test examples:{0}".format(len(test_df)))
print("the shape of tesr_df{0}x{1}".format(test_df.shape[0],test_df.shape[1]))

Number of train examples:7049
the shape of train_df7049x31
Number of test examples:1783
the shape of test_df1783x2
----------------train----dropna():-------------------
Number of train examples:2140
the shape of train_df2140x31
----------------test----dropna():-------------------
Number of test examples:1783
the shape of tesr_df1783x2


In [12]:
train_df.head()

Unnamed: 0,left_eye_center_x,left_eye_center_y,right_eye_center_x,right_eye_center_y,left_eye_inner_corner_x,left_eye_inner_corner_y,left_eye_outer_corner_x,left_eye_outer_corner_y,right_eye_inner_corner_x,right_eye_inner_corner_y,...,nose_tip_y,mouth_left_corner_x,mouth_left_corner_y,mouth_right_corner_x,mouth_right_corner_y,mouth_center_top_lip_x,mouth_center_top_lip_y,mouth_center_bottom_lip_x,mouth_center_bottom_lip_y,Image
0,66.033564,39.002274,30.227008,36.421678,59.582075,39.647423,73.130346,39.969997,36.356571,37.389402,...,57.066803,61.195308,79.970165,28.614496,77.388992,43.312602,72.935459,43.130707,84.485774,238 236 237 238 240 240 239 241 241 243 240 23...
1,64.332936,34.970077,29.949277,33.448715,58.85617,35.274349,70.722723,36.187166,36.034723,34.361532,...,55.660936,56.421447,76.352,35.122383,76.04766,46.684596,70.266553,45.467915,85.48017,219 215 204 196 204 211 212 200 180 168 178 19...
2,65.057053,34.909642,30.903789,34.909642,59.412,36.320968,70.984421,36.320968,37.678105,36.320968,...,53.538947,60.822947,73.014316,33.726316,72.732,47.274947,70.191789,47.274947,78.659368,144 142 159 180 188 188 184 180 167 132 84 59 ...
3,65.225739,37.261774,32.023096,37.261774,60.003339,39.127179,72.314713,38.380967,37.618643,38.754115,...,54.166539,65.598887,72.703722,37.245496,74.195478,50.303165,70.091687,51.561183,78.268383,193 192 193 194 194 194 193 192 168 111 50 12 ...
4,66.725301,39.621261,32.24481,38.042032,58.56589,39.621261,72.515926,39.884466,36.98238,39.094852,...,64.889521,60.671411,77.523239,31.191755,76.997301,44.962748,73.707387,44.227141,86.871166,147 148 160 196 215 214 216 217 219 220 206 18...


In [13]:
test_df.head()

Unnamed: 0,ImageId,Image
0,1,182 183 182 182 180 180 176 169 156 137 124 10...
1,2,76 87 81 72 65 59 64 76 69 42 31 38 49 58 58 4...
2,3,177 176 174 170 169 169 168 166 166 166 161 14...
3,4,176 174 174 175 174 174 176 176 175 171 165 15...
4,5,50 47 44 101 144 149 120 58 48 42 35 35 37 39 ...


In [14]:
train_df['Image'] = train_df['Image'].apply(lambda im: np.fromstring(im, sep=' '))
len(train_df['Image'].values)

2140

In [15]:
#for label
y_train=train_df[train_df.columns[:-1]].values
print(y_train.shape)

#for image pixel

X_train=np.vstack(train_df['Image'].values) 
print(X_train.shape)

(2140, 30)
(2140, 9216)


In [16]:
test_df['Image'] = test_df['Image'].apply(lambda im: np.fromstring(im, sep=' '))
len(test_df['Image'].values)

1783

In [17]:
#for test image pixel

X_test=np.vstack(test_df['Image'].values) 
print(X_test.shape)

(1783, 9216)


# 加载数据

In [43]:
class facial_data(Dataset):
    def __init__(self,file_path,transform=None,train=True,test=False):
        
        self.test=test  
        df=pd.read_csv(file_path)
        #去掉Nan行
        df.dropna(axis=0, how='any', inplace=True)
        #处理Image栏
        df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))
        
        imgs_num=len(df)
        
        self.X=np.vstack(df['Image'].values) /255.
        self.X = self.X.astype(np.float32)
        
        if self.test:
            #self.X=np.vstack(df['Image'].values) 
            self.y=None
        else:
            #self.X=np.vstack(df['Image'].values) 
            self.y=train_df[train_df.columns[:-1]].values
            self.y=(self.y-48)/48
            self.y =self. y.astype(np.float32)
        print("self.X",self.X.shape)
                
    def __len__(self):
        return len(self.X)
    
    def __getitem__(self,idx):
        if self.y is not None:
            return self.X[idx],self.y[idx]
        else:
            return self.X[idx]
    

In [44]:
batch_size = 32

train_dataset = facial_data('../data/facial/training.csv', transform=None,train=True,test=False)
test_dataset = facial_data('../data/facial/test.csv',transform=None,train=False,test=True)


train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                           batch_size=batch_size, shuffle=False)

self.X (2140, 9216)
self.X (1783, 9216)


In [39]:
class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        
        self.features=nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=0),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2),
            nn.Conv2d(32,64,kernel_size=2,stride=1,padding=0),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2),
            nn.Conv2d(64,128,kernel_size=2,stride=1,padding=0),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2)
        )
        
        self.detection=nn.Sequential(
            nn.Linear(15488,1024),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(1024,1024),
            nn.ReLU(inplace=True),
            nn.Linear(1024,30)
        )
        for m in self.features.children():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            
        
        for m in self.detection.children():
            if isinstance(m, nn.Linear):
                nn.init.xavier_uniform(m.weight)
  
        
    def forward(self,x):
        x=self.features(x)
        x=x.view(x.size(0),-1)
        x=self.detection(x)
        
        return x

In [40]:
model=Net()

optimizer=optim.SGD(model.parameters(), lr=0.003,momentum=0.9)

criterion=nn.MSELoss()

exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()

In [41]:
def train(epoch):
    model.train()
    exp_lr_scheduler.step()

    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        
        if torch.cuda.is_available():
            data = data.cuda()
            target = target.cuda()
        
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        
        loss.backward()
        optimizer.step()
        
        if (batch_idx + 1)% 100 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                100. * (batch_idx + 1) / len(train_loader), loss.data[0]))

In [42]:
train(1)

ValueError: Expected 4D tensor as input, got 2D tensor instead.