## Search By Face Expression with Clustering

Let's continue our **Search By Face Expression** application, using the same clustering algorithm as we did with music.

## Load and Pre-process the Data

The first step is to load the data. Again, we assume here that the **feature extraction** step has already been completed. The results of the feature extraction step are in the table below.

In [1]:
import pandas
import matplotlib.pyplot as plot
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler

# Import the data
raw_data = pandas.read_csv('data/facs.csv') 
dataset = raw_data[:20]  # Let's just get 20 photos
dataset

Unnamed: 0,Images,AU1: Inner Brow Raiser,AU2: Outer Brow Raiser,AU4: Brow Lowerer,AU5: Upper Lid Raiser,AU6: Cheek Raiser,AU7: Lid Tightener,AU9: Nose Wrinkler,AU10: Upper Lip Raiser,AU11: Nasolabial Deepener,...,AU38: Nostril Dilator,AU39: Nostril Compressor,AU43: Eyes Closed,AU44: Squint,AU45: Blink,AU54: Head down,AU61: Eyes turn left,AU62: Eyes turn right,AU63: Eyes up,AU64: Eyes down
0,S005_001_00000011,0,0,0,0,0,0,14,0,0,...,0,0,0,0,0,0,0,0,0,0
1,S010_001_00000014,10,10,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,S010_002_00000014,10,10,0,10,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,S010_003_00000018,0,0,10,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,S010_004_00000019,0,0,10,0,0,15,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,S010_005_00000016,0,0,10,0,10,10,15,0,0,...,0,0,0,0,0,0,0,0,0,0
6,S010_006_00000015,0,0,0,0,10,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,S011_001_00000016,10,10,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,S011_002_00000022,10,0,10,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,S011_003_00000014,10,0,10,0,0,10,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Perform K-Means Clustering

In [2]:
# Convert DataFrame to matrix, only values (not names)
dataset_values = dataset.drop('Images', 1)
features = pandas.DataFrame(dataset_values)
mat = features.values

# Use sklearn
km = KMeans(n_clusters=3)
km.fit(mat)

# Get cluster assignment labels
dataset = dataset.assign(cluster = km.labels_)
dataset

Unnamed: 0,Images,AU1: Inner Brow Raiser,AU2: Outer Brow Raiser,AU4: Brow Lowerer,AU5: Upper Lid Raiser,AU6: Cheek Raiser,AU7: Lid Tightener,AU9: Nose Wrinkler,AU10: Upper Lip Raiser,AU11: Nasolabial Deepener,...,AU39: Nostril Compressor,AU43: Eyes Closed,AU44: Squint,AU45: Blink,AU54: Head down,AU61: Eyes turn left,AU62: Eyes turn right,AU63: Eyes up,AU64: Eyes down,cluster
0,S005_001_00000011,0,0,0,0,0,0,14,0,0,...,0,0,0,0,0,0,0,0,0,2
1,S010_001_00000014,10,10,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,S010_002_00000014,10,10,0,10,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
3,S010_003_00000018,0,0,10,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
4,S010_004_00000019,0,0,10,0,0,15,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,S010_005_00000016,0,0,10,0,10,10,15,0,0,...,0,0,0,0,0,0,0,0,0,2
6,S010_006_00000015,0,0,0,0,10,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
7,S011_001_00000016,10,10,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
8,S011_002_00000022,10,0,10,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
9,S011_003_00000014,10,0,10,0,0,10,0,0,0,...,0,0,0,0,0,0,0,0,0,2


## Visualize

We want to see the clusters based on the output. So let's output the ones in each cluster (without plotting). Can you describe what expression each cluster represents?

In [3]:
# Set an option to show all columns
pandas.set_option('display.max_columns', 100)

# If it's in cluster 0, then put it here
cluster0 = dataset.loc[dataset['cluster'] == 0]
cluster0

Unnamed: 0,Images,AU1: Inner Brow Raiser,AU2: Outer Brow Raiser,AU4: Brow Lowerer,AU5: Upper Lid Raiser,AU6: Cheek Raiser,AU7: Lid Tightener,AU9: Nose Wrinkler,AU10: Upper Lip Raiser,AU11: Nasolabial Deepener,AU12: Lip Corner Puller,AU13: Sharp Lip Puller,AU14: Dimpler,AU15: Lip Corner Depressor,AU16: Lower Lip Depressor,AU17: Chin Raiser,AU18: Lip Puckerer,AU20: Lip stretcher,AU21: Neck Tightener,AU22: Lip Funneler,AU23: Lip Tightener,AU24: Lip Pressor,AU25: Lips part,AU26: Jaw Drop,AU27: Mouth Stretch,AU28: Lip Suck,AU29: Jaw Thrust,AU30: Jaw Sideways,AU31: Jaw Clencher,AU34: Cheek Puff,AU38: Nostril Dilator,AU39: Nostril Compressor,AU43: Eyes Closed,AU44: Squint,AU45: Blink,AU54: Head down,AU61: Eyes turn left,AU62: Eyes turn right,AU63: Eyes up,AU64: Eyes down,cluster
4,S010_004_00000019,0,0,10,0,0,15,0,0,0,0,0,0,0,0,14,0,0,0,0,14,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10,S011_004_00000021,0,0,14,0,0,12,0,0,0,0,0,0,0,0,13,0,0,0,0,14,14,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0
15,S014_003_00000030,0,0,10,0,0,10,0,10,0,0,0,0,0,0,10,0,0,0,0,10,10,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0


In [4]:
# If it's in cluster 1, then put it here
cluster1 = dataset.loc[dataset['cluster'] == 1]
cluster1

Unnamed: 0,Images,AU1: Inner Brow Raiser,AU2: Outer Brow Raiser,AU4: Brow Lowerer,AU5: Upper Lid Raiser,AU6: Cheek Raiser,AU7: Lid Tightener,AU9: Nose Wrinkler,AU10: Upper Lip Raiser,AU11: Nasolabial Deepener,AU12: Lip Corner Puller,AU13: Sharp Lip Puller,AU14: Dimpler,AU15: Lip Corner Depressor,AU16: Lower Lip Depressor,AU17: Chin Raiser,AU18: Lip Puckerer,AU20: Lip stretcher,AU21: Neck Tightener,AU22: Lip Funneler,AU23: Lip Tightener,AU24: Lip Pressor,AU25: Lips part,AU26: Jaw Drop,AU27: Mouth Stretch,AU28: Lip Suck,AU29: Jaw Thrust,AU30: Jaw Sideways,AU31: Jaw Clencher,AU34: Cheek Puff,AU38: Nostril Dilator,AU39: Nostril Compressor,AU43: Eyes Closed,AU44: Squint,AU45: Blink,AU54: Head down,AU61: Eyes turn left,AU62: Eyes turn right,AU63: Eyes up,AU64: Eyes down,cluster
1,S010_001_00000014,10,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,10,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,S010_002_00000014,10,10,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
6,S010_006_00000015,0,0,0,0,10,0,0,0,0,10,0,0,0,13,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
7,S011_001_00000016,10,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
12,S011_006_00000013,0,0,0,0,10,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
13,S014_001_00000029,10,10,0,10,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,10,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
17,S014_005_00000017,0,0,0,0,10,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
18,S022_001_00000030,10,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
19,S022_002_00000017,10,10,10,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


In [5]:
# If it's in cluster 2, then put it here
cluster2 = dataset.loc[dataset['cluster'] == 2]
cluster2

Unnamed: 0,Images,AU1: Inner Brow Raiser,AU2: Outer Brow Raiser,AU4: Brow Lowerer,AU5: Upper Lid Raiser,AU6: Cheek Raiser,AU7: Lid Tightener,AU9: Nose Wrinkler,AU10: Upper Lip Raiser,AU11: Nasolabial Deepener,AU12: Lip Corner Puller,AU13: Sharp Lip Puller,AU14: Dimpler,AU15: Lip Corner Depressor,AU16: Lower Lip Depressor,AU17: Chin Raiser,AU18: Lip Puckerer,AU20: Lip stretcher,AU21: Neck Tightener,AU22: Lip Funneler,AU23: Lip Tightener,AU24: Lip Pressor,AU25: Lips part,AU26: Jaw Drop,AU27: Mouth Stretch,AU28: Lip Suck,AU29: Jaw Thrust,AU30: Jaw Sideways,AU31: Jaw Clencher,AU34: Cheek Puff,AU38: Nostril Dilator,AU39: Nostril Compressor,AU43: Eyes Closed,AU44: Squint,AU45: Blink,AU54: Head down,AU61: Eyes turn left,AU62: Eyes turn right,AU63: Eyes up,AU64: Eyes down,cluster
0,S005_001_00000011,0,0,0,0,0,0,14,0,0,0,0,0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
3,S010_003_00000018,0,0,10,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
5,S010_005_00000016,0,0,10,0,10,10,15,0,0,0,0,0,0,10,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
8,S011_002_00000022,10,0,10,0,0,0,0,0,0,0,0,0,10,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
9,S011_003_00000014,10,0,10,0,0,10,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
11,S011_005_00000020,0,0,10,0,0,10,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
14,S014_002_00000016,10,0,10,0,0,0,0,0,0,0,0,0,10,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,0,0,0,0,0,2
16,S014_004_00000019,0,0,10,0,0,10,10,0,0,0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2


# Your turn

- Explore by changing the number of data rows (images) used and the number of clusters
- Are there any constraints on how many clusters you can create?

# Visualization