# Computer Vision Classification

In this workshop, we'll explore the concept of **supervised learning**, where our dataset isn't just data, it includes **labels**. 

## Face-to-Emoji

Let's imagine that you want to make a Face-To-Emoji converter, so that whenever you smile, a smiley emoji appears. When you're frowning, a frowny emoji appears. 

## This is similar to classification of Farsi

This is similar to the Farsi Flashcard game. How??? You might ask. Remember how to trained on a *training dataset* and then got tested on a *test set*?

## How it would work

Here's how the flow would look:

**Step 1: Train with Training Set**
- Get lots of images of people's facial expressions, make sure they're labeled (smile/frown/surprise, etc.)
- Convert the images to feature representations
- Train your classifier (brain) to learn the correspondences between features and labels

**Step 2: Test with Test Set**
- Get a test image (no label)
- Convert it to a feature representation
- Using your trained classifer (brain), guess its the label (smile/frown/surprise/other) based on its features

Once you have the label, then you can convert it to the right emoji! 😲😦😊 

## Dataset

Let's start by taking this dataset, called the CK+ (Cohn Kanade) dataset. As a reminder, the images look as below:

|S078_005_00000013.png | S088_004_00000011.png | S094_004_00000012.png | S098_004_00000012.png | S112_001_00000020.png|
|---|---|---|---|---|
|  <img width="120px" src="img/S078_005_00000013.png"> |  <img width="120px" src="img/S088_004_00000011.png">  | <img width="120px" src="img/S094_004_00000012.png">  |  <img width="120px" src="img/S098_004_00000012.png"> | <img width="120px" src="img/S112_001_00000020.png">  |

But this time, we're going to take advantage of an extra piece of information. 

## Annotation
You see, there were some **annotators** that were hired to look at each photo and give it a label. Now our dataset looks like this.

In [None]:
import pandas
import matplotlib.pyplot as plot
from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler

# Import the data
raw_data = pandas.read_csv('data/facs-labels.csv') 
dataset = raw_data[:20]  # Let's just get 20 photos
dataset

# How to Classify with K Nearest Neighbours

So let's remember again how we did classification with the Farsi dataset. We could simply say "hey, which one out of these training images looks the most similar to our test image?"

And then you could check the back of the card, find its label, and assume that the test image has the same label. 

This is called **1-Nearest Neighbour**. Can you guess what **2-Nearest Neighbours** might mean?



## Prepare the Training and Test data

Now let's split up our big dataset into training and testing data, just like we did with the Farsi dataset.

## 1. Create a training dataset

In [None]:
import pandas

# Read the data
raw_data = pandas.read_csv('data/facs-labels.csv')
train = raw_data[:100]  # Training set: get 100 photos

# Let's see the training data
train

## 2. Create a testing example

In [None]:
# Let's just try it out with one test example for now
test = raw_data[101:102] # Test set: just 1 photo for now
test

## 3. Train a K-Nearest Neighbours Classifier

This classifier is basically just like a piece of memory that remembers reaaaaally well. Like a photographic memory. It can pick out things it has already seen (from the training set) that look similar to your test photo.

## Let's train and test our K-Nearest Neighbors classifier!

Now, we can run our classifier on our test data to see how well it worked.

In [None]:
import numpy
from sklearn import neighbors

# Prepare the training data and labels
features = pandas.DataFrame(train).loc[:,'AU1: Inner Brow Raiser':'AU64: Eyes down']
X = features.values
y = train['LBL']

# Set parameters for our classifier, in this case set K=1
n_neighbors = 1

######## TRAINING ########
# Create the classifier with 100 training data images
classifier = neighbors.KNeighborsClassifier(n_neighbors)
classifier.fit(X, y)

######## TESTING ##########
# Run the classifer on the 101st test image
testcase = pandas.DataFrame(test).loc[:, 'AU1: Inner Brow Raiser':'AU64: Eyes down']
print(classifier.predict(testcase))

# Your Turn

Try testing your classifier with other photos from the dataset. Make sure your testing example is not in the training set! Why?

In [None]:
# Write if you were able to successfully recognize/classify each image you tested. Any that didn't work?

- What happens if your training dataset is small? Try it!

In [None]:
# What values did you try? What happened?

## Calculating Accuracy

Now, we did this only on one sample. But how can we tell how good our classifier is? We need to take an average over many different photos.

Let's break up our labeled data into training and test sets. Then we can calculate the classifier's accuracy.

In [None]:
minitrain = train[1:50]
minitest = train[50:100] # Test set: just 1 photo for now