In [2]:
import numpy, pandas
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report
from skimage.color import rgb2lab, rgb2hsv
from helpers.colour import plot_predictions
import matplotlib.pyplot as plot
%matplotlib inline

# Exercise: colour words

In this exercise, we will try to convert RGB colours (which you might have seen in graphics programs or CSS) to colour words (like "red").

The data we have was collected from SFU students by having them [select a word for a randomly-chosen colour](http://cmpt732.csil.sfu.ca/colour/). The colours in the data file have each of the red/green/blue components on a scale 0&ndash;1, instead of 0&ndash;255 which you might have seen before. (The colour tools we're using expect 0&ndash;1, so this will make them happy.)

The labels we have are the [English basic colour terms](https://en.wikipedia.org/wiki/Color_term#Basic_color_terms): "black", "white", "red", "green", "yellow", "blue", "brown", "orange", "pink", "purple", and "grey".

We hope that we can predict that (1,0,0) is called "red", (0,1,0) is called "green", and many other values that are less obvious.

Here is some data.

TODO: 
* Create feature and label arrays `X` and `y`.
* Split into training and testing data.

In [5]:
data = pandas.read_csv('data/colour-data.csv')
data

Unnamed: 0,R,G,B,Label,Confidence
0,0.658824,0.827451,0.952941,blue,good
1,0.145098,0.125490,0.156863,black,perfect
2,0.137255,0.133333,0.149020,black,perfect
3,0.309804,0.290196,0.623529,purple,good
4,0.215686,0.388235,0.133333,green,perfect
5,0.498039,0.498039,0.462745,grey,perfect
6,0.517647,0.607843,0.494118,green,good
7,0.776471,0.611765,0.776471,purple,good
8,0.647059,0.650980,0.749020,grey,poor
9,0.462745,0.525490,0.952941,blue,perfect


In [6]:
# TODO: 
#X =
#y =

In [7]:
# X_train, X_test, y_train, y_test = 

## Attempt 1: Just Use RGB Colours and Naive Bayes

To start with, don't do any work on the features: just use a Gaussian naive Bayes classifier as we have before on the data to see what you get.

TODO:
* Create a model. Train it on the training data.
* Calculate an accuracy score on the testing data.
* We have provided a helper to plot the predictions being made. Try: `plot_predictions(model)`

In [10]:
rgb_model.predict([[1,0,0], [0,1,0], [0,0,1]])

array(['red', 'green', 'blue'], dtype='<U6')

## Attempt #2: Convert to HSV Colours in a Pipeline

You probably didn't get great results above. One of the problems: the RGB colour space isn't perfectly arranged to represent the things we call colours. It is designed for computer screens, not human eyes.

The [HSV colour](https://en.wikipedia.org/wiki/HSL_and_HSV) represents colours using values for "hue", "saturation", and "value". The value for "hue" is probably closely related to the thing called "colour" that we're trying to predict. Maybe if we **transform** to that colour space before going to the classifier, it will have more meaningful values to work with.

The function `transform_rgb2hsv` provided below will convert an array of RGB colour values to an array of HSV colour values. It's something you can use with a `FunctionTransformer` in a pipeline.

TODO:
* Create a pipeline model containing a `FunctionTransformer` and a `GaussianNB`.
* Train, test, and plot as above.

In [58]:
def transform_rgb2hsv(Xrgb):
    return rgb2hsv(Xrgb.reshape(1, -1, 3)).reshape(-1, 3)

## Others

I have no intention of talking about these, but in case they're useful later...

In [61]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import VotingClassifier

# LAB colour space: distances are meaningful, so in theory it should be better for kNN.
# Also seems to work better for the neural net. Because reasons.
def vector_rgb2lab(Xrgb):
    lab = rgb2lab(Xrgb.reshape(1, -1, 3)).reshape(-1, 3)
    return lab/100

### k Nearest Neighbours

In [62]:
knn_model = make_pipeline(
    FunctionTransformer(vector_rgb2lab, validate=True),
    KNeighborsClassifier(15)
)
knn_model.fit(X_train, y_train)
knn_model.score(X_test, y_test)

0.75

In [None]:
plot_predictions(knn_model)

### SVM

In [64]:
svc_model = make_pipeline(
    FunctionTransformer(vector_rgb2lab, validate=True),
    SVC(kernel='rbf', C=5, gamma='scale')
)
svc_model.fit(X_train, y_train)
svc_model.score(X_test, y_test)

0.7580971659919028

In [None]:
plot_predictions(svc_model)

### Neural Network

In [66]:
mlp_model = make_pipeline(
    FunctionTransformer(vector_rgb2lab, validate=True),
    MLPClassifier(hidden_layer_sizes=(15, 12), activation='logistic', max_iter=1500)
)
mlp_model.fit(X_train, y_train)
mlp_model.score(X_test, y_test)

0.7540485829959515

In [None]:
plot_predictions(mlp_model)

### Ensemble of the above

In [68]:
ensemble_model = VotingClassifier(estimators=[
    ('hsv', hsv_model),
    ('knn', knn_model),
    ('svc', svc_model),
    ('mlp', mlp_model),
], weights=[1, 2, 3, 2], n_jobs=4)
ensemble_model.fit(X_train, y_train)
ensemble_model.score(X_test, y_test)

0.7621457489878543

In [None]:
plot_predictions(ensemble_model)