# Learning classes from data

## First, get your data: the Iris dataset

You've seen this dataset before (in notebook 6.1). 200+ iris plants, 3 varieties, someone in field measuring petals and sepals. 

In [24]:
import numpy as np
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
Y = iris.target
print("Targets: {}".format(iris['target_names']))
print("Features: {}".format(iris['feature_names']))
print("Y: {}".format(Y))
print("X: {}".format(X))

Targets: ['setosa' 'versicolor' 'virginica']
Features: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Y: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
X: [[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]
 [ 5.4  3.9  1.7  0.4]
 [ 4.6  3.4  1.4  0.3]
 [ 5.   3.4  1.5  0.2]
 [ 4.4  2.9  1.4  0.2]
 [ 4.9  3.1  1.5  0.1]
 [ 5.4  3.7  1.5  0.2]
 [ 4.8  3.4  1.6  0.2]
 [ 4.8  3.   1.4  0.1]
 [ 4.3  3.   1.1  0.1]
 [ 5.8  4.   1.2  0.2]
 [ 5.7  4.4  1.5  0.4]
 [ 5.4  3.9  1.3  0.4]
 [ 5.1  3.5  1.4  0.3]
 [ 5.7  3.8  1.7  0.3]
 [ 5.1  3.8  1.5  0.3]
 [ 5.4  3.4  1.7  0.2]
 [ 5.1  3.7  1.5  0.4]
 [ 4.6  3.6  1.   0.2]
 [ 5.1  3.3  1.7  0.5

### Split Iris data into test and training set

In [25]:
ntest=10
np.random.seed(0)
indices = np.random.permutation(len(iris_X))
iris_X_train = X[indices[:-ntest]]
iris_Y_train = Y[indices[:-ntest]]

iris_X_test = X[indices[-ntest:]]
iris_Y_test = Y[indices[-ntest:]]
print("{} training points, {} test points".format(len(iris_X_train), len(iris_X_test)))

140 training points, 10 test points


## Fit model to your data

We're using the K Nearest Neighbours Algorithm here

In [18]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski')
knn.fit(iris_X_train, iris_Y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

## Check your model

In [29]:
predicted_classes = knn.predict(iris_X_test)

print('kNN predicted classes: {}'.format(predicted_classes))
print('Real classes:          {}'.format(iris_Y_test))

kNN predicted classes: [1 2 1 0 0 0 2 1 2 0]
Real classes:          [1 1 1 0 0 0 2 1 2 0]


In [30]:
from sklearn.metrics import confusion_matrix
print(confusion_matrix(iris_Y_test, predicted_classes))

[[4 0 0]
 [0 3 1]
 [0 0 2]]


In [31]:
from sklearn.metrics import classification_report
print(classification_report(iris_Y_test, predicted_classes))

             precision    recall  f1-score   support

          0       1.00      1.00      1.00         4
          1       1.00      0.75      0.86         4
          2       0.67      1.00      0.80         2

avg / total       0.93      0.90      0.90        10

