### SVM Classifier Lab

In this lab, we'll use a linear support vector machine to classify irises.

This dataset ("iris") is one of the most famous in machine learning -- it contains 4 measurements for each of 150 iris instances, along with a label telling the iris type.

The goal here is to expore the space of possible iris measurements, and see where a SVC draws its boundary lines.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC

In [None]:
iris = datasets.load_iris()
dir(iris)

In [None]:
iris.feature_names

In [None]:
iris.data[0]

In [None]:
iris.target_names

In [None]:
iris.target[0]

In order to make it easy to visualize the classification, we'll just use 2 of the 4 features.

That way, we the feature space is 2-dimensional, and we can plot it. We'll indicate the classification itself using colors/symbols at each point.

In [None]:
# The petal len + width:
X = iris["data"][:, (2, 3)]
y = iris["target"].astype(np.float)

To make a simple linear support-vector classifier, we'll use scikit-learn's SVC class: http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

In [None]:
svm_clf = SVC(kernel="linear")
svm_clf.fit(X, y)

Hmmm... we just trained on all of the data... What happened to training, evaluation, test, and all that?

In this case, we are less interested in measuring the accuracy of the classifier, and more interested in visualizing how it sees the "space" (or world) of data (iris) instances.

* We'll actually be able to see the accuracy (or inaccuracy) here because this dataset is so small, we can look at all of the points!

In [None]:
# Make a grid of points throughout the feature space:

x0, x1 = np.meshgrid(
    np.linspace(0, 8, 200).reshape(-1, 1),
    np.linspace(0, 3, 200).reshape(-1, 1)
)

print(x0)

In [None]:
X_new = np.c_[x0.ravel(), x1.ravel()]

print(X_new.shape)

print(X_new[:3])

print(X_new[200:203])

Now we'll get a prediction for every point in this chunk of feature space, and shape it so a prediction "goes with" each point:

In [None]:
y_predict = svm_clf.predict(X_new)

zz = y_predict.reshape(x0.shape)

zz[0] # predictions as x0 "goes" from 0 to 8 cm

Here is how the classifier is dividing up the space of petal length, petal width

In [None]:
plt.contourf(x0, x1, zz)

Well, ok ... but is it any good? Let's plot the real flowers and their classes onto this visualization and see.

Recall that earlier we made X a view onto the the petal length and width values, and y a view onto the target column

```
# The petal len + width:
X = iris["data"][:, (2, 3)]
y = iris["target"].astype(np.float)
```

This lets us use some fancy indexing tricks almost like a "WHERE clause" to find matching records. Then we can give each matching group its own color and symbol.

In [None]:
plt.contourf(x0, x1, zz)
plt.plot(X[y == 2, 0], X[y == 2, 1], "g^") #Virginica
plt.plot(X[y == 1, 0], X[y == 1, 1], "bs") #Versicolor
plt.plot(X[y == 0, 0], X[y == 0, 1], "yo") #Setosa