### Visualizing a Decision Tree

There are many types of classifiers like Neural Networks and Support Vector machines. Why did we start with Decision trees? Because they are easy to comprehend!

Today we will work on a classic ML problem called "Iris". This is all about recognizing flowers based on features such as length and width of the petal

The dataset includes 3 species of flowers (N=150):
- Setosa
- Versicolor
- Virginica

There are 4 features to describe them:
- Sepal length
- Sepal width
- Petal length
- Petal width


### Import Iris datasets

In [11]:
from sklearn.datasets import load_iris
import numpy as np
iris = load_iris()

View meta-data

In [3]:
print(iris.feature_names)
print(iris.target_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
['setosa' 'versicolor' 'virginica']


Let's look at the first instance in our dataset. This is a Setosa because 0 corresponds with the first target_name, being a Setosa in this case

In [6]:
print(iris.data[0])
print(iris.target[0])

[5.1 3.5 1.4 0.2]
0


In [8]:
for i in range (len(iris.target)):
    print("Example %d: label %s, feature_names %s" % (i, iris.target[i], iris.data[i]))

Example 0: label 0, feature_names [5.1 3.5 1.4 0.2]
Example 1: label 0, feature_names [4.9 3.  1.4 0.2]
Example 2: label 0, feature_names [4.7 3.2 1.3 0.2]
Example 3: label 0, feature_names [4.6 3.1 1.5 0.2]
Example 4: label 0, feature_names [5.  3.6 1.4 0.2]
Example 5: label 0, feature_names [5.4 3.9 1.7 0.4]
Example 6: label 0, feature_names [4.6 3.4 1.4 0.3]
Example 7: label 0, feature_names [5.  3.4 1.5 0.2]
Example 8: label 0, feature_names [4.4 2.9 1.4 0.2]
Example 9: label 0, feature_names [4.9 3.1 1.5 0.1]
Example 10: label 0, feature_names [5.4 3.7 1.5 0.2]
Example 11: label 0, feature_names [4.8 3.4 1.6 0.2]
Example 12: label 0, feature_names [4.8 3.  1.4 0.1]
Example 13: label 0, feature_names [4.3 3.  1.1 0.1]
Example 14: label 0, feature_names [5.8 4.  1.2 0.2]
Example 15: label 0, feature_names [5.7 4.4 1.5 0.4]
Example 16: label 0, feature_names [5.4 3.9 1.3 0.4]
Example 17: label 0, feature_names [5.1 3.5 1.4 0.3]
Example 18: label 0, feature_names [5.7 3.8 1.7 0.3]
Exa

In [10]:
for i in range (len(iris.target)):
    print(i, iris.target[i], iris.data[i])

0 0 [5.1 3.5 1.4 0.2]
1 0 [4.9 3.  1.4 0.2]
2 0 [4.7 3.2 1.3 0.2]
3 0 [4.6 3.1 1.5 0.2]
4 0 [5.  3.6 1.4 0.2]
5 0 [5.4 3.9 1.7 0.4]
6 0 [4.6 3.4 1.4 0.3]
7 0 [5.  3.4 1.5 0.2]
8 0 [4.4 2.9 1.4 0.2]
9 0 [4.9 3.1 1.5 0.1]
10 0 [5.4 3.7 1.5 0.2]
11 0 [4.8 3.4 1.6 0.2]
12 0 [4.8 3.  1.4 0.1]
13 0 [4.3 3.  1.1 0.1]
14 0 [5.8 4.  1.2 0.2]
15 0 [5.7 4.4 1.5 0.4]
16 0 [5.4 3.9 1.3 0.4]
17 0 [5.1 3.5 1.4 0.3]
18 0 [5.7 3.8 1.7 0.3]
19 0 [5.1 3.8 1.5 0.3]
20 0 [5.4 3.4 1.7 0.2]
21 0 [5.1 3.7 1.5 0.4]
22 0 [4.6 3.6 1.  0.2]
23 0 [5.1 3.3 1.7 0.5]
24 0 [4.8 3.4 1.9 0.2]
25 0 [5.  3.  1.6 0.2]
26 0 [5.  3.4 1.6 0.4]
27 0 [5.2 3.5 1.5 0.2]
28 0 [5.2 3.4 1.4 0.2]
29 0 [4.7 3.2 1.6 0.2]
30 0 [4.8 3.1 1.6 0.2]
31 0 [5.4 3.4 1.5 0.4]
32 0 [5.2 4.1 1.5 0.1]
33 0 [5.5 4.2 1.4 0.2]
34 0 [4.9 3.1 1.5 0.2]
35 0 [5.  3.2 1.2 0.2]
36 0 [5.5 3.5 1.3 0.2]
37 0 [4.9 3.6 1.4 0.1]
38 0 [4.4 3.  1.3 0.2]
39 0 [5.1 3.4 1.5 0.2]
40 0 [5.  3.5 1.3 0.3]
41 0 [4.5 2.3 1.3 0.3]
42 0 [4.4 3.2 1.3 0.2]
43 0 [5.  3.5 1.6 0.6

### Splitting up our dataset

Grab one of each flower type from our dataset.

In [16]:
test_idx = [0, 50, 100]


# our training data
# delete the three chosen cases from our original dataset, both target and data
train_target = np.delete(iris.target, test_idx)
train_data = np.delete(iris.data, test_idx, axis=0)

# Our testing data with just the 3 examples we removed
# create two new sets of variables; one for training and one for testing
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]



Just like before, we can create a decision tree classifier and train it.

In [20]:
from sklearn import tree

# create object instance 
clf = tree.DecisionTreeClassifier()

# train it on our training data«
clf.fit(train_data, train_target)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

In [22]:
print(test_target)

[0 1 2]


In [24]:
print(clf.predict(test_data))

[0 1 2]


Our predicted labels match our test_target data! It got them all right!