### CART的实现——Sklearn


根据sklearn的[文档](https://scikit-learn.org/stable/modules/tree.html)
>scikit-learn uses an optimised version of the CART algorithm; however, scikit-learn implementation does not support categorical variables for now.

可以看到，其默认实现的是CART，下面简单介绍其调用方法。

In [4]:
from sklearn import tree
import pandas as pd
from sklearn.datasets import load_iris

#### Classification

In [14]:
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

In [16]:
test_x = [2.0, 2.0]
clf.predict(test_x)

array([1])

In [17]:
clf.predict_proba(test_x)

array([[ 0.,  1.]])

In [19]:
# 可视化demo
iris = load_iris()
# 数据预览,前5个样本
print('Features:\n', iris.data[:5])
print('Classes:\n', iris.target[:5])

Features:
 [[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]
Classes:
 [0 0 0 0 0]


In [20]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

In [36]:
import graphviz 
dot_data = tree.export_graphviz(clf, out_file=None, 
                     feature_names=iris.feature_names,  
                     class_names=iris.target_names,  
                     filled=True, rounded=True,  
                     special_characters=True) 
graph = graphviz.Source(dot_data) 
# 生成pdf文件，存储决策树
graph.render("iris")

'iris.pdf'

In [38]:
# 改变图像大小的方式
import pydotplus
pydot_graph = pydotplus.graph_from_dot_data(dot_data)
pydot_graph.set_size('"10,8!"')
pydot_graph.write_png('resized_tree.png')

True

![](./resized_tree.png)

上面改变图像大小的方法参考[这里](https://stackoverflow.com/questions/51346827/how-can-i-specify-the-figsize-of-a-graphviz-representation-of-decision-tree).

注意，以下代码仅仅用于matplotlib生成图片的尺寸修改
```
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (10,10)

```

#### Regression

In [39]:
from sklearn import tree
X = [[0, 0], [2, 2]]
y = [0.5, 2.5]
clf = tree.DecisionTreeRegressor()
clf = clf.fit(X, y)
clf.predict([[1, 1]])

array([ 0.5])