## **Decision Trees**

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

In [2]:
iris = load_iris()
X = iris.data[:, 2:]
y = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

In [3]:
from sklearn.tree import export_graphviz

export_graphviz(
    tree_clf,
    out_file="iris.dot",
    feature_names=iris.feature_names[2:],
    class_names=iris.target_names,
    rounded=True,
    filled=True
)

![image.png](attachment:image.png)

Gini impurity is calculated as:<br>
$$G_i = 1 - \sum_{k=1}^nP_{i,k}^2$$

For the depth 2 left node it's 1 - (0/54)^2 - (49/54)^2 - (5/54)^2 = 0.168

In [4]:
tree_clf.predict_proba([[5, 1.5]])

array([[0.        , 0.90740741, 0.09259259]])

**Sklearn Trees use CART algorithm**, meaning they "grow" recursively.<br>
I implemented it from scratch before, so this is it.<br>

Decision trees are prone to overfitting, so it's important to set hyperparametric constraints, such as:<br>

- min_samples_split - minimum number of samples node must have before it can be split
- min_samples_leaf - the minimum number of samples a leaf must have
- min_weight_fraction_leaf- same to min_samples_leaf but as a fraction
- max_leaf_nodes
and more

![image.png](attachment:image.png)

Decision Trees are also used for regression tasks and use MSE loss at each node as impurity measure.<br>
Each leaf node returns a mean of samples in it.<br>
MSE loss measures squared difference between mean and each individual sample.

Decision Trees are veeery sensitive to the changes in the data and to outliers.