In [1]:
import IPython.display
IPython.display.display_latex(IPython.display.Latex(filename="../macros.tex"))

### decision trees

<img src="images/dTree.PNG">

logical rules:

("checking account" < 200) AND ("Duration" < 2) AND ("credit history" = Bad) $\Rightarrow$ BAD

#### construct

* predicate or class-label in leaf
* answers on edges

_We want the "best" deviding on each step (in each leaf)_

**Entropy**
$$S = -\sum_{i = 1}^{C}p_i log_2 p_i$$
$p_i$ - probability to be $i-th$ class

In leaf: 
* class 1: 9 objects
* class 2: 11 objects

$p_1 =\frac{9}{20}$

$p_2 =\frac{11}{20}$

$$S_0 = - \frac{9}{20}log_2\Big(\frac{9}{20}\Big) - \frac{11}{20}log_2\Big(\frac{11}{20}\Big)\approx 1$$

$P = f_j > c_k$ -predicat

$P = true$:
* class 1: 8 objects
* class 2: 5 objects

$P = false$:
* class 1: 1 objects
* class 2: 6 objects

$S_1 = - \frac{8}{13}log_2\Big(\frac{8}{13}\Big) - \frac{5}{13}log_2\Big(\frac{5}{13}\Big)\approx 0.96$

$S_2 = - \frac{1}{7}log_2\Big(\frac{1}{7}\Big) - \frac{6}{7}log_2\Big(\frac{6}{7}\Big)\approx 0.6$

**information gain**, IG:
$$IG(P) = S_0 - \sum_{i = 1}^{q}\frac{C_i}{C}S_i$$

$C_i$ - nb of object in $i-th$ leaf

$$IG(f_j > c_k) = S_0 - \frac{13}{20}*S_1 - \frac{7}{20}* S_2\approx 0.16$$

**chose predicat in leaf**:
$$argmax_P(IG(P))$$

$$argmax_{j, c_k}(f_j > c_k)$$

**another implementations**

Gini impurity
$$G = 1 - \sum_k(p_k)^2$$

Donskoy Splitting criteria
$$D(P) = \big|\{(x_i, y_i)\::\:y_i\neq y_j\: AND\: P(x_i)\neq P(x_j)\}\big|$$

**Numerical features**:

* sort values $\rightarrow$ some numerical segment
* chose one from each numerical segment

| object | f1  |
|--------|-----|
| 1      | 22  |
| 2      | 36  |
| 3      | 19  |
| 4      | 36  |
| 5      | 54  |

$f1: 19, 22, 36, 54$ $\Rightarrow$ $c = \{0, 20, 30, 40, 60\}$

_sklearn.tree.DecisionTreeClassifier_ main params:

* max_depth
* min_samples_leaf
* max_features

In [2]:
from sklearn.datasets import load_iris
from sklearn import tree
import graphviz

iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = graphviz.Source(dot_data, format='png') 
graph.render("../images/iris")

'../images/iris.png'

<img src="../images/iris.png" style="width: 600px;">

* interpretable
* simple
* fast classification
* diffrent data and gap not plroblem
* noise problem 
* overfitting problem

### bayes classifier

Bayes' theorem
$$P(x|y) = \frac{P(y|x)P(x)}{P(y)}$$

Classification problem
$$P(c|x_1, x_2, ...x_{\nbFeatures}) = \frac{P(c)P(x_1, x_2, ...x_{\nbFeatures}|c)}{P(x_1, x_2, ...x_{\nbFeatures})}$$

$$\alg(x) = argmax_{c\in C}\frac{P(y)P(x|y)}{P(x)}$$

**$$\alg(x) = argmax_{c\in C}P(y)P(x|y) = argmax_{c\in C}P(y)p_c(x)$$**

#### naive bayes

_Each characteristic is independent of any other characteristic_

$$P(c|x_1, x_2, ...x_{\nbFeatures}) = \frac{P(c)P(x_1, x_2, ...x_{\nbFeatures}|c)}{P(x_1, x_2, ...x_{\nbFeatures})}$$

$$
P(c)P(x_1, x_2, ...x_{\nbFeatures}|c) = P(c)P(x_1|c)P(x_2, ...x_{\nbFeatures}|c, x_1)
$$

$$
=P(c)P(x_1|c)P(x2|c, x_1)...P(x_n|C,x_1, x_2, ...x_{\nbFeatures})
$$

Independency of characteristic:<br>
$P(x_i|c, x_j) = P(x_i|c)$

We have

$$
P(c)P((x_1, x_2, ...x_{\nbFeatures}|c) = P(c)P(x_1|c)P(x_2|c)...P(x_{\nbFeatures}|c) = 
$$

$$
P(c)\prod_{i = 1}^{\nbFeatures}P(x_i|c)
$$