<h1 align=center> Decision Tree In Depth </h1>

- Supervised ML algorithm
- Used for regression and classification
- Not require feature scaling
- Robust to outlier (not sensitive to outliers), but if the tree is deep outlier can affect and make the tree unstable
- Imbalanced dataset recommended to balance it
- A decision tree is very fast, can be visualized, and is easily explained

![destree.jpg](attachment:destree.jpg)


### Entropy

⇒ measure impurity, and it is used in classification problems

$$
E(s) = \sum \limits _{i=1} ^{c} -p_{i} \log_2 p_i
$$


### **Choosing a split:** Information Gain

⇒Measure how much information a feature gives us about the class. The higher the information gain that split will be select

$$
Information-Gain(S,A) = H(S) -  \sum \limits _{v \in Values(A)} \frac {|Sv|}{|S|} H(Sv)
$$


### GINI impurity

⇒ Gini impurity same as entropy but more sufficient, takes shorter time of computation 

$$
Gini = 1 - \sum \limits _{i=1} ^{c} (p_i)^2
$$



### How Decision Tree Working:

- Start with all example at the root node
- Calculate the information gain for all possible features, and pick the one with the highest information gain
- Split the dataset according to selected feature, and create left and right branches of tree
- keep repeating splitting process until stopping criteria is meet:
    - When the node is a 100% one class
    - When splitting a node will result in the tree exceeding a maximum depth
    - Information gain from additional splits is less than threshold
    - When number of examples in  a node is below a threshold
    









### Classification Problem Practical Example:

- Below is our data
![cla_tree.png](attachment:cla_tree.png)

- We will select each feature and compute the entropy and information gain


![info_cla.png](attachment:info_cla.png)

- From the above picture, we see that the ear shape got the highest information gain, so we will select it as a root node
- Then, we will first select the face shape and the whiskers as the right child of the node, the one with higher information gain will be selected

![final split.png](<attachment:final split.png>)

- Based on the information gain we will select the above-split

### Regression Problem Practical Example:
- Below is our data
![reg_tree.png](attachment:reg_tree.png)


- In the regression problem, we have continuous values as a target
- In the regression problem, we will use variance instead of entropy

![reg_split.png](attachment:reg_split.png)

- From the above picture, we can say that the ear shape will be selected as a root node due to having a higher information gain

In [3]:
# Decision Tree Classification Problem
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
cross_val_score(clf, iris.data, iris.target, cv=10)

array([1.        , 0.93333333, 1.        , 0.93333333, 0.93333333,
       0.86666667, 0.93333333, 1.        , 1.        , 1.        ])

In [4]:
# Decision Tree Regression Problem
from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeRegressor

X, y = load_diabetes(return_X_y=True)
regressor = DecisionTreeRegressor(random_state=0)
cross_val_score(regressor, X, y, cv=10)

array([-0.39292219, -0.46749346,  0.02768473,  0.06441362, -0.50323135,
        0.16437202,  0.11242982, -0.73798979, -0.30953155, -0.00137327])