# Classification Algorithms Using Trees

## Decision Trees (From Scratch)


In [1]:
import numpy as np
from sklearn.model_selection import train_test_split

from run_algos import utils


# Black code formatter (Optional)
%load_ext lab_black

## Decision Trees

[![image.png](https://i.postimg.cc/RCDcYtgh/image.png)](https://postimg.cc/5j8YYXdW)

<br>

[Source](https://www.youtube.com/watch?v=NxEHSAfFlK8&t=287)

<br>

[![image.png](https://i.postimg.cc/nVBF8bkm/image.png)](https://postimg.cc/vD8F9KC8)


### Decisions To Be Made

1. **Split feature**: Which feature should be used for the splitting?
2. **Split point**: At what point in a numerical variable should we split?
3. **When to stop splitting**: When you should you stop splitting to avoid trees from growing so big?

### Steps

> The following steps are used to build the Decision Tree classifier from scratch.

#### Training
Given the entire dataset:
1. Calculate the information gain with each possible split.
2. Divide the data with the feature and the value threshold (if it's numerical) that gives the most information gain.
3. The result from step 2 is used to create the branches.
4. Repeat steps 1 thru 3 until a stopping criteria is reached.

#### Making Predictions
Given a data point:

1. Traverse the tree until you reach a leaf node.
2. Return the most common class label i.e (if a leaf node is pure, return the class label otherwise, return a majority vote)


#### Important Terms

* Entropy: This refers to how much variance the data has. i.e. it measures how random or unpredictable a node is. The entropy is largest when a node has 50% of both classes (e.g. a binary class)

$$
E = - \sum^C_{i=1}(p_{i}*log_{2}(p_{i}))
$$

where:

$p_{i}$ is the probability of randomly picking an element of $class_{i}$ .

$C$ is the total number of classes. For a binary problem, $C = 2$. i.e $C_{unique} = [0, 1]$


* **Information Gain (IG)**: This measures the quality of the splits. i.e. it measures how much entropy was removed by splitting on a feature.

$$
IG = E_{parent} - (weighted average) * E_{children}
$$

### Stopping Criteria

1. **Maximum depth**: This refers to how deep you want the tree to grow.
2. **Minimum no of samples**: Refers to the minimum number of samples a node can have before splitting can take place.
3. **Minimum impurity decrease**: Refers to the minimum entropy change required for a split to take place.