# Decision Tree

![Image](./image/DecisionTree.png)

- A decision tree model is depicted as a tree-like structure where decision nodes (ovals or rectangles) represent points of feature evaluation that guide the classification path to leaf nodes (rectangles), which make the final prediction.

- The process involves starting at the **root node** and making decisions based on feature values to traverse the tree until reaching a **leaf node** that predicts the subject's classification.

- Based on the way choosing feature, there will be more tree

### Training Process

1. **Selecting the Current Node Feature**

2. **Branching Based on Feature Values**

3. **Repeat until the Final Node**
    - 100% data have been label, or
    - Exceed depth limit of tree, or
    - Reach the purity.

### Key Decisions in Building Decision Trees:

1. **Feature Selection for Splitting**:
   - At each node, decide which feature to split on, aiming to maximize the purity of subclasses. The goal is to achieve subsets of examples that are as homogeneous as possible (all cats or all dogs).

2. **Stopping Criteria**:
   - Determine when to stop splitting, which could be based on achieving pure subsets, reaching a maximum tree depth, achieving minimal improvement in purity from a split, or having too few examples in a node to justify further splitting.

# Entropy:

- Entropy measures the impurity (or uncertainty) in a dataset

- **Formula**: Entropy is calculated as $H(p_1) = -p_1 \log_2(p_1) - (1-p_1) \log_2(1-p_1)$, where $p_1$ is the fraction of positive

- The entropy value ranges from 0 (completely pure) to 1 (completely impure/mixed)


# Choose Feature Based on Entropy

![Image](./image/Entropy.png)

- Choosing the feature which have the biggest information gained: $H(p_1^{root}) - (w^{left}H(p_1^{left}) + w^{right}H(p_1^{right}))$

($w = \frac{Amount\space data\space on\space current\space node}{Amount\space data\space on\space root\space node}$)

![Image](./image/InformationGain.png)

- We got entropy of root node: $p_1 = \frac{5}{10} = 0.5 \implies H(0.5) = 1$

## Process with continuous feature

- **Split it into small interval**

![Image](./image/ProcessContinuous.png)

# Decision Tree vs. Neuron Network

| Type            | Advantage |
|-----------------|-----------|
| Decision Tree   | - Only suitable for structured data. Unstructured data like images, audio, and text cannot be processed.<br>- Processes faster than neural networks.<br>- Small decision trees can be visually interpreted by humans. |
| Neural Network  | - Efficiently handles all types of data.<br>- Capable of applying transfer learning.<br>- Facilitates easier integration when building systems with
