# <span style="color:green; text-align:center;display:block;">Decision Tree</span>

- A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. 
- It works by splitting the data into different subsets based on the most significant features (decisions), forming a tree structure.
-  At each node, the algorithm asks a yes/no question, and branches are formed based on the answer.

## Structure of a Decision Tree
1. <b>Root Node:</b> Represents the entire dataset and the initial decision to be made.
2. <b>Internal Nodes:</b> Represent decisions or tests on attributes. Each internal node has one or more branches.
3. <b>Branches:</b> Represent the outcome of a decision or test, leading to another node.
4. <b>Leaf Nodes:</b> Represent the final decision or prediction. No further splits occur at these nodes.

### Metrics for Splitting in Decision Trees

When constructing decision trees, different metrics are used to determine the best way to split the dataset at each node. Here are the most common splitting criteria:

### 1. **Gini Impurity**
   - **Used in:** Classification tasks (e.g., CART - Classification and Regression Trees)
   - **Definition:** Measures the likelihood of incorrectly classifying a randomly chosen element if it was randomly labeled according to the distribution of labels in the subset.
   - **Formula:**
     \[
     Gini = 1 - \sum_{i=1}^{n} p_i^2
     \]
     Where \( p_i \) is the proportion of samples of class \( i \) in the subset.
   - **Goal:** Minimize Gini impurity at each split.
   - **Range:** 0 (pure) to 0.5 (impure).

### 2. **Entropy (Information Gain)**
   - **Used in:** Classification tasks (e.g., ID3, C4.5)
   - **Definition:** A measure of the randomness or disorder in the data. Entropy is used to calculate **information gain**, which is the reduction in entropy after a split.
   - **Formula:**
     \[
     Entropy = -\sum_{i=1}^{n} p_i \log_2(p_i)
     \]
     Where \( p_i \) is the proportion of samples of class \( i \) in the subset.
   - **Information Gain:**
     \[
     Information\ Gain = Entropy(parent) - \sum \left(\frac{n_k}{n}\right) Entropy(k)
     \]
   - **Goal:** Maximize information gain (reduce entropy).
   - **Range:** 0 (pure) to log(n) (impure).

### 3. **Variance Reduction (Mean Squared Error)**
   - **Used in:** Regression tasks
   - **Definition:** The variance reduction metric is used to minimize the variance of the target variable in each split.
   - **Formula:**
     \[
     Variance = \frac{1}{n} \sum_{i=1}^{n} (y_i - \mu)^2
     \]
     Where \( y_i \) is the actual target value, and \( \mu \) is the mean of the target values.
   - **Goal:** Minimize the variance of the target values within the child nodes.

### 4. **Mean Absolute Error (MAE)**
   - **Used in:** Regression tasks
   - **Definition:** MAE measures the average of the absolute differences between predicted values and actual values.
   - **Formula:**
     \[
     MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y_i}|
     \]
     Where \( y_i \) is the actual value and \( \hat{y_i} \) is the predicted value.
   - **Goal:** Minimize the absolute difference between predictions and actual values.
