## 1. **Definition**

A **Decision Tree Classifier** is a supervised machine learning algorithm that recursively partitions data into subsets based on feature values, resulting in a tree-like model of decisions. Each internal node represents a decision rule on an attribute, each branch represents the outcome of the rule, and each leaf node represents a class label (the final prediction).

---

## 2. **Splitting Criteria: Gini Impurity and Entropy**

At each node, the decision tree chooses the feature and threshold that maximizes the "purity" of the resulting groups, i.e., makes them as homogeneous as possible regarding the target class.

* **Gini Impurity:**
  Measures the likelihood of incorrect classification of a randomly chosen element if it was randomly labeled according to the distribution of labels in the node.

  $$
  Gini = 1 - \sum_{i=1}^{C} p_i^2
  $$

  where $p_i$ is the proportion of class $i$ samples in the node, and $C$ is the number of classes.

* **Entropy:**
  Measures the amount of disorder or uncertainty.

  $$
  Entropy = -\sum_{i=1}^{C} p_i \log_2 p_i
  $$

  A split that yields child nodes with lower Gini or Entropy is preferred.

---

## 3. **Tree Depth**

* **Tree Depth** is the maximum number of edges from the root node to a leaf node.
* Deeper trees can capture more complex relationships but are more likely to overfit the training data.

---

## 4. **Pruning**

* **Pruning** is the process of reducing the size of the tree to avoid overfitting and improve generalization to unseen data.

  * **Pre-pruning:** Limiting the growth of the tree (e.g., maximum depth, minimum samples per leaf).
  * **Post-pruning:** Removing branches from a fully grown tree that do not provide significant power in predicting the target variable.

---

## 5. **Formal Example**

### **Dataset**

Suppose we have a dataset of animals with two features: `Weight (kg)` and `Sound` (`Bark` or `Purr`). The goal is to classify whether the animal is a **Dog** or a **Cat**.

| Weight (kg) | Sound | Class |
| ----------- | ----- | ----- |
| 8           | Bark  | Dog   |
| 4           | Purr  | Cat   |
| 5           | Purr  | Cat   |
| 12          | Bark  | Dog   |
| 6           | Purr  | Cat   |

### **Tree Construction**

**Step 1:**
The tree algorithm evaluates possible splits—such as splitting on `Sound` or on `Weight`.

Suppose it finds that splitting by `Sound` yields the purest groups:

* If `Sound = Bark` → **Dog**
* If `Sound = Purr` → **Cat**

**Resulting Tree Structure:**

```
           [Sound?]
          /        \
      Bark          Purr
      /               \
    Dog              Cat
```

* **Gini impurity or Entropy for each leaf is 0** (all samples in each leaf are of a single class).


## 6. **Summary Table**

| Concept       | Formal Definition / Role                                         |
| ------------- | ---------------------------------------------------------------- |
| Decision Tree | Recursive partitioning, tree of decision nodes and leaves        |
| Gini/Entropy  | Mathematical impurity metrics for selecting best splits          |
| Tree Depth    | Maximum path length from root to leaf, controls model complexity |
| Pruning       | Techniques to reduce tree size and prevent overfitting           |
| Example       | Classifying Dog/Cat by Sound (Bark/Purr) and Weight              |

---

**In summary:**
A Decision Tree classifier splits data using formal impurity measures (Gini or Entropy) to maximize class purity at each branch, with depth and pruning used to balance accuracy and generalization. The resulting model is a transparent series of decisions mapping features to class labels.




## **Step 1: The Dataset**

| Index | Weight | Sound | Class |
| ----- | ------ | ----- | ----- |
| 1     | 8      | Bark  | Dog   |
| 2     | 4      | Purr  | Cat   |
| 3     | 5      | Purr  | Cat   |
| 4     | 12     | Bark  | Dog   |
| 5     | 6      | Purr  | Cat   |

---

## **Step 2: Splitting by “Sound”**

### **Split Groups:**

* **Group 1 (Bark):** Index 1, 4

  * Both are Dog
* **Group 2 (Purr):** Index 2, 3, 5

  * All are Cat

### **Calculate Gini for Each Group**

**Group 1 (Bark):**

* 2 Dogs, 0 Cats
* $p_\text{Dog} = 1,\ p_\text{Cat} = 0$
* $\text{Gini}_{\text{Bark}} = 1 - (1^2 + 0^2) = 0$

**Group 2 (Purr):**

* 3 Cats, 0 Dogs
* $p_\text{Cat} = 1,\ p_\text{Dog} = 0$
* $\text{Gini}_{\text{Purr}} = 1 - (1^2 + 0^2) = 0$

### **Calculate Weighted Gini for Split**

* Group 1: 2/5, Group 2: 3/5

$$
\text{Gini}_{\text{split}} = \frac{2}{5} \times 0 + \frac{3}{5} \times 0 = 0
$$

---

## **Step 3: Splitting by “Weight”**

We try all possible splits **between values**. The weights in order: 4, 5, 6, 8, 12. Possible split points:

* Between 4 and 5 → 4.5
* Between 5 and 6 → 5.5
* Between 6 and 8 → 7
* Between 8 and 12 → 10

Let’s do the math for each:

---

### **A. Split at Weight < 4.5**

* **Left:** Index 2 (weight 4) → Cat
* **Right:** Index 1, 3, 4, 5 (weights 5, 6, 8, 12)

**Left (1 Cat):**

* $p_\text{Cat} = 1,\ p_\text{Dog} = 0$
* Gini = 0

**Right (1 Dog, 2 Cats, 1 Dog):**

* Index 1: Dog
* Index 3: Cat
* Index 4: Dog
* Index 5: Cat
* So, 2 Dogs, 2 Cats
* $p_\text{Dog} = 0.5,\, p_\text{Cat} = 0.5$
* $\text{Gini} = 1 - (0.5^2 + 0.5^2) = 1 - (0.25 + 0.25) = 0.5$

**Weighted Gini:**

* Left: 1/5 × 0 = 0
* Right: 4/5 × 0.5 = 0.4
* **Total:** 0.4

---

### **B. Split at Weight < 5.5**

* **Left:** Index 2, 3 (weights 4, 5) → Cats
* **Right:** Index 1, 4, 5 (weights 6, 8, 12)

**Left (2 Cats):**

* Gini = 0

**Right:**

* Index 1: Dog
* Index 4: Dog
* Index 5: Cat
* 2 Dogs, 1 Cat
* $p_\text{Dog} = 2/3 \approx 0.67,\, p_\text{Cat} = 1/3 \approx 0.33$
* $\text{Gini} = 1 - (0.67^2 + 0.33^2) \approx 1 - (0.4489 + 0.1089) = 1 - 0.5578 = 0.442$

**Weighted Gini:**

* Left: 2/5 × 0 = 0
* Right: 3/5 × 0.442 ≈ 0.265
* **Total:** ≈ 0.265

---

### **C. Split at Weight < 7**

* **Left:** Index 2, 3, 5 (weights 4, 5, 6) → All Cats
* **Right:** Index 1, 4 (weights 8, 12) → Both Dogs

**Left (3 Cats):**

* Gini = 0

**Right (2 Dogs):**

* Gini = 0

**Weighted Gini:**

* Left: 3/5 × 0 = 0
* Right: 2/5 × 0 = 0
* **Total:** 0

---

### **D. Split at Weight < 10**

* **Left:** Index 1, 2, 3, 5 (weights 4, 5, 6, 8)
* **Right:** Index 4 (weight 12)

**Left:**

* Index 1: Dog
* Index 2: Cat
* Index 3: Cat
* Index 5: Cat
* 1 Dog, 3 Cats
* $p_\text{Cat} = 3/4 = 0.75,\, p_\text{Dog} = 1/4 = 0.25$
* $\text{Gini} = 1 - (0.75^2 + 0.25^2) = 1 - (0.5625 + 0.0625) = 1 - 0.625 = 0.375$

**Right:**

* 1 Dog (Gini = 0)

**Weighted Gini:**

* Left: 4/5 × 0.375 = 0.3
* Right: 1/5 × 0 = 0
* **Total:** 0.3

---

## **Step 4: Compare All Gini Values**

| Split        | Weighted Gini |
| ------------ | ------------- |
| Sound        | 0             |
| Weight < 4.5 | 0.4           |
| Weight < 5.5 | 0.265         |
| Weight < 7   | 0             |
| Weight < 10  | 0.3           |

---

## **Step 5: Which Split is Best?**

* Both splitting by “Sound” and “Weight < 7” result in perfectly pure groups (Gini = 0).
* Either split is mathematically perfect for this small dataset.

---

## **Step 6: What Would the Tree Look Like?**

### **If Split by Sound:**

```
     Sound?
    /      \
 Bark     Purr
 Dog      Cat
```

### **If Split by Weight < 7:**

```
      Weight < 7?
      /        \
   Yes         No
  Cat         Dog
```

---

## **Summary Table of Gini Values**

| Feature | Split Point | Weighted Gini | Pure Leaves? |
| ------- | ----------- | ------------- | ------------ |
| Sound   | --          | 0             | Yes          |
| Weight  | 4.5         | 0.4           | No           |
| Weight  | 5.5         | 0.265         | No           |
| Weight  | 7           | 0             | Yes          |
| Weight  | 10          | 0.3           | No           |
