### **Calculating Entropy, Weighted Entropy, and Information Gain for Decision Tree Splitting in Python**

In [22]:
import numpy as np

#### **1. Entropy**

$$
\text{Entropy}(S) = - \sum_{i=1}^n p_i \log_2(p_i)
$$

Where:

- $S$ is the current dataset.
- $p_i$ is the proportion of instances in class $i$ relative to the total number of instances.
- $n$ is the number of different classes.


In [23]:
def calculate_entropy(class_counts: list[int]) -> float:
    total_instances = sum(class_counts)
    entropy = 0
    for count in class_counts:
        if count > 0:  # Avoid log(0)
            probability = count / total_instances
            entropy -= probability * np.log2(probability)
    return entropy

In [24]:
class_counts = [9, 5]  # 9 instances of class Yes, 5 instances of class No
entropy = calculate_entropy(class_counts)
print(f"Entropy: {entropy:.4f}")

Entropy: 0.9403


---


#### **2. Average Information (Weighted Entropy)**


$$
\text{Average Information}(S, A) = \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \times \text{Entropy}(S_v)
$$

Where:

- $S$ is the original dataset.
- $A$ is the attribute being considered for the split.
- $S_v$ is the subset of $S$ for which attribute $A$ has value $v$.
- $|S_v|$ is the number of elements in subset $S_v$.
- $|S|$ is the total number of elements in the original set $S$.


In [25]:
def calculate_weighted_entropy(subsets: list[list[int]]) -> float:
    total_instances = np.sum(subsets)
    weighted_entropy = 0

    for subset in subsets:
        subset_entropy = calculate_entropy(subset)
        weighted_entropy += (sum(subset) / total_instances) * subset_entropy
    return weighted_entropy

In [26]:
subsets = [[4, 2], [5, 3]]  # Subsets after split: [4 Yes, 2 No] and [5 Yes, 3 No]
weighted_entropy = calculate_weighted_entropy(subsets)
print(f"Weighted Entropy: {weighted_entropy:.4f}")

Weighted Entropy: 0.9389


---


#### **3. Information Gain**


$$
\text{Information Gain}(S, A) = \text{Entropy}(S) - \text{Average Information}(S, A)
$$

Where:

- $S$ is the original dataset.
- $A$ is the attribute on which the split is based.
- $\text{Entropy}(S)$ is the entropy of the original dataset.
- $\text{Average Information}(S, A)$ is the weighted entropy after splitting.

> Take Max Information Gain

In [27]:
def calculate_information_gain(
    entropy_before_split: float, subsets: list[list[int]]
) -> float:
    weighted_entropy = calculate_weighted_entropy(subsets)
    information_gain = entropy_before_split - weighted_entropy
    return information_gain


In [28]:
entropy_before_split = calculate_entropy(class_counts)
information_gain = calculate_information_gain(entropy_before_split, subsets)
print(f"Information Gain: {information_gain:.4f}")

Information Gain: 0.0013


---


#### **Summary**

- **Entropy** measures the impurity of a dataset.
- **Average Information (Weighted Entropy)** considers the impurity of the subsets formed after a split.
- **Information Gain** measures the reduction in impurity due to the split, helping to select the best attribute for splitting at each step in building a decision tree.
