## **Understanding Decision Trees in Depth**

Decision trees are a popular and powerful machine learning algorithm used for both classification and regression tasks. They are intuitive, easy to interpret, and capable of handling both numerical and categorical data. In this detailed exploration, we'll cover the fundamental concepts behind decision trees, how they work, and implement one using dummy data.

### 1. Basic Concepts of Decision Trees
#### 1.1 What is a Decision Tree?

A decision tree is a flowchart-like structure where:

- Each internal node represents a test on a feature (attribute).
- Each branch represents the outcome of the test.
- Each leaf node (terminal node) represents a class label (in classification) or a continuous value (in regression).

#### 1.2 Key Concepts

- Entropy: A measure of impurity or disorder in a dataset. The higher the entropy, the more uncertain we are about the class labels.
- Information Gain: The reduction in entropy after splitting the dataset based on a feature. The goal is to choose the feature that results in the highest information gain.
- Tree Pruning: The process of removing branches from the tree that have little importance, which helps to avoid overfitting.

#### 2. How Decision Trees Work

The process of building a decision tree can be broken down into the following steps:

- Select the Best Feature: At each node, evaluate all features and select the one that maximizes information gain (or minimizes entropy).

- Split the Data: Based on the selected feature, split the dataset into subsets.

- Repeat: Recursively apply steps 1 and 2 on the subsets until one of the stopping conditions is met:
    - All samples in a node belong to the same class.
    - No remaining features to split on.
    - The maximum depth of the tree has been reached.
    - The number of samples in a node is less than a specified threshold.

- Assign Class Labels: Once the tree is built, classify new instances by traversing the tree based on feature values.

#### 3. Dummy Data Example

Let's create a simple dummy dataset to illustrate how a decision tree works. We'll create a dataset that predicts whether a person likes a particular type of fruit based on their age and income level.

#### 3.1 Create Dummy Data

### TERMINALOGIES

in-depth explanation of key concepts such as **entropy**, **information gain**, **Gini index**, and **tree pruning**. 

### 1. Entropy

**Definition**:  
Entropy is a measure of uncertainty or impurity in a dataset. In the context of decision trees, it quantifies the amount of disorder or randomness in the target variable (the variable we are trying to predict).

**Importance**:
- **Impurity Measure**: Helps determine how mixed the classes in a dataset are. A high entropy value indicates a high degree of disorder (e.g., a dataset with an equal distribution of classes), while a low entropy value indicates a more ordered dataset.
- **Information Gain Calculation**: Used to calculate information gain, which helps in selecting the best features to split the data at each node of the tree.

**How It Works**:  
The formula for entropy \( H \) of a dataset is given by:



#### Example:
Consider a binary classification problem with a dataset of 10 instances where:
- 6 instances are of Class A
- 4 instances are of Class B

The proportions would be:
- \( p_A = \frac{6}{10} = 0.6 \)
- \( p_B = \frac{4}{10} = 0.4 \)

The entropy of this dataset would be calculated as follows:

\[
H(S) = -\left(0.6 \log_2 0.6 + 0.4 \log_2 0.4\right) \approx 0.970
\]

### 2. Information Gain

**Definition**:  
Information gain measures the reduction in entropy after a dataset is split on an attribute. It quantifies how much knowing the value of an attribute improves our prediction of the target variable.

**Importance**:  
- **Feature Selection**: Helps in deciding which feature to split the data on at each node of the tree. The feature with the highest information gain is selected to create the branches.

#### Example:
Continuing from the previous dataset, let's say we split the data based on another feature that divides it into two groups:

1. Group 1: 4 instances of Class A, 1 instance of Class B (Entropy = 0.321)
2. Group 2: 2 instances of Class A, 3 instances of Class B (Entropy = 0.971)

### 3. Tree Pruning

**Definition**:  
Pruning involves removing nodes from a decision tree that provide little power in classifying instances, thereby reducing the complexity of the model.

**Importance**:  
- **Overfitting Prevention**: Pruning helps to prevent overfitting, where the model becomes too complex and performs poorly on unseen data.

**How It Works**:  
There are two types of pruning:
- **Pre-pruning**: Stops the tree from growing once it meets certain criteria (e.g., a minimum number of samples in a node).
- **Post-pruning**: Grows the full tree and then removes nodes that do not provide significant predictive power.


In [1]:
import pandas as pd

# Create a dummy dataset
data = {
    'Age': [22, 25, 47, 35, 32, 26, 60, 45, 41, 30],
    'Income': [50000, 60000, 120000, 80000, 75000, 62000, 30000, 90000, 70000, 55000],
    'Likes_Fruit': ['Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'No', 'Yes', 'No', 'Yes']
}

# Create a DataFrame
df = pd.DataFrame(data)
print(df)


   Age  Income Likes_Fruit
0   22   50000         Yes
1   25   60000         Yes
2   47  120000          No
3   35   80000         Yes
4   32   75000          No
5   26   62000         Yes
6   60   30000          No
7   45   90000         Yes
8   41   70000          No
9   30   55000         Yes


### 4. Step-by-Step Decision Tree Implementation

We will manually build a decision tree based on our dummy dataset.
#### 4.1 Calculate Entropy

To begin, we need to calculate the initial entropy of our dataset.

In [2]:
import numpy as np

def calculate_entropy(y):
    class_labels = np.unique(y)
    entropy = 0
    for label in class_labels:
        p_label = np.sum(y == label) / len(y)
        entropy -= p_label * np.log2(p_label) if p_label > 0 else 0
    return entropy

# Calculate the initial entropy
initial_entropy = calculate_entropy(df['Likes_Fruit'])
print(f'Initial Entropy: {initial_entropy:.4f}')


Initial Entropy: 0.9710


### 4.2 Information Gain Calculation

Next, we need to calculate the information gain for each feature by splitting the dataset on each feature.

In [3]:
def information_gain(y, x_feature, threshold):
    parent_entropy = calculate_entropy(y)
    
    # Create masks for left and right splits
    left_mask = x_feature <= threshold
    right_mask = x_feature > threshold
    n = len(y)

    # Check if splits are possible
    if np.sum(left_mask) == 0 or np.sum(right_mask) == 0:
        return 0  # No split possible

    n_left, n_right = np.sum(left_mask), np.sum(right_mask)
    
    # Calculate child entropy
    child_entropy = (n_left / n) * calculate_entropy(y[left_mask]) + (n_right / n) * calculate_entropy(y[right_mask])
    
    # Calculate Information Gain
    return parent_entropy - child_entropy

# Calculate information gain for Age and Income
info_gain_age = information_gain(df['Likes_Fruit'], df['Age'], 35)
info_gain_income = information_gain(df['Likes_Fruit'], df['Income'], 70000)

print(f'Information Gain for Age: {info_gain_age:.4f}')
print(f'Information Gain for Income: {info_gain_income:.4f}')


Information Gain for Age: 0.2564
Information Gain for Income: 0.0200


### 4.3 Building the Tree

Based on the information gain, we can decide which feature to split on. In this case, we would choose Age since it has a higher information gain. We can recursively repeat this process until we reach the stopping criteria.

Here's how we would implement this step:

In [4]:
class DecisionTreeNode:
    def __init__(self, feature=None, threshold=None, left=None, right=None, output=None):
        self.feature = feature
        self.threshold = threshold
        self.left = left
        self.right = right
        self.output = output

def build_tree(X, y, depth=0, max_depth=3):
    # Check stopping conditions
    if len(np.unique(y)) == 1:
        return DecisionTreeNode(output=np.unique(y)[0])
    if depth >= max_depth:
        return DecisionTreeNode(output=np.bincount(y).argmax())

    best_feature, best_threshold, best_gain = None, None, -1

    for feature in ['Age', 'Income']:
        thresholds = np.unique(X[feature])
        for threshold in thresholds:
            gain = information_gain(y, X[feature], threshold)
            if gain > best_gain:
                best_gain = gain
                best_feature = feature
                best_threshold = threshold

    if best_gain == 0:
        return DecisionTreeNode(output=np.bincount(y).argmax())

    left_mask = X[best_feature] <= best_threshold
    right_mask = X[best_feature] > best_threshold

    left_child = build_tree(X[left_mask], y[left_mask], depth + 1, max_depth)
    right_child = build_tree(X[right_mask], y[right_mask], depth + 1, max_depth)

    return DecisionTreeNode(feature=best_feature, threshold=best_threshold, left=left_child, right=right_child)

# Build the decision tree
tree = build_tree(df[['Age', 'Income']], df['Likes_Fruit'])


### 4.4 Making Predictions

Now that we have built our decision tree, we can use it to make predictions for new data.

In [6]:
def predict(tree, x):
    if tree.output is not None:
        return tree.output
    if x[tree.feature] <= tree.threshold:
        return predict(tree.left, x)
    else:
        return predict(tree.right, x)

# Test prediction
test_data = {'Age': 28, 'Income': 65000}
prediction = predict(tree, test_data)
print(f'Prediction for {test_data}: {prediction}')


Prediction for {'Age': 28, 'Income': 65000}: Yes


In [7]:
from sklearn.tree import DecisionTreeClassifier, export_text

# Preparing the data
X = df[['Age', 'Income']]
y = df['Likes_Fruit']

# Fit the Decision Tree Classifier
classifier = DecisionTreeClassifier(random_state=42)
classifier.fit(X, y)

# Visualize the tree
tree_rules = export_text(classifier, feature_names=list(X.columns))
print(tree_rules)


|--- Age <= 31.00
|   |--- class: Yes
|--- Age >  31.00
|   |--- Income <= 77500.00
|   |   |--- class: No
|   |--- Income >  77500.00
|   |   |--- Income <= 105000.00
|   |   |   |--- class: Yes
|   |   |--- Income >  105000.00
|   |   |   |--- class: No

