# Decision Tree

A decision tree is a tree-like structure used for both classification and regression tasks. In this model, the internal nodes represent features or attributes, branches represent decisions based on the values of these attributes, and leaves represent the final outcomes or target values.

## History

Decision trees have been used for decades in various fields, including data mining, pattern recognition, and artificial intelligence. Some of the earliest algorithms for decision tree construction include the Iterative Dichotomiser 3 (ID3), Classification and Regression Trees (CART), and Chi-squared Automatic Interaction Detection (CHAID).

## Mathematical Concepts

Decision trees use various mathematical concepts to determine the best splits at each node. Some of the most common criteria for measuring the quality of a split are:

- Gini impurity: Measures the impurity or diversity of the samples in a node.
- Information gain: Measures the reduction in entropy (disorder) after splitting the node.
- Variance reduction: Measures the reduction in variance for regression tasks.

## Learning Algorithm

The learning algorithm for decision trees involves recursively partitioning the data into subsets based on the values of the input features. The main steps are:

1. Start with the entire dataset at the root node.
2. Select the best attribute to split the data based on a splitting criterion (e.g., Gini impurity, information gain, or variance reduction).
3. Create child nodes for each value of the selected attribute.
4. Recursively apply steps 2 and 3 to each child node until a stopping criterion is met (e.g., maximum tree depth, minimum samples per leaf, or no further improvement in the splitting criterion).

## Pros and Cons

**Pros:**

- Easy to understand, interpret, and visualize.
- Can handle both numerical and categorical data.
- Requires little data preprocessing (e.g., no need for scaling or normalization).
- Can be easily combined with other models to create ensemble methods (e.g., Random Forest, Gradient Boosting).

**Cons:**

- Prone to overfitting, especially when the tree is deep or has many leaves.
- Can be unstable, as small changes in the data may lead to completely different trees.
- May not perform well on datasets with complex relationships or high dimensionality.
- Can be computationally expensive for large datasets.

## Suitable Tasks and Datasets

Decision trees can be applied to a wide range of classification and regression tasks where the relationship between the input features and the target variable can be modeled using a tree-like structure. Some examples include:

- Diagnosing diseases based on symptoms.
- Predicting customer churn.
- Determining the optimal marketing strategy for a product.

## References

1. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
2. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC press.
3. Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 29(2), 119-127.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

# Helper function to calculate entropy
def entropy(y):
    _, counts = np.unique(y, return_counts=True)
    probabilities = counts / len(y)
    return -np.sum(probabilities * np.log2(probabilities))

class DecisionTreeNode:
    def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
        self.feature = feature
        self.threshold = threshold
        self.left = left
        self.right = right
        self.value = value

class DecisionTree:
    def __init__(self, min_samples_split=2, max_depth=None):
        self.min_samples_split = min_samples_split
        self.max_depth = max_depth

    def fit(self, X, y):
        self.root = self._build_tree(X, y, depth=0)

    def predict(self, X):
        return [self._predict_sample(x, self.root) for x in X]

    def _build_tree(self, X, y, depth):
        n_samples, n_features = X.shape
        n_labels = len(np.unique(y))

        # Stopping criteria
        if (n_samples < self.min_samples_split
                or n_labels == 1
                or (self.max_depth is not None and depth >= self.max_depth)):
            value = np.argmax(np.bincount(y))
            return DecisionTreeNode(value=value)

        # Find the best split
        best_feature, best_threshold = self._find_best_split(X, y)
        left_indices, right_indices = self._split(X[:, best_feature], best_threshold)

        # Recursively build the left and right subtrees
        left = self._build_tree(X[left_indices, :], y[left_indices], depth + 1)
        right = self._build_tree(X[right_indices, :], y[right_indices], depth + 1)

        return DecisionTreeNode(feature=best_feature, threshold=best_threshold, left=left, right=right)

    def _find_best_split(self, X, y):
        n_samples, n_features = X.shape
        best_feature, best_threshold, best_info_gain = None, None, 0

        for feature in range(n_features):
            thresholds = np.unique(X[:, feature])
            for threshold in thresholds:
                left_indices, right_indices = self._split(X[:, feature], threshold)
                left_entropy = entropy(y[left_indices])
                right_entropy = entropy(y[right_indices])
                info_gain = entropy(y) - (len(left_indices) * left_entropy + len(right_indices) * right_entropy) / n_samples

                if info_gain > best_info_gain:
                    best_feature, best_threshold, best_info_gain = feature, threshold, info_gain

        return best_feature, best_threshold

    def _split(self, X_feature, threshold):
        left_indices = np.where(X_feature <= threshold)[0]
        right_indices = np.where(X_feature > threshold)[0]
        return left_indices, right_indices

    def _predict_sample(self, x, node):
        if node.value is not None:
            return node.value
        if x[node.feature] <= node.threshold:
            return self._predict_sample(x, node.left)
        return self._predict_sample(x, node.right)

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the decision tree model
model = DecisionTree(min_samples_split=2, max_depth=3)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the accuracy and confusion matrix
accuracy = accuracy_score(y_test, y_pred)
conf_mat = confusion_matrix(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:")
print(conf_mat)

# Visualize the decision tree using the print_tree function
def print_tree(node, feature_names, class_names, prefix=""):
    if node.value is not None:
        class_name = class_names[node.value]
        print(f"{prefix}Class: {class_name}")
    else:
        feature_name = feature_names[node.feature]
        threshold = node.threshold
        print(f"{prefix}{feature_name} <= {threshold:.2f}")
        print_tree(node.left, feature_names, class_names, prefix + "\t")
        print(f"{prefix}{feature_name} > {threshold:.2f}")
        print_tree(node.right, feature_names, class_names, prefix + "\t")

print_tree(model.root, iris.feature_names, iris.target_names)


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.tree import DecisionTreeClassifier, plot_tree

class DecisionTree:
    def __init__(self, max_depth=None):
        self.max_depth = max_depth
        self.tree = None

    def fit(self, X, y):
        self.tree = DecisionTreeClassifier(criterion='entropy', max_depth=self.max_depth)
        self.tree.fit(X, y)

    def predict(self, X):
        return self.tree.predict(X)

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the decision tree model
model = DecisionTree(max_depth=3)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the accuracy and confusion matrix
accuracy = accuracy_score(y_test, y_pred)
conf_mat = confusion_matrix(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:")
print(conf_mat)

# Visualize the decision tree
plt.figure(figsize=(12, 8))
plot_tree(model.tree, filled=True, feature_names=iris.feature_names, class_names=iris.target_names)
plt.show()
