# Classification and Regression Tree (CART): Decision Tree

## Introduction to Decision Trees
Decision Trees are supervised learning models that split data into regions by asking a sequence of if/else questions on features. Each internal node applies a rule (e.g., petal length ≤ 2.45), and leaves hold final predictions (a class label for classification or a value for regression).

- Greedy construction: At each node, choose the best split to reduce impurity (classification) or variance (regression).
- Advantages: Interpretable, handles mixed feature types, little preprocessing needed, captures nonlinearities and interactions.
- Limitations: Prone to overfitting without constraints; unstable to small data changes; axis-aligned splits only.

## Interpreting a Decision Tree
- Root: The first, most informative split in the dataset.
- Internal node: A rule “feature ≤ threshold”. Left branch is True; right branch is False.
- Leaf: Final decision. 
  - Classification: distribution of classes, predicted class, impurity (Gini/Entropy), samples.
  - Regression: predicted value and number of samples.
- Depth: Path length from root to leaf. Shallower trees generalize better; deeper trees may overfit.

In [None]:
# Conceptual: A simple 1D split demonstrating a tree partition
import numpy as np
import matplotlib.pyplot as plt

# Synthetic 1D data and labels by a threshold rule
np.random.seed(42)
X1 = np.linspace(0, 10, 80)
y1 = (X1 > 5).astype(int)  # true rule: class 0 for <=5, class 1 for >5

# Add noise to show imperfect separability
X1_noisy = X1 + np.random.normal(0, 0.5, size=X1.shape)

# Evaluate impurity (Gini) for candidate thresholds
thresholds = np.linspace(1, 9, 17)

def gini_of_split(X, y, t):
    left = y[X <= t]
    right = y[X > t]
    def gini(group):
        if len(group) == 0: return 0.0
        p = np.mean(group)
        return 2 * p * (1 - p)
    n = len(y)
    return (len(left)/n) * gini(left) + (len(right)/n) * gini(right)

scores = [gini_of_split(X1_noisy, y1, t) for t in thresholds]
best_t = thresholds[np.argmin(scores)]

print(f"Best threshold by Gini: {best_t:.2f} (lower is better)")

# Visualize split
plt.figure(figsize=(8,3))
plt.scatter(X1_noisy, y1, c=y1, cmap='coolwarm', s=40, alpha=0.8)
plt.axvline(best_t, color='black', linestyle='--', label=f'Threshold = {best_t:.2f}')
plt.yticks([0,1], ['Class 0','Class 1'])
plt.xlabel('Feature X1 (noisy)')
plt.title('Conceptual 1D Split (Gini-optimal Threshold)')
plt.legend(); plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Decision Tree Classification in Python (Iris)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns

iris = load_iris()
X, y = iris.data, iris.target
feature_names = iris.feature_names
class_names = iris.target_names

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

clf = DecisionTreeClassifier(
    criterion='gini',
    max_depth=3,      # limit depth for interpretability
    random_state=42
)
clf.fit(X_train, y_train)

# Evaluate
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.3f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=class_names))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(4,3))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix - Iris Decision Tree')
plt.xlabel('Predicted'); plt.ylabel('True')
plt.tight_layout(); plt.show()

# Visualize tree
plt.figure(figsize=(12,6))
plot_tree(clf, feature_names=feature_names, class_names=class_names,
          filled=True, rounded=True, impurity=True)
plt.title('Decision Tree (Iris, depth=3)')
plt.show()

### Interpreting the Decision Tree Classification Output
- We train a CART classifier (Gini) with max_depth=3 on the Iris dataset and report accuracy and a full classification report.
- The confusion matrix shows which classes are correctly/incorrectly predicted. Diagonal cells are correct predictions; off-diagonals indicate misclassifications.
- The plotted tree shows:
  - Each node’s splitting rule (feature and threshold)
  - Samples, class distribution, and Gini impurity at the node
  - Leaf nodes’ predicted class (color-coded)
- Try adjusting `max_depth`, `min_samples_split`, or `criterion='entropy'` to see how complexity impacts accuracy and interpretability.

In [None]:
# Decision Tree Regression in Python (Diabetes)
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

import numpy as np

# Load regression dataset
X, y = load_diabetes(return_X_y=True)
feature_names = load_diabetes().feature_names

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# Model: limit depth to reduce overfitting
regr = DecisionTreeRegressor(max_depth=4, random_state=42)
regr.fit(X_train, y_train)

# Predict and evaluate
y_pred = regr.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {rmse:.3f}")
print(f"R^2: {r2:.3f}")

# Feature importance plot
importances = regr.feature_importances_
indices = np.argsort(importances)[::-1]

plt.figure(figsize=(8,4))
plt.bar(range(len(importances)), importances[indices], align='center')
plt.xticks(range(len(importances)), np.array(feature_names)[indices], rotation=45, ha='right')
plt.title('Feature Importances - Decision Tree Regressor (Diabetes)')
plt.tight_layout(); plt.show()

# Predicted vs True
plt.figure(figsize=(5,4))
plt.scatter(y_test, y_pred, alpha=0.7)
lims = [min(y_test.min(), y_pred.min()), max(y_test.max(), y_pred.max())]
plt.plot(lims, lims, 'r--', label='Ideal')
plt.xlabel('True'); plt.ylabel('Predicted')
plt.legend(); plt.title('True vs Predicted - Decision Tree Regressor')
plt.tight_layout(); plt.show()

### Interpreting the Decision Tree Regression Output
- We fit a `DecisionTreeRegressor` with a limited `max_depth` to control variance.
- RMSE quantifies average prediction error; lower is better. R^2 indicates the fraction of variance explained by the model.
- The feature importance bar chart shows which features the tree used most in splits.
- The True vs Predicted scatter should cluster near the red dashed identity line; deviations indicate error.
- Try tuning `max_depth`, `min_samples_leaf`, and using cross-validation to balance bias-variance.