# Decision Trees - Hands-On Tutorial

This notebook provides a comprehensive guide to Decision Trees with practical examples.

## Contents
1. Import Libraries
2. Load and Explore Dataset
3. Data Preprocessing
4. Decision Tree Classifier
5. Visualizing the Tree
6. Feature Importance
7. Hyperparameter Tuning
8. Preventing Overfitting
9. Decision Tree Regressor
10. Implementation from Scratch

## 1. Import Libraries

In [None]:
# Data manipulation
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor, plot_tree
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder

# Settings
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')
%matplotlib inline

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

## 2. Load and Explore Dataset

We'll use the Iris dataset for classification.

In [None]:
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target
df['species_name'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

print("Dataset Shape:", df.shape)
df.head()

In [None]:
# Dataset information
print("Dataset Info:")
print(df.info())
print("\nClass Distribution:")
print(df['species_name'].value_counts())

In [None]:
# Statistical summary
df.describe()

In [None]:
# Visualize feature distributions
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.ravel()

for idx, col in enumerate(iris.feature_names):
    for species in df['species_name'].unique():
        subset = df[df['species_name'] == species]
        axes[idx].hist(subset[col], alpha=0.6, label=species, bins=15)
    axes[idx].set_xlabel(col)
    axes[idx].set_ylabel('Frequency')
    axes[idx].legend()
    axes[idx].set_title(f'Distribution of {col}')

plt.tight_layout()
plt.show()

## 3. Data Preprocessing

In [None]:
# Separate features and target
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"\nClass distribution in training set:")
print(pd.Series(y_train).value_counts().sort_index())

## 4. Decision Tree Classifier

### 4.1 Train a Simple Decision Tree

In [None]:
# Create and train decision tree
dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train, y_train)

# Make predictions
y_pred = dt_classifier.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
print(f"\nTree Depth: {dt_classifier.get_depth()}")
print(f"Number of Leaves: {dt_classifier.get_n_leaves()}")

In [None]:
# Detailed classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix - Decision Tree')
plt.show()

## 5. Visualizing the Tree

One of the biggest advantages of Decision Trees is their interpretability!

In [None]:
# Visualize the decision tree
plt.figure(figsize=(20, 10))
plot_tree(dt_classifier, 
          feature_names=iris.feature_names,
          class_names=iris.target_names,
          filled=True,
          rounded=True,
          fontsize=10)
plt.title('Decision Tree Visualization', fontsize=16, fontweight='bold')
plt.show()

### Understanding the Tree Visualization

- **Top node (root)**: First decision based on most informative feature
- **Gini**: Measure of impurity (0 = pure, 0.5 = maximum impurity for binary)
- **Samples**: Number of samples at this node
- **Value**: Number of samples per class
- **Class**: Predicted class at this node

## 6. Feature Importance

Decision Trees provide feature importance scores.

In [None]:
# Get feature importances
importances = dt_classifier.feature_importances_
feature_importance_df = pd.DataFrame({
    'Feature': iris.feature_names,
    'Importance': importances
}).sort_values('Importance', ascending=False)

print("Feature Importance:")
print(feature_importance_df)

In [None]:
# Visualize feature importance
plt.figure(figsize=(10, 6))
plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'])
plt.xlabel('Importance')
plt.title('Feature Importance - Decision Tree')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

## 7. Hyperparameter Tuning

### 7.1 Important Hyperparameters

- **max_depth**: Maximum depth of the tree
- **min_samples_split**: Minimum samples required to split a node
- **min_samples_leaf**: Minimum samples required in a leaf node
- **max_features**: Number of features to consider for best split
- **criterion**: Splitting criterion ('gini' or 'entropy')

In [None]:
# Grid Search for best hyperparameters
param_grid = {
    'max_depth': [2, 3, 4, 5, 6, 7, 8, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'criterion': ['gini', 'entropy']
}

grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), 
                          param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

print("Best Parameters:")
print(grid_search.best_params_)
print(f"\nBest Cross-Validation Score: {grid_search.best_score_:.4f}")

In [None]:
# Evaluate best model
best_dt = grid_search.best_estimator_
y_pred_best = best_dt.predict(X_test)

print(f"Test Accuracy: {accuracy_score(y_test, y_pred_best):.4f}")
print(f"Tree Depth: {best_dt.get_depth()}")
print(f"Number of Leaves: {best_dt.get_n_leaves()}")

## 8. Preventing Overfitting

### 8.1 Comparing Different max_depth Values

In [None]:
# Compare different max_depth values
depths = range(1, 15)
train_scores = []
test_scores = []

for depth in depths:
    dt = DecisionTreeClassifier(max_depth=depth, random_state=42)
    dt.fit(X_train, y_train)
    
    train_scores.append(dt.score(X_train, y_train))
    test_scores.append(dt.score(X_test, y_test))

# Plot
plt.figure(figsize=(10, 6))
plt.plot(depths, train_scores, marker='o', label='Training Score', linewidth=2)
plt.plot(depths, test_scores, marker='s', label='Test Score', linewidth=2)
plt.xlabel('Max Depth')
plt.ylabel('Accuracy')
plt.title('Training vs Test Accuracy by Tree Depth')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Optimal max_depth: {depths[np.argmax(test_scores)]}")

### 8.2 Cross-Validation

In [None]:
# Cross-validation scores
dt_cv = DecisionTreeClassifier(max_depth=3, random_state=42)
cv_scores = cross_val_score(dt_cv, X_train, y_train, cv=5, scoring='accuracy')

print(f"Cross-Validation Scores: {cv_scores}")
print(f"Mean CV Score: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")

## 9. Decision Tree Regressor

Decision Trees can also be used for regression tasks.

In [None]:
from sklearn.datasets import fetch_california_housing

# Load regression dataset
housing = fetch_california_housing()
X_reg = housing.data[:1000]  # Use subset for faster training
y_reg = housing.target[:1000]

# Split data
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.2, random_state=42
)

# Train decision tree regressor
dt_regressor = DecisionTreeRegressor(max_depth=5, random_state=42)
dt_regressor.fit(X_train_reg, y_train_reg)

# Predictions
y_pred_reg = dt_regressor.predict(X_test_reg)

# Evaluate
mse = mean_squared_error(y_test_reg, y_pred_reg)
rmse = np.sqrt(mse)
r2 = r2_score(y_test_reg, y_pred_reg)

print(f"RÂ² Score: {r2:.4f}")
print(f"RMSE: {rmse:.4f}")

In [None]:
# Visualize predictions
plt.figure(figsize=(10, 6))
plt.scatter(y_test_reg, y_pred_reg, alpha=0.6)
plt.plot([y_test_reg.min(), y_test_reg.max()], 
         [y_test_reg.min(), y_test_reg.max()], 'r--', lw=2)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Decision Tree Regressor: Actual vs Predicted')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 10. Implementation from Scratch

Understanding how Decision Trees work internally.

In [None]:
class SimpleDecisionTree:
    """Simple Decision Tree implementation for educational purposes."""
    
    def __init__(self, max_depth=5):
        self.max_depth = max_depth
        self.tree = None
    
    def gini_impurity(self, y):
        """Calculate Gini impurity."""
        _, counts = np.unique(y, return_counts=True)
        probabilities = counts / len(y)
        return 1 - np.sum(probabilities ** 2)
    
    def split_data(self, X, y, feature_idx, threshold):
        """Split data based on feature and threshold."""
        left_mask = X[:, feature_idx] < threshold
        right_mask = ~left_mask
        return X[left_mask], X[right_mask], y[left_mask], y[right_mask]
    
    def find_best_split(self, X, y):
        """Find the best feature and threshold to split on."""
        best_gini = float('inf')
        best_feature = None
        best_threshold = None
        
        n_features = X.shape[1]
        
        for feature_idx in range(n_features):
            thresholds = np.unique(X[:, feature_idx])
            
            for threshold in thresholds:
                _, _, y_left, y_right = self.split_data(X, y, feature_idx, threshold)
                
                if len(y_left) == 0 or len(y_right) == 0:
                    continue
                
                # Weighted Gini impurity
                n = len(y)
                gini = (len(y_left) / n) * self.gini_impurity(y_left) + \
                       (len(y_right) / n) * self.gini_impurity(y_right)
                
                if gini < best_gini:
                    best_gini = gini
                    best_feature = feature_idx
                    best_threshold = threshold
        
        return best_feature, best_threshold
    
    def build_tree(self, X, y, depth=0):
        """Recursively build the decision tree."""
        # Stopping criteria
        if depth >= self.max_depth or len(np.unique(y)) == 1:
            return {'class': np.bincount(y).argmax()}
        
        # Find best split
        feature_idx, threshold = self.find_best_split(X, y)
        
        if feature_idx is None:
            return {'class': np.bincount(y).argmax()}
        
        # Split data
        X_left, X_right, y_left, y_right = self.split_data(X, y, feature_idx, threshold)
        
        # Build subtrees
        return {
            'feature': feature_idx,
            'threshold': threshold,
            'left': self.build_tree(X_left, y_left, depth + 1),
            'right': self.build_tree(X_right, y_right, depth + 1)
        }
    
    def fit(self, X, y):
        """Fit the decision tree."""
        self.tree = self.build_tree(X, y)
        return self
    
    def predict_single(self, x, tree):
        """Predict a single sample."""
        if 'class' in tree:
            return tree['class']
        
        if x[tree['feature']] < tree['threshold']:
            return self.predict_single(x, tree['left'])
        else:
            return self.predict_single(x, tree['right'])
    
    def predict(self, X):
        """Predict multiple samples."""
        return np.array([self.predict_single(x, self.tree) for x in X])

# Test custom implementation
custom_dt = SimpleDecisionTree(max_depth=3)
custom_dt.fit(X_train, y_train)
y_pred_custom = custom_dt.predict(X_test)

print(f"Custom Decision Tree Accuracy: {accuracy_score(y_test, y_pred_custom):.4f}")

## Summary

### What You Learned:

âœ… **Decision Tree Basics**
- How decision trees split data
- Gini impurity and entropy
- Tree visualization and interpretation

âœ… **Practical Skills**
- Training decision tree classifiers and regressors
- Feature importance analysis
- Hyperparameter tuning
- Preventing overfitting

âœ… **Advanced Topics**
- Cross-validation
- Grid search for optimal parameters
- Implementation from scratch

### Key Takeaways:

1. **Interpretability**: Decision trees are easy to understand and visualize
2. **No Feature Scaling**: Don't need to normalize/standardize features
3. **Overfitting Risk**: Prone to overfitting without proper constraints
4. **Pruning**: Use max_depth, min_samples_split to control complexity
5. **Versatile**: Works for both classification and regression

### Next Steps:

1. Try with different datasets (Titanic, Wine Quality, etc.)
2. Experiment with different hyperparameters
3. Compare with Random Forest (ensemble of trees)
4. Learn about pruning techniques
5. Explore gradient boosting algorithms

**Happy Learning! ðŸŒ³**