# Decision Trees â€” Overfit vs Regularize

This notebook explores **model capacity** for trees.

**We will look at:**
- Depth vs performance
- Underfit vs overfit behavior
- Simple feature importance (with caveats)

**Goal:** understand how to control tree complexity in practice.

## 1. Setup and Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import f1_score

sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)

from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000,
    n_features=10,
    n_informative=5,
    n_redundant=2,
    random_state=42,
)

df = pd.DataFrame(X, columns=[f"feature_{i+1}" for i in range(X.shape[1])])
df["target"] = y

print("Shape:", df.shape)
print(df["target"].value_counts())
df.head()