
# Decision Tree – Binary Classification

This notebook demonstrates a decision tree classifier trained on synthetic data with 4 features. We visualize the tree structure and decision performance.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier, plot_tree

In [None]:
# Generate synthetic data with 4 features
X, y = make_classification(
    n_samples=100, n_features=4, n_informative=3, n_redundant=1,
    n_clusters_per_class=1, class_sep=1.5, random_state=42
)
df = pd.DataFrame(X, columns=["Feature1", "Feature2", "Feature3", "Feature4"])
df["Label"] = y
df.head()

In [None]:
# Train decision tree with depth=5
model = DecisionTreeClassifier(max_depth=5, random_state=42)
model.fit(X, y)

In [None]:
# Plot decision tree structure
plt.figure(figsize=(8, 6))
plot_tree(
    model, 
    feature_names=["Feature1", "Feature2", "Feature3", "Feature4"],
    class_names=["No", "Yes"], 
    filled=True
)
plt.title("Decision Tree Structure (Depth = 5, 4 Features)")
plt.show()

FYI, Gini impurity measures the probability of incorrectly classifying a randomly chosen element from a node if it were labeled according to the distribution of labels within that node. (Gini = 1 - Σ (p_i)^2)

A Gini impurity of 0 indicates perfect purity, meaning all samples in the node belong to the same class.