# 📅 Day 10: Decision Trees
Learn how to build and visualize a Decision Tree classifier on the Breast Cancer dataset.

## 🎯 Objective
- Understand the concept of Decision Trees
- Train a Decision Tree classifier
- Visualize the trained tree
- Experiment with hyperparameters

## 🌳 What is a Decision Tree?
A Decision Tree is a flow‑chart‑like structure where each internal node represents a decision on a feature, each branch represents an outcome of the decision, and each leaf node represents a class label.

### 🔑 Key Concepts
- **Splitting Criteria**: Gini Impurity (default) or Entropy/Information Gain
- **Overfitting**: Very deep trees may overfit; control with `max_depth`, pruning, etc.
- **Visualization**: Trees are easy to interpret once plotted.

## 📦 Step 1 – Load & Prepare Data

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Load dataset
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Train‑test split
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)
df.head()

## 🤖 Step 2 – Train a Decision Tree Classifier

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

tree_clf = DecisionTreeClassifier(max_depth=4, random_state=42)
tree_clf.fit(X_train_scaled, y_train)
y_pred = tree_clf.predict(X_test_scaled)

print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=data.target_names))

## 🖼 Step 3 – Visualize the Decision Tree

In [None]:
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(20, 10))
plot_tree(tree_clf, filled=True, feature_names=data.feature_names, class_names=data.target_names)
plt.show()

## 🔄 Step 4 – Experiment with Hyperparameters

In [None]:
# Try a shallower tree
shallow_tree = DecisionTreeClassifier(max_depth=2, criterion='entropy', random_state=42)
shallow_tree.fit(X_train_scaled, y_train)
shallow_pred = shallow_tree.predict(X_test_scaled)
print('Shallow Tree Accuracy:', accuracy_score(y_test, shallow_pred))

## ✅ Summary
- Decision Trees split data based on feature thresholds.
- Control depth & criteria to manage bias‑variance trade‑off.
- Trees are highly interpretable and form the basis for ensemble methods like Random Forests.