# Decision Trees: Advanced Tutorial

A **Decision Tree** is a supervised learning algorithm used for both classification and regression tasks.
It models decisions using a tree-like structure where internal nodes represent features, branches represent conditions, and leaves represent outcomes.

## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report, confusion_matrix

sns.set(style="whitegrid")


## 2. Load Dataset (Iris)

In [None]:
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name="target")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

X.head()


## 3. Train Decision Tree Classifier

In [None]:
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(X_train, y_train)


## 4. Visualize the Tree

In [None]:
plt.figure(figsize=(12, 8))
plot_tree(clf, filled=True, feature_names=X.columns, class_names=iris.target_names)
plt.title("Decision Tree Visualization")
plt.show()


## 5. Tree Logic (Text Output)

In [None]:
tree_rules = export_text(clf, feature_names=list(X.columns))
print(tree_rules)


## 6. Evaluate Model

In [None]:
y_pred = clf.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=iris.target_names))


## 7. Cross-Validation

In [None]:
scores = cross_val_score(clf, X, y, cv=5)
print("CV Accuracy:", np.round(scores, 3))
print("Mean Accuracy:", np.mean(scores).round(3))


## 8. Effect of Tree Depth

In [None]:
depths = range(1, 11)
mean_scores = []

for depth in depths:
    model = DecisionTreeClassifier(max_depth=depth, random_state=42)
    scores = cross_val_score(model, X, y, cv=5)
    mean_scores.append(scores.mean())

plt.plot(depths, mean_scores, marker='o')
plt.xlabel("Tree Depth")
plt.ylabel("CV Accuracy")
plt.title("Accuracy vs Tree Depth")
plt.show()


## 9. Summary

- Decision Trees are interpretable and easy to visualize
- Overfitting occurs with deep trees
- Tune `max_depth`, `min_samples_split`, etc.
- Work well for tabular data with mixed types