# Random Forest: Advanced Tutorial

**Random Forest** is an ensemble learning method that builds multiple decision trees and merges their results to improve accuracy and reduce overfitting.
It is widely used for classification and regression tasks.

## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

sns.set(style='whitegrid')


## 2. Load and Prepare Dataset

In [None]:
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

X.head()


## 3. Train Random Forest Model

In [None]:
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)


## 4. Evaluate the Model

In [None]:
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


## 5. Feature Importances

In [None]:
importances = model.feature_importances_
indices = np.argsort(importances)[::-1]
features = X.columns

plt.figure(figsize=(10, 6))
sns.barplot(x=importances[indices], y=features[indices])
plt.title("Feature Importances from Random Forest")
plt.show()


## 6. Cross-Validation and Tuning

In [None]:
cv_scores = cross_val_score(model, X, y, cv=5)
print("CV Accuracy Scores:", np.round(cv_scores, 3))
print("Mean CV Accuracy:", round(np.mean(cv_scores), 3))


## 7. Summary

- Random Forest reduces overfitting by averaging trees
- Each tree is built from a random subset of features and data
- Supports feature importance and works well on tabular data
- Tune `n_estimators`, `max_depth`, and `max_features` for better results