# Random Forest and Naive Bayes Classifiers

## ðŸ“š Learning Objectives

By completing this notebook, you will:
- Apply random forest classifiers to improve prediction accuracy
- Implement Naive Bayes for text classification tasks
- Evaluate classification models using confusion matrices, ROC curves
- Compare different classification algorithms

## ðŸ”— Prerequisites

- âœ… Understanding of classification concepts
- âœ… Python 3.8+ installed

---

## Official Structure Reference

This notebook covers practical activities from **Course 04, Unit 3**:
- Applying random forest classifiers to improve prediction accuracy
- Implementing Naive Bayes for text classification tasks like spam detection
- Evaluating classification models using confusion matrices, ROC curves
- **Source:** `DETAILED_UNIT_DESCRIPTIONS.md` - Unit 3 Practical Content


In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc, roc_auc_score
from sklearn.feature_extraction.text import CountVectorizer
import matplotlib.pyplot as plt_
print("âœ… Libraries imported successfully!")


## Part 1: Random Forest Classification


In [None]:
# Generate classification dataset
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                          n_redundant=5, n_classes=2, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Random Forest Classifier_rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)_rf_pred =  rf.predict(X_test)
rf_pred = rf.predict(X_test)_rf_pred_proba =  rf.predict_proba(X_test)[:, 1]
rf_pred_proba = rf.predict_proba(X_test)[:, 1]

print("=" * 60)
print("Random Forest Classifier:")
print("=" * 60)
print(f"Accuracy: {rf.score(X_test, y_test):.4f}")
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, rf_pred))
print("\nClassification Report:")
print(classification_report(y_test, rf_pred))
print(f"\nROC AUC Score: {roc_auc_score(y_test, rf_pred_proba):.4f}")


## Part 2: Naive Bayes for Text Classification (Spam Detection)


In [None]:
# Example: Spam detection with Naive Bayes
# Simulated email data_emails = [
    "Free money click here now",
    "Meeting tomorrow at 3pm",
    "You won a prize claim it",
    "Project update attached",
    "Buy now limited offer",
    "Team meeting cancelled",
    "Special discount only today",
    "Please review the document"
]

labels = [1, 0, 1, 0, 1, 0, 1, 0]  # 1 = spam, 0 = ham

# Vectorize text_vectorizer = CountVectorizer()
X_email = vectorizer.fit_transform(emails)

# Train Naive Bayes_nb = MultinomialNB()
nb.fit(X_email, labels)

# Predict_test_emails = ["Free gift today", "Schedule meeting"]
X_test_email = vectorizer.transform(test_emails)_predictions =  nb.predict(X_test_email)
predictions = nb.predict(X_test_email)_probabilities =  nb.predict_proba(X_test_email)
probabilities = nb.predict_proba(X_test_email)
print("=" * 60)
print("Naive Bayes Text Classification (Spam Detection):")
print("=" * 60)
for email, pred, prob in zip(test_emails, predictions, probabilities):
    spam_prob = prob[1]_result =  "SPAM" if pred == 1 else "HAM"
    result = "SPAM" if pred == 1 else "HAM"
    print(f"Email: '{email}'")
print(f"  Prediction: {result} (Spam probability: {spam_prob:.4f})")
    print()


## Part 3: Evaluation Metrics - Confusion Matrix and ROC Curve


In [None]:
# ROC Curve for Random Forest
fpr, tpr, thresholds = roc_curve(y_test, rf_pred_proba)_roc_auc =  auc(fpr, tpr)
roc_auc = auc(fpr, tpr)
print("=" * 60)
print("ROC Curve Analysis:")
print("=" * 60)
print(f"ROC AUC Score: {roc_auc:.4f}")
print(f"True Positive Rate at FPR=0.2: {np.interp(0.2, fpr, tpr):.4f}")

# Confusion matrix interpretation_cm = confusion_matrix(y_test, rf_pred)
tn, fp, fn, tp = cm.ravel()

print("\nConfusion Matrix Interpretation:")
print(f"True Negatives (TN): {tn}")
print(f"False Positives (FP): {fp}")
print(f"False Negatives (FN): {fn}")
print(f"True Positives (TP): {tp}")
print(f"\nPrecision: {tp/(tp+fp):.4f}")
print(f"Recall: {tp/(tp+fn):.4f}")
print(f"F1-Score: {2*tp/(2*tp+fp+fn):.4f}")


## Summary

### Key Concepts:
1. **Random Forest**: Ensemble of decision trees, reduces overfitting
2. **Naive Bayes**: Probabilistic classifier, excellent for text classification
3. **Confusion Matrix**: Shows TP, TN, FP, FN for detailed evaluation
4. **ROC Curve**: Plots TPR vs FPR, AUC score summarizes performance

### Applications:
- Random Forest: General classification tasks
- Naive Bayes: Spam detection, text classification, document categorization

**Reference:** Course 04, Unit 3: "Applying random forest classifiers" and "Implementing Naive Bayes for text classification"
