# Experiment 9: Implementing the Random Forest Algorithm Using Python

## Aim
To implement the Random Forest algorithm using Python for a classification problem and analyze its performance.

## Objectives
- Understand the working of the Random Forest algorithm.
- Implement Random Forest using Python's `sklearn` library.
- Evaluate and visualize the model's performance on a classification dataset.

## Tools Used
- **scikit-learn**: For implementing the Random Forest algorithm and evaluating the model.
- **Matplotlib** and **Seaborn**: For visualizations.

## Implementation

### Step 1: Import Libraries
```python
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
```

### Step 2: Create a Sample Dataset
```python
from sklearn.datasets import make_classification

# Generate synthetic data
X, y = make_classification(
    n_samples=500, n_features=5, n_informative=3, n_redundant=0, n_classes=2, random_state=42
)

# Visualize class distribution
sns.countplot(x=y)
plt.title("Class Distribution")
plt.xlabel("Class")
plt.ylabel("Count")
plt.show()
```

### Step 3: Split the Dataset
```python
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

### Step 4: Initialize and Train the Random Forest Classifier
```python
# Initialize the Random Forest classifier
rf_model = RandomForestClassifier(n_estimators=100, max_depth=None, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)
```

### Step 5: Evaluate the Model
```python
# Predict on the test set
y_pred = rf_model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=["Class 0", "Class 1"], yticklabels=["Class 0", "Class 1"])
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()
```

### Step 6: Feature Importance
```python
# Feature importance
feature_importances = rf_model.feature_importances_

# Plot feature importance
plt.bar(range(X.shape[1]), feature_importances, color='skyblue')
plt.title("Feature Importance")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.show()
```

### Step 7: Visualize Decision Boundaries
```python
from matplotlib.colors import ListedColormap

# Decision boundary visualization
def plot_decision_boundary(model, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))

    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.8, cmap=ListedColormap(['#FFAAAA', '#AAAAFF']))
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=ListedColormap(['#FF0000', '#0000FF']))
    plt.title("Decision Boundary")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.show()

# Plot decision boundary for the first two features
plot_decision_boundary(rf_model, X[:, :2], y)
```

### Step 8: Summary and Observations
```python
print("\nSummary:")
print("1. Random Forest was successfully implemented for classification.")
print("2. The model achieved an accuracy of {:.2f}% on the test data.".format(accuracy * 100))
print("3. Important features contributing to the classification were identified and visualized.")
print("4. Decision boundaries demonstrate the model's ability to classify data effectively.")
