# Experiment 8: Implementing the AdaBoost Algorithm and Boosting the Ensemble Technique Using Python

## Aim
To implement the AdaBoost algorithm using Python and demonstrate the boosting ensemble technique on a classification problem.

## Objectives
- Understand the AdaBoost algorithm and how it improves model performance.
- Implement AdaBoost using Python's `sklearn` library.
- Evaluate and visualize the performance of the ensemble model.

## Tools Used
- **scikit-learn**: For implementing AdaBoost and evaluating the model.
- **Matplotlib** and **Seaborn**: For data visualization.

## Implementation

### Step 1: Import Libraries
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
```

### Step 2: Create a Sample Dataset
```python
from sklearn.datasets import make_classification

# Generate synthetic data
X, y = make_classification(
    n_samples=500, n_features=5, n_informative=3, n_redundant=0, n_classes=2, random_state=42
)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Visualize class distribution
sns.countplot(x=y)
plt.title("Class Distribution")
plt.xlabel("Class")
plt.ylabel("Count")
plt.show()
```

### Step 3: Initialize and Train the AdaBoost Classifier
```python
# Base estimator: Decision Tree with max depth of 1 (stump)
base_estimator = DecisionTreeClassifier(max_depth=1)

# AdaBoost Classifier
adaboost = AdaBoostClassifier(base_estimator=base_estimator, n_estimators=50, learning_rate=1.0, random_state=42)

# Train the model
adaboost.fit(X_train, y_train)
```

### Step 4: Evaluate the Model
```python
# Predict on test data
y_pred = adaboost.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=["Class 0", "Class 1"], yticklabels=["Class 0", "Class 1"])
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()
```

### Step 5: Visualize Feature Importance
```python
# Feature importance
feature_importances = adaboost.feature_importances_

# Plot feature importance
plt.bar(range(X.shape[1]), feature_importances, color='skyblue')
plt.title("Feature Importance")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.show()
```

### Step 6: Visualize Decision Boundaries
```python
from matplotlib.colors import ListedColormap

# Decision boundary visualization
def plot_decision_boundary(model, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))

    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.8, cmap=ListedColormap(['#FFAAAA', '#AAAAFF']))
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=ListedColormap(['#FF0000', '#0000FF']))
    plt.title("Decision Boundary")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.show()

# Plot decision boundary for the first two features
plot_decision_boundary(adaboost, X[:, :2], y)
```

### Step 7: Summary and Observations
```python
print("\nSummary:")
print("1. AdaBoost was implemented using DecisionTreeClassifier as the base estimator.")
print("2. The model achieved an accuracy of {:.2f}% on the test data.".format(accuracy * 100))
print("3. Visualizations, including feature importance and decision boundaries, were created to better understand the model's behavior.")
