## Exploring Ensemble Methods: Random Forest vs. AdaBoost vs. Bagging

In this demonstration project, we explore the implementation and comparison of three prominent ensemble learning methods: Random Forest, AdaBoost, and Bagging, using the `scikit-learn` library.

We can compare ensemble methods with a team of experts each making a prediction:
* **Random Forest** is like a council of experts each providing their opinion; their combined decision is typically better than any single expert.
* **AdaBoost** is like refining a complex decision by iteratively consulting a series of experts, where each new expert focuses on the errors of the previous ones.
* **Bagging** is like asking several experts to solve the problem independently and then combining their insights.

We employ a synthetic dataset generated by `make_classification` to train and evaluate each ensemble method, focusing on their ability to aggregate decisions from multiple models to achieve higher accuracy.

In [1]:
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, BaggingClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

In [2]:
# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, n_redundant=10, random_state=42)

# n_samples: the number of samples (rows) to generate
# n_features: the total number of features (columns) in the dataset.
# n_informative: the number of informative features, which are features that are actually useful for classifying a sample.
# n_redundant: the number of redundant features, which are linear combinations of the informative features.

In [3]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [5]:
# Initialize the ensemble models
random_forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
adaboost_model = AdaBoostClassifier(n_estimators=100, random_state=42)
bagging_model = BaggingClassifier(n_estimators=100, random_state=42)

# n_estimators is a parameter used in ensemble methods to specify the number of base estimators
#  (trees for Random Forest and Bagging, or the base classifiers for AdaBoost)
# that will be built and combined by the ensemble method


In [6]:
# Fit the models
random_forest_model.fit(X_train, y_train)
adaboost_model.fit(X_train, y_train)
bagging_model.fit(X_train, y_train)


In [7]:
# Make predictions
rf_predictions = random_forest_model.predict(X_test)
ab_predictions = adaboost_model.predict(X_test)
bg_predictions = bagging_model.predict(X_test)


In [8]:
# Evaluate the models
rf_accuracy = accuracy_score(y_test, rf_predictions)
ab_accuracy = accuracy_score(y_test, ab_predictions)
bg_accuracy = accuracy_score(y_test, bg_predictions)

# Print the accuracies
print(f'Random Forest Accuracy: {rf_accuracy:.4f}')
print(f'AdaBoost Accuracy: {ab_accuracy:.4f}')
print(f'Bagging Accuracy: {bg_accuracy:.4f}')

Random Forest Accuracy: 0.9333
AdaBoost Accuracy: 0.8933
Bagging Accuracy: 0.9200
