Ensemble learning can be applied to any dataset to improve model performance. Here are three general ensemble learning techniques:


1. Bagging (Bootstrap Aggregating)
Works well to reduce variance and prevent overfitting.
Common algorithm: Random Forest
# Suitable for both classification and regression tasks.

In [None]:


from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Bagging Accuracy:", accuracy_score(y_test, y_pred))




2. Boosting
Focuses on reducing bias by training models sequentially.

Common algorithms: AdaBoost, Gradient Boosting (GBM), XGBoost, LightGBM, CatBoost

Works well with imbalanced datasets and structured data.


Implementation using XGBoost:

In [None]:

from xgboost import XGBClassifier

# Train XGBoost model
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Boosting Accuracy:", accuracy_score(y_test, y_pred))


3. Stacking (Stacked Generalization)
Combines multiple models (base models) and trains a meta-model on their predictions.
Can outperform single models by capturing diverse perspectives.
Implementation using Logistic Regression as Meta-Model:


In [None]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# Define base models
base_models = [
    ('knn', KNeighborsClassifier(n_neighbors=5)),
    ('svm', SVC(kernel='linear', probability=True)),
    ('dt', DecisionTreeClassifier(max_depth=3))
]

# Define meta-model
meta_model = LogisticRegression()

# Train Stacking Classifier
model = StackingClassifier(estimators=base_models, final_estimator=meta_model)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Stacking Accuracy:", accuracy_score(y_test, y_pred))

# ensemble learning on the Iris dataset

**Bagging (Bootstrap Aggregating)**

In [1]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train Random Forest
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0


**Boosting**

In [3]:
from xgboost import XGBClassifier

# Train XGBoost model
model = XGBClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Boosting (XGBoost) Accuracy:", accuracy_score(y_test, y_pred))



Boosting (XGBoost) Accuracy: 1.0


**Stacking (Stacked Generalization)**

In [4]:
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# Define base models
base_models = [
    ('knn', KNeighborsClassifier(n_neighbors=3)),
    ('dt', DecisionTreeClassifier(max_depth=3))
]

# Define meta-model
meta_model = SVC(kernel='linear', probability=True)

# Train Stacking Classifier
model = StackingClassifier(estimators=base_models, final_estimator=meta_model)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0
