### Ensemble Learning
Use different voting mechanism and Apply AdaBoost (Adaptive Boosting), Gradient
Tree Boosting (GBM), XGBoost classification on Iris dataset and compare the
performance of three models using different evaluation measures.

Dataset Link: https://www.kaggle.com/datasets/uciml/iris

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv("Iris.csv")
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [2]:
# Encoding the 'Species' column as it is categorical

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
df['Species'] = label_encoder.fit_transform(df['Species'])

In [3]:
# Dropping the 'Id' column as it is just an identifier
df.drop('Id', axis=1, inplace=True)

In [4]:
X = df.drop("Species", axis=1)  # Features
y = df["Species"]  # Labels

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
# Now that we have our data prepared, let's proceed to apply the ensemble methods
(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

((120, 4), (30, 4), (120,), (30,))

In [7]:
# training
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
import xgboost as xgb

# Create the models
adaboost = AdaBoostClassifier(n_estimators=50, random_state=42)
gbm = GradientBoostingClassifier(n_estimators=100, random_state=42)
xgboost = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='mlogloss', random_state=42)

# Fit the models
adaboost.fit(X_train, y_train)
gbm.fit(X_train, y_train)
xgboost.fit(X_train, y_train)

In [10]:
# predictions
adaboost_predictions = adaboost.predict(X_test)
gbm_predictions = adaboost.predict(X_test)
xgb_predictions = adaboost.predict(X_test)

In [18]:
# evaluation
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

def evaluate_model(model, y_test, y_pred, model_name):
    accuracy = accuracy_score(y_test, y_pred)
    print(f"{model_name} Accuracy: {accuracy:.2f}")
    print(f"{model_name} Classification Report:\n{classification_report(y_test, y_pred)}")
    print(f"{model_name} Confusion Matrix:\n{confusion_matrix(y_test, y_pred)}\n")

In [19]:
evaluate_model(adaboost, y_test, adaboost_predictions, "AdaBoost")

AdaBoost Accuracy: 1.00
AdaBoost Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

AdaBoost Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]



In [20]:
evaluate_model(gbm, y_test, gbm_predictions, "Gradient Boosting")

Gradient Boosting Accuracy: 1.00
Gradient Boosting Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Gradient Boosting Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]



In [21]:
evaluate_model(xgboost, y_test, xgb_predictions, "XGBoost")

XGBoost Accuracy: 1.00
XGBoost Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

XGBoost Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

