# Easy Ensemble AdaBoost Classifier

In AdaBoost, a model is trained and then evaluated. After evaluating the errors of the first model, another model is trained. 

The model gives extra weight to the errors from the previous model. The purpose of this weighting is to minimize similar errors in subsequent models. Then, the errors from the second model are given extra weight for the third model. This process is repeated until the error rate is minimized.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd

business_df = pd.read_csv("../../Data/01_Clean_Business_Data.csv")

# Categorizing restaurants based on stars ratings
business_df["Category"] = pd.cut(business_df["Stars_Rating"],bins=[0.9,2,3,4,5],labels=["Poor","Average","Good","Successful"])

In [2]:
# Define features set
X = business_df.drop(columns=['Unnamed: 0', 'Restaurant_ID', 'Restaurants_Name', 'Address', 'City',
       'State', 'Postal_Code', 'Latitude', 'Longitude', 'Stars_Rating', 'Category'])

In [4]:
# Define the target set
y = business_df["Category"]

In [5]:
# Split the model into training and testing sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
                                                   y, 
                                                   random_state=1, 
                                                    stratify=y)

In [6]:
# Train the EasyEnsembleClassifier
from imblearn.ensemble import EasyEnsembleClassifier

model = EasyEnsembleClassifier(n_estimators=100, random_state=1)

model.fit(X_train, y_train)

EasyEnsembleClassifier(n_estimators=100, random_state=1)

In [8]:
# Calculated the balanced accuracy score
from sklearn.metrics import balanced_accuracy_score
y_pred = model.predict(X_test)
balanced_accuracy_score(y_test, y_pred)

0.43398127229197286

In [12]:
# Display the confusion matrix
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
cm

array([[ 876,  553,  980,  553],
       [1388, 1856,  941, 1817],
       [ 217,   41,  679,  180],
       [ 366,  432,  339, 1247]])

In [13]:
# Print the imbalanced classification report
from imblearn.metrics import classification_report_imbalanced

print(classification_report_imbalanced(y_test, y_pred))



                   pre       rec       spe        f1       geo       iba       sup

    Average       0.31      0.30      0.79      0.30      0.48      0.22      2962
       Good       0.64      0.31      0.84      0.42      0.51      0.25      6002
       Poor       0.23      0.61      0.80      0.33      0.70      0.48      1117
 Successful       0.33      0.52      0.75      0.40      0.63      0.38      2384

avg / total       0.47      0.37      0.81      0.38      0.54      0.29     12465

