# Easy Ensemble AdaBoost Classifier

In AdaBoost, a model is trained and then evaluated. After evaluating the errors of the first model, another model is trained. 

The model gives extra weight to the errors from the previous model. The purpose of this weighting is to minimize similar errors in subsequent models. Then, the errors from the second model are given extra weight for the third model. This process is repeated until the error rate is minimized.

In [2]:
import pandas as pd

business_df = pd.read_csv("../../Data/01_Clean_Business_Data.csv")

# Categorizing restaurants based on stars ratings
business_df["Category"] = pd.cut(business_df["Stars_Rating"],bins=[0.9,2,3,4,5],
                                 labels=["Poor","Average","Good","Successful"])

def changeStatus(status):
    if status == "Poor":
        return 0
    elif status == "Average":
        return 1
    elif status ==  "Good":
        return 2
    else:
        return 3

business_df['Category_Encoded'] = business_df["Category"].apply(changeStatus)
business_df["Category_Encoded"] = pd.to_numeric(business_df["Category_Encoded"])

In [3]:
X = business_df[['Review_Count', 'Restaurants_Delivery', 'Outdoor_Seating',
       'Restaurants_TakeOut', 'WiFi', 'Restaurants_Reservations',
       'Good_For_Groups', 'Wheelchair_Accessible', 'Happy_Hour',
       'Dietary_Restrictions']]

In [4]:
# Define the target set
y = business_df["Category_Encoded"]

In [5]:
# Split the model into training and testing sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,
                                                   y, 
                                                   random_state=1, 
                                                    stratify=y)

In [6]:
# Train the EasyEnsembleClassifier
from imblearn.ensemble import EasyEnsembleClassifier

model = EasyEnsembleClassifier(n_estimators=100, random_state=1)

model.fit(X_train, y_train)

EasyEnsembleClassifier(n_estimators=100, random_state=1)

In [7]:
# Calculated the balanced accuracy score
from sklearn.metrics import balanced_accuracy_score
y_pred = model.predict(X_test)
balanced_accuracy_score(y_test, y_pred)

0.4281340013590954

In [8]:
# Training balanced accuracy
y_pred_train = model.predict(X_train)
balanced_accuracy_score(y_train, y_pred_train)

0.44239330825548157

In [9]:
# Display the confusion matrix
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
cm

array([[ 671,  238,   40,  168],
       [ 994,  885,  550,  533],
       [ 983, 1428, 2045, 1546],
       [ 352,  359,  547, 1126]])

In [11]:
# Print the imbalanced classification report
from imblearn.metrics import classification_report_imbalanced

print(classification_report_imbalanced(y_test, y_pred))



                   pre       rec       spe        f1       geo       iba       sup

          0       0.22      0.60      0.79      0.33      0.69      0.47      1117
          1       0.30      0.30      0.79      0.30      0.48      0.22      2962
          2       0.64      0.34      0.82      0.45      0.53      0.27      6002
          3       0.33      0.47      0.78      0.39      0.61      0.36      2384

avg / total       0.47      0.38      0.80      0.39      0.55      0.29     12465

