In [36]:
import pandas as pd
import pickle
from sklearn.metrics import fbeta_score
from sklearn.metrics import confusion_matrix

# Final Project: _Model Comparison_

## Group Members: Jennifer Baker, Rachana Nagaraj Banakar, Devan Kreitzer, Patrick Maggio, Safrin Patil, Sravani Yadavalli

In this notebook, our primary objective is to conduct a comprehensive comparison of seven different machine learning models to evaluate their performance in predicting whether an individual will utilize a coupon or not.

By assessing and comparing the performance of these models, we aim to gain a deeper understanding of their strengths and weaknesses, enabling us to select the most suitable model for the specific task at hand. This analysis will not only help us determine which model is the most effective but also provide insights into the factors that contribute to successful coupon utilization predictions.


### Our Goal

Our team is proposing to develop a model for this dataset because personalized recommendations based on profiles to provide encouragement or engagement with local products and services is a market that we feel is strong enough to warrant predictive analysis on different metrics for coupon redemption. If we know that someone will use a coupon, that means that we know that person will be a customer, which leads to more profit, even with the coupon’s value.


### Explaining Our Models

We've developed 7 Jupyter Notebooks to identify the optimal model configurations, which you can find in the attached **_"\_Model Gen Notebooks"_** .zip file. In this notebook, we'll evaluate the top-performing models for Neural Network (MLPClassifier), XGBoost, Decision Tree, KNNClassifier, Logistic Regression, Support Vector Classification (SVC), and Random Forest. All of which are targeting a constructed `fbeta_score`. Each model has been fine-tuned using both GridSearch or RandomSearch for hyperparameter optimization.


### Our custom F1 Score

We built and optimized our models for using a custom F1 score with Beta = 2 (`fbeta_score`). Using this custom F2 score, we can optimize recall (failing to give a coupon to someone who would use it) while still considering the value of precision ( giving a coupon to someone who does not want it)

---


## Data Preprocessing/Import

The heavy lifting of cleaning and preparing our data set is completed primarily in the notebook **_"Final_Project_Data_Gen"_**.

Before modeling, it's essential to preprocess and understand the data. Here are the steps we followed:

1. **Load Data**: Start by importing the dataset into our environment.
2. **Explore the Data**: Take a preliminary look at the data to understand its structure, columns, and types.
3. **Review NaN Values**: Handle missing values appropriately. Especially ensure that there are no NaN values in the `target` column, and if there are, we'll drop those rows.
4. **Separate Features and Target**: Split the dataset into input features (X) and the target variable (y).
5. **Split the Data**: Divide the dataset into training and test sets for model validation.
6. **Export train/test Data**: For the ease of our teammates, we exported the split data into separate CSV's for use. This allowed us to efficiently hyperparameter tune our model generation notebooks.

_to see the more in-depth data preprocessing steps, see the notebook "Final_Project_Data_Gen"_


For the sake of better understanding our data, let's review the uncleaned data set.


In [3]:
df = pd.read_csv("./in-vehicle-coupon-recommendation.csv")
df.head(5)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


Our data consisted of 25 attributes ranging from weather, time, demographics, restaurant type and location.


Now, we will import our train/test split data.

_Reminder_: These files come from the **_"Final_Project_Data_Gen"_** notebook.


In [4]:
X_train = pd.read_csv("train_X_In-Car-Rec.csv")
y_train = pd.read_csv("train_y_In-Car-Rec.csv")
X_test = pd.read_csv("test_X_In-Car-Rec.csv")
y_test = pd.read_csv("test_y_In-Car-Rec.csv")

## Model Evaluation & Comparison

As we transition from preprocessing to the evaluation phase, it's pivotal to assess our top models.

Our aim is clear: identify which of our seven serialized models delivers the paramount F2 score.

The evaluation blueprint is as follows:

1. **Retrieve Models**: Initialize by extracting each model from its pickled state.
2. **Forecasting**: Employ each model to project outcomes on our split `test/train` data set.
3. **Score Quantification**: For every prediction set, compute its F2 score to gauge its performance.
4. **Performance Analysis**: Analyze the scores, pinpointing the model that truly stands out in its predictive ability.

This structured assessment ensures we spotlight the model that aligns best with our analytical aspirations.


Let's import and retrieve our models.


In [37]:
with open("./_Models/DecisionTree_Model.pkl", "rb") as f:
    DecisionTree_classifier = pickle.load(f)

with open("./_Models/KNNClassifier_Model.pkl", "rb") as f:
    KNN_classifier = pickle.load(f)

with open("./_Models/LogReg_Model.pkl", "rb") as f:
    LogReg_classifier = pickle.load(f)

with open("./_Models/MLP_Model.pkl", "rb") as f:
    MLP_classifier = pickle.load(f)

with open("./_Models/SVC_Model.pkl", "rb") as f:
    SVC_classifier = pickle.load(f)

with open("./_Models/XGB_Model.pkl", "rb") as f:
    XGB_classifier = pickle.load(f)

with open("./_Models/RandomForest_Model.pkl", "rb") as f:
    RandomForest_classifier = pickle.load(f)

Next, we will fit our models to the data.


In [42]:
# Fitting our Decision Tree model
DecisionTree_classifier.fit(X_train, y_train)

In [43]:
# Fitting our KNNClassifier model
KNN_classifier.fit(X_train, y_train.values.ravel())

In [44]:
# Fitting our Logistic Regression model
LogReg_classifier.fit(X_train, y_train.values.ravel())

In [45]:
# Fitting our Neural Network (MLP) model
MLP_classifier.fit(X_train, y_train.values.ravel())

In [46]:
# Fitting our SVC model
SVC_classifier.fit(X_train, y_train.values.ravel())

In [47]:
# Fitting our XGBoost model
XGB_classifier.fit(X_train, y_train)

In [48]:
# Fitting our RandomForest model
RandomForest_classifier.fit(X_train, y_train.values.ravel())

Now that our models are fitted, we can predict on our test set.


In [49]:
y_pred1 = DecisionTree_classifier.predict(X_test)
y_pred2 = KNN_classifier.predict(X_test)
y_pred3 = LogReg_classifier.predict(X_test)
y_pred4 = MLP_classifier.predict(X_test)
y_pred5 = SVC_classifier.predict(X_test)
y_pred6 = XGB_classifier.predict(X_test)
y_pred7 = RandomForest_classifier.predict(X_test)

Let's calculate our F2 scores for each set of predictions using `fbeta_score` from `sklearn.metrics`.


In [50]:
f2_model1 = fbeta_score(y_test, y_pred1, beta=2)
f2_model2 = fbeta_score(y_test, y_pred2, beta=2)
f2_model3 = fbeta_score(y_test, y_pred3, beta=2)
f2_model4 = fbeta_score(y_test, y_pred4, beta=2)
f2_model5 = fbeta_score(y_test, y_pred5, beta=2)
f2_model6 = fbeta_score(y_test, y_pred6, beta=2)
f2_model7 = fbeta_score(y_test, y_pred7, beta=2)

### Comparing our F2 Scores


We will now print out our weighted F2 scores within a f-string output.


In [51]:
print(f"Decision Tree Score: {f2_model1:.4f}")
print(f"KNN Classifier F2 Score: {f2_model2:.4f}")
print(f"Logistic Regression F2 Score: {f2_model3:.4f}")
print(f"MLP Classifier F2 Score: {f2_model4:.4f}")
print(f"SVC F2 Score: {f2_model5:.4f}")
print(f"XGBoost F2 Score: {f2_model6:.4f}")
print(f"RandomForest F2 Score: {f2_model7:.4f}")

Decision Tree Score: 0.7868
KNN Classifier F2 Score: 0.7839
Logistic Regression F2 Score: 0.7640
MLP Classifier F2 Score: 0.7931
SVC F2 Score: 0.8713
XGBoost F2 Score: 0.8346
RandomForest F2 Score: 0.8354


Let's review the "winning"model.


In [53]:
conf_matrix = confusion_matrix(y_test, y_pred5)
print("SCV Confusion Matrix:")
print(conf_matrix)

SCV Confusion Matrix:
[[   0 1078]
 [   0 1459]]


Our "winning" model (with the best F2 score) seems to be a SVC model. But after reviewing the confusion matrix, we can see the downside to weighing for recall. Our team made the conscious decision and acknowledged that our SVC model was not hyperparameterized thoroughly. The parameters that we used underfit our data and skewed the results. This issue of a great F2 score from a model that always predicts one category was evident in other models as well. XGBoost had to be retrained, with parameters designed to discourage under fitting.

Our past experience with simple data in class led our assumptions about reasonable parameters astray. In our efforts to avoid overfitting, we prevented the model from creating the necessary complexity to produce a good model.

To ensure that we utilizing a balanced prediction, we will instead use the XGBoost as our "winning" model.


In [56]:
conf_matrix1 = confusion_matrix(y_test, y_pred6)
print("XGBoost Confusion Matrix:")
print(conf_matrix1)

XGBoost Confusion Matrix:
[[ 730  348]
 [ 220 1239]]


These results represent the F2 scores obtained from evaluating different machine learning models on a classification task, specifically predicting whether or not someone will use a coupon. Here's an interpretation of the scores:

- **Decision Tree Score (F2 Score: 0.7868)**:

  - The Decision Tree model achieved an F2 score of 0.7868, indicating a good balance between precision and recall. It performs reasonably well in correctly classifying instances, with a particular focus on minimizing false negatives.

- **KNN Classifier F2 Score (F2 Score: 0.7839)**:

  - The K-Nearest Neighbors (KNN) Classifier demonstrates a similar performance to the Decision Tree, with an F2 score of 0.7839. It also maintains a balance between precision and recall, making it a competitive choice for this task.

- **Logistic Regression F2 Score (F2 Score: 0.7640)**:

  - The Logistic Regression model achieved an F2 score of 0.7640, indicating a slightly lower overall performance compared to the Decision Tree and KNN models. It balances precision and recall but might be slightly less effective in minimizing false negatives.

- **MLP Classifier F2 Score (F2 Score: 0.7640)**:

  - The Multi-Layer Perceptron (MLP) Classifier, similar to Logistic Regression, obtained an F2 score of 0.7640. It demonstrates a comparable performance to Logistic Regression in terms of precision and recall trade-offs.

- **SVC F2 Score (F2 Score: 0.8713)**:

  - The Support Vector Classifier (SVC) achieved the highest F2 score of 0.8713, indicating superior performance in correctly classifying instances. However, underfitting was observed, suggesting that the model may not have captured the underlying complexity of the data. Further tuning or exploring more complex models may be necessary to address this issue.

- **XGBoost F2 Score (F2 Score: 0.8346)**:
  - The XGBoost model achieves an F2 score of 0.8346, demonstrating strong overall performance and a good balance between precision and recall.

In summary, while the Support Vector Classifier (SVC) achieved the highest F2 score, underfitting was observed, indicating that it may not have fully captured the complexity of the data. The Decision Tree, KNN, and XGBoost models also offer competitive performances and may be considered, with further investigation and model tuning needed to address underfitting in the SVC. For the sake of our analysis, we will determine that XGBoost was our best performing model.


### Concluding Insight:

> Our decision to embrace XGBoost as our winning model equips us with a potent tool for precise prediction, adaptability, and robustness. It empowers us to make data-driven decisions, optimize marketing campaigns, and enhance the success of our coupon-related initiatives. With XGBoost's ability to handle complexity and provide actionable insights, we are well-prepared to tackle the intricacies of coupon usage prediction effectively.
