<a href="https://colab.research.google.com/github/Ch-Ibrahim/PortfolioProjects/blob/main/lazyPredict_Happy_Happy_Customers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install lazypredict

In [None]:
# basics
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import lazypredict
from lazypredict.Supervised import LazyClassifier
from sklearn.model_selection import train_test_split

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
url='https://drive.google.com/file/d/1ew_caB7jx_GjdfpYNEpYmmSop8JI2veg/view?usp=sharing'
file_id=url.split('/')[-2]
dwn_url='https://drive.google.com/uc?id=' + file_id
data = pd.read_csv(dwn_url)
print(data.head())

**Data Description:**

**Y** = target attribute (Y) with values indicating 0 (unhappy) and 1 (happy) customers

**X1** = my order was delivered on time

**X2** = contents of my order was as I expected

**X3** = I ordered everything I wanted to order

**X4** = I paid a good price for my order

**X5** = I am satisfied with my courier

**X6** = the app makes ordering easy for me


Attributes X1 to **X6** indicate the responses for each question and have values from 1 to 5 where the smaller number indicates less and the higher number indicates more towards the answer.

In [None]:
y_counts = data['Y'].value_counts()
labels = ['Happy', 'Unhappy']


plt.pie(y_counts, labels=labels, autopct='%1.1f%%', startangle=90, colors=['skyblue', 'lightcoral'])
plt.title('Distribution of Customer Happiness')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()

The plot indicates that the number of happy customers exceeded the number of unhappy ones, although the margin between the two groups is relatively small.

Split my Data for trainng :


To understand model performance, dividing the dataset into a training set and a test set is a good strategy.

Let's split the dataset by using the function train_test_split(). You need to pass 3 parameters: features, target, and test_set size. Additionally, you can use random_state to select records randomly.

In [None]:
Y=data['Y']
X=data[['X1','X2','X3','X4','X5','X6']]

Y_train, Y_test, X_train, X_test = train_test_split(Y, X, test_size=0.15, random_state=42)

**LazyPredict**  :

is a powerful Python library designed to streamline and partially automate machine learning workflows. It rapidly generates a wide range of baseline models with minimal coding effort, enabling users to efficiently compare algorithm performance prior to any hyperparameter tuning

In [None]:
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train, X_test, Y_train, Y_test)
#print(models)
models

In [None]:
print("Top Performing Model:\n")
display(models.head(1))

The table generated by LazyPredict provides a quick comparison of multiple classification models based on key performance metrics such as Accuracy, Balanced Accuracy, ROC AUC, F1 Score, and computation time. These metrics help evaluate how well each model handles both balanced and imbalanced data, measures class‑separation ability, and balances precision with recall. Among all tested models, the LGBMClassifier achieved the strongest overall performance across all major metrics, indicating it is the most suitable model for this dataset.

In [None]:
plt.figure(figsize=(10, 5))
sns.set_theme(style="whitegrid")
ax=sns.barplot(x=models.index, y="Accuracy", data=models)
ax.set_xticklabels(ax.get_xticklabels(), rotation=90)


The LGBMClassifier showed consis
tently strong results across all major evaluation metrics in the LazyPredict analysis. These performance values indicate that the model is highly effective in distinguishing between classes below The following section presents the model’s performance

In [None]:
from sklearn.metrics import accuracy_score
from lightgbm import LGBMClassifier
import lightgbm as lgb
Y=data['Y']
X=data[['X1','X2','X3','X4','X5','X6']]

Y_train, Y_test, X_train, X_test = train_test_split(Y, X, test_size=0.15, random_state=42)


clf = lgb.LGBMClassifier()
clf.fit(X_train, Y_train)
# predict the results
y_pred=clf.predict(X_test)

from sklearn.metrics import accuracy_score
accuracy=accuracy_score(y_pred, Y_test)
print('LightGBM Model accuracy score: {0:0.4f}'.format(accuracy_score(Y_test, y_pred)))

In [None]:
from sklearn.metrics import accuracy_score
accuracy=accuracy_score(y_pred, Y_test)
print('LightGBM Model accuracy score: {0:0.4f}'.format(accuracy_score(Y_test, y_pred)))

Although the dataset appears balanced from a technical standpoint, the business reality is that half of the customers are dissatisfied. Prioritizing recall for Class 0 ensures you identify as many unhappy customers as possible, which ultimately delivers the greatest business value


Why Focus on Recall for Class 0
- Accuracy alone might look good (say 79%), but it hides the fact that unhappy customers are being missed.
- By focusing on recall for Class 0, you ensure the model captures as many unhappy customers as possible.
- This directly supports business goals: identifying dissatisfaction → improving operations → reducing churn.


Instead of relying only on accuracy, it’s more useful to look at the recall for Class 0. This tells us how well the model identifies unhappy customers, which is the group that matters most for improving business results. By focusing on this metric, we can better understand how effectively the model supports real business needs and helps target the customers who require the most attention
