Scenario: Retail – Customer Churn Prediction
A retail company wants to predict whether customers will churn (1) or stay loyal (0) based on:
Monthly spend (amount spent in store)
Visits per month (frequency of shopping)
Satisfaction score (1–10 scale)
Why XGBoost?
Handles messy real-world data (missing values, skewed distributions).
Faster training compared to traditional Gradient Boosting.
Regularization reduces overfitting, making predictions more robust
 

In [1]:
import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv("retail_churn_200.csv")
df.shape

(200, 4)

In [3]:
df.head(2)

Unnamed: 0,Monthly_spend,Visits_per_month,Satisfaction_score,Churn
0,1226,7,3,0
1,1559,7,4,0


In [5]:
X=df.drop(columns = ["Churn"])
y=df["Churn"]

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)


In [7]:
from xgboost import XGBClassifier

In [12]:
xgb_model=XGBClassifier(
    n_estimators=50,
    learning_rate=0.1,
    max_depth=3,
    random_state=42,
    # use_label_encoder=False,
    eval_metric="logloss"
)

In [13]:
from sklearn.metrics import accuracy_score
xgb_model.fit(X_train,y_train)
ypred=xgb_model.predict(X_test)
print("Accuracy : ",accuracy_score(y_test,ypred))

Accuracy :  0.9833333333333333
