# Customer Churn Prediction

## Setup

### Imports

In [1]:
import pandas as pd
from dotenv import load_dotenv
import kagglehub

  from .autonotebook import tqdm as notebook_tqdm


### Data

In [2]:
path = kagglehub.dataset_download("dhairyajeetsingh/ecommerce-customer-behavior-dataset")
print("Path to dataset files:", path)

Path to dataset files: C:\Users\kaiod\.cache\kagglehub\datasets\dhairyajeetsingh\ecommerce-customer-behavior-dataset\versions\1


In [3]:
ecommerce_df = pd.read_csv(f"{path}/ecommerce_customer_churn_dataset.csv")
ecommerce_df.head()

Unnamed: 0,Age,Gender,Country,City,Membership_Years,Login_Frequency,Session_Duration_Avg,Pages_Per_Session,Cart_Abandonment_Rate,Wishlist_Items,...,Email_Open_Rate,Customer_Service_Calls,Product_Reviews_Written,Social_Media_Engagement_Score,Mobile_App_Usage,Payment_Method_Diversity,Lifetime_Value,Credit_Balance,Churned,Signup_Quarter
0,43.0,Male,France,Marseille,2.9,14.0,27.4,6.0,50.6,3.0,...,17.9,9.0,4.0,16.3,20.8,1.0,953.33,2278.0,0,Q1
1,36.0,Male,UK,Manchester,1.6,15.0,42.7,10.3,37.7,1.0,...,42.8,7.0,3.0,,23.3,3.0,1067.47,3028.0,0,Q4
2,45.0,Female,Canada,Vancouver,2.9,10.0,24.8,1.6,70.9,1.0,...,0.0,4.0,1.0,,8.8,,1289.75,2317.0,0,Q4
3,56.0,Female,USA,New York,2.6,10.0,38.4,14.8,41.7,9.0,...,41.4,2.0,5.0,85.9,31.0,3.0,2340.92,2674.0,0,Q1
4,35.0,Male,India,Delhi,3.1,29.0,51.4,,19.1,9.0,...,37.9,1.0,11.0,83.0,50.4,4.0,3041.29,5354.0,0,Q4


## Defining the scenario

SwiftCart, a mid-sized global e-commerce platform specializing in consumer electronics, has observed a steady increase in customer churn. While the company remains effective at acquiring new users, the rising Cost of Acquisition (CAC) has become a significant financial challenge.

To stabilize revenue growth and protect the bottom line, a new Marketing & Retention Department was established. Led by a data-driven executive, the department's mission is to pivot from intuition-based tactics to evidence-based strategies. Consequently, the Data Science team was tasked with developing a Customer Churn Prediction Model.

Currently, the Marketing & Retention team operates reactively. They typically offer discounts or win-back incentives only after a customer has already stopped using the platform or unsubscribed. At that stage, retention efforts are often too late and yield low conversion rates.

The goal of this project is to get ahead of the problem. By predicting which customers are at high risk of leaving, SwiftCart can intervene with personalized engagement strategies before the churn occurs, maximizing the chance of retention and increasing the overall Customer Lifetime Value (CLV).

### Objectives
- Make a model that predicts wheter or not a customer will churn.


### Solution Framework

To solve the business needs, a customer churn prediction model is required. This is typically a **Classification** problem, which is a classical **Supervised Learning** scenario.

For now, customer behavior regarding churn doesn't seem to change over time, so the M&R department doesn't need an online solution; therefore, we'll make an **Offline (batch)**  model.

To measure our model efficiency, we are interested in its **Recall Score**. We want to capture the maximum amount of customers that will churn. The model **Precision** is also an interesting metric to take a look at, since we don't want to distribute retention benefits to people who wouldn't churn. There is one metric that mixes both the metrics already shown: the **F1-Score**. Because of that property, we'll evaluate our model with the **F1-Score**.

To achieve a positive ROI, the model targeted an F1-Score above 0.65.

- Business Impact: This balance ensures that we capture at least 70% of potential churners (Recall) while maintaining a Precision of at least 60%, preventing the marketing budget from being exhausted on customers who had no intention of leaving.

## Data Split

To avoid any data leakage, the first step is to split the data into train, test and validation sets.

In [8]:
from sklearn.model_selection import train_test_split

In [10]:
X = ecommerce_df.drop('Churned', axis = 'columns')
y = ecommerce_df.Churned

In [11]:
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.30, random_state=42, stratify=y
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.50, random_state=42, stratify=y_temp
)

## Data Cleaning and Preparation