# Optimize Customer Retention with Machine Learning - Economic integrated approach

> ## Introduction 
Customer churn prediction is the process of identifying customers who are possibly going to stop using
a service, particularly in subscription-based business models. Since acquiring a new customer is typically far more expensive than retaining an existing one, early identification of at-risk customers enables companies to take proactive actions — such as targeted incentives, personalized offers, or service improvements — to reduce churn.

This problem is especially critical in the telecommunications industry, where intense competition leads to high customer acquisition costs and elevated churn rates, often estimated at **20–25% on average across the industry¹**. As a result, effective churn prediction models play a key role in improving customer retention and enhancing companies' profitability.

This project goes beyond churn prediction alone. It links ML model's outputs with decision-making by addressing two core tasks:

<br>


> 1. **Classification**: ```build a Machine learning model that estimates each customer's churn probability Pᶜ``` and


> 2. **Decision-making**: ```build an economic model that determines whether a customer should be offered retention incentives or not based on their Pᶜ```


<br>

This project follows the CRISP-DM (Cross Industry Standard Process for Data Mining), which includes the following steps:
1. Business Understanding 

2. Data Understanding

3. Data Preparation

4. Modelling

5. Evaluation

<br>
  

***


> ## 1. Business Understanding

### 1.1 Business Problem and Objectives
+ Cell2Cell, an US telecomunications company, faces high annual customer churn rate, resulting in substantial recurring revenue loss each year

+ The company can provide targetted incentives (e.g discount offer) to at-risk customers. However, this will incur direct cost and the company has limited budget

+ The challenge is to predict each customer's risk level and determine a targeting strategy that maximizes expected net value from the campaign


### 1.2 Economic model for decision-making
We model churn as an economic loss minimization problem. The baseline outcome is a customer who stays with the company, which yields utility 0. Churn generates a loss of 
R, representing the economic cost of losing a customer (i.e forgone future revenue). An at-risk customer might still decided to churn even after being offered the incentives.

+ Cost of offering incentives to one customer: C

+ Loss incurred if one customer churn: R

+ Predicted probability that the customer will churn without intervention: Pᶜ

+ Probability that an incentives offer successfully prevent an at-risk customer from churning: Pᴿ

The utilities for the four possible outcomes are:

+ Targeting customer who wouldn't churn: EV = - C

+ Not targeting customer who wouldn't churn: EV = 0

+ Not targeting customer who would churn: EV = - R

+ Targeting customer who would churn: EV = (1-Pᴿ) . (-R -C) + Pᴿ . (-C)

**Expected utility of not targeting a customer:**

EVᴺ = Pᶜ . (-R) + (1-Pᶜ) . 0 = Pᶜ . (-R)

**Expected utility of targeting a customer:**

EVᵀ = Pᶜ . ((1-Pᴿ) . (-R -C) + Pᴿ . (-C)) + (1-Pᶜ) . (-C)

We choose to target if EVᵀ > EVᴺ. Deriving from the 2 equations above, we arrive at the following decision rule:

```Target a customer if:``` $$ Pᶜ > \frac {C} {Pᴿ.R} $$

The model is aligned with business intuition. Customers with higher churn risk (Pᶜ) are more likely to be worth targeting. The probability threshold for targetting can be loosened up when the cost of offering incentives is cheap (low C), incentives are more effective (high Pᴿ), and churns are more costly (high R). This framework also enables scenario analysis. By letting C, Pᴿ, and R vary, we can determine the churn probability threshold required for a retention offer to be profitable.

<br>

***

> ## 2. Data Understanding

The Cell2Cell churn dataset used in this project is retrieved from [Kaggle](https://www.kaggle.com/datasets/jpacse/datasets-for-churn-telecom/data). It originates from a real U.S. telecommunications provider (operated in the late 1990s) and was released as part of the KDD Cup 2000 competition. While the data reflects real customer behavior and churn outcomes, it has been anonymized for research purposes. 

The dataset contains 71,047 instances, 58 features and 3491 missing values. The data is split into two csv. files: one file consists of 51,048 instances and has labels, while the other file consists of of 19,999 instances and doesn't have labels. 

The data is imbalance with 36,336 (71%) instances of the majority class of “non-churn” and 14,711 (29%) instances of the minority class of “churn”.

> ## 3. Data Preparation

### 3.1 Preprocessing

1. Drop irrelevant columns

2. Handle missing data

3. Handle duplicates

4. Handle outliers

5. Encode variables

> Drop irrelevant