# Customer Lifetime Value (CLV) Feature Engineering

## Objective
Create a proxy-based CLV score and prepare features for value-based customer segmentation.

## Dataset
IBM Telco Customer Churn dataset (adapted for SME context)

## CLV Feature Construction

In [2]:
import pandas as pd

# Load dataset
df = pd.read_csv("../data/raw/telco_customer_churn.csv")

df.head()


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [3]:
# Create retention factor
df['retention_factor'] = df['Churn'].map({'No': 1, 'Yes': 0})

df[['Churn', 'retention_factor']].head()

Unnamed: 0,Churn,retention_factor
0,No,1
1,No,1
2,Yes,0
3,No,1
4,Yes,0


In [4]:
# Compute CLV score
df['clv_score'] = (
    df['MonthlyCharges'] *
    df['tenure'] *
    df['retention_factor']
)

df[['MonthlyCharges', 'tenure', 'retention_factor', 'clv_score']].head()

Unnamed: 0,MonthlyCharges,tenure,retention_factor,clv_score
0,29.85,1,1,29.85
1,56.95,34,1,1936.3
2,53.85,2,0,0.0
3,42.3,45,1,1903.5
4,70.7,2,0,0.0


In [5]:
df['clv_score'].describe()

count    7043.000000
mean     1873.138513
std      2291.322438
min         0.000000
25%         0.000000
50%       855.000000
75%      3195.325000
max      8550.000000
Name: clv_score, dtype: float64