# Customer Lifecycle Analytics  
## Feature Engineering

### Objective
The objective of this notebook is to engineer meaningful features that better
capture customer behavior, engagement, and value in order to improve churn
prediction and customer segmentation.

### Why Feature Engineering Matters
Raw variables often do not fully represent customer behavior. Feature
engineering helps transform existing data into more informative signals
that highlight churn risk and retention drivers.

### Key Feature Categories
- Demographic features
- Engagement & activity indicators
- Financial & value-based metrics
- Behavioral risk flags

### Dataset Used
This notebook uses the cleaned dataset generated in the previous step:



In [1]:
import pandas as pd
import numpy as np
import os

df = pd.read_csv("../data/processed/customer_churn_clean.csv")
df.head()


Unnamed: 0,rownumber,customerid,surname,creditscore,geography,gender,age,tenure,balance,numofproducts,has_credit_card,is_active_member,estimatedsalary,churn,complain,satisfaction_score,card_type,point_earned
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1,1,2,DIAMOND,464
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0,1,3,DIAMOND,456
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1,1,3,DIAMOND,377
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0,0,5,GOLD,350
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0,0,5,GOLD,425


In [2]:
df.shape, df.isnull().sum()


((10000, 18),
 rownumber             0
 customerid            0
 surname               0
 creditscore           0
 geography             0
 gender                0
 age                   0
 tenure                0
 balance               0
 numofproducts         0
 has_credit_card       0
 is_active_member      0
 estimatedsalary       0
 churn                 0
 complain              0
 satisfaction_score    0
 card_type             0
 point_earned          0
 dtype: int64)

In [3]:
df["age_group"] = pd.cut(
    df["age"],
    bins=[0, 30, 45, 60, 100],
    labels=["Young", "Mid-Age", "Senior", "Elder"]
)

df["age_group"].value_counts()


age_group
Mid-Age    5921
Young      1968
Senior     1647
Elder       464
Name: count, dtype: int64

In [4]:
df["tenure_group"] = pd.cut(
    df["tenure"],
    bins=[0, 2, 5, 10, 50],
    labels=["New", "Early", "Established", "Loyal"]
)

df["tenure_group"].value_counts()


tenure_group
Established    4494
Early          3010
New            2083
Loyal             0
Name: count, dtype: int64

In [5]:
df["high_balance_flag"] = (df["balance"] > df["balance"].median()).astype(int)

df["high_balance_flag"].value_counts()


high_balance_flag
0    5000
1    5000
Name: count, dtype: int64

In [6]:
df["engagement_score"] = (
    df["is_active_member"] +
    df["has_credit_card"] +
    (df["complain"] == 0).astype(int)
)


In [7]:
df["low_satisfaction_flag"] = (df["satisfaction_score"] <= 2).astype(int)

df["low_satisfaction_flag"].value_counts()


low_satisfaction_flag
0    6054
1    3946
Name: count, dtype: int64

In [8]:
df["single_product_flag"] = (df["numofproducts"] == 1).astype(int)

df["single_product_flag"].value_counts()


single_product_flag
1    5084
0    4916
Name: count, dtype: int64

In [9]:
df["salary_segment"] = pd.qcut(
    df["estimatedsalary"],
    q=3,
    labels=["Low", "Medium", "High"]
)

df["salary_segment"].value_counts()


salary_segment
Medium    3334
Low       3333
High      3333
Name: count, dtype: int64

In [10]:
df["risk_score"] = (
    df["low_satisfaction_flag"] +
    df["single_product_flag"] +
    (df["is_active_member"] == 0).astype(int)
)


In [11]:
df_model = df.drop(columns=[
    "rownumber",
    "customerid",
    "surname"
])


In [12]:
os.makedirs("../data/processed", exist_ok=True)

df_model.to_csv(
    "../data/processed/customer_churn_features.csv",
    index=False
)

print("✅ Feature-engineered dataset saved successfully!")


✅ Feature-engineered dataset saved successfully!


## Feature Engineering Summary

- Converted continuous variables into meaningful segments
- Created engagement and satisfaction-based indicators
- Added value-focused and behavioral risk signals
- Removed non-informative identifier columns
- Prepared a modeling-ready dataset

These engineered features provide stronger signals for churn prediction
and enable more targeted retention strategies.
