# Feature Engineering for Telecom Churn Prediction

This notebook focuses on deriving meaningful customer behavior features to improve churn prediction performance.


In [2]:
import pandas as pd

In [3]:
data_path= r"D:\Aman Deep\Deep-Learning\telecom-churn-deep-learning\data\processed\telecom_churn_initial_clean.csv"

df=pd.read_csv(data_path)
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


We use the cleaned dataset from the EDA phase as the base for feature engineering.
Feature engineering is performed on tabular data, not on scaled arrays.


In [6]:
df.shape

(7043, 21)

In [11]:
df['tenure_group'] = pd.cut(
    df['tenure'],
    bins=[0, 12, 24, 48, 72],
    labels=['0-1 year', '1-2 years', '2-4 years', '4-6 years']
)

df[['tenure', 'tenure_group']].head(10)


Unnamed: 0,tenure,tenure_group
0,1,0-1 year
1,34,2-4 years
2,2,0-1 year
3,45,2-4 years
4,2,0-1 year
5,8,0-1 year
6,22,1-2 years
7,10,0-1 year
8,28,2-4 years
9,62,4-6 years


Customers in the early tenure phase are more likely to churn due to onboarding issues,
pricing sensitivity, and unmet service expectations.
Tenure is therefore grouped to capture lifecycle-based churn behavior.


In [9]:
median_charge = df['MonthlyCharges'].median()

df['high_monthly_charge'] = (df['MonthlyCharges'] > median_charge).astype(int)

df[['MonthlyCharges', 'high_monthly_charge']].head()


Unnamed: 0,MonthlyCharges,high_monthly_charge
0,29.85,0
1,56.95,0
2,53.85,0
3,42.3,0
4,70.7,1


In [17]:
df['high_monthly_charge'].value_counts()

high_monthly_charge
0    3528
1    3515
Name: count, dtype: int64

Customers with higher monthly charges represent higher revenue risk.
Creating a binary flag helps the model distinguish high-value customers
who require proactive retention strategies.


In [16]:
df['Contract'].value_counts()

Contract
Month-to-month    3875
Two year          1695
One year          1473
Name: count, dtype: int64

In [12]:
df['long_term_contract'] = df['Contract'].isin(
    ['One year', 'Two year']
).astype(int)

In [13]:

df[['Contract', 'long_term_contract']].head()


Unnamed: 0,Contract,long_term_contract
0,Month-to-month,0
1,One year,1
2,Month-to-month,0
3,One year,1
4,Month-to-month,0


Customers on long-term contracts demonstrate higher commitment and
lower churn probability compared to month-to-month subscribers.


In [20]:
service_cols = [
    'OnlineSecurity', 'OnlineBackup',
    'DeviceProtection', 'TechSupport',
    'StreamingTV', 'StreamingMovies'
]

df['num_services'] = (df[service_cols] == 'Yes').sum(axis=1)

df[['num_services']].head()


Unnamed: 0,num_services
0,1
1,2
2,2
3,3
4,0


In [22]:
df['num_services'].value_counts()

num_services
0    2219
3    1118
2    1033
1     966
4     852
5     571
6     284
Name: count, dtype: int64

Customers using multiple services are more engaged with the telecom provider,
making them less likely to churn compared to single-service users.
