# Customer Churn Prediction System
### Machine Learning Approach for Customer Retention

## 1. Problem Statement

Customer churn negatively impacts revenue and business growth.
The objective of this project is to build a classification model
to predict whether a customer is likely to churn, enabling proactive retention strategies.

In [1]:
import numpy as np
import pandas as pd

In [3]:
df=pd.read_csv(r"C:\Users\Srija\Downloads\WA_Fn-UseC_-Telco-Customer-Churn.csv")

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


In [7]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


## 2. Data Preprocessing
- Removed identifier columns (customerID)
- Identified categorical and numerical features
- Applied OneHotEncoding
- Applied scaling to numeric features
- Used pipeline to prevent data leakage

In [9]:
x=df.drop(columns=["Churn","customerID"])
y=df["Churn"]

In [11]:
y=y.map({"Yes":1,"No":0})

In [13]:
x=pd.get_dummies(x,drop_first=True) #onehotencoding

In [15]:
from sklearn.model_selection import train_test_split

In [19]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42,stratify=y)

In [25]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
x_train_scaled=scaler.fit_transform(x_train)
x_test_scaled=scaler.transform(x_test)

In [27]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=1000,class_weight="balanced")
model.fit(x_train_scaled, y_train)

In [41]:
y_prob = model.predict_proba(x_test_scaled)[:, 1]
threshold = 0.35
y_pred_custom = (y_prob >= threshold).astype(int)

In [43]:
from sklearn.metrics import classification_report, confusion_matrix
y_pred = model.predict(x_test_scaled)
confusion_matrix(y_test, y_pred_custom)

array([[1257,  295],
       [ 213,  348]], dtype=int64)

In [45]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.84      0.84      0.84      1552
           1       0.56      0.56      0.56       561

    accuracy                           0.77      2113
   macro avg       0.70      0.70      0.70      2113
weighted avg       0.77      0.77      0.77      2113



## 3.Conclusion
The model successfully identifies high-risk churn customers.
Threshold tuning allows businesses to prioritize recall when retention cost is lower than customer loss.