<a href="https://colab.research.google.com/github/KK1503/GitHubRepo/blob/main/Copy_of_LogisticRegression_Customer_Churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Statement — Telco Customer Churn Prediction

###**Context & Need**
Recurring-revenue telcos lose significant ARR to customer churn. Proactive retention (discounts, outreach, plan optimization) works only if we can **identify at-risk customers before they leave** and target them efficiently.

###**Objective**
Build a supervised classification model that predicts whether a customer will **churn** (`Churn`: Yes/No) using account, service, and billing attributes (e.g., **tenure, contract type, payment method, monthly/total charges, internet/phone services, addons**). The model’s scores will drive retention actions and offer personalization.


###**Data**
Input features include demographics and service/billing fields (see dataset dictionary in your code). Target is binary **churn** (Yes=1/No=0). ID fields (e.g., `customerID`) carry no signal and are excluded.



In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

# Load the dataset
data = pd.read_csv('/home/Telco-Customer-Churn.csv')  # https://drive.google.com/file/d/1j5Uxw4Qo6RAXkPP9LUFiFsD3CNXjNarG/view?usp=sharing

# Preview the data
print(data.head())
'''
Customer ID       Unique identifier for each customer
gender            Whether the customer is a male or a female
SeniorCitizen     Whether the customer is a senior citizen or not (1, 0)
Partner           Whether the customer has a partner or not (Yes, No)
Dependents        Whether the customer has dependents or not (Yes, No)
tenure            Number of months the customer has stayed with the company
PhoneService      Whether the customer has a phone service or not (Yes, No)
MultipleLines     Whether the customer has multiple lines or not (Yes, No, No phone service)
InternetService   Customer’s internet service provider (DSL, Fiber optic, No)
OnlineSecurity    Whether the customer has online security or not (Yes, No, No internet service)
OnlineBackup      Whether the customer has online backup or not (Yes, No, No internet service)
DeviceProtection  Whether the customer has device protection or not (Yes, No, No internet service)
TechSupport       Whether the customer has tech support or not (Yes, No, No internet service)
StreamingTV       Whether the customer has streaming TV or not (Yes, No, No internet service)
StreamingMovies   Whether the customer has streaming movies or not (Yes, No, No internet service)
Contract          The contract term of the customer (Month-to-month, One year, Two year)
PaperlessBilling  Whether the customer has paperless billing or not (Yes, No)
PaymentMethod     The customer’s payment method
MonthlyCharges    The amount charged to the customer monthly
TotalCharges      The total amount charged to the customer
Churn             Whether the customer churned or not (Yes or No)
'''

   customerID  gender  SeniorCitizen Partner Dependents  tenure PhoneService  \
0  7590-VHVEG  Female              0     Yes         No       1           No   
1  5575-GNVDE    Male              0      No         No      34          Yes   
2  3668-QPYBK    Male              0      No         No       2          Yes   
3  7795-CFOCW    Male              0      No         No      45           No   
4  9237-HQITU  Female              0      No         No       2          Yes   

      MultipleLines InternetService OnlineSecurity  ... DeviceProtection  \
0  No phone service             DSL             No  ...               No   
1                No             DSL            Yes  ...              Yes   
2                No             DSL            Yes  ...               No   
3  No phone service             DSL            Yes  ...              Yes   
4                No     Fiber optic             No  ...               No   

  TechSupport StreamingTV StreamingMovies        Contract Pape

'\nCustomer ID       Unique identifier for each customer\ngender            Whether the customer is a male or a female\nSeniorCitizen     Whether the customer is a senior citizen or not (1, 0)\nPartner           Whether the customer has a partner or not (Yes, No)\nDependents        Whether the customer has dependents or not (Yes, No)\ntenure            Number of months the customer has stayed with the company\nPhoneService      Whether the customer has a phone service or not (Yes, No)\nMultipleLines     Whether the customer has multiple lines or not (Yes, No, No phone service)\nInternetService   Customer’s internet service provider (DSL, Fiber optic, No)\nOnlineSecurity    Whether the customer has online security or not (Yes, No, No internet service)\nOnlineBackup      Whether the customer has online backup or not (Yes, No, No internet service)\nDeviceProtection  Whether the customer has device protection or not (Yes, No, No internet service)\nTechSupport       Whether the customer has

In [None]:
# Drop customerID as it’s not useful for prediction
data.drop('customerID', axis=1, inplace=True)

# Convert target column 'Churn' to binary (Yes=1, No=0)
data['Churn'] = data['Churn'].map({'Yes': 1, 'No': 0})

# Handle categorical features using Label Encoding for simplicity
for col in data.select_dtypes(include='object').columns:
    data[col] = LabelEncoder().fit_transform(data[col])

'''Label Encoding: convert categorical (text) data into numbers, so that machine learning models can process them.
Why is this needed?
 - Machine learning models only understand numbers.
 - If you have text values like "Male" and "Female", you need to convert them into numeric values.

Selects all columns where the data type is object (i.e., text/categorical columns).
Converts the text categories into numeric labels for each column
'''

# Separate features and target
X = data.drop('Churn', axis=1)
y = data['Churn']


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)


y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

# Metrics
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print(f"ROC AUC Score: {roc_auc_score(y_test, y_proba):.2f}")


Confusion Matrix:
 [[925 110]
 [177 197]]

Classification Report:
               precision    recall  f1-score   support

           0       0.84      0.89      0.87      1035
           1       0.64      0.53      0.58       374

    accuracy                           0.80      1409
   macro avg       0.74      0.71      0.72      1409
weighted avg       0.79      0.80      0.79      1409

ROC AUC Score: 0.84
