# Buisness Context & Objectives

Currently shareholders at the telecom company are concerned about declining customer retention and its effect on recurring revenue

##### **Key metrics to track**
- Overall churn rate
- Monthly recurring revenue (MRR) at risk
- Customer lifetime value (CLV)
- High-value customer retention rates
- Churn rates by customer segment

##### **Questions to Answer:**
1. Who is churning and what segments are most at risk?
2. Why are customers leaving — what patterns emerge?
3. What is the financial impact of churn?
4. Which retention strategies give the best ROI?
5. Can we predict future churn to target customers before they leave?

##### **Our goal is to deliver:**
- A churn risk model
- Clear insights into churn drivers
- Data-driven recommendations to increase retention and profitability


# Data Cleaning

In [1]:
import pandas as pd

In [None]:
churn_df = pd.read_csv('data\Telco-Customer-Churn.csv')

### Understading the Structure

In [None]:
#shape of the dataset
print(churn_df.shape)
#columns of the dataset
print(churn_df.columns)
#first 5 rows of the dataset
churn_df.head()

(7043, 21)
Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [15]:
# Getting the data types of the columns
print(churn_df.dtypes)

customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object


### Fixing data Type Issues

In [None]:
# Total Charges is string but it needs to be a float
new_total_charges = []
for charge in churn_df['TotalCharges']:
    if charge == ' ':
        new_total_charges.append(0.0)
    else:
        new_total_charges.append(float(charge))
 sadf     n n asdkfakjbsdfuoasasdfasdfsdfsafadfafafdfadfadfafasfasdfa
churn_df['TotalCharges'] = new_total_charges

In [52]:
# Dropping rows with 0.0 values in TotalCharges, Won't have impact on analysis because it only makes up 11 rows
churn_df[churn_df['TotalCharges'] == 0.0].index
churn_df.drop(churn_df[churn_df['TotalCharges'] == 0.0].index, inplace=True)