# Customer Churn Prediction for a Telecom Provider
__Core Problem :__

Proactively identify customers at high risk of canceling their service, allowing the business to implement targeted retention strategies to reduce revenue loss.

## 01-Setup and Initial Exploration

In [6]:
# Import necessary Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Loading the Dataset
raw_url = "https://raw.githubusercontent.com/Javadhamidi99/Customer-Churn-Prediction-for-a-Telecome-Company--Business-Marketing/refs/heads/main/Data/WA_Fn-UseC_-Telco-Customer-Churn.csv"
df = pd.read_csv(raw_url)

# Seeing the First Rows of the Dataset
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [3]:
# Getting a Summary from our Dataset
df.info()

# Getting a statistical Summary of our Dataset
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692
std,0.368612,24.559481,30.090047
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75


## 01-01 Small Corrections in our Data Set
Fixing the Data Type for TotalCharges from object to float 64

In [12]:
# Check to see if we have empty values in our column
df['TotalCharges'].value_counts()

# Replacing Empty values with 0
df['TotalCharges'] = df['TotalCharges'].replace(" " , '0')

# Changing the type of the column
df['TotalCharges'] = df['TotalCharges'].astype('float64')

# 02-Exploratory Data Analysis (EDA)
"This is the data understanding and feature engineering stage. We act as 'data detectives' to uncover the data's story—what it tells us, its limitations, and how we can best prepare it for our analysis. Our goal is to select the most impactful features and parameters, and to remove or refine any parts of the data that might hinder our progress."
