<div class="alert alert-block alert-success">
    <h1 align="center">Telecom Customer Churn Modelling</h1>
    <h4 align="center" >Alireza Esmaeilpour</h4>
    <h6 align="center"><a href="https://alireza-esp.ir/">Website</a></h6>
    <h6 align="center"><a href="https://github.com/Alireza-Esp">Github</a></h6>
    <h6 align="center"><a href="https://www.kaggle.com/alirezaesmaeilpour">Kaggle</a></h6>
</div>

# 🔵 Import Libraries

In [1]:
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

# 🔵 Load the data

In [2]:
data = pd.read_csv("https://github.com/Alireza-Esp/Telecom-Customer-Churn-Modelling/raw/refs/heads/main/data/telecom-customer-churn-v0.csv")

In [3]:
data

Unnamed: 0,Customer ID,City,Latitude,Longitude,Gender,Partner,Dependents,Tenure Months,Phone Service,Multiple Lines,...,Streaming Music,Unlimited Data,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Total Revenue,Satisfaction Score,Customer Status,Churn Category,Churn Reason
0,3668-QPYBK,Los Angeles,33.964131,-118.272783,Male,No,No,2,Yes,No,...,No,Yes,0.00,0,20.94,129.09,1,Churned,Competitor,Competitor made better offer
1,9237-HQITU,Los Angeles,34.059281,-118.307420,Female,No,Yes,2,Yes,No,...,No,Yes,0.00,0,18.24,169.89,2,Churned,Other,Moved
2,9305-CDSKC,Los Angeles,34.048013,-118.293953,Female,No,Yes,8,Yes,Yes,...,Yes,Yes,0.00,0,97.20,917.70,3,Churned,Other,Moved
3,7892-POOKP,Los Angeles,34.062125,-118.315709,Female,Yes,Yes,28,Yes,Yes,...,Yes,Yes,0.00,0,136.92,3182.97,3,Churned,Other,Moved
4,0280-XJGEX,Los Angeles,34.039224,-118.266293,Male,No,Yes,49,Yes,Yes,...,Yes,Yes,0.00,0,2172.17,7208.47,1,Churned,Competitor,Competitor had better devices
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,2569-WGERO,Landers,34.341737,-116.539416,Female,No,No,72,Yes,No,...,No,No,19.31,0,1639.44,3039.53,5,Stayed,,
7039,6840-RESVB,Adelanto,34.667815,-117.536183,Male,Yes,Yes,24,Yes,Yes,...,Yes,Yes,48.23,0,865.20,2807.47,3,Stayed,,
7040,2234-XADUH,Amboy,34.559882,-115.637164,Female,Yes,Yes,72,Yes,Yes,...,Yes,Yes,45.38,0,2135.52,9453.04,4,Stayed,,
7041,4801-JZAZL,Angelus Oaks,34.167800,-116.864330,Female,Yes,Yes,11,No,No phone service,...,No,Yes,27.24,0,0.00,319.21,4,Stayed,,


# 🔵 EDA

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 39 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Customer ID                        7043 non-null   object 
 1   City                               7043 non-null   object 
 2   Latitude                           7043 non-null   float64
 3   Longitude                          7043 non-null   float64
 4   Gender                             7043 non-null   object 
 5   Partner                            7043 non-null   object 
 6   Dependents                         7043 non-null   object 
 7   Tenure Months                      7043 non-null   int64  
 8   Phone Service                      7043 non-null   object 
 9   Multiple Lines                     7043 non-null   object 
 10  Internet Service                   7043 non-null   object 
 11  Online Security                    7043 non-null   objec

🟣 "Offer", "Churn Category" and "Churn Reason" have null records...

🟣 "Total Charges" column is object. should to be converted to float64...

In [5]:
data.drop(index=data[data["Total Charges"]==" "].index, inplace=True)
data["Total Charges"] = data["Total Charges"].astype("float64")
data.reset_index(inplace=True)
data.drop(columns=["index"], inplace=True)

In [6]:
data.describe()

Unnamed: 0,Latitude,Longitude,Tenure Months,Monthly Charges,Total Charges,CLTV,Number of Referrals,Avg Monthly Long Distance Charges,Avg Monthly GB Download,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Total Revenue,Satisfaction Score
count,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0,7032.0
mean,36.283307,-119.799215,32.421786,64.798208,2283.300441,4401.445108,1.949232,22.963471,20.531712,1.965252,6.871445,749.957096,3038.16373,3.243885
std,2.456118,2.157588,24.54526,30.085974,2266.771362,1182.414266,3.001324,15.449368,20.419561,7.908412,25.123141,847.025001,2865.830234,1.202019
min,32.555828,-124.301372,1.0,18.25,18.8,2003.0,0.0,0.0,0.0,0.0,0.0,0.0,21.36,1.0
25%,34.030915,-121.815412,9.0,35.5875,401.45,3469.75,0.0,9.21,3.0,0.0,0.0,70.5675,607.275,3.0
50%,36.391777,-119.73541,29.0,70.35,1397.475,4527.5,0.0,22.89,17.0,0.0,0.0,403.875,2111.3,3.0
75%,38.227285,-118.043237,55.0,89.8625,3794.7375,5381.0,3.0,36.4125,27.0,0.0,0.0,1192.4325,4808.7975,4.0
max,41.962127,-114.192901,72.0,118.75,8684.8,6500.0,11.0,49.99,85.0,49.79,150.0,3564.72,11979.34,5.0


🟣 some columns have "0.0" value for min index; this is uncertain...

In [7]:
for i in list(data.columns):
    if data[i].dtype == np.dtype('object'):
        print(data[i].value_counts())
        print("---------------------------------------")

Customer ID
3186-AJIEK    1
3668-QPYBK    1
9237-HQITU    1
9305-CDSKC    1
7892-POOKP    1
             ..
5380-WJKOV    1
6047-YHPVI    1
8773-HHUOZ    1
8665-UTDHZ    1
6467-CHFZW    1
Name: count, Length: 7032, dtype: int64
---------------------------------------
City
Los Angeles      304
San Diego        150
San Jose         112
Sacramento       108
San Francisco    104
                ... 
Manton             4
Ben Lomond         3
Cupertino          3
Independence       3
Redcrest           3
Name: count, Length: 1129, dtype: int64
---------------------------------------
Gender
Male      3549
Female    3483
Name: count, dtype: int64
---------------------------------------
Partner
No     3639
Yes    3393
Name: count, dtype: int64
---------------------------------------
Dependents
No     5412
Yes    1620
Name: count, dtype: int64
---------------------------------------
Phone Service
Yes    6352
No      680
Name: count, dtype: int64
---------------------------------------
Multiple L

# 🔵 Save the final raw data

In [8]:
with open(file="../data/telecom-customer-churn-v1.csv", mode="w", encoding="utf-8") as file_path:
    data.to_csv(path_or_buf=file_path, index=False, lineterminator="\n")