# Analysis Services Churning of Telco-customer

Project objective and aim
- To identify the key services
- To identify the churn rate of each services
- Create a visual representation of the result using Seaborn

## Information on Dataset

Using Telco Customer churn data obtain from Kaggle: https://www.kaggle.com/datasets/blastchar/telco-customer-churn?resource=download.
> The data was downloaded onto a local machine on the  27/01/2025

Features of the dataset 
- Customer ID: Unique code to each customer
- Gender: Whether the customer is a male or a female
- Senior Citizen: Whether the customer is a senior citizen or not (1, 0)
- Partner: Whether the customer has a partner or not (Yes, No)
- Dependents: Whether the customer has dependents or not (Yes, No)
- Tenure: Number of months the customer has stayed with the company
- PoneService: Whether the customer has a phone service or not (Yes, No)
- MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
- InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
- OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
- OnlineBackup: Whether the customer has online backup or not (Yes, No, No internet service)
- DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
- TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
- StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
- StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service)
- Contract: The contract term of the customer (Month-to-month, One year, Two year)
- PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
- PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card
- MonthlyCharges: The amount charged to the customer monthly
- TotalCharges: The total amount charged to the customer
- Churn: Whether the customer churned or not (Yes or No)

Features of interest:  
- PoneService: Whether the customer has a phone service or not (Yes, No)
- MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
- InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
- OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
- OnlineBackup: Whether the customer has online backup or not (Yes, No, No internet service)
- DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
- TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
- StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
- StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service)
- Churn: Whether the customer churned or not (Yes or No)

## Loading dataset

Import the required libraries.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Loading the data

In [3]:
df = pd.read_csv('Telco-Customer-Churn-Data.csv')

## Data Exploring the data set

Displaying the top 5 rows and columns of the dataset to get a feel of how the dataset look.

In [4]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [5]:
# Exploring the numbers of rows and columns of data in the dataset.
df.shape

(7043, 21)

### Getting more information on the data

Exploring the dataset and identify the data types of each column.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 



Note:
* The data set have 7043 observation and 21 variable.
* There is no missing data
* TotalCharges is an object datatype, from observation, this should be numerical data type (float or int) 

In [7]:
# changing the data type of TotalCharges from object to float and printing data types of all the columns to confirm change.
df["TotalCharges"] = pd.to_numeric(df["TotalCharges"], errors='coerce')
df["TotalCharges"] = df["TotalCharges"].astype("float")

# print the list of column and their data type to confirmed that totalcharges has been changed to a numerical datatype.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customerID        7043 non-null   object 
 1   gender            7043 non-null   object 
 2   SeniorCitizen     7043 non-null   int64  
 3   Partner           7043 non-null   object 
 4   Dependents        7043 non-null   object 
 5   tenure            7043 non-null   int64  
 6   PhoneService      7043 non-null   object 
 7   MultipleLines     7043 non-null   object 
 8   InternetService   7043 non-null   object 
 9   OnlineSecurity    7043 non-null   object 
 10  OnlineBackup      7043 non-null   object 
 11  DeviceProtection  7043 non-null   object 
 12  TechSupport       7043 non-null   object 
 13  StreamingTV       7043 non-null   object 
 14  StreamingMovies   7043 non-null   object 
 15  Contract          7043 non-null   object 
 16  PaperlessBilling  7043 non-null   object 


## Data Wrangling

### Searching duplicates Values

In [8]:
df.duplicated().value_counts()

False    7043
Name: count, dtype: int64

No duplicate find

### Searching Missing Values

In [9]:
missing_data = df.isnull().sum()
missing_data

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

Based on the above result, it can be observed that TotalCharges, have 11 missing data.
> This has no baring on feature of interesting, and in this instance it will be largely ignore. But explored to find the reason of the missing data.

#### Finding line with the missing values

In [10]:
null_mask = df.isnull().any(axis=1)
null_rows = df[null_mask]

null_rows

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
488,4472-LVYGI,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,Yes,Bank transfer (automatic),52.55,,No
753,3115-CZMZD,Male,0,No,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.25,,No
936,5709-LVOEQ,Female,0,Yes,Yes,0,Yes,No,DSL,Yes,...,Yes,No,Yes,Yes,Two year,No,Mailed check,80.85,,No
1082,4367-NUYAO,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.75,,No
1340,1371-DWPAZ,Female,0,Yes,Yes,0,No,No phone service,DSL,Yes,...,Yes,Yes,Yes,No,Two year,No,Credit card (automatic),56.05,,No
3331,7644-OMVMY,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,19.85,,No
3826,3213-VVOLG,Male,0,Yes,Yes,0,Yes,Yes,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,25.35,,No
4380,2520-SGTTA,Female,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,Two year,No,Mailed check,20.0,,No
5218,2923-ARZLG,Male,0,Yes,Yes,0,Yes,No,No,No internet service,...,No internet service,No internet service,No internet service,No internet service,One year,Yes,Mailed check,19.7,,No
6670,4075-WKNIU,Female,0,Yes,Yes,0,Yes,Yes,DSL,No,...,Yes,Yes,Yes,No,Two year,No,Mailed check,73.35,,No


From the observation of the table above, the conclusion is the missing data are for new customer at the beginning of their contract, as a result have zero tenure, and consequently have no total charge record.

## Analysis Services Aspect of the Dataset

### Selecting features of Interest

In [11]:
Services_data = df[['PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
                    'TechSupport','StreamingTV', 'StreamingMovies', 'Churn']]
Services_data.head()

Unnamed: 0,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Churn
0,No,No phone service,DSL,No,Yes,No,No,No,No,No
1,Yes,No,DSL,Yes,No,Yes,No,No,No,No
2,Yes,No,DSL,Yes,Yes,No,No,No,No,Yes
3,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,No
4,Yes,No,Fiber optic,No,No,No,No,No,No,Yes


The two core services are Phone and Internet service. Multiple lines is dependent on Phone service, customer without phone services cannot have multiple lines. Likewise, other services; Online Security, Online Backup, Device Protection, Tech Support, Streaming TV, Streaming Movies are dependent on Internet services. 

### Analysis Phone Service

In [12]:
category_mapping = {'Yes':'Phone line','No':'No phone service'}
Services_data.loc[:, 'PhoneService'] = Services_data['PhoneService'].map(category_mapping)

In [13]:
Services_data['PhoneService'].value_counts().reset_index()

Unnamed: 0,PhoneService,count
0,Phone line,6361
1,No phone service,682


In [14]:
Services_data['MultipleLines'].value_counts().reset_index()

Unnamed: 0,MultipleLines,count
0,No,3390
1,Yes,2971
2,No phone service,682


In [15]:
# Number of phone services provided
Number_of_phone_services = (Services_data['PhoneService'].value_counts().reset_index().loc[0, 'count'] + 
                            Services_data['MultipleLines'].value_counts().reset_index().loc[1, 'count'])

Number_of_phone_services

9332

In [16]:
Service_Phone = Services_data[['PhoneService','Churn']].value_counts().reset_index()
Service_Phone

Unnamed: 0,PhoneService,Churn,count
0,Phone line,No,4662
1,Phone line,Yes,1699
2,No phone service,No,512
3,No phone service,Yes,170


In [17]:
Service_phone = Services_data[['PhoneService','MultipleLines','Churn']].value_counts().reset_index().sort_values(
    by = ['PhoneService','MultipleLines','Churn'], ignore_index = True)
Service_phone

Unnamed: 0,PhoneService,MultipleLines,Churn,count
0,No phone service,No phone service,No,512
1,No phone service,No phone service,Yes,170
2,Phone line,No,No,2541
3,Phone line,No,Yes,849
4,Phone line,Yes,No,2121
5,Phone line,Yes,Yes,850


In [18]:
Multiple_data = Service_phone[Service_phone['MultipleLines']!='No phone service']
Multiple_data

Unnamed: 0,PhoneService,MultipleLines,Churn,count
2,Phone line,No,No,2541
3,Phone line,No,Yes,849
4,Phone line,Yes,No,2121
5,Phone line,Yes,Yes,850


In [86]:
"""
Finding the percentage of customer that have multiple phone line 
out of the customer that have a phone line
"""

round(((Multiple_data[Multiple_data['MultipleLines'] == 'Yes']['count'].sum())/
(Multiple_data['count'].sum()))*100, 2)

46.71

In [19]:
# Find the churn rate of Phone services
round(((Services_data[['PhoneService','Churn']].value_counts().reset_index().loc[1, 'count'])/
      (Services_data['PhoneService'].value_counts().reset_index().loc[0, 'count']))*100, 2)

26.71

In [20]:
# Find the churn rate of customer with multiple lines
round(((Service_phone.loc[5, 'count'])/
       (Services_data['MultipleLines'].value_counts().reset_index().loc[1,'count']))*100, 2)

28.61

In [21]:
# Find the churn rate of customer without multiple lines
round(((Service_phone.loc[3, 'count'])/
       (Services_data['MultipleLines'].value_counts().reset_index().loc[0,'count']))*100, 2)

25.04

In [22]:
# Find the churn rate of customer without phone services
round(((Service_phone.loc[1, 'count'])/
       (Services_data['PhoneService'].value_counts().reset_index().loc[1, 'count']))*100, 2)

24.93

### Analysis On Internet Service

In [23]:
Services_data['InternetService'].value_counts().reset_index()

Unnamed: 0,InternetService,count
0,Fiber optic,3096
1,DSL,2421
2,No,1526


In [24]:
Service_internet = (Services_data[['InternetService','Churn']].value_counts().reset_index()
                    .sort_values(by=['InternetService'], ignore_index = True))
Service_internet

Unnamed: 0,InternetService,Churn,count
0,DSL,No,1962
1,DSL,Yes,459
2,Fiber optic,No,1799
3,Fiber optic,Yes,1297
4,No,No,1413
5,No,Yes,113


In [25]:
internet_type = Service_internet[Service_internet['InternetService'] != 'No']
internet_type

Unnamed: 0,InternetService,Churn,count
0,DSL,No,1962
1,DSL,Yes,459
2,Fiber optic,No,1799
3,Fiber optic,Yes,1297


In [26]:
internet_data = {'InternetService': ['Has Internet Service','Has Internet Service', 'No Internet Services', 'No Internet Services'],
                    'Churn': ['No', 'Yes', 'No', 'Yes'],
                    'count': [3761, 1756, 1413, 113]
                   }
Services_Internet = pd.DataFrame(internet_data)

Services_Internet

Unnamed: 0,InternetService,Churn,count
0,Has Internet Service,No,3761
1,Has Internet Service,Yes,1756
2,No Internet Services,No,1413
3,No Internet Services,Yes,113


In [27]:
# Find the churn rate of Internet services
round(((Service_internet.loc[1,'count']+Service_internet.loc[3,'count'])/
     (Services_data['InternetService'].value_counts().reset_index().loc[0,'count']+Services_data['InternetService'].value_counts().reset_index().loc[1,'count']))*100
     , 2)

31.83

In [28]:
# Creating/calculating the churn rate of the internet services
churn_table = {'Internet Serives' : [], 'Churn_rate' : []}

# Iterate over rows using .iloc[] (index-based)

for i in range(int((len(Service_internet)/2))):
    churn_table['Internet Serives'].append(Service_internet.loc[(i*2), 'InternetService'])
    churn_table['Churn_rate'].append(round(((Service_internet.loc[((i*2)+1),'count'])/
                                            (Service_internet.loc[(i*2),'count'] + Service_internet.loc[((i*2)+1),'count']))*100, 2))
    

In [29]:
pd.DataFrame.from_dict(churn_table).sort_values(by=['Churn_rate'], ignore_index = True)

Unnamed: 0,Internet Serives,Churn_rate
0,No,7.4
1,DSL,18.96
2,Fiber optic,41.89


#### Analysis service that dependents on Internet Service

In [30]:
NoInternet_data= Services_data[Services_data['DeviceProtection'] == 'No internet service'].reset_index(drop = True)
NoInternet_data = NoInternet_data[['InternetService','OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV',
                                 'StreamingMovies','Churn']]

NoInternet_data.shape

(1526, 8)

In [31]:
Internet_Dependent_data= Services_data[Services_data['OnlineSecurity'] != 'No internet service'].reset_index(drop = True)
Internet_Dependent_data = Internet_Dependent_data[['InternetService','OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV',
                                 'StreamingMovies','Churn']]

Internet_Dependent_data.shape

(5517, 8)

The sum of the rows above 1526 and 5517, is total observed data of 7043. To avoid repeated analysis of no internet service whilst analyzing the internet dependent services, the 'No internet service' observation has been filter to create a dataframe - Internet_Dependent_data.

In [32]:
Internet_Dependent_data.describe(include = 'object')

Unnamed: 0,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Churn
count,5517,5517,5517,5517,5517,5517,5517,5517
unique,2,2,2,2,2,2,2,2
top,Fiber optic,No,No,No,No,No,No,No
freq,3096,3498,3088,3095,3473,2810,2785,3761


##### Analysis online security 

In [33]:
Internet_Dependent_data['OnlineSecurity'].value_counts().reset_index()

Unnamed: 0,OnlineSecurity,count
0,No,3498
1,Yes,2019


In [85]:
"""
Calculate the percentage of customer that have online security out of customer that have internet service.
"""

round(((Internet_Dependent_data[Internet_Dependent_data['OnlineSecurity'] == 'Yes'].shape[0])/
(Internet_Dependent_data.shape[0]))*100, 2)

36.6

In [34]:
OnlineSec_data = Internet_Dependent_data[['OnlineSecurity','Churn']].value_counts().reset_index().sort_values(
    by=['OnlineSecurity'], ignore_index = True
)
OnlineSec_data

Unnamed: 0,OnlineSecurity,Churn,count
0,No,No,2037
1,No,Yes,1461
2,Yes,No,1724
3,Yes,Yes,295


In [35]:
# Calculating churn rate for customer who have online security
round(((OnlineSec_data.loc[3,'count'])/(OnlineSec_data.loc[2,'count']+OnlineSec_data.loc[3,'count']))*100, 2)

14.61

In [36]:
# Calculating churn rate for customer who does not online security, but internet services
round(((OnlineSec_data.loc[1,'count'])/(OnlineSec_data.loc[0,'count']+OnlineSec_data.loc[1,'count']))*100, 2)

41.77

##### Analysis online Backup 

In [37]:
Internet_Dependent_data['OnlineBackup'].value_counts().reset_index()

Unnamed: 0,OnlineBackup,count
0,No,3088
1,Yes,2429


In [82]:
"""
Calculate the percentage of customer that have online backup out of customer that have internet service.
"""

round(((Internet_Dependent_data[Internet_Dependent_data['OnlineBackup'] == 'Yes'].shape[0])/
(Internet_Dependent_data.shape[0]))*100,2)

44.03

In [38]:
OnlineBac_data = Internet_Dependent_data[['OnlineBackup','Churn']].value_counts().reset_index().sort_values(
    by=['OnlineBackup'], ignore_index = True
)
OnlineBac_data

Unnamed: 0,OnlineBackup,Churn,count
0,No,No,1855
1,No,Yes,1233
2,Yes,No,1906
3,Yes,Yes,523


In [39]:
# Calculating churn rate for customer who have online backup
round(((OnlineBac_data.loc[3,'count'])/(OnlineBac_data.loc[2,'count']+OnlineBac_data.loc[3,'count']))*100, 2)

21.53

In [40]:
# Calculating churn rate for customer who does not have online backup, but have internet services
round(((OnlineBac_data.loc[1,'count'])/(OnlineBac_data.loc[0,'count']+OnlineBac_data.loc[1,'count']))*100, 2)

39.93

#### Analysis Device Protection Service

In [41]:
Internet_Dependent_data['DeviceProtection'].value_counts().reset_index()

Unnamed: 0,DeviceProtection,count
0,No,3095
1,Yes,2422


In [81]:
"""
Calculate the percentage of customer that have Device Protection out of customer that have internet service.
"""

round(((Internet_Dependent_data[Internet_Dependent_data['DeviceProtection'] == 'Yes'].shape[0])/
(Internet_Dependent_data.shape[0]))*100, 2)

43.9

In [42]:
Device_data = Internet_Dependent_data[['DeviceProtection','Churn']].value_counts().reset_index().sort_values(
    by=['DeviceProtection'], ignore_index = True
)
Device_data

Unnamed: 0,DeviceProtection,Churn,count
0,No,No,1884
1,No,Yes,1211
2,Yes,No,1877
3,Yes,Yes,545


In [43]:
# Calculating churn rate for customer who have Device Protection
round(((Device_data.loc[3,'count'])/(Device_data.loc[2,'count']+Device_data.loc[3,'count']))*100, 2)

22.5

In [44]:
# Calculating churn rate for customer who does not have Device Protection, but have internet services
round(((Device_data.loc[1,'count'])/(Device_data.loc[0,'count']+Device_data.loc[1,'count']))*100, 2)

39.13

#### Analysis Tech Support Service

In [45]:
Internet_Dependent_data['TechSupport'].value_counts().reset_index()

Unnamed: 0,TechSupport,count
0,No,3473
1,Yes,2044


In [87]:
"""
Calculate the percentage of customer that have Tech Support out of customer that have internet service.
"""

round(((Internet_Dependent_data[Internet_Dependent_data['TechSupport'] == 'Yes'].shape[0])/
(Internet_Dependent_data.shape[0]))*100, 2)

37.05

In [46]:
Tech_data = Internet_Dependent_data[['TechSupport','Churn']].value_counts().reset_index().sort_values(
    by=['TechSupport'], ignore_index = True
)
Tech_data

Unnamed: 0,TechSupport,Churn,count
0,No,No,2027
1,No,Yes,1446
2,Yes,No,1734
3,Yes,Yes,310


In [47]:
# Calculating churn rate for customer who have Tech Support
round(((Tech_data.loc[3,'count'])/(Tech_data.loc[2,'count']+Tech_data.loc[3,'count']))*100, 2)

15.17

In [48]:
# Calculating churn rate for customer who does not have Tech Support, but have internet services
round(((Tech_data.loc[1,'count'])/(Tech_data.loc[0,'count']+Tech_data.loc[1,'count']))*100, 2)

41.64

#### Analysis Streaming TV Service

In [49]:
Internet_Dependent_data['StreamingTV'].value_counts().reset_index()

Unnamed: 0,StreamingTV,count
0,No,2810
1,Yes,2707


In [88]:
"""
Calculate the percentage of customer that have Streaming TV out of customer that have internet service.
"""

round(((Internet_Dependent_data[Internet_Dependent_data['StreamingTV'] == 'Yes'].shape[0])/
(Internet_Dependent_data.shape[0]))*100, 2)

49.07

In [50]:
StreamTv_data = Internet_Dependent_data[['StreamingTV','Churn']].value_counts().reset_index().sort_values(
    by=['StreamingTV'], ignore_index = True
)
StreamTv_data

Unnamed: 0,StreamingTV,Churn,count
0,No,No,1868
1,No,Yes,942
2,Yes,No,1893
3,Yes,Yes,814


In [51]:
# Calculating churn rate for customer who have Steaming TV services
round(((StreamTv_data.loc[3,'count'])/(StreamTv_data.loc[2,'count']+StreamTv_data.loc[3,'count']))*100, 2)

30.07

In [52]:
# Calculating churn rate for customer who does not have Steaming TV services, but have internet services
round(((StreamTv_data.loc[1,'count'])/(StreamTv_data.loc[0,'count']+StreamTv_data.loc[1,'count']))*100, 2)

33.52

#### Analysis Streaming Movie Service

In [53]:
Internet_Dependent_data['StreamingMovies'].value_counts().reset_index()

Unnamed: 0,StreamingMovies,count
0,No,2785
1,Yes,2732


In [90]:
"""
Calculate the percentage of customer that have Streaming Movies out of customer that have internet service.
"""

round(((Internet_Dependent_data[Internet_Dependent_data['StreamingMovies'] == 'Yes'].shape[0])/
(Internet_Dependent_data.shape[0]))*100, 2)

49.52

In [54]:
StreamMovie_data = Internet_Dependent_data[['StreamingMovies','Churn']].value_counts().reset_index().sort_values(
    by=['StreamingMovies'], ignore_index = True
)
StreamMovie_data

Unnamed: 0,StreamingMovies,Churn,count
0,No,No,1847
1,No,Yes,938
2,Yes,No,1914
3,Yes,Yes,818


In [55]:
# Calculating churn rate for customer who have Steaming Movies services
round(((StreamMovie_data.loc[3,'count'])/(StreamMovie_data.loc[2,'count']+StreamMovie_data.loc[3,'count']))*100, 2)

29.94

In [56]:
# Calculating churn rate for customer who does not have Steaming Movies services, but internet services
round(((StreamMovie_data.loc[1,'count'])/(StreamMovie_data.loc[0,'count']+StreamMovie_data.loc[1,'count']))*100, 2)

33.68

### Create a table that shows Services used and churn rate for each services

In [111]:
Services = ['PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 
           'DeviceProtection','TechSupport','StreamingTV', 'StreamingMovies']

Service_summary = {
    'Service' : [],
    'Customer Count Service User' : [],
    'User Percentage (%)' : [],
    'Service User Churn rate (%)' : [],
    'Churned costomer count' : [],
    'Customer Count Non Service User' : [],
    'Non_Service User Churn rate (%)' : []
}


for service in Services:
    if service != 'InternetService':
        Service_summary['Service'].append(service)
        
        dummy_user = df[df[service]=='Yes']
        yes_count = dummy_user.shape[0]
        churn_count = dummy_user[dummy_user['Churn'] == 'Yes'].shape[0]
        
        Service_summary['Customer Count Service User'].append(yes_count)
        Service_summary['User Percentage (%)'].append(round((yes_count/df.shape[0])*100, 2))
        if churn_count != 0:
            Service_summary['Service User Churn rate (%)'].append(round((churn_count/yes_count)*100, 2))
        else:
            Service_summary['Service User Churn rate (%)'].append('N/A')

        Service_summary['Churned costomer count'].append(churn_count)
        dummy_non_user = df[df[service]=='No']
        non_user_count = dummy_non_user.shape[0]
        non_churn_count = dummy_non_user[dummy_non_user['Churn'] == 'Yes'].shape[0]
                
        Service_summary['Customer Count Non Service User'].append(non_user_count)
        if non_churn_count != 0:
            Service_summary['Non_Service User Churn rate (%)'].append(round((non_churn_count/non_user_count)*100, 2))
        else:
            Service_summary['Non_Service User Churn rate (%)'].append('N/A')

    else:
        Service_summary['Service'].append(service)
        
        dummy_user = df[(df[service]=='Fiber optic')|(df[service]=='DSL')]
        yes_count = dummy_user.shape[0]
        churn_count = dummy_user[dummy_user['Churn'] == 'Yes'].shape[0]
        
        Service_summary['Customer Count Service User'].append(yes_count)
        Service_summary['User Percentage (%)'].append(round((yes_count/df.shape[0])*100, 2))
        if churn_count != 0:
            Service_summary['Service User Churn rate (%)'].append(round((churn_count/yes_count)*100, 2))
        else:
            Service_summary['Service User Churn rate (%)'].append('N/A')

        Service_summary['Churned costomer count'].append(churn_count)
        dummy_non_user = df[df[service]=='No']
        non_user_count = dummy_non_user.shape[0]
        non_churn_count = dummy_non_user[dummy_non_user['Churn'] == 'Yes'].shape[0]
                
        Service_summary['Customer Count Non Service User'].append(non_user_count)
        if non_churn_count != 0:
            Service_summary['Non_Service User Churn rate (%)'].append(round((non_churn_count/non_user_count)*100, 2))
        else:
            Service_summary['Non_Service User Churn rate (%)'].append('N/A')

Service_summary_df = pd.DataFrame(Service_summary)
Service_summary_df.to_csv('Telco-Services-Analysis-Summary.csv', index = False)
Service_summary_df_sorted = Service_summary_df.sort_values(by=['Non_Service User Churn rate (%)'], ignore_index = True, ascending = False)
Service_summary_df_sorted

Unnamed: 0,Service,Customer Count Service User,User Percentage (%),Service User Churn rate (%),Churned costomer count,Customer Count Non Service User,Non_Service User Churn rate (%)
0,OnlineSecurity,2019,28.67,14.61,295,3498,41.77
1,TechSupport,2044,29.02,15.17,310,3473,41.64
2,OnlineBackup,2429,34.49,21.53,523,3088,39.93
3,DeviceProtection,2422,34.39,22.5,545,3095,39.13
4,StreamingMovies,2732,38.79,29.94,818,2785,33.68
5,StreamingTV,2707,38.44,30.07,814,2810,33.52
6,MultipleLines,2971,42.18,28.61,850,3390,25.04
7,PhoneService,6361,90.32,26.71,1699,682,24.93
8,InternetService,5517,78.33,31.83,1756,1526,7.4


Note: 
- Multiple Line churn rate for both users and non user, does not take into account customer without phone line.
- For services that are internet dependents, the churn rate of user and non-user does not take into account customers that do not have internet

### Churned - services review

In [94]:
Services = ['PhoneService', 'InternetService']

Service_summary = {
    'Service' : [],
    'Churned Customer Count' : [],
    'Sole customer' : [],
    #'Percentage (%)' : [],
    'Churn rate (%)' : [],
    #'Customer Count Non Service User' : [],
    #'Non_Service User Churn rate (%)' : []
}

churned_data = df[df['Churn']=='Yes']

for service in Services:
    if service == 'PhoneService':
        Service_summary['Service'].append(service)
        dummy_user = churned_data[churned_data[service]!='No']
        sole_user = dummy_user[dummy_user['InternetService']=='No']
        yes_count = dummy_user.shape[0] 
        sole_count = sole_user.shape[0]
        Service_summary['Churned Customer Count'].append(yes_count)
        Service_summary['Churn rate (%)'].append(round((yes_count/df[df[service]!='No'].shape[0])*100, 2))
        Service_summary['Sole customer'].append(sole_count)
        
    else:
        Service_summary['Service'].append(service)
        dummy_user = churned_data[churned_data[service]!='No']
        sole_user = dummy_user[dummy_user['PhoneService']=='No']
        yes_count = dummy_user.shape[0] 
        sole_count = sole_user.shape[0]
        Service_summary['Churned Customer Count'].append(yes_count)
        Service_summary['Churn rate (%)'].append(round((yes_count/df[df[service]!='No'].shape[0])*100, 2))
        Service_summary['Sole customer'].append(sole_count)

Service_summary_df = pd.DataFrame(Service_summary)
Service_summary_df.to_csv('Telco-Churned-Services-Analysis-Summary.csv', index = False)
Service_summary_df

Unnamed: 0,Service,Churned Customer Count,Sole customer,Churn rate (%)
0,PhoneService,1699,113,26.71
1,InternetService,1756,170,31.83


## Exploring data Visually.

### Phone Services

In [None]:
ax = sns.barplot(Service_Phone, x='PhoneService', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Phone Service by Churn')
plt.xlabel('Phone Service')
plt.ylabel('Count')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

#### Multiple Line Phone Users

In [None]:
ax = sns.barplot(Multiple_data, x='MultipleLines', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count Customer with Phone Service and Multiple by Churn')
plt.xlabel('Multiple Line Status')
plt.ylabel('Count')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

### Internet Services

In [None]:
ax = sns.barplot(Services_Internet, x='InternetService', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Internet Service by Churn')
plt.xlabel('Internet Services')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

#### Type of Internet

In [None]:
ax = sns.barplot(internet_type, x='InternetService', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Internet Type by Churn')
plt.xlabel('Internet Services Type')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

### Online Security Service

In [None]:
ax = sns.barplot(OnlineSec_data, x='OnlineSecurity', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Customer that Have internet and Online Security by Churn')
plt.xlabel('Online Security Status')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

### Online Backup

In [None]:
ax = sns.barplot(OnlineBac_data, x='OnlineBackup', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Customer that have internet and Online Backup by Churn')
plt.xlabel('Online Backup Status')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

### Device Protection

In [None]:
ax = sns.barplot(Device_data, x='DeviceProtection', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Customer that have internet and Device Protection by Churn')
plt.xlabel('Device Protection Status')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

### Tech Support

In [None]:
ax = sns.barplot(Tech_data, x='TechSupport', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Customer that have internet and Tech Support by Churn')
plt.xlabel('Tech Support Status')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

### Streaming TV

In [None]:
ax = sns.barplot(StreamTv_data, x='StreamingTV', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Customer that have internet and Stream TV by Churn')
plt.xlabel('Streaming TV Status')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

### Streaming Movie

In [None]:
ax = sns.barplot(StreamMovie_data, x='StreamingMovies', y='count', hue='Churn', errorbar=None)

# Iterate through the bars and add labels
for p in ax.patches:
    height = p.get_height()
    width = p.get_width()
    x = p.get_x() + width / 2  # Center of the bar
    y = height  # Top of the bar
    label = format(int(height), ',')
    ax.annotate(label, (x, y), ha='center', va='bottom',
                xytext=(0, 3), textcoords='offset points', color='black')

plt.title('Count of Customer that have internet and Stream Movies by Churn')
plt.xlabel('Streaming Movies Status')
plt.ylabel('Number of Customer')
#plt.xticks(rotation=45, ha='right')  # Rotate for better readability
plt.tight_layout() # Adjust layout to prevent labels from overlapping
plt.show()

## Findings

### Service Popularity

The most widely used service is **Phone Line**, with **90%** of customers subscribed. **Internet Service** follows closely behind at **78%**.

Certain services depend on others:
- **Multiple Line** requires **Phone Line**.
- All other services rely on having **Internet Service**.

Among the **dependent services**, **Multiple Line** is the most popular, with **42%** of customers using it. Streaming services also have significant adoption:
- **Streaming Movies** (**38.79%**)
- **Streaming TV** (**38.44%**)

On the **lower end**, **Online Security** (**28.67%**) and **Tech Support** (**29.02%**) are less commonly used.

### Churn Rate Insights

#### Services with the Highest Customer Attrition Rates:
- **Internet Service** (**31.83%**)
- **Streaming TV** (**30.07%**)
- **Streaming Movies** (**29.94%**)

#### Services with Lower Churn Rates:
- **Online Security** (**14.61%**)
- **Tech Support** (**15.17%**)
- **Online Backup** (**21.53%**)

#### Total Customer Loss:
- **Internet Service** saw the highest churn, with **1,756** customers leaving.
- **Phone Line** followed closely, with **1,699** customers lost.

### Churn Rate Among Non-Service Users

Interestingly, **non-users** of certain services churn at **much higher rates** than those who subscribe:
- **Online Security** (**41.77%**)
- **Tech Support** (**41.64%**)
- **Online Backup** (**39.93%**)

This suggests that customers who **don’t use these services** are more likely to leave than those who do, potentially indicating the **value these services provide** when subscribed.

For **Internet Service**, the churn rate among **non-users** is **7.40%**, which seems low. However, since **most other services depend on having Internet**, **not subscribing is not recommended**.

Lastly, the **churn rate difference** between **Phone Line users and non-users** is **relatively small**, at **1.78 percentage points**, translating to a **6.89% difference**.

### Conclusion

#### Key Findings
1. **Internet Service Plays a Critical Role in Customer Retention**  
   - **Internet Service has the highest churn rate (31.83%)** and the highest total customer loss (**1,756 customers**).  
   - Since many other services depend on Internet, improving its reliability and customer satisfaction could help **reduce overall churn**.  

2. **Streaming Services Are at High Risk of Churn**  
   - **Streaming TV (30.07%) and Streaming Movies (29.94%)** have some of the highest churn rates.  
   - Customers may not perceive sufficient value in these services, possibly due to pricing, content availability, or competition.  

3. **Non-Service Users Have Higher Churn Rates**  
   - Non-users of **Online Security (41.77%)**, **Tech Support (41.64%)**, and **Online Backup (39.93%)** churn at **higher rates** than subscribers.  
   - This suggests these services may **help retain customers**, making adoption strategies beneficial.  

4. **Phone Line Has a Small but Notable Impact on Churn**  
   - The **churn rate difference** between **Phone Line users and non-users** is **only 1.78 percentage points**, translating to a **6.89% difference**.  
   - While its churn rate is lower than Internet Service, it still contributes **significantly to overall customer loss (1,699 customers lost)**.  

#### Implications & Recommendations  
- **Prioritize Improvements in Internet Service**: Reducing churn in Internet Service will likely have a **cascading effect** on retaining customers in dependent services.  
- **Enhance the Value of Streaming Services**: Addressing **customer concerns** regarding streaming services could prevent further churn.  
- **Leverage Retention Benefits of Security & Support Services**: Encouraging customers to adopt **Online Security and Tech Support** may **increase retention** and stabilize customer engagement.  