#### Question: Does customer churn depend on the customer contract segment ?

##### Expectations:
The result of the analysis would provide insights to the telecommunications company on which contract segments may be more vulnerable to churn and help in developing targeted retention strategies for specific customer segments.

If the analysis shows that the contract segment is a significant predictor of churn, then the company can develop targeted retention strategies for specific customer segments.

##### Information about the data:
The data is stored in an Excel file named `Telco_customer_churn.xlsx`. The file contains 7043 rows. Each row represents a customer, each column contains customer’s attributes described on the column Metadata. The features we are interested in are:
- `Contract`: Type of contract (Month-to-month, One year, Two year)
- `Churn Value`: Whether the customer churned or not (1 for yes and 0 for no)

#### EDA

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
import seaborn as sns
from scipy.stats import t

In [2]:
# read the dataset
dataset = pd.read_excel('../Dataset/Telco_customer_churn.xlsx')

In [3]:
dataset.columns

Index(['CustomerID', 'Count', 'Country', 'State', 'City', 'Zip Code',
       'Lat Long', 'Latitude', 'Longitude', 'Gender', 'Senior Citizen',
       'Partner', 'Dependents', 'Tenure Months', 'Phone Service',
       'Multiple Lines', 'Internet Service', 'Online Security',
       'Online Backup', 'Device Protection', 'Tech Support', 'Streaming TV',
       'Streaming Movies', 'Contract', 'Paperless Billing', 'Payment Method',
       'Monthly Charges', 'Total Charges', 'Churn Label', 'Churn Value',
       'Churn Score', 'CLTV', 'Churn Reason'],
      dtype='object')

In [4]:
# taking only the required columns which are Contract and Churn Value
dataset = dataset[['Contract','Churn Value']]

In [5]:
# checking for null values
dataset.isnull().sum()

Contract       0
Churn Value    0
dtype: int64

In [6]:
dataset.dtypes

Contract       object
Churn Value     int64
dtype: object

In [8]:
# converting the categorical values to numerical values
data = dataset.copy()
data['Contract'] = data['Contract'].map({'Month-to-month':0,'One year':1,'Two year':2})

In [9]:
# check the correlation between the features
corr = data.corr()
corr.style.background_gradient(cmap='coolwarm')

Unnamed: 0,Contract,Churn Value
Contract,1.0,-0.396713
Churn Value,-0.396713,1.0


In [11]:
# Correlation with Spearman's Rank Correlation:
corr = data.corr(method='spearman')
corr.style.background_gradient(cmap='coolwarm')

Unnamed: 0,Contract,Churn Value
Contract,1.0,-0.406262
Churn Value,-0.406262,1.0


In [12]:
# Correlation with Kendall's Rank Correlation:
corr = data.corr(method='kendall')
corr.style.background_gradient(cmap='coolwarm')

Unnamed: 0,Contract,Churn Value
Contract,1.0,-0.386912
Churn Value,-0.386912,1.0


In [10]:
# check the covariance between the features
cov = data.cov()
cov.style.background_gradient(cmap='coolwarm')

Unnamed: 0,Contract,Churn Value
Contract,0.695148,-0.146051
Churn Value,-0.146051,0.194976


In [10]:
# separate each category of contract type with its churn value
month_to_month = dataset[dataset['Contract'] == 'Month-to-month']
one_year = dataset[dataset['Contract'] == 'One year']
two_year = dataset[dataset['Contract'] == 'Two year']

##### Initial observations:
It is clear that `Contract` variable has negative correlation with `Churn Value` variable.
So this is a good indicator that contract type/segement is a good predictor of churn. In other words, customers churn depends on the contract type, but we need to do a statistical test to confirm this.

### Model building

##### Defining Hypothesises:
- Null Hypothesis: Customer segemnts are independent of churn
- Alternate Hypothesis: Customer segemnts are dependent of churn

##### Statistical test:
- we will use `Chi-square test of independence` to test the hypothesis.
- We will use `0.05` as the significance level.

In [14]:
# apply chi-square test to check if there is any relationship between contract type and churn value
from scipy.stats import chi2_contingency
chi2, p, dof, expected = chi2_contingency(pd.crosstab(dataset['Contract'], dataset['Churn Value']))

print('Chi-square statistic %0.3f p_value %0.3f' % (chi2, p))
print('Degrees of freedom %d' % dof)
print('Expected values ', expected)


Chi-square statistic 1184.597 p_value 0.000
Degrees of freedom 2
Expected values  [[2846.69175067 1028.30824933]
 [1082.11018032  390.88981968]
 [1245.198069    449.801931  ]]


In [15]:
# get the critical value for 0.05 significance level and 2 degrees of freedom
critical_value = t.ppf(1-0.05, dof)
print('Critical value ', critical_value)

Critical value  2.919985580355516


In [16]:
# check if the chi-square statistic is greater than the critical value
if chi2 > critical_value:
    print('Reject the null hypothesis')
else:
    print('Accept the null hypothesis')

Reject the null hypothesis


### we managed to reject the null hypothesis and conclude that customer churn is dependent of customer segements.