In [1]:
import pandas as pd
df = pd.read_csv("churn.csv")
df.head(2)

Unnamed: 0,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,Churn
0,DSL,No,Yes,No,No,No
1,DSL,Yes,No,Yes,No,No


## Business Context

Customer churn is one of the user behaviors that need to be tracked. A high churn rate is an indication if something needs to be improved on the product. It can be related to many things, for example, an error in the customer journey, product pricing, or customer preferences. Maintaining customer costs less than gaining the new one is another reason why tracking customer churn becomes beneficial. It can also reveal consumer acceptance of our products.

The data here is from a telecom service provider. A small subset of data is presented here, there are certain business questions that need to be answered. In this demo we will show how some of the business questions can be answered using hypothesis tests

In [2]:
df['InternetService'].unique()

array(['DSL', 'Fiber optic', 'No'], dtype=object)

In [3]:
df['OnlineSecurity'].unique()

array(['No', 'Yes', 'No internet service'], dtype=object)

In [4]:
df['OnlineBackup'].unique()

array(['Yes', 'No', 'No internet service'], dtype=object)

In [5]:
df['DeviceProtection'].unique()

array(['No', 'Yes', 'No internet service'], dtype=object)

In [6]:
df['TechSupport'].unique()

array(['No', 'Yes', 'No internet service'], dtype=object)

**Business Problem**

We need to find out if there is any relationship between churn and the number of features customers buy in a plan



In [7]:
df.head(2)
## Customer 1 has bought 1 feature apart from internet connection,i.e. OnlineBackup

Unnamed: 0,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,Churn
0,DSL,No,Yes,No,No,No
1,DSL,Yes,No,Yes,No,No


In [9]:
def normalize_internet(x):
    if x=='No':
        result = 0
    else:
        result = 1
    return result
def norm(x):
    if x=='Yes':
        result = 1
    else:
        result = 0
    return result

In [11]:
df['InternetService']=df['InternetService'].map(normalize_internet)

In [13]:
df['OnlineSecurity']=df['OnlineSecurity'].map(norm)
df['OnlineBackup'] = df['OnlineBackup'].map(norm)
df['DeviceProtection'] = df['DeviceProtection'].map(norm)
df['TechSupport'] = df['TechSupport'].map(norm)

In [14]:
df['NumFeatures'] = df['OnlineSecurity']+df['OnlineBackup']+df['DeviceProtection']+df['TechSupport']

In [15]:
df.head(2)

Unnamed: 0,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,Churn,NumFeatures
0,1,0,1,0,0,No,1
1,1,1,0,1,0,No,2


In [16]:
df[df['NumFeatures']==0] ### Customers who only bought the internet connection

Unnamed: 0,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,Churn,NumFeatures
4,1,0,0,0,0,Yes,0
11,0,0,0,0,0,No,0
16,0,0,0,0,0,No,0
21,0,0,0,0,0,No,0
22,0,0,0,0,0,Yes,0
...,...,...,...,...,...,...,...
7032,1,0,0,0,0,Yes,0
7033,1,0,0,0,0,No,0
7035,1,0,0,0,0,No,0
7037,0,0,0,0,0,No,0


In [17]:
multi = df[df['NumFeatures']!=0]

In [19]:
res = pd.crosstab(multi['NumFeatures'],multi['Churn'])

In [20]:
from scipy.stats import chi2_contingency

In [21]:
stat, p, dof, expected = chi2_contingency(res)

In [22]:
p

1.1403241985467037e-71

```
H0: There is no relationship between number of plans and churn
Ha: There is a relationship between number of plans and churn
```

```

Conlusion
Since p-value is small, we reject the H0, concluding that there is a relationship between number of plans and churn
```