### Telco Churn Data Set

About Dataset

A telecommunications company has collected data on their customers. They want you to analyze this data and provide answers to the questions below.

Note: Churn is the measure of how many customers stop using a product. This can be 
measured based on actual usage or failure to renew a subscription. Companies with high 
churn rates lose a large number of subscribers, resulting in little growth, which 
significantly impacts revenues and profits. 

### Questions
1. Engineer a column that calculates a total charge of a customer. Total   charge referring to their day, evening , and international call charges?

2. Which customer receive the highest number of voicemail?
    • Which state do they live in?

3. How many customers have an international plan with an international charge greater than $3.7

4. How many customers churned?

5. Compare the minutes of calls made in the day versus the night and present the time of day with the most calls measured in hours

6. How many customers in Arizona (AZ) state churned?

7. How many customers in New Jersey had  voice mail and international plan?

8. How much did the company make from customers those who churned before they churned.

9. Which customer spent the most on evening calls?
     • How many hours did they spend?
     • Which state do they live in?
10. How many customers in Tennessee (TN) never called customer service for assistance?


In [1]:
import pandas as pd
import numpy as np

#upload dataset

df = pd.read_csv("data/telco_churn.csv")

In [2]:
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,99.0,16.78,244.7,91.0,11.01,10.0,3,2.7,1.0,False
1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,103.0,16.62,254.4,103.0,11.45,13.7,3,3.7,1.0,False
2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,110.0,10.3,162.6,104.0,7.32,12.2,5,3.29,0.0,False
3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,88.0,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False
4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,122.0,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   3333 non-null   object 
 1   Account length          3333 non-null   int64  
 2   Area code               3333 non-null   int64  
 3   International plan      3333 non-null   object 
 4   Voice mail plan         3333 non-null   object 
 5   Number vmail messages   3333 non-null   int64  
 6   Total day minutes       3323 non-null   float64
 7   Total day calls         3323 non-null   float64
 8   Total day charge        3315 non-null   float64
 9   Total eve minutes       3324 non-null   float64
 10  Total eve calls         3325 non-null   float64
 11  Total eve charge        3333 non-null   float64
 12  Total night minutes     3333 non-null   float64
 13  Total night calls       3332 non-null   float64
 14  Total night charge      3333 non-null   

In [8]:
df.isna().sum()

State                      0
Account length             0
Area code                  0
International plan         0
Voice mail plan            0
Number vmail messages      0
Total day minutes         10
Total day calls           10
Total day charge          18
Total eve minutes          9
Total eve calls            8
Total eve charge           0
Total night minutes        0
Total night calls          1
Total night charge         0
Total intl minutes         0
Total intl calls           0
Total intl charge          5
Customer service calls     5
Churn                      8
dtype: int64

In [7]:
#percentage of null rows
(df.isna().sum().sum() / df.shape[0]) * 100

2.22022202220222

In [19]:
#drop null values
df.dropna(inplace=True)

In [20]:
#Question 1

df["total_charge"] = df["Total day charge"] + df["Total eve charge"] + df["Total night charge"] + df["Total intl charge"]
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,...,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn,total_charge
0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,...,16.78,244.7,91.0,11.01,10.0,3,2.7,1.0,False,75.56
1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,...,16.62,254.4,103.0,11.45,13.7,3,3.7,1.0,False,59.24
2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,...,10.3,162.6,104.0,7.32,12.2,5,3.29,0.0,False,62.29
3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,...,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False,66.8
4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,...,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False,52.09


In [21]:
df.shape

(3307, 21)

In [31]:
#Question 2
max_voicemail =  df[df['Number vmail messages'] == df['Number vmail messages'].max()]
voicemail = max_voicemail["Number vmail messages"].values[0]
state = max_voicemail["State"].values[0]

print(f"The highest voicemail received is: {voicemail} and the customer state is: {state}")

The highest voicemail received is: 51 and the customer state is: FL


In [56]:
#Question 3

intl_plans_max37 = df[(df["International plan"] == "Yes") & (df["Total intl charge"] > 3.7)]
print(f"The number of customers with inteernational plans and charge more than $3.7 are {intl_plans_max37["State"].count()}")


The number of customers with inteernational plans and charge more than $3.7 are 38


In [65]:
#Question 4
#df["Churn"].unique()
cust_churned = df[df["Churn"] == True]
cust_churned["State"].count()

480