# Why Are Our Customers Churning?

### Quick Reference
I. Find Baseline Probability  
II. Create a Baseline Model  
III. Create/Test Models and Identify Ω Model  

## I. Project Plan

### Summary

Zach, my team leader @ Telco Corp, wants to find out why our customers churning.  

Below is a list of questions he wants answers to:

1. Could the month in which they signed up influence churn? i.e. if a cohort is identified by tenure, is there a cohort or cohorts who have a higher rate of churn than other cohorts? **(Plot the rate of churn on a line chart where x is the tenure and y is the rate of churn (customers churned/total customers))**tk
2. Are there features that indicate a higher propensity to churn? like type of internet service, type of phone service, online security and backup, senior citizens, paying more than x% of customers with the same services, etc.?
3. Is there a price threshold for specific services where the likelihood of churn increases once price for those services goes past that point? If so, what is that point for what service(s)?
4. If we looked at churn rate for month-to-month customers after the 12th month and that of 1-year contract customers after the 12th month, are those rates comparable?

### Goals

The goals of the project are to answer the questions above and to deliver the following data products:

1. Report detailing my analysis in an .ipynb format
2. CSV file containing my predictions on a test data set
3. Google Slides explaining my chosen model
4. .py files that are used through the entire pipeline, and that contains _reproducible_ python scripts
5. Read Me file on a github repo containing all files created for this project
  
### Data Dictionary
| Feature 	| Description 	| Table of Origin 	| Notes 	|
|---------	|-------------	|-----------------	|-------	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|
|         	|             	|                 	|       	|

### Packages

In [1]:
import pandas as pd
import numpy as np

import wrangle
import prep
import model

from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings("ignore")

from scipy.stats import binom

## I. Find Baseline Probability

With the available data, what is the proportion of customers who churn?

```sql
SELECT churn, count(*) /
	(SELECT count(*) FROM customers)
FROM customers
WHERE churn = "Yes"
```

Proportion of people who churned:
0.2654 (out of 7043 observations)

n = no. of trials  
P = probability of success (success = Churned)

In [2]:
will_churn = binom(7043, 0.2654).sf(.5) # not sure about this yet. will come back later tk
will_churn

1.0

## II. Acquire and Split Data

Using the `get_sql_telcochurn` function from `wrangle.py`, acquire data from `telco_churn` database on MySQL.

In [3]:
telco = wrangle.get_sql_telco()

Split data to train and test, and set aside test data.

In [4]:
train, test = train_test_split(telco, train_size=0.7, random_state=123)

print(f"""
Train data size: {len(train)}
Test data size: {len(test)}
""")


Train data size: 4930
Test data size: 2113



## III. Perform Temporary Cleaning and Preliminary Exploration on Train Data

### Temporary Cleaning

In [5]:
train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4930 entries, 1479 to 3582
Data columns (total 21 columns):
customer_id                 4930 non-null object
gender                      4930 non-null object
senior_citizen              4930 non-null int64
partner                     4930 non-null object
dependents                  4930 non-null object
tenure                      4930 non-null int64
phone_service               4930 non-null object
multiple_lines              4930 non-null object
internet_service_type_id    4930 non-null int64
online_security             4930 non-null object
online_backup               4930 non-null object
device_protection           4930 non-null object
tech_support                4930 non-null object
streaming_tv                4930 non-null object
streaming_movies            4930 non-null object
contract_type_id            4930 non-null int64
paperless_billing           4930 non-null object
payment_type_id             4930 non-null int64
monthly_charg

`total_charges` is an object type. There are 8 observations in the `train` data that are encoded NaN types. These observations are making this attribute an object instead of a float type.

At this phase, I have not definitely determined if I want to impute these nulls or drop them completely. So to facilitate exploration, I am creating `traindrp` which is a copy of train data without the nulls.

>Action Steps:  
 >- Drop NaNs using `dropna()`
 >- Cast values to a float data type

In [6]:
train.total_charges.isnull().sum()
train.total_charges.value_counts(ascending=False, dropna=False)

           8
20.05      7
19.65      7
45.3       6
19.75      6
19.9       5
20.25      5
20.2       5
19.55      5
19.45      4
20.3       4
20.4       4
20.35      4
69.9       4
19.5       4
69.65      4
20.45      4
19.95      3
70.3       3
44.4       3
69.1       3
74.6       3
470.2      3
19.25      3
19.3       3
24.4       3
55.7       3
305.55     3
70.15      3
20.5       3
          ..
4543.95    1
1054.8     1
1233.25    1
927.1      1
934.1      1
5175.3     1
6296.75    1
1043.4     1
3824.2     1
4250.1     1
2088.75    1
4735.35    1
1216.35    1
4895.1     1
990.3      1
5032.25    1
3141.7     1
1810.55    1
679.8      1
978.6      1
6017.9     1
2157.5     1
4487.3     1
6689       1
2970.8     1
553.4      1
7413.55    1
564.4      1
3623.95    1
298.45     1
Name: total_charges, Length: 4636, dtype: int64

In [7]:
traindrp = train.copy()
traindrp = traindrp.replace(" ",np.nan).dropna()
traindrp.total_charges = traindrp.total_charges.astype("float")

print(f"""
From the original train count of {len(train)}, traindrp has a reduced observation count of {len(traindrp)}. Also, total charges is now recast as a "{traindrp.total_charges.dtype}" data type.
""")


From the original train count of 4930, traindrp has a reduced observation count of 4922. Also, total charges is now recast as a "float64" data type.



## III. Create Baseline Model

Because Decision Tree accepts discrete and continuous features, I will create baseline model using a Decision Tree.  
> Action Steps:
> - Encode `churn` into a computer-readable variable with 0s and 1s values, such that 0 = No (Stayed), 1 = Yes (Churned). Use `int_encode` function from `prep`
   

In [8]:
traindrp["enc_churn"] = traindrp.churn.apply(prep.yes_no_to_boolean)

In [9]:
traindrp.head()

Unnamed: 0,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,internet_service_type_id,online_security,...,tech_support,streaming_tv,streaming_movies,contract_type_id,paperless_billing,payment_type_id,monthly_charges,total_charges,churn,enc_churn
1479,2187-PKZAY,Male,0,No,No,12,Yes,No,2,No,...,No,No,Yes,1,Yes,3,79.95,1043.4,No,0
2377,3402-XRIUO,Female,1,Yes,No,22,Yes,Yes,1,Yes,...,Yes,No,No,1,Yes,2,63.55,1381.8,No,0
6613,9397-TZSHA,Female,0,No,No,69,Yes,Yes,3,No internet service,...,No internet service,No internet service,No internet service,3,No,4,24.6,1678.05,No,0
6468,9153-BTBVV,Female,0,Yes,No,71,Yes,Yes,3,No internet service,...,No internet service,No internet service,No internet service,3,No,3,25.0,1753.0,No,0
2668,3793-MMFUH,Female,1,No,No,13,Yes,Yes,2,No,...,No,Yes,Yes,1,Yes,1,95.05,1290.0,Yes,1


In [None]:
model_by_cart