## TELCO CHURN INITIAL EXPLORATORY ANALYSIS

#### WORKING PREMISE:

**While each customer will have their own particular reason for parting ways with Telco, the two ingredients needed for customer churn are:**
1. Low or No "Perception of Commitment" to Telco.
2. Minimal Switching Costs associated with leaving Telco.

#### The goals of this initial exploration are as follows:
- Discover the "Driving Factors" which contribute most to customer churn.
- Discover the customer segment which is most likely to churn.
- Discover the customer segment which is most responsible for revenue.
- Identify those customers who exhibit the characteristics of said target customer segment.
- Create a Machine Learning Model which can accurately predict customer churn more accurately than baseline.


In [1]:
# DS Libraries
import pandas as pd
import numpy as np
from scipy import stats

# Data Acquisition
from pydataset import data
import env
import acquire as acq
import prepare as prp

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# scikit learn submodules
from sklearn.tree import DecisionTreeClassifier, export_text, plot_tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from sklearn.linear_model import LogisticRegression

In [2]:
# Let's Acquire and Prepare the DataFrame
# load telco via acquire.py (be sure to place acquire.py in same directory as this Notebook)
df = acq.new_telco_data()

In [3]:
# Using the prep_telco function in prepare.py, we will drop unneeded coloumns and encode columns that 
# require encoding
df = prp.prep_telco(df)

#### Initial questions and hypothesis:
#### What % of Telco Customers have No "Perceived Commitment" to Telco in combination with low "Switching Costs"?

- Do customers with **"Month-to-Month"** contracts churn out at a statistically significant greater rate than the overall population?
- Do customers with **"Paperless Billing"** enabled churn out at a statistically significant greater rate than the overall population?
- Do customers **without "Dependents"** churn out at a statistically significant greater rate than the overall population?
- Do customers **without "Partners"** churn out at a statistically significant greater rate than the overall population?
- Do customers who comprise the largest customer segment churn out at a statistically significant greater rate than the overall population?

#### HYPOTHESIS: Customers in this segmnent churn at greater rates than the overall Telco customer base

In [11]:
# Using prepare.py, split the Dataset into TRAIN(.80), VALIDATE(.14), and TEST(.06) subsets:
train, validate, test = prp.split_data(df,'churn')
# quick check of the numbers for each should be 3943-1691-1409
train.shape[0],validate.shape[0],test.shape[0]

(3943, 1691, 1409)