## Goal:
* Discover drivers of churn within the Telco dataset
* Use these drivers to develop a machine learning model to predict whether or not a customer will churn

#### Imports

In [None]:
# acquire
import wrangle as w
import env
import explore as e

# General DS Imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import scipy.stats as stats

# Decision Tree and Model Evaluation Imports
from sklearn.tree import DecisionTreeClassifier, plot_tree, export_text
from sklearn.metrics import classification_report, confusion_matrix, plot_confusion_matrix, ConfusionMatrixDisplay
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression

## Acquire
* Data acquired from Codeup Database
* It contained 7043 rows and 24 columns before cleaning
* Each row represents a customer from the Telco company
* Each column represents a feature of those customers

## Prepare
### Prepare Actions:

* Dropped duplicate columns
* Removed columns that did not contain useful information
* Renamed columns to promote readability
* Checked for nulls in the data (there were none)
* Dropped Null Values stored as whitespace
* Checked that column data types were appropriate, converted total charges to the correct data type
* Encoded categorical variables
* Split data into train, validate and test (approx. 75/12.5/12.5), stratifying on 'churn'
* Outliers have not been removed for this iteration of the project

In [None]:
# acquiring telco data from codeup database
df = w.get_telco_data()

#Preparing telco data for exploration
df = w.prep_telco(df)

#Split data and set target variable
target = 'churn'
train, validate, test = w.train_validate_test_split(df, target)

## A brief look at the data

In [None]:
train.head()

## Summary of the data

In [None]:
train.describe()

## Exploration

## What is the percentage of customers who churn?

In [None]:
#Get pie chart for churn
w.get_pie_churn(train)

* **Approximately 27 percent of customers in this dataset "churned" (left the company)**

## Is there a relationship between being a Senior Citizen and Churn rate?

In [None]:
#Create bar chart (senior_citizen)
w.get_bar_senior(train)

### Senior Citizen vs. Churn:

* I used a 95% confidence interval
* The resulting alpha value is .05

Hypotheses:

$H_0$: **The churn rate of senior citizens is less than or equal to the churn rate of non-senior citizens.**

$H_a$: **The churn rate of senior citizens is greater than the churn rate of non-senior citizens.**

In [None]:
#Get chi-square test results
w.get_chi_senior(train)

**The p-value is less than alpha. Therefore, there is evidence to support the hypothesis that "Senior Citizen" and "Churn" are related. I believe that including this feature in modeling will likely have a positive impact on model accuracy.**

## Does a customer having dependents affect churn?

In [None]:
#Get bar chart comparing having depenedents with churn
w.get_bar_dependents(train)

### It appears that customers with dependents churn less than those without dependents.

### Dependents vs. Churn: Testing Significance of Relationship

* Both variables are categorical so a Chi-Squared test is required
* I used a 95% confidence interval
* The resulting alpha value is .05

Hypotheses:

$H_0$: **The churn rate of those with dependents is greater than or equal to those without dependents.**

$H_a$: **The churn rate of those with dependents is less than those without dependents.**

In [None]:
#Run chi-squared test on dependents vs churn
w.get_chi_dependents(train)

**The p-value is less than alpha. Therefore, there is evidence to support the hypothesis that a customer having dependents is related to churn rate. I believe that including this feature in modeling will likely have a positive impact on model accuracy.**

## Does a customer having a partner affect churn?

In [None]:
#Visualizing relationship between partner status and churn
w.get_bar_partner(train)

#### It appears that those with partners churn less than those without.

### Partner Status vs. Churn: Looking at the significance of the relationship

* Both variables are categorical so a Chi-Squared test is required
* I used a 95% confidence interval
* The resulting alpha value is .05

Hypotheses:

$H_0$: **The churn rate of partnered customers is greater than or equal to the churn rate of single customers.**

$H_a$: **The churn rate of partnered customers is less than the churn rate of single customers.**

In [None]:
#Get chi-squared test for partner status vs churn
w.get_chi_partner(train)

**The p-value is less than alpha. Therefore, there is evidence to support the hypothesis that a customer having a partner and churn are related. However, this feature is similar to having dependents, and may not add value when included in modeling.**

## Does customer contract type affect churn?

In [None]:
#Get bar chart comparing contract 
w.get_bar_contract(train)

#### It appears that customers with month-to-month contracts churn at a much higher rate than those with two-year contracts. Customers with one-year contracts churn less than month-to-month, but more than two-year contact customers.

### Contract Type vs. Churn: Testing Significance of Relationship

* Both variables are categorical so a Chi-Squared test is required
* I used a 95% confidence interval
* The resulting alpha value is .05

Hypotheses:

$H_0$: The churn rate of those on a month-to-month contract is less than or equal to other customers.

$H_a$: The churn rate of those on a month-to-month contract is greater than other customers.

In [None]:
#Running chi-squared test comparing contract type and churn
w.get_chi_contract(train)

**The p-value is less than alpha. Therefore, there is evidence to support the hypothesis that customer contract type is related to churn rate. I believe that including this feature in modeling will likely have a strong positive impact on model accuracy.**

## Does a customer's monthly charge amount impact churn?

In [None]:
#Get viz of monthly charge vs churn
w.monthly_charges_md(train)

#### It appears that customers who have lower monthly charges churn less.

### Monthly Charges vs. Churn: Testing Significance of Relationship

* Monthly Charges is a continuous variable, and both populations are independent, so an independent T-Test is required.
* I used a 95% confidence interval (alpha value .05)
* Variances were tested and not equal, and indicated in testing parameter

Hypotheses:

$H_0$: **Customers with lower monthly charges have a churn rate greater than or equal to those with higher charges.**

$H_a$: **Customers with lower monthly charges have a lower churn rate than those with higher monthly charges.**

In [None]:
#Run T-test comparing monthly charges and churn status
w.get_t_monthly(train)

**The p-value is less than alpha. Therefore, there is evidence to support the hypothesis that customer monthly charge amount is related to churn rate. I believe that including this feature in modeling will likely have a positive impact on model accuracy.**

## Does the total charges a customer has accrued impact churn rate?

In [None]:
#Get bar chart comparing total charges with churn
w.total_charges_md(train)

#### It appears that customers with higher total charges churn less.

### Total Charges vs. Churn: Testing Significance of Relationship

* Total Charges is a continuous variable, and both populations are independent, so an independent T-Test is required.
* I used a 95% confidence interval (alpha value .05)
* Variances were tested and not equal, and indicated in testing parameter

Hypotheses:

$H_0$: Customers with higher total charges have a churn rate greater than or equal to those with lower total charges

$H_a$: Customers with higher total charges have a lower churn rate than those with lower total charges

In [None]:
#Run t-test comparing total charges and churn
w.get_t_total(train)

**The p-value is less than alpha. Therefore, there is evidence to support the hypothesis that customer total charge amount is related to churn rate.**

## Exploration Summary

* Most categorical variables in this dataset had significant relationships with the target variable of "churn"
* Categorical variables that did not impact churn were Gender and Phone Service
* Senior Citizen status is a driver of churn
* Both partner status and having dependents are drivers of churn
* More time to explore this data could yield greater results in exploration (feature engineering)

## Features I am moving to modeling with

## Features I am not moving to modeling with