# Telco Customer Churn

# Goal:
    * Predict customer churn
    * Find out the key drivers that lead to churn


# Importing Libraries and data

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import warnings
warnings.filterwarnings('ignore')
import wrangle as w
# acquiring, cleaning, and adding features to data
df = w.get_telco()
df = w.wrangle_telco_encoded(df)

# splitting data into train, validate, and test
train, validate, test = w.train_validate_test(df, "churn")


# Acquire Data
    * Data is acquired from codeup database using get_telco_data function from wrangle.py
    * Telco has 7043 rows and 24 columns before cleaning

# Prepare
## Prepare Actions
    * Checked for null values and found none
    * Checked for duplicates and found none
    * converted total_charges to float
    * encoded churn, contract_type, internet_service_type, and payment_type
    * split data into train, validate, and test


# Explore

# Is fiber optic a driver of churn?
$H_0$: Customers with fiber optic do not have a higher churn rate than those with DSL.
$H_a$: Customers with fiber optic have a higher churn rate than those with DSL.

In [None]:
# Get Bar Plot of Churn Rate by Internet Service Type
w.plot_churn_rate_by_internet_service_type()

In [None]:
# Get Chi2 Test Results
w.chi2_test_for_churn_and_internet_service_type()

# Is fiber optic price a driver of churn?

# Does tenure correlate with higher or lower churn?

# Are customers with dependents more likely to churn than those without?

# Exploration Summary

# Selected Features for Modeling
    * tenure
    * monthly_charges
    * total_charges
    * contract_type
    * internet_service_type
    * payment_type
    * senior_citizen
    * partner
    * dependents
    * phone_service
    * multiple_lines
    * online_security
    * online_backup
    * device_protection
    * tech_support
    * streaming_tv
    * streaming_movies
    * paperless_billing
    * churn

# Features that will be dropped
    * gender(Has no significance to churn)
    * payment_type_id(same as payment_type)
    * internet_service_type_id(same as internet_service_type)
    * contract_type_id(same as contract_type)


# Modeling

# Baseline Prediction
    * Baseline prediction of customers who do not churn is 73.46%
    * For the model to be useful, it must perform better than the baseline

# Model Selection
    * Logistic Regression
    * Random Forest
    * KNN

# Comparion of Models
    * Logistic Regression performed the best with an accuracy of 80.5%
    * Random Forest performed the worst with an accuracy of 77.5%
    * KNN performed in the middle with an accuracy of 79.5%


# Model Evaluation
    * All models performed better than the baseline
    * Models were evaluated on accuracy
    * Tested hyperparameters for each model to get the best accuracy with lowest variance between train and validate

