# Predicting Customer Churn

by Jared Godar 2021-11-29

---

## Project Goal

This project aims to use historical customer data to build a classification model to predict what customers are likely to churn to identify drivers of churn to devise better customer retention strategies.

## Project Description

Acquiring new customers is expensive. Between marketing costs and initial offers and promotions, a significant investment is made in bringing a customer to Telco, Inc. We are losing too many customers to our competitors and having to find new customers to replace them. Here we will analyze the attributes of customers who leave and stay, develop a model to predict churn based on those attributes, make recommendations to increase customer retention, and provide a list of current customers and their predicted likelihood to churn.

## Initial Questions

Are there any demographic categories (age, sex, etc.) more or less likely to churn?

Does the type of service or services the customer subscribes to influence churn?

How do customer's charges influence churn?



---

## Data Dictionary

---

## Import Libraries

Here we import the various functions, modules, and libraries necessary.



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math

# import splitting and imputing functions
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer

# turn off pink boxes for demo
import warnings
warnings.filterwarnings("ignore")

# import our own acquire module
import telcoacquire

# Remove limits on viewing dataframes
pd.set_option('display.max_columns', None)

---

## Wrangle Telco Dat

To acquire the titanic data, I used the `telco_churn` datbase on the Codeup server, selecting all columns from the `customers` atabnle and joining the `contract_type_id`, `internet_service_types`, and `payment_type_id` tables with the following query:

    select * from customers
                join contract_types using (contract_type_id)
                join internet_service_types using (internet_service_type_id)
                join payment_types using (payment_type_id)

In [2]:
# Acquire data from the sql database telco_churn

telco_df = telcoacquire.new_telco_data()

---

## Data Cleaning

To clean the data, I did the following:

1. Dropped Columns
    - *Duplicates:* Since `payment_type_id` duplicates `payment_type`, `internet_service_type_id` duplicates `internet_service_type`, and `contract_type_id` duplicates `contract_type`, we only need one column of each pair. Drop the `_id` columns
    - *Unhelpful:* `customer_id` should provide no predictive value for our model and can be dropped as well.
2. Dropped Rows
    - There were 11 new customers with null values for total charges
    - Since these customers have not had an opportunity to churn, they offer no predictive value and were fropped from the dataset
3. Changed the `total_charges` column data type from `object` to `float`.
4. Created Dummy Variables for object data types
    - Columns with two options were assigned a new encoded column with the values `0` or `1` (`gender`, `partner`, `dependents`, `phonse_service`, `paperless_billing`, `churn`)
    - Columns with more than two options were one-hot encoded (`multiple_lines`, `online_security`, `online_backup`, `device_protection`, `tech_support`, `streaming_tv`, `streaming_movies`, `contract_type`, `internet_service_type`, `payment_type`)
5. Encoding created some redundant columns `no_phone` `no_internet`. Since this information was captured elsewhere, these columns were also dropped.

In [3]:
import telco_prepare

telco_df=telco_prepare.prep_telco(telco_df)

---

## Split the data

Data will be split into a training, validation, and test data set:
    - *Train* Perform exploratory analysis and construct models
    - *Validate* verify models are not overfit to training data
    - *Test* used only once with the best model to approximate how the model will perform on new data

In [4]:
telco_train, telco_validate, telco_test=telco_prepare.split_telco(telco_df)