# Telco Churn

## Table of Contents

* [Package Imports](#packages_import)
* [Acquire](#data_import)
* [Prepare](#prepare)
* [Explore Process](#explore1)
    * [Question 1](#q_1)
        * [Q1 Visualization](#q_1_viz)
        * [Q1 Statistical Test](#q_1_stats)
        * [Q1 Answer](#q_1_ans)
    * [Question 2](#q_2)
        * [Q2 Visualization](#q_2_viz)
        * [Q2 Statistical Test](#q_2_stats)
        * [Q2 Answer](#q_2_ans)    
    * [Question 3](#q_3)
        * [Q3 Visualization](#q_3_viz)
        * [Q3 Statistical Test](#q_3_stats)
        * [Q3 Answer](#q_3_ans)    
    * [Question 4](#q_4)
        * [Q4 Visualization](#q_4_viz)
        * [Q4 Statistical Test](#q_4_stats)
        * [Q4 Answer](#q_4_ans)
* [Explore Summary](#explore2)
* [Modeling](#modeling)
    * [Introduction](#m_intro)
    * [Baseline](#baseline)
    * [Model 1](#mod_1)
    * [Model 2](#mod_2)
    * [Model 3](#mod_3)
* [Conclusion](#conclusion)
    * [Summery](#c_summery)
    * [Recommendations](#c_recs)
    * [Next Steps](#c_steps)
* [ReadMe](#readme)

## Package Imports <a class="anchor" id="packages_import"></a>


In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from acquire import get_telco_data
from prepare import prep_telco, train_validate

pd.options.display.max_columns = None

## Acquire<a class="anchor" id="data_import"></a>

In [2]:
master_df = get_telco_data()

In [7]:
master_df.head()

Unnamed: 0,churn_month,paperless_billing,senior_citizen,partner,dependents,monthly_charges,total_charges,signup_date,churn
0,,1,0,1,1,65.6,593.3,2021-04-21 18:07:34,1
1,,0,0,0,0,59.9,542.4,2021-04-21 18:07:34,1
2,2022-01-31,1,0,0,0,73.9,280.85,2021-09-21 18:07:34,0
3,2022-01-31,1,1,1,0,98.0,1237.85,2020-12-21 18:07:34,0
4,2022-01-31,1,1,1,0,83.9,267.4,2021-10-21 18:07:34,0


## Prepare <a class="anchor" id="prepare"></a>

In [4]:
working_df = prep_telco(master_df)

In [6]:
working_df.head()

Unnamed: 0,churn_month,paperless_billing,senior_citizen,partner,dependents,monthly_charges,total_charges,signup_date,churn,gender_Male,streaming_movies_No internet service,streaming_movies_Yes,streaming_tv_No internet service,streaming_tv_Yes,tech_support_No internet service,tech_support_Yes,multiple_lines_No phone service,multiple_lines_Yes,online_backup_No internet service,online_backup_Yes,online_security_No internet service,online_security_Yes,device_protection_No internet service,device_protection_Yes,payment_type_Credit card (automatic),payment_type_Electronic check,payment_type_Mailed check,internet_service_type_Fiber optic,internet_service_type_None,contract_type_One year,contract_type_Two year
0,,1,0,1,1,65.6,593.3,2021-04-21 18:07:34,1,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0
1,,0,0,0,0,59.9,542.4,2021-04-21 18:07:34,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0
2,2022-01-31,1,0,0,0,73.9,280.85,2021-09-21 18:07:34,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0
3,2022-01-31,1,1,1,0,98.0,1237.85,2020-12-21 18:07:34,0,1,0,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,0,0,0
4,2022-01-31,1,1,1,0,83.9,267.4,2021-10-21 18:07:34,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0


## Explore process <a class="anchor" id="explore1"></a>

In [5]:
#Split, Apply, Combine.
#Split using a single column new_group = df.groupby(['column'])
#get the new group new_group.get_group('group')

#Useful bc you don't need a filter every time:
#new_group['column'].value_counts().loc['any_group']
#To see percentages versus numbers:
#new_group['column'].value_counts(normalize=True).loc['any_group']


#Alt way:
#filt = df['column'] == 'group'
#df.loc[filt]

### Question 1 <a class="anchor" id="q_1"></a>

#### Q1 Visualization <a class="anchor" id="q_1_viz"></a>

#### Q1 Statistical Test <a class="anchor" id="q_1_stats"></a>

#### Q1 Answer <a class="anchor" id="q_1_ans"></a>

### Question 2 <a class="anchor" id="q_2"></a>

#### Q2 Visualization <a class="anchor" id="q_2_viz"></a>

#### Q2 Statistical Test <a class="anchor" id="q_2_stats"></a>

#### Q2 Answer <a class="anchor" id="q_2_ans"></a>

### Question 3 <a class="anchor" id="q_3"></a>

#### Q3 Visualization <a class="anchor" id="q_3_viz"></a>

#### Q3 Statistical Test <a class="anchor" id="q_3_stats"></a>

#### Q3 Answer <a class="anchor" id="q_3_ans"></a>

### Question 4 <a class="anchor" id="q_4"></a>

#### Q4 Visualization <a class="anchor" id="q_4_viz"></a>

#### Q4 Statistical Test <a class="anchor" id="q_4_stats"></a>

#### Q4 Answer <a class="anchor" id="q_4_ans"></a>

## Explore summary <a class="anchor" id="explore2"></a>

## Modeling <a class="anchor" id="modeling"></a>

### Introduction <a class="anchor" id="m_intro"></a>

### Baseline <a class="anchor" id="baseline"></a>

### Model 1 <a class="anchor" id="mod_1"></a>

### Model 2 <a class="anchor" id="mod_2"></a>

### Model 3 <a class="anchor" id="mod_3"></a>

## Conclusion <a class="anchor" id="conclusion"></a>

### Summery <a class="anchor" id="c_summery"></a>

### Recommendations <a class="anchor" id="c_recs"></a>

### Next Steps <a class="anchor" id="c_steps"></a>

## ReadMe <a class="anchor" id="readme"></a>

Your README should contain all of the following elements:

* **Title** Gives the name of your project
* **Project Description** Describes what your project is and why it is important 
* **Project Goal** Clearly states what your project sets out to do and how the information gained can be applied to the real world
* **Initial Hypotheses** Initial questions used to focus your project 
* **Project Plan** Guides the reader through the different stages of the pipeline as they relate to your project
* **Data Dictionary** Gives a definition for each of the features used in your report and the units they are measured in, if applicable
* **Steps to Reproduce** Gives instructions for reproducing your work. i.e. Running your notebook on someone else's computer.