# Telco Project
***
### Project Description
Accurately predict customer churn using machine learning classification
algorithms

__Tasks__
- [ ] Introduce __"Telco"__
- [ ] Business model
- [ ] Business problem

### Table of Contents:
1. [Planning](#Planning)
2. [Acquisition](#Acquisition)
3. [Preparation](#Preparation)
4. [Exploration](#Exploration)
5. [Modeling](#Modeling)
6. [Delivery](#Delivery)

## Planning
---
 - [ ] Goal(s)
     - [ ] Find drivers of customer churn
     - [ ] Accurately predict customer churn at Telco.
 - [ ] Measure(s) of success
     - [ ] Hypothesis testing
     - [ ] Baseline accuracy
     - [ ] 3 classification models
         - [ ] Model performance: train, validate, test
         - [ ] Hyperparameter tuning
 - [ ] Plan to achieve 1 & 2
 - [ ] Develop hypotheses

## Acquisition
--------------
- [ ] Instructions to acquire data
- [x] Upload `.csv` file to repository - file named `telecom_data.csv`
- [ ] acquire.py file
    1. [x] Write functions to acquire telco dataset
    2. [ ] Write docstring for each function

In [10]:
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from acquire import get_telecom_data
from prepare import telcom_data_prep
from sklearn.model_selection import train_test_split
warnings.filterwarnings('ignore')

In [11]:
df_telco = get_telecom_data()

In [12]:
df_telco.head()

Unnamed: 0,payment_type_id,contract_type_id,internet_service_type_id,customer_id,gender,senior_citizen,partner,dependents,tenure,phone_service,...,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,internet_service_type,contract_type,payment_type
0,2,1,1,0003-MKNFE,Male,0,No,No,9,Yes,...,No,No,Yes,No,59.9,542.4,No,DSL,Month-to-month,Mailed check
1,4,1,1,0013-MHZWF,Female,0,No,Yes,9,Yes,...,Yes,Yes,Yes,Yes,69.4,571.45,No,DSL,Month-to-month,Credit card (automatic)
2,1,1,1,0015-UOCOJ,Female,1,No,No,7,Yes,...,No,No,No,Yes,48.2,340.35,No,DSL,Month-to-month,Electronic check
3,1,1,1,0023-HGHWL,Male,1,No,No,1,No,...,No,No,No,Yes,25.1,25.1,Yes,DSL,Month-to-month,Electronic check
4,3,1,1,0032-PGELS,Female,0,Yes,Yes,1,No,...,No,No,No,No,30.5,30.5,Yes,DSL,Month-to-month,Bank transfer (automatic)


In [None]:
df_telco.info()

In [None]:
df_telco.nunique()

In [None]:
object_columns = df_telco.nunique()[df_telco.nunique() <= 4]

In [None]:
object_columns = object_columns.index.to_list()

In [None]:
for column in object_columns:
    print(df_telco[column].value_counts().sort_index())
    print('')

In [None]:
sns.distplot(df_telco.tenure);

In [None]:
sns.distplot(df_telco.monthly_charges);

## Preparation and Processing
---
- [ ] Document process
- [ ] Create `prepare.py` file
    - [ ] Create functions to clean data
    - [ ] Store functions in a separate file, `prepare.py`
- [ ] Clean data
    - [ ] __tidy-data__ mindset
    - [ ] Change datatypes
        - [ ] Create encoded variables from categorical variables
    - [ ] Round numeric floating-point values
    



In [None]:
from sklearn.model_selection import train_test_split

from prepare import telcom_data_prep

In [None]:
df = telcom_data_prep()

In [None]:
df.head()

In [None]:
X = df.drop(columns='churn')
y = df[['churn']]

In [None]:
X_train_validate, X_test, y_train_validate, y_test = train_test_split(X,
                                                                      y,
                                                                      test_size=.3,
                                                                      stratify=y.churn)

In [None]:
X_train, X_validate, y_train, y_validate = train_test_split(X_train_validate,
                                                            y_train_validate, 
                                                            test_size=.3,
                                                            stratify=y_train_validate.churn)

In [None]:
print("Training Set", X_train.shape, y_train.shape)
print("Validation Set", X_validate.shape, y_validate.shape)
print("Test Set", X_test.shape, y_test.shape)

## Exploration
---
- [ ] Statistical Analysis
    - [ ] Restate hypothesis here
    - [ ] Test hypotheses
    - [ ] Plot distributions
- [ ] Create visuals
- [ ] Present and summarize key findings

Hypotheses


## Modeling
---
- [ ] sklearn.domymathhomework.classification_models
    - [ ] Create 3 classification models

## Delivery
---

- [ ] Summarize/Recap key findings
    - [ ] Drivers

[Return to the top](#Telco-Project)