# Telco Classification Project

![](telco_churn_pic.png)

In [2]:
from scipy import stats

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import seaborn as sns

from pydataset import data

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

import os

import graphviz
from graphviz import Graph

import env
from acquire_telco import get_telco_data
from prepare_telco import clean_telco, remove_unwanted_values, train_validate_test_split, num_distributions
from helper import splitting_target_var
from data_dictionary import data_dict

import warnings
warnings.filterwarnings("ignore")

NameError: name 'df' is not defined

## Planning

- Make a README.md that will hold all of the project details including a data dictionary, key finding, initial hypotheses, and explain how my process can be replicated
- Create a MVP, originally and work through the iterative process of making improvements to that MVP.
- Define atleast 2 clear sets of null and alternative hypotheses set an alpha value.
- Create two .py scripts for both acquire and prepare, in order to automate the collection and cleaning of the data.
- Create a helper.py for any other functions I need implamented thoughout the pipeline.
- Properly anotate my code as I run though the process, in order for the code to be easily understood, and document any decisions that were made when cleaning, creating new columns, or removing rows of data.

### Data Science Pipeline
##### Acquire
- Create an acquire.py (acquire_telco.py) was the name of my py file.
- Use that acquire_telco.py file to grab the data from the CodeUp SQL database and cache that file to a csv for ease of accessability.
- Render the csv into a pandas dataframe on python.
- Summarize the initial data and plot the distributions of individual variables.
##### Prepare
- Create a prepare.py (prepare_telco.py) was the name of my py file.
- Clean the data as I see fit, handling the missing values and encoding values as necessary in order to give numeric values that will work with the models
- There were 11 values with no current tenure, and I made the decision to remove those values. These customers have not payed their first bill, so there is no data on weather they are satisfied with the product.
- Add new columns that might be useful in modeling, might need more information from the explore for incite into columns that once combined will drive churn.
- I added two new columns (auto_pay - if payment type was automatic.),and (add_ons - A column that sums the six aditional services.)
##### Explore
- Awnser my initial hypotheses that was asked in my planning phase, and test those hypotheses using statistical tests, either accepting or rejecting the null hypothesis.
- Continue using statistical testing and visualizations to discover variable relationships in the data, and attempt to understand "how the data works".
- Summarize my conclusions giving clear awnsers to the questions I posed in the planning stage and summarize any takeaways that might be useful.
##### Modeling and Evaluation
- Train and evaluate multiple models comparing those models on different evaluation metrics.
- Validate the models and choose the best model that was found in the validation phase.
- Test the best model found and summarize the performance and document the results using a confusion matrix, predict methods, and classification reports.
- Save the test predictions to a .csv file.
##### Delivery
- Deliver my refined jupyter notebook to the CodeUp data science team.
- Summarize my findings, and build a narrative around the data, pulling from my knowledge on story telling.
- Walk though the notebook explaining finding, documentation, and decisions that were made.
- End with key takeaways and reccomendations.

### EXECUTIVE SUMMARY

- 
- 
- 

### Data Dictionary

|Target|Datatype|Definition|
|:-------|:--------|:----------|
| species | 150 non-null: object | iris species - virginica, versicolor, setosa |

|Feature|Datatype|Definition|
|:-------|:--------|:----------|
| petal_length       | 150 non-null: float64 |    iris petal length in cm |
| petal_width        | 150 non-null: float64 |    iris petal width in cm |
| sepal_length       | 150 non-null: float64 |    iris sepal length in cm |
| sepal_width        | 150 non-null: float64 |    iris sepal width in cm |

In [2]:
ictionary = {
    'Feature' : [
                'customer_id', 
                 'gender', 
                 'senior_citizen', 
                 'partner', 
                 'dependents', 
                 'tenure', 
                 'phone_service', 
                 'multiple_lines', 
                 'internet_service_type_id', 
                'online_security', 
                'online_backup', 
                'device_protection',
                'tech_support', 
                'streaming_tv', 
                'streaming_movies', 
                'contract_type_id', 
                'paperless_billing', 
                'payment_type_id', 
                'monthly_charges',
                'total_charges',
                'churn',
                'contract_type',
                'internet_service_type',
                'payment_type',
                'has_churned'
                ],
    'Dataype' : [
                df.dtypes['customer_id'], 
                 df.dtypes['gender'], 
                 df.dtypes['senior_citizen'],
                df.dtypes['partner'],
                 df.dtypes['dependents'], 
                 df.dtypes['tenure'], 
                 df.dtypes['phone_service'], 
                df.dtypes['multiple_lines'],
                 df.dtypes['internet_service_type_id'], 
                df.dtypes['online_security'], 
                df.dtypes['online_backup'], 
                df.dtypes['device_protection'],
                df.dtypes['tech_support'], 
                df.dtypes['streaming_tv'], 
                df.dtypes['streaming_movies'],
                df.dtypes['contract_type_id'], 
                df.dtypes['paperless_billing'], 
                df.dtypes['payment_type_id'], 
                df.dtypes['monthly_charges'],
                df.dtypes['total_charges'],
                df.dtypes['churn'],
                df.dtypes['contract_type'],
                df.dtypes['internet_service_type'],
                df.dtypes['payment_type'],
                df.dtypes['has_churned']
                ],
    'Definition' : ['Identification number for customer', 
                    'Customer gender, male or female', 
                    'Yes or no, is the customer a senior citizen', 
                    'Yes or no, does the customer customer has a parter', 
                    'Number of dependents a customer has', 
                    'Number of days a customer has been with the company', 
                    'Type of phone service plan a customer has', 
                    'Yes or no, does the customer have multiple lines', 
                    '1 for DSL, 2 for Fiber Optic, 3 for None', 
                    'Yes, no, or no internet service',
                    'Yes, no, or no internet service', 
                    'Yes, no, or no internet service',
                    'Yes, no, or no internet service', 
                    'Yes, no, or no internet service',
                    'Yes, no, or no internet service',
                    '1 for month-to-month, 2 for year, and 3 for two-year contract', 
                    'Yes or no, whether or not the customer uses paperless billing', 
                    '1 for electronic check, 2 for mailed check, 3 for automatic bank transfer, 4 for automatic credit card payment',
                    'Monthly charges the customer pays',
                    'Total charges the customer has paid',
                    'Yes or no, whether or not the customer has churned',
                    'Month-to-month, year, or two-year contract',
                    'DSL, Fiber Optic, or None',
                    'Electronic check, mailed check, automatic bank transfer, or automatic credit card payment',
                    '0 for has not churned, 1 for has churned'
                    ]
}

NameError: name 'df' is not defined