# Raw Classification Project notebook 

# Goals:
### - Discover drivers for customer churn at Telco.
### - Use drivers to develop a machine learning model to predict customer churn.
### - Use the model for recommendations on ways to reduce churn

# Questions to guide to the goals:
## - Why do customers churn?
### - Is there a pattern to those who churn?
### - What is the most common tenure when customers churn?
### - Who does not churn?
### - Do customers who have less than 1 tenure *(month)* impact churn? 
### - Is the avg_monthly_charges *(total_charges/tenure)= avg_monthly_charges*

In [1]:
## IMPORTS ##

import pandas as pd
import numpy as np
from pydataset import data

import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
import sklearn.preprocessing
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

from sklearn import preprocessing
from sklearn.impute import SimpleImputer

import scipy as sp
from pydataset import data
from env import user, password, host

import warnings
warnings.filterwarnings("ignore")

import wrangle as w
import os
directory = os.getcwd()
seed = 3333

## Acquire

#### * Add information about how, where and when you acquired your data
- From the CodeUp mySQL server, the Telco database was gathered during the Tobias cohort.
#### * How/where did you get your data?
- 
#### * When did you get your data?
- 
#### * What is the size of your data? (columns and rows)
- initially (7043 rows, 24 columns) 
#### * What does each observation represent?
- 
#### * What does each column represent?
- 

In [2]:
# Aquiring Telco data
# the main database has 7043 rows of Telco customers
# and 24 columns pertaining to Telco observations on each customer
telco = w.new_telco_data()
telco.shape

(7043, 24)

In [3]:
#check for inconsistencies, possible duplicate columns, wrong dtypes, and null/nan values.
telco.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 24 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   payment_type_id           7043 non-null   int64  
 1   internet_service_type_id  7043 non-null   int64  
 2   contract_type_id          7043 non-null   int64  
 3   customer_id               7043 non-null   object 
 4   gender                    7043 non-null   object 
 5   senior_citizen            7043 non-null   int64  
 6   partner                   7043 non-null   object 
 7   dependents                7043 non-null   object 
 8   tenure                    7043 non-null   int64  
 9   phone_service             7043 non-null   object 
 10  multiple_lines            7043 non-null   object 
 11  online_security           7043 non-null   object 
 12  online_backup             7043 non-null   object 
 13  device_protection         7043 non-null   object 
 14  tech_sup

# Prepare

#### Prepare

- perform univariate stats
- clean up your data
- encode your data
- split your data

### Prepare Actions:

- Removed columns that did not contain useful information
- Renamed columns to promote readability
- Checked for nulls in the data (there were none)
- Checked that column data types were appropriate
- Removed white space from values in object columns
- Added Target column 'upset' indicating weather the lower rated player won the game
- Added additional features to investigate:
    - Rating Difference
    - Game Rating
    - Lower Rated White
    - Time Control Group
    - Upset
- Encoded categorical variables
- Split data into train, validate and test (approx. 60/25/15), stratifying on 'upset'
- Scaled continuous variable
- Outliers have not been removed for this iteration of the project

In [4]:
# Creating a .csv file of Telco data for faster output
telco = w.get_telco_data()
telco.shape

(7043, 25)

In [5]:
telco.head(3)

Unnamed: 0.1,Unnamed: 0,payment_type_id,internet_service_type_id,contract_type_id,customer_id,gender,senior_citizen,partner,dependents,tenure,...,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,contract_type,internet_service_type,payment_type
0,0,2,1,2,0002-ORFBO,Female,0,Yes,Yes,9,...,Yes,Yes,No,Yes,65.6,593.3,No,One year,DSL,Mailed check
1,1,2,1,1,0003-MKNFE,Male,0,No,No,9,...,No,No,Yes,No,59.9,542.4,No,Month-to-month,DSL,Mailed check
2,2,1,2,1,0004-TLHLJ,Male,0,No,No,4,...,No,No,No,Yes,73.9,280.85,Yes,Month-to-month,Fiber optic,Electronic check


In [6]:
list(telco.columns)

['Unnamed: 0',
 'payment_type_id',
 'internet_service_type_id',
 'contract_type_id',
 'customer_id',
 'gender',
 'senior_citizen',
 'partner',
 'dependents',
 'tenure',
 'phone_service',
 'multiple_lines',
 'online_security',
 'online_backup',
 'device_protection',
 'tech_support',
 'streaming_tv',
 'streaming_movies',
 'paperless_billing',
 'monthly_charges',
 'total_charges',
 'churn',
 'contract_type',
 'internet_service_type',
 'payment_type']

In [7]:
# find dtypes of columns to make sure the data types are the right type
# i can tell that total_charges is wrongfully an 'object' so it will have to chage to a float!

telco.dtypes

Unnamed: 0                    int64
payment_type_id               int64
internet_service_type_id      int64
contract_type_id              int64
customer_id                  object
gender                       object
senior_citizen                int64
partner                      object
dependents                   object
tenure                        int64
phone_service                object
multiple_lines               object
online_security              object
online_backup                object
device_protection            object
tech_support                 object
streaming_tv                 object
streaming_movies             object
paperless_billing            object
monthly_charges             float64
total_charges                object
churn                        object
contract_type                object
internet_service_type        object
payment_type                 object
dtype: object

In [8]:
telco?

[0;31mType:[0m        DataFrame
[0;31mString form:[0m
Unnamed: 0  payment_type_id  internet_service_type_id  contract_type_id  \
           0              0 <...>  DSL      Mailed check
           7042                   DSL  Electronic check
           
           [7043 rows x 25 columns]
[0;31mLength:[0m      7043
[0;31mFile:[0m        /opt/homebrew/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py
[0;31mDocstring:[0m  
Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.

Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
    Dict can contain Series, arrays, constants, dataclass or list-like objects. If
    data is a dict, column order follows insertion-order. If a dict contains Series
 

In [9]:
telco.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 25 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Unnamed: 0                7043 non-null   int64  
 1   payment_type_id           7043 non-null   int64  
 2   internet_service_type_id  7043 non-null   int64  
 3   contract_type_id          7043 non-null   int64  
 4   customer_id               7043 non-null   object 
 5   gender                    7043 non-null   object 
 6   senior_citizen            7043 non-null   int64  
 7   partner                   7043 non-null   object 
 8   dependents                7043 non-null   object 
 9   tenure                    7043 non-null   int64  
 10  phone_service             7043 non-null   object 
 11  multiple_lines            7043 non-null   object 
 12  online_security           7043 non-null   object 
 13  online_backup             7043 non-null   object 
 14  device_p

In [10]:
telco.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Unnamed: 0,7043.0,3521.0,2033.283305,0.0,1760.5,3521.0,5281.5,7042.0
payment_type_id,7043.0,2.315633,1.148907,1.0,1.0,2.0,3.0,4.0
internet_service_type_id,7043.0,1.872923,0.737796,1.0,1.0,2.0,2.0,3.0
contract_type_id,7043.0,1.690473,0.833755,1.0,1.0,1.0,2.0,3.0
senior_citizen,7043.0,0.162147,0.368612,0.0,0.0,0.0,0.0,1.0
tenure,7043.0,32.371149,24.559481,0.0,9.0,29.0,55.0,72.0
monthly_charges,7043.0,64.761692,30.090047,18.25,35.5,70.35,89.85,118.75


In [11]:
# Count all of the unique values for dtype='object'

for col in telco.columns.to_list():
    if telco[col].dtypes == 'object':
        print(f'{col} has-  {telco[col].nunique()}  -unique values.')

customer_id has-  7043  -unique values.
gender has-  2  -unique values.
partner has-  2  -unique values.
dependents has-  2  -unique values.
phone_service has-  2  -unique values.
multiple_lines has-  3  -unique values.
online_security has-  3  -unique values.
online_backup has-  3  -unique values.
device_protection has-  3  -unique values.
tech_support has-  3  -unique values.
streaming_tv has-  3  -unique values.
streaming_movies has-  3  -unique values.
paperless_billing has-  2  -unique values.
total_charges has-  6531  -unique values.
churn has-  2  -unique values.
contract_type has-  3  -unique values.
internet_service_type has-  3  -unique values.
payment_type has-  4  -unique values.


In [12]:
telco.internet_service_type.value_counts(dropna=False)

Fiber optic    3096
DSL            2421
None           1526
Name: internet_service_type, dtype: int64

In [13]:
# the "None" option in internet_service_type will be read as a 'nan' and will be an issue, 
# so will be replaced with 'no_int' representing no internet service
telco.loc[:,'internet_service_type'] = telco.internet_service_type.fillna('no_int')

In [14]:
# there seems to be 11 new cutomers who have no total_charges and can become an issue
telco.total_charges.value_counts(ascending=False)

           11
20.2       11
19.75       9
19.9        8
20.05       8
           ..
2387.75     1
6302.8      1
2058.5      1
829.55      1
3707.6      1
Name: total_charges, Length: 6531, dtype: int64

In [15]:
# will drop the 11 customers who have " " for the total_charges as those customers are in a "no churn period"
telco = telco[telco.total_charges != " "]
telco.total_charges.value_counts(ascending=False)

20.2       11
19.75       9
19.9        8
19.65       8
20.05       8
           ..
2387.75     1
6302.8      1
2058.5      1
829.55      1
3707.6      1
Name: total_charges, Length: 6530, dtype: int64

In [16]:
# will change the dtype from an object to a float for 'total_charges'
telco.total_charges = telco.total_charges.astype(float)

In [17]:
# have another look at the telco data
telco.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7032 entries, 0 to 7042
Data columns (total 25 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Unnamed: 0                7032 non-null   int64  
 1   payment_type_id           7032 non-null   int64  
 2   internet_service_type_id  7032 non-null   int64  
 3   contract_type_id          7032 non-null   int64  
 4   customer_id               7032 non-null   object 
 5   gender                    7032 non-null   object 
 6   senior_citizen            7032 non-null   int64  
 7   partner                   7032 non-null   object 
 8   dependents                7032 non-null   object 
 9   tenure                    7032 non-null   int64  
 10  phone_service             7032 non-null   object 
 11  multiple_lines            7032 non-null   object 
 12  online_security           7032 non-null   object 
 13  online_backup             7032 non-null   object 
 14  device_p

In [18]:
# comparing the numbers of contract_type_it and contract_type
pd.crosstab(telco.contract_type_id, telco.contract_type)

contract_type,Month-to-month,One year,Two year
contract_type_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,3875,0,0
2,0,1472,0
3,0,0,1685


In [19]:
# checking if the payment_type_id and payment_type are the same
pd.crosstab(telco.payment_type_id, telco.payment_type)

payment_type,Bank transfer (automatic),Credit card (automatic),Electronic check,Mailed check
payment_type_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0,0,2365,0
2,0,0,0,1604
3,1542,0,0,0
4,0,1521,0,0


In [20]:
# checking if the internet_service_type_id and internet_service_type are the same
pd.crosstab(telco.internet_service_type_id, telco.internet_service_type)

internet_service_type,DSL,Fiber optic,None
internet_service_type_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,2416,0,0
2,0,3096,0
3,0,0,1520


In [21]:
# after analizing those 6 columns, i only need 3 and will remove 3: ,
# 'internet_service_type_id',
# 'payment_type_id',
# 'contract_type_id'
telco = telco.drop(
    columns=[
        'internet_service_type_id',
        'payment_type_id',
        'contract_type_id'
        ])

In [22]:
# 'Unnamed: 0' will also be dropped and 'customer_id' will be put as the index
telco = telco.drop(columns='Unnamed: 0')
telco = telco.set_index('customer_id')

In [23]:
# since we are interested in customers who churn, i want to make sure there are no nan or null values.
# i also want to see the normal rate of churn and the only two outputs on churn
telco.churn.value_counts(dropna=False, normalize=True)

No     0.734215
Yes    0.265785
Name: churn, dtype: float64

In [24]:
telco.head()

Unnamed: 0_level_0,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,online_security,online_backup,device_protection,tech_support,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,contract_type,internet_service_type,payment_type
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
0002-ORFBO,Female,0,Yes,Yes,9,Yes,No,No,Yes,No,Yes,Yes,No,Yes,65.6,593.3,No,One year,DSL,Mailed check
0003-MKNFE,Male,0,No,No,9,Yes,Yes,No,No,No,No,No,Yes,No,59.9,542.4,No,Month-to-month,DSL,Mailed check
0004-TLHLJ,Male,0,No,No,4,Yes,No,No,No,Yes,No,No,No,Yes,73.9,280.85,Yes,Month-to-month,Fiber optic,Electronic check
0011-IGKFF,Male,1,Yes,No,13,Yes,No,No,Yes,Yes,No,Yes,Yes,Yes,98.0,1237.85,Yes,Month-to-month,Fiber optic,Electronic check
0013-EXCHZ,Female,1,Yes,No,3,Yes,No,No,No,No,Yes,Yes,No,Yes,83.9,267.4,Yes,Month-to-month,Fiber optic,Mailed check


In [25]:
# adding a column that will calculate the average tenure cost by taking total_charges and dividing it by tenure.
telco['avg_tenure_charges'] = telco['total_charges'] / telco['tenure']
telco.head()

Unnamed: 0_level_0,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,online_security,online_backup,device_protection,...,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,contract_type,internet_service_type,payment_type,avg_tenure_charges
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0002-ORFBO,Female,0,Yes,Yes,9,Yes,No,No,Yes,No,...,Yes,No,Yes,65.6,593.3,No,One year,DSL,Mailed check,65.922222
0003-MKNFE,Male,0,No,No,9,Yes,Yes,No,No,No,...,No,Yes,No,59.9,542.4,No,Month-to-month,DSL,Mailed check,60.266667
0004-TLHLJ,Male,0,No,No,4,Yes,No,No,No,Yes,...,No,No,Yes,73.9,280.85,Yes,Month-to-month,Fiber optic,Electronic check,70.2125
0011-IGKFF,Male,1,Yes,No,13,Yes,No,No,Yes,Yes,...,Yes,Yes,Yes,98.0,1237.85,Yes,Month-to-month,Fiber optic,Electronic check,95.219231
0013-EXCHZ,Female,1,Yes,No,3,Yes,No,No,No,No,...,Yes,No,Yes,83.9,267.4,Yes,Month-to-month,Fiber optic,Mailed check,89.133333


In [26]:
telco['avg_tenure_charges'] = telco['avg_tenure_charges'].round(2)
telco.head()

Unnamed: 0_level_0,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,online_security,online_backup,device_protection,...,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,contract_type,internet_service_type,payment_type,avg_tenure_charges
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0002-ORFBO,Female,0,Yes,Yes,9,Yes,No,No,Yes,No,...,Yes,No,Yes,65.6,593.3,No,One year,DSL,Mailed check,65.92
0003-MKNFE,Male,0,No,No,9,Yes,Yes,No,No,No,...,No,Yes,No,59.9,542.4,No,Month-to-month,DSL,Mailed check,60.27
0004-TLHLJ,Male,0,No,No,4,Yes,No,No,No,Yes,...,No,No,Yes,73.9,280.85,Yes,Month-to-month,Fiber optic,Electronic check,70.21
0011-IGKFF,Male,1,Yes,No,13,Yes,No,No,Yes,Yes,...,Yes,Yes,Yes,98.0,1237.85,Yes,Month-to-month,Fiber optic,Electronic check,95.22
0013-EXCHZ,Female,1,Yes,No,3,Yes,No,No,No,No,...,Yes,No,Yes,83.9,267.4,Yes,Month-to-month,Fiber optic,Mailed check,89.13


#### Call a function that does all the previous steps and separates the data into train, validate, test.

In [27]:
train, val, test = w.split_data_telco(telco)
train.shape, val.shape, test.shape

((4218, 21), (1407, 21), (1407, 21))

In [28]:
train.head()

Unnamed: 0_level_0,gender,senior_citizen,partner,dependents,tenure,phone_service,multiple_lines,online_security,online_backup,device_protection,...,streaming_tv,streaming_movies,paperless_billing,monthly_charges,total_charges,churn,contract_type,internet_service_type,payment_type,avg_tenure_charges
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6260-ONULR,Male,0,No,No,1,Yes,No,No,No,No,...,Yes,Yes,Yes,62.8,62.8,No,Month-to-month,DSL,Mailed check,62.8
6857-TKDJV,Male,0,Yes,Yes,67,Yes,Yes,No internet service,No internet service,No internet service,...,No internet service,No internet service,No,24.65,1620.45,No,Two year,,Bank transfer (automatic),24.19
1935-IMVBB,Male,0,Yes,No,56,Yes,No,No,No,No,...,Yes,Yes,No,89.7,4952.95,No,Month-to-month,Fiber optic,Mailed check,88.45
6860-YRJZP,Male,1,No,No,9,Yes,Yes,No,No,No,...,No,No,Yes,74.05,678.45,No,Month-to-month,Fiber optic,Electronic check,75.38
0781-LKXBR,Male,1,No,No,9,Yes,Yes,No,No,Yes,...,Yes,Yes,Yes,100.5,918.6,Yes,Month-to-month,Fiber optic,Electronic check,102.07


In [29]:
print(train['total_charges'].dtype)
print(val['total_charges'].dtype)
print(test['total_charges'].dtype)

float64
float64
float64


In [33]:
train_prepped, val_prepped, test_prepped = w.prep_telco_data(telco)

In [34]:
train_prepped.head()

Unnamed: 0_level_0,gender,senior_citizen,partner,dependents,tenure,phone_service,paperless_billing,monthly_charges,total_charges,churn,...,streaming_tv_Yes,streaming_movies_No internet service,streaming_movies_Yes,contract_type_One year,contract_type_Two year,internet_service_type_Fiber optic,internet_service_type_None,payment_type_Credit card (automatic),payment_type_Electronic check,payment_type_Mailed check
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6260-ONULR,Male,0,No,No,1,Yes,Yes,62.8,62.8,No,...,1,0,1,0,0,0,0,0,0,1
6857-TKDJV,Male,0,Yes,Yes,67,Yes,No,24.65,1620.45,No,...,0,1,0,0,1,0,1,0,0,0
1935-IMVBB,Male,0,Yes,No,56,Yes,No,89.7,4952.95,No,...,1,0,1,0,0,1,0,0,0,1
6860-YRJZP,Male,1,No,No,9,Yes,Yes,74.05,678.45,No,...,0,0,0,0,0,1,0,0,1,0
0781-LKXBR,Male,1,No,No,9,Yes,Yes,100.5,918.6,Yes,...,1,0,1,0,0,1,0,0,1,0


In [35]:
train_prepped.nunique()

gender                                      2
senior_citizen                              2
partner                                     2
dependents                                  2
tenure                                     72
phone_service                               2
paperless_billing                           2
monthly_charges                          1360
total_charges                            4010
churn                                       2
avg_tenure_charges                       3194
gender_encoded                              2
partner_encoded                             2
dependents_encoded                          2
phone_service_encoded                       2
paperless_billing_encoded                   2
churn_encoded                               2
multiple_lines_No phone service             2
multiple_lines_Yes                          2
online_security_No internet service         2
online_security_Yes                         2
online_backup_No internet service 

In [36]:
train_prepped.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4218 entries, 6260-ONULR to 5180-UCIIQ
Data columns (total 38 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   gender                                 4218 non-null   object 
 1   senior_citizen                         4218 non-null   int64  
 2   partner                                4218 non-null   object 
 3   dependents                             4218 non-null   object 
 4   tenure                                 4218 non-null   int64  
 5   phone_service                          4218 non-null   object 
 6   paperless_billing                      4218 non-null   object 
 7   monthly_charges                        4218 non-null   float64
 8   total_charges                          4218 non-null   float64
 9   churn                                  4218 non-null   object 
 10  avg_tenure_charges                     4218 non-null   float64

# Data Dictionary

# A brief look at the data
train.head()

# A summery of the data
train.describe()

# Explore
# How often do upsets occur?
#### #get pie chart upsets
e.get_pie_upsets(train)