### Exploring Customer Segmentation


<center>
    <img src = images/segments.jpeg>
</center>


In this activity, you are tasked with profiling customer groups for a large telecommunications company.  The data provided contains information on customers purchasing and useage behavior with the telecom products.  Your goal is to use PCA and clustering to segment these customers into meaningful groups, and report back your findings.  

Because these results need to be interpretable, it is important to keep the number of clusters reasonable.  Think about how you might represent some of the non-numeric features so that they can be included in your segmentation models.  You are to report back your approach and findings to the class.  Be specific about what features were used and how you interpret the resulting clusters.

In [13]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.cluster import KMeans, DBSCAN
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [80]:
df = pd.read_csv('data/telco_churn_data.csv')

In [46]:
df.head()
#df.describe()

Unnamed: 0,Customer ID,Referred a Friend,Number of Referrals,Tenure in Months,Offer,Phone Service,Avg Monthly Long Distance Charges,Multiple Lines,Internet Service,Internet Type,...,Latitude,Longitude,Population,Churn Value,CLTV,Churn Category,Churn Reason,Total Customer Svc Requests,Product/Service Issues Reported,Customer Satisfaction
0,8779-QRDMV,No,0,1,,No,0.0,No,Yes,Fiber Optic,...,34.02381,-118.156582,68701,1,5433,Competitor,Competitor offered more data,5,0,
1,7495-OOKFY,Yes,1,8,Offer E,Yes,48.85,Yes,Yes,Cable,...,34.044271,-118.185237,55668,1,5302,Competitor,Competitor made better offer,5,0,
2,1658-BYGOY,No,0,18,Offer D,Yes,11.33,Yes,Yes,Fiber Optic,...,34.108833,-118.229715,47534,1,3179,Competitor,Competitor made better offer,1,0,
3,4598-XLKNJ,Yes,1,25,Offer C,Yes,19.76,No,Yes,Fiber Optic,...,33.936291,-118.332639,27778,1,5337,Dissatisfaction,Limited range of services,1,1,2.0
4,4846-WHAFZ,Yes,1,37,Offer C,Yes,6.33,Yes,Yes,Cable,...,33.972119,-118.020188,26265,1,2793,Price,Extra data charges,1,0,2.0


In [None]:
df.info()

## Data Cleaning

1. Turn Yes/No columns to 1/0 columns

2. Other columns appear to be important but they also have nonnuneric data. Turn that data into numerical data
   too, with 3 categories getting 0,1,2 , 4 categories getting 0,1,2,3 , etc...

#### Value counts cell
To see what resonses were given and should get converted to numerical

In [None]:
# Value counts cell to check the various types of responses for each category
df['Churn Category'].value_counts()




### Convert categories to numbers
Numerical data, please.

In [77]:
# df = pd.read_csv('data/telco_churn_data.csv')

In [None]:
# Turn all the Yes/No answers to 1/0
df['Referred a Friend'] = df['Referred a Friend'].replace({'No': 0, 'Yes': 1})
df['Phone Service'] = df['Phone Service'].map({'Yes': 1, 'No': 0})
df['Multiple Lines'] = df['Multiple Lines'].map({'Yes': 1, 'No': 0})
df['Internet Service'] = df['Internet Service'].map({'Yes': 1, 'No': 0})
df['Online Security'] = df['Online Security'].map({'Yes': 1, 'No': 0})
df['Online Backup'] = df['Online Backup'].map({'Yes': 1, 'No': 0})
df['Device Protection Plan'] = df['Device Protection Plan'].map({'Yes': 1, 'No': 0})
df['Premium Tech Support'] = df['Premium Tech Support'].map({'Yes': 1, 'No': 0})
df['Streaming TV'] = df['Streaming TV'].map({'Yes': 1, 'No': 0})
df['Streaming Movies'] = df['Streaming Movies'].map({'Yes': 1, 'No': 0})
df['Streaming Music'] = df['Streaming Music'].map({'Yes': 1, 'No': 0})
df['Unlimited Data'] = df['Unlimited Data'].map({'Yes': 1, 'No': 0})
df['Paperless Billing'] = df['Paperless Billing'].map({'Yes': 1, 'No': 0})
df['Under 30'] = df['Under 30'].map({'Yes': 1, 'No': 0})
df['Senior Citizen'] = df['Senior Citizen'].map({'Yes': 1, 'No': 0})
df['Married'] = df['Married'].map({'Yes': 1, 'No': 0})
df['Dependents'] = df['Dependents'].map({'Yes': 1, 'No': 0})

# Change columns with string options to numbers
df['Offer'] = df['Offer'].map({'None': 0, 'Offer A': 1, 'Offer B': 2, 'Offer C': 3, 'Offer D': 4, 'Offer E': 5})

df['Internet Type'] = df['Internet Type'].map({'None':0, 'Fiber Optic':1, 'DSL':2, 'Cable':3})

df['Contract'] = df['Contract'].map({'Month-to-Month':0, 'One Year':1, 'Two Year':2})

df['Payment Method'] = df['Payment Method'].map({'Bank Withdrawal':0, 'Credit Card':1, 'Mailed Check':2})

df['Gender'] = df['Gender'].map({'Male':0, 'Female':1})

#For some reason, the 'Offer' category has NaNs instead of 0s. This fixes that
df.fillna(0, inplace=True, downcast='infer')

# df.to_csv('data/df2.csv')
df.head(30)


## Drop non-numeric columns
Bye!

In [85]:
object_cols = df.select_dtypes('object').columns.tolist()
df_numeric = df.drop(object_cols,axis=1)

In [88]:
df_numeric.head(30)
df_numeric.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 42 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Referred a Friend                  7043 non-null   int64  
 1   Number of Referrals                7043 non-null   int64  
 2   Tenure in Months                   7043 non-null   int64  
 3   Offer                              7043 non-null   int64  
 4   Phone Service                      7043 non-null   int64  
 5   Avg Monthly Long Distance Charges  7043 non-null   float64
 6   Multiple Lines                     7043 non-null   int64  
 7   Internet Service                   7043 non-null   int64  
 8   Internet Type                      7043 non-null   int64  
 9   Avg Monthly GB Download            7043 non-null   int64  
 10  Online Security                    7043 non-null   int64  
 11  Online Backup                      7043 non-null   int64

## Scale the data

In [None]:
# Since the Churn Category and Churn Reason are both present for 1870/7044 rows,
# let's treat that dataset separately and see what happens there, in addition to
# the whole set with only those categories dropped

