# Churn Customers

### Introduction
This IBM Sample Dataset has information about Telco customers and if they left the company within the last month (churn).

Basic information:
Customers who left within the last month – the column is called Churn.
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies.
Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges.
Demographic info about customers – gender, age range, and if they have partners and dependents.
There are 21 columns with 19 features.

#### Objective
I will explore the data and try to answer some questions like:

Customer churn measures how and why are customers leaving the business










### Importing Libraries

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras




### Loading data

In [None]:
df = pd.read_csv("C:/Users/STSC/Desktop/ChurnDataset/Churndata.csv")
df.sample(5)

### Pre-processing

In [None]:
#Checking for datatypes
#Wesee that Total charges is an object. We need to change that to float
df.dtypes

In [None]:
df.TotalCharges.values

In [None]:
df['TotalCharges'] = df['TotalCharges'].replace(" ", 0).astype('float64')

In [None]:
df.dtypes

### Data Visualization 

##### Viz1 : 
No. of customers Vs. Tenure

In [None]:
tenure_churn_no = df[df.Churn=='No'].tenure
tenure_churn_yes = df[df.Churn=='Yes'].tenure

plt.xlabel("tenure")
plt.ylabel("Number Of Customers")
plt.title("Customer Churn Prediction Visualization")

blood_sugar_men = [113, 85, 90, 150, 149, 88, 93, 115, 135, 80, 77, 82, 129]
blood_sugar_women = [67, 98, 89, 120, 133, 150, 84, 69, 89, 79, 120, 112, 100]

plt.hist([tenure_churn_yes, tenure_churn_no], rwidth=0.95, color=['green','red'],label=['Churn=Yes','Churn=No'])
plt.legend()

#In this graph, we have the number of customers who will be leaving and not leaving the company, and we have the tenure of months that they have been apart of the company till now.

##### Churn Vs. Count

Churn: No - 72.4%

Churn: Yes - 27.6%

In [None]:
ax = sns.catplot(y="Churn", kind="count", data=df, height=2.6, aspect=2.5, orient='h')

##### Numeric Features

There are only three numerical columns: tenure, monthly charges and total charges. The probability density distribution can be estimate using the seaborn kdeplot function.

In [None]:
def kdeplot(feature):
    plt.figure(figsize=(9, 4))
    plt.title("KDE for {}".format(feature))
    ax0 = sns.kdeplot(df[df['Churn'] == 'No'][feature].dropna(), color= 'navy', label= 'Churn: No')
    ax1 = sns.kdeplot(df[df['Churn'] == 'Yes'][feature].dropna(), color= 'orange', label= 'Churn: Yes')
kdeplot('tenure')
kdeplot('MonthlyCharges')
kdeplot('TotalCharges')

From the plots above we can conclude that:

Recent clients are more likely to churn.

Clients with higher MonthlyCharges are also more likely to churn.

Tenure and MonthlyCharges are probably important features.

##### Categorical feautures
This dataset has 16 categorical features.

###### Gender and Age(SeniorCitizen) 

In [None]:
def barplot_percentages(feature, orient='v', axis_name="percentage of customers"):
    ratios = pd.DataFrame()
    g = df.groupby(feature)["Churn"].value_counts().to_frame()
    g = g.rename({"Churn": axis_name}, axis=1).reset_index()
    g[axis_name] = g[axis_name]/len(df)
    if orient == 'v':
        ax = sns.barplot(x=feature, y= axis_name, hue='Churn', data=g, orient=orient)
        ax.set_yticklabels(['{:,.0%}'.format(y) for y in ax.get_yticks()])
    else:
        ax = sns.barplot(x= axis_name, y=feature, hue='Churn', data=g, orient=orient)
        ax.set_xticklabels(['{:,.0%}'.format(x) for x in ax.get_xticks()])
    ax.plot()
barplot_percentages("SeniorCitizen")

In [None]:
df['churn_rate'] = df['Churn'].replace("No", 0).replace("Yes", 1)
g = sns.FacetGrid(df, col="SeniorCitizen", height=4, aspect=.9)
ax = g.map(sns.barplot, "gender", "churn_rate", palette = "Blues_d", order= ['Female', 'Male'])

Gender is not an indicative of churn.
SeniorCitizens are only 16% of customers, but they have a much higher churn rate: 42% against 23% for non-senior customers.

##### Phone and Internet services
There are only two features here: 
If the client has phone and if he has more than one line.
Both can be summed up in one chart:

In [None]:
plt.figure(figsize=(9, 4.5))
barplot_percentages("MultipleLines", orient='h')

Customer with multiple lines have higher of churn rate.

In [None]:
plt.figure(figsize=(9, 4.5))
barplot_percentages("InternetService", orient="h")

##### Additional service analysis

The first plot shows the total number of customers for each additional service, while the second shows the number of clients that churn.

In [None]:
#There are six additional services for customers with internet:
cols = ["OnlineSecurity", "OnlineBackup", "DeviceProtection", "TechSupport", "StreamingTV", "StreamingMovies"]
df1 = pd.melt(df[df["InternetService"] != "No"][cols]).rename({'value': 'Has service'}, axis=1)
plt.figure(figsize=(10, 4.5))
ax = sns.countplot(data=df1, x='variable', hue='Has service')
ax.set(xlabel='Additional service', ylabel='Num of customers')
plt.show()

In [None]:
plt.figure(figsize=(10, 4.5))
df1 = df[(df.InternetService != "No") & (df.Churn == "Yes")]
df1 = pd.melt(df1[cols]).rename({'value': 'Has service'}, axis=1)
ax = sns.countplot(data=df1, x='variable', hue='Has service', hue_order=['No', 'Yes'])
ax.set(xlabel='Additional service', ylabel='Num of churns')
plt.show()

Customers with the OnlineSecurity, Backup,Protection and tech support are more unlikely to churn.

##### Contract and Payment

Customers with paperless billing are more probable to churn.
The preferred payment method is Electronic check with around 35% of customers. This also has a very high churn rate.

In [None]:
g = sns.FacetGrid(df, col="PaperlessBilling", height=4, aspect=.9)
ax = g.map(sns.barplot, "Contract", "churn_rate", palette = "Blues_d", order= ['Month-to-month', 'One year', 'Two year'])

In [None]:
plt.figure(figsize=(9, 4.5))
barplot_percentages("PaymentMethod", orient='h')

##### Correlation between features 

In [None]:
plt.figure(figsize=(12, 6))
df.drop(['customerID', 'churn_rate'],
        axis=1, inplace=True)
corr = df.apply(lambda x: pd.factorize(x)[0]).corr()
ax = sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, 
                 linewidths=.2, cmap="YlGnBu")

### Encoding 

In [None]:
df.head()

In [None]:
#Let's print unique values in object columns to see data values
def print_unique_col_values(df):
       for column in df:
            if df[column].dtypes=='object':
                print(f'{column}: {df[column].unique()}')

In [None]:
print_unique_col_values(df)

In [None]:
#Some of the columns have no internet service or no phone service, that can be replaced with a simple No
df.replace('No internet service','No',inplace=True)
df.replace('No phone service','No',inplace=True)

In [None]:
print_unique_col_values(df)

In [None]:
df

In [None]:
#Here, converting Yes= 1, No =0

yes_no_columns = ['Partner','Dependents','PhoneService','MultipleLines','OnlineSecurity','OnlineBackup',
                  'DeviceProtection','TechSupport','StreamingTV','StreamingMovies','PaperlessBilling','Churn']
for col in yes_no_columns:
    df[col].replace({'Yes': 1,'No': 0},inplace=True)

In [1]:
for col in df:
    print(f'{col}: {df[col].unique()}')

NameError: name 'df' is not defined

In [None]:

df['gender'].replace({'Female':1,'Male':0},inplace=True)

In [None]:

df.gender.unique()

##### One hot encoding for categorical columns 

In [None]:

df1 = pd.get_dummies(data=df, columns=['InternetService','Contract','PaymentMethod'])
df1.columns

In [None]:
df1.sample(5)

In [None]:

df1.dtypes

In [None]:
cols_to_scale = ['tenure','MonthlyCharges','TotalCharges']

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df1[cols_to_scale] = scaler.fit_transform(df1[cols_to_scale])

In [None]:

for col in df1:
    print(f'{col}: {df1[col].unique()}')

##### Train test split 

In [4]:
X = df1.drop('Churn',axis='columns')
y = df1['Churn']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=5)

NameError: name 'df1' is not defined

In [5]:

X_train.shape

NameError: name 'X_train' is not defined

In [6]:
X_test.shape

NameError: name 'X_test' is not defined

In [213]:
X_train[:10]

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,OnlineSecurity,OnlineBackup,DeviceProtection,...,InternetService_DSL,InternetService_Fiber optic,InternetService_No,Contract_Month-to-month,Contract_One year,Contract_Two year,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
5860,1,0,0,0,0.027778,1,0,0,0,0,...,0,1,0,1,0,0,1,0,0,0
2458,0,1,1,0,0.694444,1,1,1,0,1,...,0,1,0,0,0,1,0,1,0,0
5879,0,0,1,0,0.458333,1,0,1,1,0,...,1,0,0,0,0,1,0,0,0,1
4708,1,0,1,1,0.777778,1,0,1,1,1,...,1,0,0,0,0,1,0,1,0,0
1293,0,0,1,1,0.930556,1,1,0,1,1,...,0,1,0,0,0,1,1,0,0,0
2242,0,0,1,1,0.611111,1,1,0,0,0,...,0,0,1,0,1,0,0,0,1,0
1444,0,0,0,1,0.569444,1,0,1,1,1,...,0,1,0,0,0,1,0,0,1,0
3269,0,0,0,0,0.902778,1,1,0,0,0,...,0,0,1,0,1,0,0,1,0,0
101,1,0,1,1,0.013889,1,0,0,0,0,...,0,0,1,1,0,0,0,0,1,0
4191,1,0,1,0,0.875,1,1,0,1,1,...,0,1,0,1,0,0,0,0,1,0


In [214]:
len(X_train.columns)

26

### Build a model (ANN) in tensorflow/keras 

In [2]:
model = keras.Sequential([
    keras.layers.Dense(26, input_shape=(26,), activation='relu'),
    keras.layers.Dense(15, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

# opt = keras.optimizers.Adam(learning_rate=0.01)

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=100)

NameError: name 'keras' is not defined

In [1]:
model.evaluate(X_test, y_test)

NameError: name 'model' is not defined