# **Telcom customers churn** - **Quiz Assessment**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/logicalclocks/hopsworks-tutorials/blob/master/quickstart.ipynb)

# **Context**

The Telco customer churn data contains information about a fictional telco company that provided home phone and Internet services to 7043 customers in California in Q3. It indicates which customers have left, stayed, or signed up for their service. Multiple important demographics are included for each customer, as well as a Satisfaction Score, Churn Score, and Customer Lifetime Value (CLTV) index.

# **Objective**

**What is Churn Analysis ?**

Customer churn analysis is the process of using your churn data to understand :

* Which customers are leaving ?
* Why are they leaving ?
* What can you do to reduce churn ?

As you may have guessed, churn analysis goes beyond just looking at your customer churn rate. It’s about discovering the underlying causes behind your numbers.

Ultimately, successful churn analysis will give you the valuable insights you need to start reducing your business’s customer attrition rate.


**You, as a data scientist at the telco company, have been provided the the following dataset to :**

* **Analyze and build an ML model to help identify which customers are more likely to churn.**
* **Find the factors driving the customer churn process.**
* **Create a profile of the customers which are likely to churn.**

# **Data Description**

The data contains the different attributes of customers and their interaction details with the telco company. The detailed data dictionary is given below.


**Data Dictionary**

Variable | Description
-- | --
CustomerID | A unique ID that identifies each customer.
Gender | The customer’s gender: Male, Female
SeniorCitizen | Indicates if the customer is 65 or older: Yes, No
Married | Indicates if the customer is married: Yes, No
Dependents | Indicates if the customer lives with any dependents: Yes, No. Dependents   could be children, parents, grandparents, etc.
Tenure | Indicates the total amount of months that the customer has been with the   company by the end of the quarter specified above.
PhoneService | Indicates if the customer subscribes to home phone service with the   company: Yes, No
MultipleLines | Indicates if the customer subscribes to multiple telephone lines with the   company: Yes, No
InternetService | Indicates if the customer subscribes to Internet service with the   company: No, DSL, Fiber Optic, Cable.
OnlineSecurity | Indicates if the customer subscribes to an additional online security   service provided by the company: Yes, No
OnlineBackup | Indicates if the customer subscribes to an additional online backup   service provided by the company: Yes, No
DeviceProtection | Indicates if the customer subscribes to an additional device protection   plan for their Internet equipment provided by the company: Yes, No
TechSupport | Indicates if the customer subscribes to an additional technical support   plan from the company with reduced wait times: Yes, No
StreamingTV | Indicates if the customer uses their Internet service to stream   television programing from a third party provider: Yes, No. The company does   not charge an additional fee for this service.
StreamingMovies | Indicates if the customer uses their Internet service to stream movies   from a third party provider: Yes, No. The company does not charge an   additional fee for this service.
Contract | Indicates the customer’s current contract type: Month-to-Month, One Year,   Two Year.
PaperlessBilling | Indicates if the customer has chosen paperless billing: Yes, No
PaymentMethod | Indicates how the customer pays their bill: Bank Withdrawal, Credit Card,   Mailed Check
MonthlyCharges | Indicates the customer’s current total monthly charge for all their   services from the company.
TotalCharges | Indicates the customer’s total charges, calculated to the end of the   quarter specified above.
Churn | Yes = the customer left the company this quarter. No = the customer   remained with the company.

# **Importing libraries**

In [None]:
import warnings

warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning

warnings.simplefilter("ignore", ConvergenceWarning)

# Libraries to help with reading and manipulating data

import pandas as pd
import numpy as np

# Library to split data
from sklearn.model_selection import train_test_split

# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# setting the precision of floating numbers to 5 decimal points
pd.set_option("display.float_format", lambda x: "%.5f" % x)

# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve

# To tune different models
from sklearn.model_selection import GridSearchCV


# To get diferent metric scores
import sklearn.metrics as metrics
from sklearn.metrics import (
    f1_score,
    accuracy_score,
    recall_score,
    precision_score,
    confusion_matrix,
    classification_report,
    roc_auc_score,
    precision_recall_curve,
    roc_curve,
    make_scorer,
)

# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
from sklearn.preprocessing import StandardScaler

# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
from sklearn.preprocessing import MinMaxScaler

# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
# Encode categorical features as a one-hot numeric array.
from sklearn.preprocessing import OneHotEncoder

# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
# Encode target labels with value between 0 and n_classes-1.
# This transformer should be used to encode target values, i.e. y, and not the input X.
from sklearn.preprocessing import LabelEncoder

# https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html
# Univariate imputer for completing missing values with simple strategies.
# Replace missing values using a descriptive statistic (e.g. mean, median, or most frequent) along each column, or using a constant value.
from sklearn.impute import SimpleImputer


from sklearn import set_config
from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dense, Input, Dropout,BatchNormalization
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras import backend

# **Data Overview**

In [None]:
# Let's mount the G. Drive in order to access to the dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Read the dataset from csv file into a pandas df
df = pd.read_excel("/content/drive/********************/Telco-Customer-Churn.xlsx")

In [None]:
# Copy the data to another variable to avoid any changes to original data
data = df.copy()

In [None]:
# Let's view the first 5 records from our dataset
data.head().T

In [None]:
data.shape

In [None]:
data.info()

In [None]:
# Checking for missing values in the data
data.isnull().sum()

In [None]:
# Checking unique values in each categorical column
data.select_dtypes('object').nunique()

In [None]:
# Checking for duplicate values
data.duplicated().sum()

In [None]:
# Dropping the columns
data = data.drop(['CustomerID'], axis=1)

# Creating numerical columns
num_cols = ['Tenure',	'MonthlyCharges',	'TotalCharges']

# Creating categorical variables
cat_cols = ['Gender',	'SeniorCitizen',	'Married',	'Dependents',	'PhoneService',	'MultipleLines',
            'InternetService',	'OnlineSecurity',	'OnlineBackup',	'DeviceProtection',	'TechSupport',
            'StreamingTV',	'StreamingMovies',	'Contract',	'PaperlessBilling',	'PaymentMethod']

## **Question (1)**

Update the bellow cell by providing your comments on following points

* Why CustomerID has been droped ?
* Why do we need to separate numerical & categorical variables ?
* Is there any issues with the dataset ? if yes what's the startegy to fix it ?

## **Response (1)**

**Observations**
* ....
* ....

# **Exploratory Data Analysis**

## **Univariate Analysis**

### Categorical variables

In [None]:
data[cat_cols].describe().T

In [None]:
sns.set_style('whitegrid')
# Let's plot the counplot for each categorical variable
# Sow each 3 variables in one line of the grid
cat_col = [
            ['Gender',	'SeniorCitizen',	'Married'],
            ['Dependents', 'PhoneService',	'MultipleLines'],
            ['InternetService',	'OnlineSecurity', 'OnlineBackup'],
            ['DeviceProtection',	'TechSupport', 'StreamingTV'],
            ['StreamingMovies',	'Contract',	'PaperlessBilling'],
            ['PaymentMethod']
           ]
fig, axes = plt.subplots(6, 3, figsize = (15, 20))
ax_i = 0
for i in cat_col:
  ax_j = 0
  for j in i:
    plot = sns.countplot(ax = axes[ax_i, ax_j], x = j, data = data, order = data[j].value_counts().index, palette='colorblind')
    plot.set_xticklabels(plot.get_xticklabels(), rotation=20, ha="right")
    for p in plot.patches:
      perc = '{:.1f}% ({:.1f})'.format(100 * p.get_height() / len(data[j]), len(data[j])) # Percentage of each class of the category
      x = p.get_x() + p.get_width() / 2  # Width of the plot
      y = p.get_height()                 # Height of the plot
      plot.annotate(perc, (x, y), ha = "center", va = "center", size = 10, xytext = (0, 5), textcoords = "offset points")        # Annotate the percentage
    # Move to the next position in the grid line
    ax_j = ax_j + 1
  # Move to the next line of the grid
  ax_i = ax_i + 1

# set the spacing between subplots
fig.tight_layout()
plt.show()

#### **Question (2)**

Update the bellow cell providing your observations on the distributuon of each categorical variable

#### **Response (2)**

**Observations**
* ....
* ....
* ....

### Continuous variables

In [None]:
data[num_cols].describe().T

In [None]:
# Plot the boxplot and historgramme for each numerical variable
fig, axes = plt.subplots(2, 3, figsize = (18, 5))
ax = 0
for i in num_cols:
  sns.boxplot(data=data, x=i, ax=axes[0, ax], showmeans = True, color = "violet")
  sns.histplot(data=data, x=i, kde = True, ax = axes[1, ax], palette = "winter")
  # Move to the next position in the same line of the grid
  ax = ax + 1
# set the spacing between subplots
fig.tight_layout()
plt.show()

#### **Question (3)**

Update the bellow cell providing your observations on the distributuon of each numerical variable

#### **Response (3)**

**Observations**
* ....
* ....
* ....

## **Bivariate Analysis**

### Categorical variables

In [None]:
sns.set_style('whitegrid')
# Let's plot a barplot for each categorical variable showing how it participates in the conversion of leads
# Sow each 3 variables in one line of the grid
cat_col = [
            ['Gender',	'SeniorCitizen',	'Married'],
            ['Dependents', 'PhoneService',	'MultipleLines'],
            ['InternetService',	'OnlineSecurity', 'OnlineBackup'],
            ['DeviceProtection',	'TechSupport', 'StreamingTV'],
            ['StreamingMovies',	'Contract',	'PaperlessBilling'],
            ['PaymentMethod']
           ]
fig, axes = plt.subplots(6, 3, figsize = (15, 20))
ax_i = 0
for i in cat_col:
  ax_j = 0
  for j in i:
    plot = (pd.crosstab(data[j], data['Churn'], normalize='index') * 100).plot(kind='bar', stacked=True, ax=axes[ax_i, ax_j], alpha=0.75, rot=0, colormap='Paired', color=['#d9534f', '#5cb85c'])
    plot.set_xticklabels(plot.get_xticklabels(), rotation=20, ha="right")
    patches, labels = plot.get_legend_handles_labels()
    plot.legend(patches, labels, bbox_to_anchor=(1.2, 0.5))
    for p in plot.patches:
      perc = '{:.1f}%'.format(p.get_height()) # Percentage of each class of the category
      x = p.get_x() + p.get_width() / 2   # Width of the plot
      y = p.get_y() + p.get_height() / 2  # Height of the plot
      plot.annotate(perc, (x, y), ha = "center", va = "center", size = 10, xytext = (0, 5), textcoords = "offset points")        # Annotate the percentage
    # Move to the next position in the grid line
    ax_j = ax_j + 1
  # Move to the next line of the grid
  ax_i = ax_i + 1


# set the spacing between subplots
fig.tight_layout()
plt.show()

#### **Question (4)**

<div class="alert-success">Update the bellow cell providing your observations on how each numerical variable categorical to the churn of customers.</div>

#### **Response (4)**

**Observations**
* ....
* ....
* ....

### Continuous variables

In [None]:
# Plot the boxplot for each numerical variable with the split of status
fig, axes = plt.subplots(1, 3, figsize = (18, 5))
ax = 0
for i in num_cols:
  sns.boxplot(data=data, x='Churn', y=i, ax=axes[ax], showmeans=True, color="violet")
  # Move to the next position in the same line of the grid
  ax = ax + 1
# set the spacing between subplots
fig.tight_layout()
plt.show()

#### **Question (5)**

Update the bellow cell providing your observations on how each numerical variable contribute to the churn of customers

#### **Response (5)**

**Observations**
* ....
* ....
* ....

## **Multivariate Analysis**

In [None]:
# Plotting the correlation between numerical variables
plt.figure(figsize=(12, 2))
sns.heatmap(data[num_cols].corr(), annot=True, fmt='0.2f', cmap='YlGnBu')

#### **Question (6)**

Update the bellow cell providing your observations on the correlation matrix

#### **Response (6)**

**Observations**
* ....
* ....
* ....

In [None]:
num_cols

['Tenure', 'MonthlyCharges', 'TotalCharges']

<span style="color:#ff5f27;"> 👾 Uncomment and update the following cell code if required </span>

In [None]:
# Columns to be droped if required
#col_drop = ['', '', '', '']
#data.drop(col_drop, axis=1, inplace=True)

# **Data Preprocessing**

## **Splitting the Data**

<span style="color:#ff5f27;"> 👾 Uncomment and update the following cell code if required </span>

In [None]:
#data['Churn'] = np.where(data['Churn'] == 'Yes', 1, 0)
#data['Churn'] = data['Churn'].astype(int)

**Separating the independent variables (X) and the dependent variable (Y)**

In [None]:
## Separating Independent and Dependent Columns
X = data.drop(['Churn'], axis=1)
Y = data[['Churn']]

In [None]:
Y.head()

**Splitting the data into 70% train and 30% test set**

In [None]:
# Splitting the dataset into the Training and Testing set.
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size = 0.3, random_state = 42, stratify = Y)

## **Transforming the dataset**

In [None]:
X_train.isnull().sum()

In [None]:
X_test.isnull().sum()

**Missing Value Imputation** : As you can see we have some variables with missing values :
* Tenure
* MonthlyCharges
* Contract
* PaperlessBilling
* PaymentMethod

We Will impute the missing values in columns using :
* their **mode** for categorical variables
* their **mean** for continuous variables

### **Question (7)**

* Create the required python code for the missing values imputation applied to continuous & categorical variables.
* Explain reasons behind applying these transformations after the data splitting ?

In [None]:
imputer_mode = SimpleImputer(strategy="most_frequent")
imputer_mean = SimpleImputer(strategy="mean")

In [None]:
# Provide code here for Question (7)



In [None]:
X_train.isnull().sum()

In [None]:
X_test.isnull().sum()

Often in machine learning, we want to convert categorical variables into some type of numeric format that can be readily used by algorithms.

There are two common ways to convert categorical variables into numeric variables:

1. Label Encoding: Assign each categorical value an integer value based on alphabetical order.

2. One Hot Encoding: Create new variables that take on values 0 and 1 to represent the original categorical values. When using this approach, we create one new column for each unique value in the original categorical variable.

In [None]:
# Printing the % sub categories of each category
for i in cat_cols:
    print(data[i].value_counts(normalize=True) * 100)
    print('*' * 40)
    print()

### **Question (8)**

In [None]:
col_encoded = ['SeniorCitizen', 'Married', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity',
               'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'PaperlessBilling']

* Analyse variables from list `col_encoded` and argument what strategy you will use to encode them.
* Create the required python code for encoding all variables from list `col_encoded`

In [None]:
# Provide code here for Question (8)




* Create a new variable called `IsFemale` that will replace the variable `Gender`


In [None]:
# Provide code here for Question (8)



In [None]:
X_train.head().T

In [None]:
X_test.head().T

### **Question (9)**

In [None]:
col_encoded = ['PaymentMethod', 'Contract']

* Analyse variables from list `col_encoded` and argument what strategy you will use to encode them.
* Create the required python code for encoding all variables from list `col_encoded`

In [None]:
# Provide code here for Question (9)



In [None]:
X_train.head().T

In [None]:
X_test.head().T

# **Model Evaluation Criterion**

The model will make a number of mistakes.
It will predicts some correctly and few incorrectly. For example, it will marks some of the customers who will churn as not churn and also will marks some of the customers who will not churn as someone who will churn.

The goal for the telco company is to engage and talk to the customers to prevent them from churning, its ok to engage with those who are mistakenly tagged as "not churned" as it does not cause any negative problem. It could potentially make them even happier for the extra love they are getting. This is the kind of model that can add value from day one.

Let's create a function to calculate and print the classification report and confusion matrix so that we don't have to rewrite the same code repeatedly for each model.

In [None]:
# Creating metric function
def metrics_score(actual, predicted):
    # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.htm
    print(classification_report(actual, predicted, target_names=['Not Churn (0)', 'Churn (1)'], digits=4))

    # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
    cm = confusion_matrix(actual, predicted)
    tn, fp, fn, tp = cm.ravel()
    group_names = ['TN', 'FP', 'FN', 'TP']
    group_counts = ['{0:0.0f}'.format(value) for value in cm.flatten()]
    group_percentages = ['{0:.2%}'.format(value) for value in cm.flatten() / np.sum(cm)]
    labels = [f'{v1} ({v2}) ({v3})' for v1, v2, v3 in zip(group_names, group_counts, group_percentages)]
    labels = np.asarray(labels).reshape(2,2)
    plt.figure(figsize=(6, 3))
    sns.heatmap(cm, annot=labels,  fmt='', xticklabels=['Not Churn (0)', 'Churn (1)'], yticklabels=['Not Churn (0)', 'Churn (1)'], cmap='Blues')
    plt.ylabel('Actual classes')
    plt.xlabel('Predicted classes')
    plt.show()

# **Decision Tree**

In [None]:
# Building decision tree model
dt = DecisionTreeClassifier(class_weight={0: 0.27, 1: 0.73}, random_state=1)

In [None]:
# Fitting decision tree model
dt.fit(X_train, y_train)

**Let's check the model performance of decision tree**

In [None]:
# Checking performance on the training dataset
y_train_pred_dt = dt.predict(X_train)
metrics_score(y_train, y_train_pred_dt)

In [None]:
# Checking performance on the test dataset
y_test_pred_dt = dt.predict(X_test)
metrics_score(y_test, y_test_pred_dt)

#### **Question (10)**

Update the bellow cell providing your observations on the model performance

#### **Response (10)**

**Observations**
* ....
* ....
* ....

**Let's plot the feature importance and check the most important features.**

In [None]:
# Plot the feature importance
importances = dt.feature_importances_
columns = X_train.columns
importance_df = pd.DataFrame(importances, index = columns, columns = ['Importance']).sort_values(by = 'Importance', ascending = False)
plt.figure(figsize = (10, 5))
sns.barplot(data = importance_df, x = importance_df.Importance, y = importance_df.index)
plt.show()

#### **Question (11)**

Update the bellow cell providing your observations on features importance

#### **Response (11)**

**Observations**
* ....
* ....
* ....

**Let's plot the tree** and check:

As we know the decision tree keeps growing until the nodes are homogeneous, i.e., it has only one class, and the dataset here has a lot of features, it would be hard to visualize the whole tree with so many features. Therefore, we are only visualizing the tree up to **max_depth = 4**.

In [None]:
features = list(X_train.columns)
plt.figure(figsize = (30, 20))
tree.plot_tree(dt, max_depth=4, feature_names=features, filled=True, fontsize=10, node_ids=True, class_names=True)
plt.show()

#### **Question (12)**

Update the bellow cell providing your observations on the tree

#### **Response (12)**

**Observations**
* ....
* ....
* ....

In [None]:
# Choose the type of classifier
dt_tunned = DecisionTreeClassifier(random_state=1, class_weight={0: 0.27, 1: 0.73}, criterion='entropy', max_depth=5)
# Fit the best algorithm to the data
dt_tunned.fit(X_train, y_train)

In [None]:
# Checking performance on the training dataset
y_train_pred_dt = dt_tunned.predict(X_train)
metrics_score(y_train, y_train_pred_dt)

In [None]:
# Checking performance on the test dataset
y_test_pred_dt = dt_tunned.predict(X_test)
metrics_score(y_test, y_test_pred_dt)

In [None]:
# Plot the feature importance of the tunned model
importances = dt_tunned.feature_importances_
columns = X_train.columns
importance_df = pd.DataFrame(importances, index=columns, columns=['Importance']).sort_values(by='Importance', ascending=False)
plt.figure(figsize = (15, 4))
sns.barplot(data = importance_df, x = importance_df.Importance, y = importance_df.index)
plt.show()

In [None]:
features = list(X_train.columns)
plt.figure(figsize = (30, 20))
tree.plot_tree(dt_tunned, max_depth=4, feature_names=features, filled=True, fontsize=10, node_ids=True, class_names=True)
plt.show()

#### **Question (13)**

Update the bellow cell providing your observations on :
* The performance of the tunned decision tree
* The features importance
* The tree

#### **Response (13)**

**Observations**
* ....
* ....
* ....

In [None]:
# Choose the type of classifier
dt_tunned_hp = DecisionTreeClassifier(random_state=1, class_weight={0: 0.27, 1: 0.73}, criterion='entropy')

# Grid of parameters to choose from
parameters = {
                'max_depth': np.arange(1, 10),
                'min_samples_leaf': np.arange(1, 10),
                'min_samples_split': np.arange(1, 10),
             }
# Type of scoring used to compare parameter combinations - recall score for class 1
scorer = metrics.make_scorer(recall_score, pos_label=1)

# Run the grid search
grid_obj = GridSearchCV(dt_tunned, parameters, scoring=scorer, cv=10)

grid_obj = grid_obj.fit(X_train, y_train)

# Set the classifier to the best combination of parameters
dt_tunned_hp = grid_obj.best_estimator_

# Fit the best algorithm to the data
dt_tunned_hp.fit(X_train, y_train)

In [None]:
# Checking performance of the tunned DT model on the training data
y_pred_train_dt_tunned = dt_tunned_hp.predict(X_train)
metrics_score(y_train, y_pred_train_dt_tunned)

In [None]:
# Checking performance ofthe DT tunned on the testing data
y_pred_test_dt_tunned = dt_tunned_hp.predict(X_test)
metrics_score(y_test, y_pred_test_dt_tunned)

In [None]:
# Plot the feature importance of the tunned model
importances = dt_tunned_hp.feature_importances_
columns = X_train.columns
importance_df = pd.DataFrame(importances, index=columns, columns=['Importance']).sort_values(by='Importance', ascending=False)
plt.figure(figsize = (15, 4))
sns.barplot(data = importance_df, x = importance_df.Importance, y = importance_df.index)
plt.show()

In [None]:
features = list(X_train.columns)
plt.figure(figsize = (30, 20))
tree.plot_tree(dt_tunned_hp, max_depth=4, feature_names=features, filled=True, fontsize=10, node_ids=True, class_names=True)
plt.show()

#### **Question (14)**

Update the bellow cell providing your observations on :
* The performance of the tunned decision tree
* The features importance
* The tree

#### **Response (14)**

**Observations**
* ....
* ....
* ....

# **Neural Network**

## **Question (15)**

This is a bonus/optionnal question.

* Can apply a new approach to the problem using **ANN**
* Feel free with your imagination here

**Scaling the data**

The independent variables in this dataset have different scales. When features have different scales from each other, there is a chance that a higher weightage will be given to features that have a higher magnitude, and they will dominate over other features whose magnitude changes may be smaller but whose percentage changes may be just as significant or even larger. This will impact the performance of our machine learning algorithm, and we do not want our algorithm to be biased towards one feature.

The solution to this issue is **Feature Scaling**, i.e. scaling the dataset so as to give every transformed variable a comparable scale.

We will use the **Standard Scaler** method, which centers and scales the dataset using the Z-Score. It standardizes features by subtracting the mean and scaling it to have unit variance. The standard score of sample x is calculated as:

> **z = (x - u) / s**

where **u** is the mean of the training samples (zero) and **s** is the standard deviation of the training samples.

In [None]:
# Scaling the data
sc = StandardScaler()

# Complete the code to scale the data




In [None]:
X_train_scaled.head().T

In [None]:
X_test_scaled.head().T

In [None]:
# Initializing the ANN
model = Sequential()

# The amount of nodes (dimensions) in hidden layer should be the average of input and output layers
# This adds the input layer (by specifying input dimension) AND the first hidden layer (units)
model.add(Dense(activation='****', input_dim=*****, units=*****))

# Add 1st hidden layer
model.add(Dense(*****, activation='****'))

# Adding the output layer
# Notice that we do not need to specify input dim.
# we have an output of 1 node, which is the the desired dimensions of our output (Churn or Not)
# We use the **** because we want probability outcomes
model.add(Dense(1, activation = '***'))

In [None]:
# Create optimizer with default learning rate
# Compile the model
model.compile(optimizer='***', loss='***', metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
history = model.fit(X_train,
                    y_train,
                    validation_split=0.2,
                    epochs=*****,
                    batch_size=****,
                    verbose=1)

In [None]:
# Capturing learning history per epoch
hist  = pd.DataFrame(history.history)
hist['epoch'] = history.epoch

# Plotting accuracy at different epochs
plt.plot(hist['loss'])
plt.plot(hist['val_loss'])
plt.legend(("Training Set" , "Test Set") , loc =0)

#Printing results
results = model.evaluate(X_test, y_test)

In [None]:
# predict probabilities
yhat = model.predict(X_test)

# keep probabilities for the positive outcome only
yhat = yhat[:, 0]

# calculate roc curves
fpr, tpr, thresholds = roc_curve(y_test, yhat)

# calculate the g-mean for each threshold
gmeans = np.sqrt(tpr * (1-fpr))

# locate the index of the largest g-mean
ix = np.argmax(gmeans)
print('Best Threshold=%f, G-Mean=%.3f' % (thresholds[ix], gmeans[ix]))

# plot the roc curve for the model
plt.plot([0,1], [0,1], linestyle='--', label='No Skill')
plt.plot(fpr, tpr, marker='.')
plt.scatter(fpr[ix], tpr[ix], marker='o', color='black', label='Best')
# axis labels
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
# show the plot
plt.show()

In [None]:
y_pred_test = model.predict(X_test)
y_pred_test = (y_pred_test > thresholds[ix])
metrics_score(y_test, y_pred_test)