In [1]:
import pandas as pd 
import numpy as np 

X_train = pd.read_csv('../Datasets/X_train.csv')
X_test = pd.read_csv('../Datasets/X_test.csv')
y_train = pd.read_csv('../Datasets/y_train.csv')
y_test = pd.read_csv('../Datasets/y_test.csv')

#### **Keras Models**  
There are three ways to create Keras models:  
- The *"Sequential model"*, which is very straightforward (a simple list of layers), but is limited to single-input, single-output stacks of layers (as the name gives away).  
- The *"Functional API"*, which is an easy-to-use, fully-featured API that supports arbitrary model architectures. For most people and most use cases, this is what you should be using. This is the Keras "industry strength" model.  
- *"Model subclassing"*, where you implement everything from scratch on your own. Use this if you have complex, out-of-the-box research use cases.  
  
We will be use **"Sequential model"**  
  
#### **Keras Layers**  
Layers are the basic building blocks of neural networks in Keras. A layer consists of a tensor-in tensor-out computation function (the layer's call method) and some state, held in TensorFlow variables (the layer's weights).  
  
For more information you can visit [here][def]

[def]: https://keras.io/api/

In [8]:
import tensorflow
import keras
from keras.models import Sequential
from keras.layers import Dense

classifier = Sequential() # this is gonna be our neural network

classifier.add(Dense(6, kernel_initializer='uniform', activation='relu' , input_dim=11)) # 6 functions on hidden layer 1
classifier.add(Dense(6, kernel_initializer='uniform', activation='relu')) # 6 functions on hidden layer 2

#### **Set Hidden Layers**  
! Warning: Theese are not rules. There are many tuning types or ways in deep learning but I will tell you some ideas.  
  
Tuning the hidden layers can be most popular subject of neural networks. "What should we set layer number equal to?" question is the one of them.  
  
<img src="https://miro.medium.com/max/1400/0*fny6vMZG5rJWCmAM" width='600' height='400'>  
  
As you can see from the image we have some hidden layers to between input and output layer.  
Think about our topic. We have 11 independent input layer such as *'Gender', 'Age', 'Tenure' etc.*  
Also we have output layer that can answer the "will customer churn?" question with 0-1 and this layer just 1 dependent layer.  
When you look at the upper image and thinking the 13 input and just 1 output layer we have, but don't now how much hidden layer inside.  
  
At this point we can use triangle shape:  
  
<img src="https://miro.medium.com/max/1002/1*gAMNusemlDZOvwTN1WKKhQ.png" width='600' height='400'>  
  
So I'll use 6 functions on 2 hidden layers. Of course can be used 7 functions on 1st hidden layer and 3 functions on 2nd hiden layer, theese are not rules.  

In [10]:
# creating output layer
classifier.add(Dense(1, kernel_initializer='uniform', activation='sigmoid')) 

#### **Optimizers**  
There are many optimization methods for deep learning. As I told you there are not rules on this subject, esencially deep learning like an art.  
  
Usage with `.compile()` and `.fit()`.  
  
An optimizer is one of the two arguments required for compiling a Keras model  
  
**Available Optimizers:**  
- [SGD][def1]
- [RMSprop][def2]
- [Adam][def3]
- [Adadelta][def4]
- [Adagrad][def5]
- [Adamax][def6]
- [Nadam][def7]
- [Ftrl][def8]  
  
I'll use [*'adam'*][def3] optimizer
  
Adam:  
Optimizer that implements the Adam algorithm.
  
Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.
  
According to Kingma et al., 2014, the method is "computationally efficient, has little memory requirement, invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters".
  
[def1]: https://keras.io/api/optimizers/sgd
[def2]: https://keras.io/api/optimizers/rmsprop
[def3]: https://keras.io/api/optimizers/adam
[def4]: https://keras.io/api/optimizers/adadelta
[def5]: https://keras.io/api/optimizers/adagrad
[def6]: https://keras.io/api/optimizers/adamax
[def7]: https://keras.io/api/optimizers/Nadam
[def8]: https://keras.io/api/optimizers/ftrl

In [11]:
# Optimizing
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [14]:
classifier.fit(X_train, y_train, epochs=50, verbose=0) # neural network will learn from train data 50 times
y_pred = classifier.predict(X_test)



#### **`.fit()` and `.predict()`:**  
  
We can use `.fit()` and `.predict()` functions like machine learning models.  
at this point when we can fitting the neural network we can describe some hyperparameters:  
- *epochs:* how many times the neural network will learn from train data  
- *verbose:* do you want to see process of the fitting use `verbose=1` if u dont set it equal to 0.  
- etc...   

In [15]:
y_pred

array([[0.20791489],
       [0.20791489],
       [0.20791489],
       ...,
       [0.20791489],
       [0.20791489],
       [0.20791489]], dtype=float32)

As you can see the results format is float, but we expected it will churn (1) or not (0).  
Actually output tells us to that customer will churn xx.xx%  
I mean if it result 0.2054 that mean that customer will churn 20.54%  
  
Lets encode the output binary and calculate the accuracy.

In [24]:
y_pred = np.multiply((y_pred > 0.5), 1)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
accuracy = ((cm[0][0] + cm[1][1]) / cm.sum())*100
f'{round(accuracy, 2)}%'

'79.79%'

#### **Result**  
Our neural networks accuracy rate is 79,79% 