#**(13)Deep Neural Network(Deep learning Based)**


- Neural Networks are used to perform many complex tasks including Image Classification, Object Detection, Face Identification, Text Summarization, speech recognition, and the list is endless.



##**Tensorflow Library**

- TensorFlow is a popular framework of machine learning and deep learning, developed by Google Team. It is a free and open source software library and designed in Python programming language.

- Here, **Tensor is a multidimensional array**, Flow is used to define the **flow of data in operation.**

- A tensor is a vector or a matrix of n-dimensional that represents all type of data. All values in a tensor hold similar data type with a known shape. The shape of the data is the dimension of the matrix or an array.
- TensorFlow provides amazing functionalities and services when compared to other popular deep learning frameworks. TensorFlow is used to create a large-scale neural network with many layers.


In [None]:
import tensorflow as tf

In [None]:
const1 = tf.constant([[1,2,3], [1,2,3]])
const2 = tf.constant([[3,4,5], [3,4,5]])
result = tf.add(const1, const2)
print(result)

tf.Tensor(
[[4 6 8]
 [4 6 8]], shape=(2, 3), dtype=int32)


In [None]:
var1 = tf.Variable([[1, 2], [1, 2]])
var2 = tf.Variable([[3, 4], [3, 4]])
result = tf.multiply(var1, var2)
print(result)

tf.Tensor(
[[3 8]
 [3 8]], shape=(2, 2), dtype=int32)


##**Kears Library**

- Keras is a fast, open-source, and easy-to-use Neural Network Library written in Python that runs at top of Theano or Tensorflow. Tensorflow provides low-level as well as high-level API, indeed Keras only provide High-level API.

- Keras is very quick to make a network model. If you want to make a simple network model with a few lines, Python Keras can help you with that. Look at the Keras example below:

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation,Conv2D,MaxPooling2D,Flatten,Dropout

model = Sequential()
model.add(Dense(64, activation='relu', input_dim=50)) #input shape of 50 Dense Layer
model.add(Dropout(0.5))  #Dropout Layer


#**Activation Function**

- The choice of activation functions in Deep Neural Networks has a significant impact on the training dynamics and task performance.
- A neural network without an activation function is essentially just a linear regression model.

- Thus we use a non linear transformation to the inputs of the neuron and this non-linearity in the network is introduced by an activation function.


<figure align="center">
<img src="https://drive.google.com/uc?id=1OUu-CoT0f4HzKygZ7cKlohvpdY-IeDeg" height="300px", width="400px"img>


$WeightedSum= w_1x_1 +w_2x_2 +...+w_nx_n +bias$

After calculating Weighted Sum  we pass this values to Activtion function and this activation function generate particular value for the given node ,this value act as input for next node as so on.







##**(a)Binary Step Function**
-  If the input to the activation function is greater than a threshold, then the neuron is activated, else it is deactivated, i.e. its output is not considered for the next hidden layer.

- Threshold is the cut off value of the function. So if you set it to 0.5, anything below it is a 0 output, and anything above is a 1 output.

- The binary step function can be used as an activation function while creating a binary classifier.
- As you can imagine, this function will not be useful when there are multiple classes in the target variable

\begin{align}
        \text{f}(x) = \left\{
        \begin{array}{cl}
        0 &  x < 0 \\
        1 & x ≥ 0
        \end{array}
        \right.
    \end{align}

- Gradients are calculated to update the weights and biases during the backpropgation process. Since the gradient of the Step function is zero, the weights and biases don’t update.So we need to solve this proablem with another Activation function.
- f'(x)(Gradients) = 0, for all x

In [None]:
def binary_step(x):
    if x<0:
        return 0
    else:
        return 1
binary_step(5)

1

##**(b)Linear Function:**
- We saw the problem with the step function, the gradient of the function became zero. This is because there is no component of x in the binary step function. Instead of a binary function, we can use a linear function. 
$$f(x)=ax+b$$

$$f(x) = (weight * input) + bias$$

- When we differentiate the function with respect to x, the result is the coefficient of x, which is a constant.
- f'(x)(Gradients) = a

- Although the gradient here does not become zero, but it is a constant which does not depend upon the input value x at all. This implies that the weights and biases will be updated during the backpropagation process but the updating factor would be the same.

- It’s not possible to use backpropagation as the derivative of the function is a constant and has no relation to the input x. 

- In this scenario, the neural network will not really improve the error since the gradient is the same for every iteration. 

In [None]:
def linear_function(x):
    return 4*x
linear_function(4)


16

##**(c)Non-Linear Sigmoid Activation Function**

- Mathmatically Sigmoid Function can be express as follows:
$$f(x) = \frac 1 {1+ e^{-{x}}} $$

- The output will always ranges between 0 to 1
- The gradient values are significant for range -3 and 3 but the graph gets much flatter in other regions. This implies that for values greater than 3 or less than -3, will have very small gradients. As the gradient value approaches zero, the network is not really learning.

- f'(x)(Gradients) = sigmoid(x)*(1-sigmoid(x))


- This can be addressed by scaling the sigmoid function which is exactly what happens in the tanh function

In [None]:
import numpy as np
def sigmoid_function(x):
    z = (1/(1 + np.exp(-x)))
    return z
sigmoid_function(7)

0.9990889488055994

##**(d)Tanh Function (Hyperbolic Tangent)**

- The tanh function is very similar to the sigmoid function. The only difference is that it is symmetric around the origin. The range of values in this case is from -1 to 1. Thus the inputs to the next layers will not always be of the same sign. 

$$f(x) = \frac {e^{x}- e^{-{x}}}{e^{x}+ e^{-{x}}} $$

- The output of Tanh is zero centered with a range from -1 to 1.
-  Usually tanh is preferred over the sigmoid function since it is zero centered.
- The gradient of the tanh function is much steeper as compared to the sigmoid function.


In [None]:
def tanh_function(x):
    z = (2/(1 + np.exp(-2*x))) -1
    return z
tanh_function(-1)


-0.7615941559557649

##**(e)ReLU Function**

- The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time.

- If the input is negative the function returns 0, but for any positive input, it returns that value back.
- ReLU function is non-linear around 0, but the slope is always either 0 (for negative inputs) or 1 (for positive inputs).
- The function is very fast to compute (Compare to Sigmoid and Tanh) it doesn’t calculate exponent

- For the negative input values, the result is zero, that means the neuron does not get activated. Since only a certain number of neurons are activated, the ReLU function is far more computationally efficient when compared to the sigmoid and tanh function


$$f(x) = max(0,x)$$


\begin{align}
        \text{f}(x) = max\left\{
        \begin{array}{cl}
        0 &  x ≤ 0 \\
        x & x > 0
        \end{array}
        \right.
    \end{align}



In [None]:
def relu_function(x):
    if x<0:
        return 0
    else:
        return x

relu_function(7)

7

##**Total Error**

\begin{align}
        E_{total} = \sum{\frac12}(TargetValueFromActivationFunction - ActualOutputValueFromActivationFunction)^2
    \end{align}

##**Update the weights to Reduce  Total Error** 

- To update the weight, we calculate the error correspond to each weight with the help of a total error. The error on weight w is calculated by differentiating(taking derivatives) total error with respect to w.

- After that  we calculate new weight as follows:

$$W_0:=w_0 - \alpha\frac{\partial E_{total}}{\partial w_0}$$

$$W_1:=w_1 - \alpha\frac{\partial E_{total}}{\partial w_2}$$

$$W_2:=w_2 - \alpha\frac{\partial E_{total}}{\partial w_3}$$

$$\vdots \\
W_n:=w_n- \alpha\frac{\partial E_{total}}{\partial w_n}$$



- Here, $\alpha$ is learning rate

##**DNN  model using Keras**

In [31]:
from google.colab import drive     #mount your Google Drive in your virtual machine(VM).
drive.mount('/gdrive')              #Access  the data  drive because of different server of colab and drive.

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


In [32]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [33]:
pima=pd.read_csv('/gdrive/My Drive/ML Project /Feature Engineering /4.ML Algorithms/diabetes.csv',quoting=3)
                                 #Read data file with path location step by step path location from My Drive.

In [34]:
pima.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [35]:
#split dataset in features and target variable
feature_cols = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin', 'BMI','DiabetesPedigreeFunction','Age']
X = pima[feature_cols] # Features/independent variables
y = pima.Outcome # Target variable/dependent variables


# or Also we can write above code as this also. 
X = pima.drop('Outcome', axis=1)   #Features/independent variables
y = pima['Outcome']               # Target variable/dependent variables

Let's split dataset by using function train_test_split().

In [38]:
# split X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.35,random_state=42)

Since, we have different range of features so we need to apply Feature Scaling techniques to bring features in same scale.

In [39]:
#Feature Scaling techniques to bring features in same scale.
from sklearn.preprocessing import RobustScaler # Or we can also use StandardScaler,MinMaxScaler depending on the dataset.
rb = RobustScaler()  
X_train = rb.fit_transform(X_train)
X_test = rb.transform(X_test)

pd.DataFrame(X_train)   ##Convert numpy array generated by sklearn libraries to orginal dataframe.
pd.DataFrame(X_test)    ##Convert numpy array generated by sklearn libraries to orginal dataframe.

Unnamed: 0,0,1,2,3,4,5,6,7
0,0.6,-0.481928,-0.8750,0.31250,1.209486,0.210526,0.131191,0.823529
1,-0.2,-0.144578,0.1875,0.28125,-0.292490,0.389474,-0.623829,-0.470588
2,-0.2,-0.240964,-0.5000,-0.71875,-0.292490,-0.126316,-0.597055,-0.470588
3,1.0,-0.265060,0.5000,-0.71875,-0.292490,-0.778947,1.271754,0.294118
4,0.8,0.433735,1.1250,-0.71875,-0.292490,-0.221053,-0.457831,1.235294
...,...,...,...,...,...,...,...,...
264,1.8,-0.289157,0.5000,-0.71875,-0.292490,-0.884211,-0.653280,0.882353
265,-0.2,-0.433735,-0.5000,0.00000,-0.292490,-0.242105,-0.034806,-0.470588
266,1.2,0.144578,-0.1250,0.31250,2.885375,0.357895,-0.265060,0.294118
267,-0.6,-0.554217,-0.5000,0.50000,0.537549,1.326316,-0.040161,-0.411765


###**Define Keras Model**

- Model in Keras always defines as a sequence of layers. It means that we initialize the sequence model and add the layers one after the other which is executed as the sequence of the list.

- The thing which you need to take care of is the first layer has the right number of input features which is specified using the **input_dim** parameter.


- The first layer has 12 neurons and activation function as relu.
- The second hidden layer has 8 neurons and activation function as relu.

- Finally, at the output layer, we use 1 unit and activation as sigmoid because it is a binary classification problem.


In [41]:
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()                                 #Model in Keras always defines as a sequence of layers so creat  object of it.
model.add(Dense(12, input_dim=8, activation="relu")) #First layer has 12 neurons,input_dim=8 means we have 8 independent features.
model.add(Dense(8, activation="relu"))               #The second hidden layer has 8 neurons and activation function as relu.
model.add(Dense(6, activation="relu"))             #The third hidden layer has 8 neurons and activation function as relu.
model.add(Dense(1, activation="sigmoid"))      #output layer,has 1 neurons,with sigmoid because it is a binary classification problem.

###**Compile Keras Model**
- When we compile the Keras model, it uses the backend numerical libraries such as TensorFlow.
- When we are compiling the model we must specify some additional parameters to better evaluate the model and to find the best set of weights to map inputs to outputs.

- **Loss Function** – one must specify the loss function to evaluate the set of weights on which model will be mapped. we will use cross-entropy as a loss function which is actually known as **binary cross-entropy** used for binary classification.

- **Optimizer** – second is the optimizer to optimize the loss. we will use **adam** which is a popular version of gradient descent and gives the best result in most problems.

In [42]:
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])

###**Start Training (Fit the Model)**

- After successful compilation of the model, we are ready to fit data to the model and start training the neural network.
- **Epoch**– How many time weight need to update by backpropagation process.
- **Batch size** – How many data  samples pass to the model before updating the weights in each time.

In [43]:
model.fit(X_train,y_train ,epochs=150, batch_size=10)

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<keras.callbacks.History at 0x7f730a548390>

###**Evaluate the Model**

In [45]:
accuracy = model.evaluate(X_test,y_test)
print(accuracy)

[0.634177029132843, 0.732342004776001]
