# Introduction

Nothing in nature compares to the complex information processing and pattern recognition abilities of our brains. In our quest to advance technology, we are now developing algorithms that mimic the network of our brains━these are called deep neural networks.

Deep neural networks have a unique structure because they have a relatively large and complex hidden component between the input and output layers. To be considered a deep neural network, this hidden component must contain at least two layers.

![](assets/dnn_structure.jpg)


A weight is assigned to each connection from one node to another, signifying the strength of the connection between the two nodes. A weighted sum of all the connections to a specific node is computed and converted to a number between zero and one by an activation function. The result is then passed on to the next node in the network.

This process continues until the output nodes are reached. The output nodes are categories, such as cats, zebras or cars. As the model learns, the weights between the connection are continuously updated.

Because of their structure, deep neural networks have a greater ability to recognize patterns than shallow networks. Deep neural networks classify data based on certain inputs after being trained with labeled data. Meaning, they can learn by being exposed to examples without having to be programmed with explicit rules for every task.

For example, if we want to build a model that will identify cat pictures, we can train the model by exposing it to labeled pictures of cats. Over time, the model will learn to identify the generic features of cats, such as pointy ears, the general shape, and tail, and it will be able to identify an unlabeled cat picture it has never seen.

## Let's understand with an example

Imagine you work for a loan company, and you need to build a model for predicting, whether a user (borrower) should get a loan or not? You have the features for each customer like age, bank balance, salary per annum, whether retired or not and so on.

![](assets/ex1.jpg)

Consider if you want to solve this problem using a linear regression model, then the linear regression will assume that the outcome (whether a customer's loan should be sanctioned or not) will be the sum of all the features. It will take into account the effect of age, salary, bank balance, retirement status and so. So the linear regression model is not taking into account the interaction between these features or how they affect the overall loan process.

![](assets/ex2.jpg)

The above figure left **(A)** shows prediction from a linear regression model with absolutely no interactions in which it simply adds up the effect of age (30 > age > 30) and bank balance, you can observe from figure (A) that the lack of interaction is reflected by both lines being parallel that is what the linear regression model assumes.

On the other hand, figure right **(B)** shows predictions from a model that allows interactions in which the lines do not have to parallel. Neural Networks is a pretty good modeling approach that allows interactions like the one in figure (B) very well and from these neural networks evolves a term known as Deep Learning which uses these powerful neural networks. Because the neural network takes into account these type of interactions so well it can perform quite well on a plethora of prediction problems you have seen till now or possibly not heard.

Since neural networks are capable of handling such complex interactions gives them the power to solve challenging problems and do amazing things with

    1. Image
    2. Text
    3. Audio
    4. Video
    
This list is merely a subset of what neural networks are capable of solving, almost anything you can think of in data science field can be solved with neural networks.

Deep learning can even learn to write a code for you. Well isn't that super amazing?

## Interactions in Neural Network


![](assets/ex3.jpg)

The neural network architecture looks something similar to the above figure. On the far left you have the **input layer** that consists of the features like age, salary per annum, bank balance, etc. and on the far right, you have the **output layer** that outputs the prediction from the model which in your case is whether a customer should get a loan or not.

The layers apart from the input and the output layers are called the **hidden layers**.

### Why they are called hidden layers?

Well, one good reason is while the input and output layers correspond to apparent things that occur or are present in the world and can be stored as data but the values in the hidden layers are not something that relates to the real world or something for which have data.

Technically, each node in the hidden layer represents an aggregation of information from the input data; hence each node adds to the model's capability to capture interactions between the data. The more the nodes, the more interactions can be achieved from the data.

# Forward Propagation

Let's start by seeing how neural networks use data to make predictions which is taken care by the forward propagation algorithm.

To understand the concept of forward propagation let's revisit the example of a loan company. For simplification, let's consider only two features as an input namely age and retirement status, the retirement status being a binary ( 0 - not retired and 1 - retired) number based on which you will make predictions.

![](assets/forward_propagation.jpg)

The above figure shows a customer with age 40 and is not retired. The forward propagation algorithm will pass this information through the network/model to predict the output layer. The lines connect each node of the input to every other node of the hidden layer. Each line has a weight associated with it which indicates how strongly that feature affects the hidden node connected to that specific line.

There are total four weights between input and hidden layer. The first set of weights are connected from the top node of the input layer to the first and second node of the hidden layer; likewise, the second set of weight are connected from the bottom node of the input to the first and second node of the hidden layer.

Remember these weights are the key in deep learning which you train or update when you fit a neural network to the data. These weights are commonly known as **parameters*.

To make a prediction for the top node of the hidden layer, you consider each node in the input layer multiply it by the weights connected to that top node and finally sum up all the values resulting in a value 40 (40 * 1 + 0 * 1 = 40) as shown in above figure. You repeat the same process for the bottom node of the hidden layer resulting in a value 40. Finally, for the output layer you follow the same process and obtain a value 0 (40 * 1 + 40 * (-1) = 0). This output layer predicts a value zero.

Now you might wonder what the relevance of value zero is, well you consider the loan problem as binary classification in which an output of zero indicates a loan sanction and an output of one indicates a loan prohibition.

That's pretty much what happens in forward propagation. You start from the input layer move to the hidden layer and then to the output layer which then gives you a prediction score. You pretty much always use the multiple-add process, in linear algebra this operation is a dot product operation. In general, a forward propagation is done for a single data point at a time.

## Forward propagation algorithm coding example


In [1]:
import numpy as np

Next, you will create a numpy array of input_data.

In [2]:
input_data = np.array([40,0])


Once you have the input data, now you will create a dictionary called weights in which the keys of the dictionary will hold the variable names for node0 and node1 of hidden layers and an output node for the output layer. The values of the dictionary will be the parameters (weight values).

In [3]:
weights = {'node0':([1,1]),
          'node1':([1,-1]),
          'output':([1,-1])}

Let's quickly calculate the value of node0 of the hidden layer. You first multiply the input_data with the weights of node0 and then use a sum() function to obtain a scalar value.

In [4]:
node0_value = (input_data * weights['node0']).sum()


You will do the same for the node1 of the hidden layer.



In [5]:
node1_value = (input_data * weights['node1']).sum()


For simplicity let's create a numpy array of the hidden layer values.



In [7]:
hidden_layer_values = np.array([node0_value,node1_value])

hidden_layer_values


array([40, 40])

Finally, you multiply the hidden layer values with the weights of the output layer and again use a sum() function to obtain a prediction.



In [8]:
output = (hidden_layer_values * weights['output']).sum()


Let's print the output and see if it matches the output you should expect!



In [9]:
output


0

## Activation Functions


The multiply-add process is only half part of how a neural network works; there's more to it!

To utilize the maximum predictive power, a neural network uses an activation function in the hidden layers. An activation function allows the neural network to capture non-linearities present in the data.

![](assets/activation_functions.jpg)

In neural networks, often time the data that you work with is not linearly separable and to find a decision boundary that can separate the data points you need some non-linearity in your network. For example, A customer has no previous loan record compared to a customer having a previous loan record may impact the overall output differently.

If the relationships in the data aren't straight or linear, then you need a non-linear activation function to capture the non-linearity. An activation function is applied to the value coming into a node which then transforms it into the value stored in that node or the node output.

Let's apply an **s-shaped activation function** called tanh to the nodes of hidden layers.

In [10]:
node0_act = np.tanh(node0_value)


In [11]:
node1_act = np.tanh(node1_value)


In [13]:
hidden_layer_values_act = np.array([node0_act,node1_act])
hidden_layer_values_act

array([1., 1.])

You can observe the difference in the hidden_layer_values and hidden_layer_values_act.

Let's quickly calculate the output using the hidden_layer_values_act.

In [14]:
output = (hidden_layer_values_act * weights['output']).sum()
output

0.0

In today's time, an activation function called Rectifier Linear Unit (ReLU) is widely used in both industry and research. Even though it has two linear pieces, it's very powerful when combined through multiple hidden layers. ReLU is half rectified from the bottom as shown in the figure below.

![](assets/relu.jpg)

Now you will apply ReLU as the activation function on the hidden layer nodes and calculate the network's output.

In [15]:
def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(0, input)

    # Return the value just calculated
    return(output)

In [16]:
# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print("Model's Output:",model_output)

Model's Output: 0


# Deeper Networks

The significant difference between traditional neural networks and the modern deep learning that makes use of neural networks is the use of not just one but many successive hidden layers. Research shows that increasing the number of hidden layers massively improves the performance making the network capable of more and more interactions.

The working in a network with just a single hidden layer and with multiple hidden layers remain the same. You forward propagate through these successive hidden layers as you did in the previous example with one hidden layer.

![](assets/deeper_networks.jpg)

Let's understand some essential facts about these deep networks!

   1. Deep Learning networks are capable of internally building up representations of the pattern in the data that are essential for making accurate predictions;
   
   2. The patterns in the initial layers are simple, but as you go through successive hidden layers or deep into the network the network starts learning more and more complex patterns;
   
   3. Deep learning networks eliminate the need for handcrafted features. You do not need to create better predictive features which you then feed to the deep learning network, the network itself learns meaningful features from the data and using which it makes predictions;
   
   4. Deep learning is also called Representation Learning since the subsequent layers in the network build increasingly sophisticated representations of the data until you reach to the final layer where it finally makes the prediction.
    

The input to the above network is images of humans; You can see that the initial layers in the network are capturing the patterns of local contrast that are conceptually simple, patterns like vertical edges, horizontal, diagonal edges, blurry areas, etc. Once the network identifies where are these diagonal or horizontal lines the successive layers then combine that information to find larger patterns like eyes, nose, lips, etc. A much later layer might combine these patterns to find much larger abstract patterns like for example a face as depicted in the above figure.

Well, the cool thing about deep learning is you don't explicitly tell the network to look for diagonal lines or wherein the image is the nose or a lip, instead of when you train the network the neural network has weights that are learned to find the relevant patterns to make accurate predictions. The learning process in neural networks off course is a gradual process in which the network undergoes multiple pieces of training before it can learn to make better predictions.

## Forward propagation in a multi-layer neural network


In [17]:
input_data = np.array([3,5])

In [18]:
weights = {'node0_0':([2,4]),
          'node0_1':([4,-5]),
          'node1_0':([-1,2]),
          'node1_1':([1,2]),
          'output':([2,7])}

![](assets/multi_layer.jpg)

In [19]:
def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data* weights['node0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data* weights['node0_1']).sum()
    node_0_1_output = relu(node_0_1_input)



    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    print("Hidden Layer 1 Output:", hidden_0_outputs)

    # Calculate node 0 in the second hidden layer
    node_1_0_input =  (hidden_0_outputs* weights['node1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs* weights['node1_1']).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])
    print("Hidden Layer 2 Output:", hidden_1_outputs)
    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()
    # Return model_output
    return(model_output)

output = predict_with_network(input_data)
print("Model's Prediction:",output)

Hidden Layer 1 Output: [26  0]
Hidden Layer 2 Output: [ 0 26]
Model's Prediction: 182


## Useful Links
- http://neuralnetworksanddeeplearning.com/index.html
- https://www.investopedia.com/terms/n/neuralnetwork.asp
- https://www.kdnuggets.com/2020/02/deep-neural-networks.html