# Basics of deep learning and neural networks
Linear regression doesn't take into account interactions between the features. That's the big advantage of neural networks.

Input layer vs output layer vs hidden layers.

Gonna use example with trying to predict number of bank transactions for a client based on number of kids, age, ...

## Forward Propagation
The weights bet layers are the parameters we tune when training our model. They represent how strongly two nodes are related.

Forward propagation = info goes from input to hidden layers to output layer. Dot product between input/node values and weights.

![title](nn.PNG)

In [4]:
import numpy as np

input_data = np.array([2,3])

weights = {'node_0': np.array([1,1]),
          'node_1': np.array([-1, 1]),
          'output': np.array([2, -1])}

node_0_value = (input_data * weights['node_0']).sum()

node_1_value = (input_data* weights['node_1']).sum()

hidden_layer_values = np.array([node_0_value, node_1_value])
hidden_layer_values

array([5, 1])

In [6]:
output = (hidden_layer_values * weights['output']).sum()
output

9

## Activation Functions

This functions allow the model to capture non-linearities. This means we can capture different behaviour when going from 1->2 kids than when going from 4->5 kids.

This functions are applied to the node inputs and produce the node output.

For a long time the activation function was $tanh$ but now the industry standard is ReLU (Rectified Linead Activation).
![](relu.PNG)

In [8]:
#Now we include activation functions in our previous code
input_data = np.array([2,3])

weights = {'node_0': np.array([1,1]),
          'node_1': np.array([-1, 1]),
          'output': np.array([2, -1])}

node_0_input = (input_data * weights['node_0']).sum()
node_0_output = np.tanh(node_0_input)
node_1_input = (input_data* weights['node_1']).sum()
node_1_output = np.tanh(node_1_input)
hidden_layer_values = np.array([node_0_output, node_1_output])
output = (hidden_layer_values * weights['output']).sum()
output

1.2382242525694254

### Using ReLU

In [9]:
def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(input, 0)
    
    # Return the value just calculated
    return(output)

### Applying the network to many observations/rows of data
You'll now define a function called predict_with_network() which will generate predictions for multiple data observations, which are pre-loaded as input_data. As before, weights are also pre-loaded. In addition, the relu() function you defined in the previous exercise has been pre-loaded.

In [10]:
# Define predict_with_network()
def predict_with_network(input_data_row, weights):

    # Calculate node 0 value
    node_0_input = (input_data_row*weights['node_0']).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 value
    node_1_input = (input_data_row*weights['node_1']).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])
    
    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs*weights['output']).sum()
    model_output = relu(input_to_final_layer)
    
    # Return model output
    return(model_output)

In [None]:
# Create empty list to store prediction results
results = []
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row,weights))

# Print results
print(results)

## Deeper Networks
Many hidden layers are the feature that allows for such powerful stuff. 

Deep Networks internally build representations of patterns in the data and find increasingly complex patterns as we go deeper into the network.
NN partially replace the need for feature engineering.

Deep learning is sometimes calles representation learning because subsequent layers build incresingly sophisticated represetations of raw data until we can make predictions. 

Big advantage is that the modeler doesn't need to specify the interactions. Instead, when we train the model, the NN gets weights that find the relevant patterns to make better predictions.

### Multi-layer neural networks
In this exercise, you'll write code to do forward propagation for a neural network with 2 hidden layers. Each hidden layer has two nodes. The input data has been preloaded as input_data. The nodes in the first hidden layer are called node_0_0 and node_0_1. Their weights are pre-loaded as weights['node_0_0'] and weights['node_0_1'] respectively.

The nodes in the second hidden layer are called node_1_0 and node_1_1. Their weights are pre-loaded as weights['node_1_0'] and weights['node_1_1'] respectively.

We then create a model output from the hidden nodes using weights pre-loaded as weights['output'].


In [11]:
#meh