### Forward Propagation

In [1]:
import numpy as np
def feed_forward(inputs, outputs, weights):       
    pre_hidden = np.dot(inputs,weights[0])+ weights[1]
    hidden = 1/(1+np.exp(-pre_hidden))
    pred_out = np.dot(hidden, weights[2]) + weights[3]
    mean_squared_error = np.mean(np.square(pred_out - outputs))
    return mean_squared_error

### Code BreakDown

1. Takes the values (inputs), weights(randomly initiated if this is the first iteration) and the actual outputs
######  as provided in the datasets as the parameters of the `(feed_forward)` function.

2. Calculate hidden layer values by performing the matrix multiplication
(np.dot) of inputs and weight values (weights[0]) connecting the input
layer to the hidden layer and add the bias terms (weights[1]) associated
with the hidden layer's nodes.

3. Apply the `(sigmoid activation)` function on top of the hidden layer values
obtained in the previous step – pre_hidden

4. Calculate the output layer values by performing the matrix multiplication
(np.dot) of hidden layer activation values (hidden) and weights
connecting the hidden layer to the output layer (weights[2]) and
summing the output with bias associated with the node in the output layer
– weights[3]

5. Calculate the `(mean squared error)` value across the dataset and return the
mean squared error

# Some Activation Functions

In [12]:
# Tanh 
def tanh(x):
  return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))

# ReLU - The Rectifier Linear Unit
def relu(x):
  return np.where(x>0, x,0)

# Linear - The linear activation of a value is the value itself
def linear(x):
  return x

# Softmax - Unlike other activations, softmax is performed on top of an
# array of values. This is generally done to determine the probability of an
# input belonging to one of the m number of possible output classes in a
# given scenario. 

# Softmax activation is used to provide a probability value for each class in
# the output

def softmax(x):
  return np.exp(x)/np.sum(np.exp(x))

The two operations on top of input x – np.exp will make all values positive,
and the division by np.sum(np.exp(x)) of all such exponents will force all the
values to be in between 0 and 1. This range coincides with the probability of an event.
And this is what we mean by returning a probability vector. 

# Loss Functions

Loss values (which are minimized during a neural network training process) are
minimized by updating weight values. Defining the proper loss function is the key to
building a working and reliable neural network model.


**Mean Squared Error** : The mean squared error is the squared difference
between the actual and the predicted values of the output.
Squaring ensures that positive and negative errors do not offset each other.

The mean squared error is typically used when trying to predict a value that is continue in nature.

**Mean Absolute Error** :The mean absolute error works in a manner that is
very similar to the mean squared error. The mean absolute error ensures
that positive and negative errors do not offset each other by taking an
average of the absolute difference between the actual and predicted values
across all data points.

**Binary Cross-Entropy** :Cross-entropy is a measure of the difference between two different distributions: actual and predicted. Binary cross-entropy is applied to binary output data, unlike the previous two loss functions that discussed (which are applied during continuous variable prediction).

**Categorical Cross-Entropy** :Categorical cross-entropy between an array of predicted values (p) and an array of actual values (y).


In [14]:
# MSE - Mean Squared Error
def mse(p,y):
  return np.mean(np.square(p-y))

# 2. MAE - Mean absolute Error
def mae(p,y):
  return np.mean(np.abs(p-y))

# 3. Binary Cross Entropy
def binary_cross_entropy(p,y):
  return -np.mean(np.sum((y*np.log(p)+(1-y)*np.log(1-p))))

# 4 Categorical Cross-Entropy
def categorical_cross_entropy(p,y):
  return -np.mean(np.sum(y*np.log(p)))