Brief introduction to Neural Networks:
Artificial neural networks (ANNs), usually called neural networks (NNs) or, more simply yet, neural nets,are  computing systems inspired by the biological neural networks that constitute animal brains.

An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. 

These neurons are nothing but mathematical functions which, when given some input, generate an output. The output of neurons depends on the input and the parameters of the neurons. We can update these parameters to get a desired value out of the network.

Each of these neurons are defined using sigmoid function. A sigmoid function gives an output between zero to one for every input it gets. These sigmoid units are connected to each other to form a neural network.

By connection here we mean that the output of one layer of sigmoid units is given as input to each sigmoid unit of the next layer. In this way our neural network produces an output for any given input. The process continues until we have reached the final layer. The final layer generates its output.

This process of a neural network generating an output for a given input is Forward Propagation. Output of final layer is also called the prediction of the neural network. 

Right after the final layer generates its output, we calculate the cost function. The cost function computes how far our neural network is from making its desired predictions. The value of the cost function shows the difference between the predicted value and the truth value.

Our objective here is to minimize the value of the cost function. The process of minimization of the cost function requires an algorithm which can update the values of the parameters in the network in such a way that the cost function achieves its minimum value.

Algorithms such as gradient descent and stochastic gradient descent are used to update the parameters of the neural network. These algorithms update the values of weights and biases of each layer in the network depending on how it will affect the minimization of cost function. The effect on the minimization of the cost function with respect to each of the weights and biases of each of the input neurons in the network is computed by backpropagation.

Neural networks are essentially self-optimizing functions that map inputs to the correct outputs. We can then place a new input into the function, where it will predict an output based on the function it created with the training data.

OUR PROBLEM :-
Here,we will solve a simple problem. Suppose we have some information about obesity, smoking habits, and exercise habits of six people. We also know if these people have high cholesterol. Our dataset looks like this:
OBESITY    PHYSICAL ACTIVITY/EXERCISES     SMOKING      CHOLESTEROL
  0                    1                      0             0
  0                    1                      1             0     
  0                    0                      0             0
  1                    0                      0             1
  1                    1                      1             1
  1                    0                      1             1

This is a type of supervised learning problem where we are given inputs and corresponding correct outputs and our task is to find the mapping between the inputs and the outputs.

In [1]:
import numpy as np 

In [2]:
input_matrix = np.array([[0, 1, 0],[0, 1, 1],[0, 0, 0],[1, 0, 0],[1, 1, 1],[1, 0, 1]])            
                   

output_matrix = np.array([[0], [0], [0], [1], [1], [1]])

As mentioned earlier, neural networks need data to learn from. We will create our input data matrix and the corresponding outputs matrix with Numpy’s .array() function. 

In [3]:
class NeuralNetwork:

    #1
    def __init__(self, input_matrix, output_matrix):
        self.inputs  = input_matrix
        self.outputs = output_matrix
        
        
        np.random.seed(5)
        self.weights = np.random.rand(3,1)

    #2    
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_prime(self, x):
        return x * (1 - x)
    
    #3
    def forward_propagation(self):
        self.hidden = self.sigmoid(np.dot(self.inputs, self.weights))
        
        
    #4   
    def back_propagation(self):
        self.error  = self.outputs - self.hidden
        delta = self.error * self.sigmoid_prime(self.hidden)
        self.weights += np.dot(self.inputs.T, delta)
        
    #5    
    def train_neuralnetwork(self, iterations=50000):
        for iteration in range(iterations):
            
            self.forward_propagation()
            
            self.back_propagation()
            
    #6        
    def predict(self, new_input):
        prediction = self.sigmoid(np.dot(new_input, self.weights))
        return prediction       
    


#1: We start by creating a class called “NeuralNetwork”. We initialize the class by defining the __init__ function. It takes the input_matrix and the output_matrix as the parameter. In the script above we used the random.seed function so that we can get the same random values whenever the script is executed.In the next step, we initialize our weights with normally distributed random numbers. Since we have three features in the input, we have a vector of three weights.

In [4]:
#2

This neural network will be using the sigmoid function as the activation function.
Let us first define what a sigmoid function is:An Activation Function decides whether a neuron should be activated or not. This means that it will decide whether the neuron’s input to the network is important or not in the process of prediction using simpler mathematical operations. 

The role of the Activation Function is to derive output from a set of input values fed to a node (or a layer).
Depending on the nature and intensity of these input signals, the brain processes them and decides whether the neuron should be activated (“fired”) or not. 

In deep learning, this is also the role of the Activation Function—that’s why it’s often referred to as a Transfer Function in Artificial Neural Network.  

The primary role of the Activation Function is to transform the summed weighted input from the node into an output value to be fed to the next hidden layer or as output. 


Why do Neural Networks need it?
The purpose of an activation function is to add non-linearity to the neural network.

How Activation Function works in Neural Networks
Activation functions introduce an additional step at each layer during the forward propagation, but its computation is worth it. 

Here, we use the sigmoid function as our activation function.
The sigmoid function is a popular nonlinear activation function that has a range of (0–1). The inputs to this function will always be squished down to fit in-between the sigmoid function’s two horizontal asymptotes at y=0 and y=1. The sigmoid function has some well-known issues that restrict its usage. When we look at the graph below of the sigmoidal curve, we notice that as we reach the two ends of the curve, the derivatives of those points become very small. When these small derivatives are multiplied during backpropagation, they become smaller and smaller until becoming useless. Due to the derivatives, or gradients, getting smaller and smaller, the weights in the neural network will not be updated very much, if at all. This will lead the neural network to become stuck, with the situation becoming worse and worse for every additional training iteration.

In [5]:
#3

During our neural network’s training process, the input data will be fed forward through the network’s weights and functions. The result of this feed-forward function will be the output of the hidden layer.  The forward propagation function can be written like this, where xᵢ and wᵢ are individual features and weights in the matrices:
y cap = 1/(1 + e^-z) where z: sigma (xi * wi) + b
Here b is the bias term. For simplicity reasons we will consider b = 0.
Each feature in the input data will have its own weight for its connection to the hidden layer. We will start by taking the sum of every feature multiplied by its corresponding weight. We can take the results and feed it through the sigmoid function to get a value(probability) between (0–1).

The above process will result in the hidden layer’s prediction. Each row in the sigma(xw) matrix will be entered into the sigmoid function. The colours represent the individual processes for each row in the sigma(xw) matrix. 
Note: this calculation only represents one training iteration, so the resulting y cap matrix will not be very accurate. By computing the hidden layer this way, then using backpropagation for many iterations, the result will be much more accurate.


In [6]:
#4

Backpropagation will go back through the layer(s) of the neural network, determine which weights contributed to the output and the error, then change the weights based on the gradient of the hidden layers output. The whole process can be written like this, where y is the correct output and y cap is the hidden layers prediction:

wi + XT.(y - y cap).(S * (1 - S)) where S is 1/(1 + e^-z)

We can now multiply the error and the derivative of the hidden layer’s prediction. We know that the derivative of the sigmoid function is S(x)(1 — S(x)). Therefore, the derivative for each of the hidden layer’s predictions would be y cap*(1 - y cap)
This step will result with the update that will be added to the weights. We can get this update by multiplying our “error weighted derivative” from the above step and the inputs.This step will result with the update that will be added to the weights. We can get this update by multiplying our “error weighted derivative” from the above step and the inputs. Once we have the updated matrix, we can add it to our weights matrix to officially change the weights to become stronger.




In [7]:
#5
      

The time has come to train the neural network. During the training process, the neural network will “learn” which features in the input data correlate with its output, and it will learn to make accurate predictions. To train our neural network, we create a train_neuralnetwork function with the number of iterations to 50,000. This means the neural network will repeat the weight-updating process 50,000 times. Within the train function, we will call our forward_propagation() function, then the back_propagation() function. 

In [8]:
#6                       
    

The prediction function will look similar to the hidden layer. The forward propagation function essentially makes a prediction as well, then backpropagation checks for the error and updates the weights. Our predict function will use the same method as the feedforward function: multiply the input matrix and the weights matrix, then feed the results through the sigmoid function to return a value between 0-1. 

In [9]:
  
NN = NeuralNetwork(input_matrix, output_matrix)
NN.train_neuralnetwork()

We will create our NN object from the NeuralNetwork class and pass in the input matrix and the output matrix. We can then call the .train_neuralnetwork() function on our object.

In [10]:
                                   
first = np.array([[1, 1, 0]])
second = np.array([[0, 1, 1]])

                                   
print(NN.predict(first), ' ANSWER: ', first[0][0])
print(NN.predict(second), ' ANSWER: ', second[0][0])

Now we can create the two new examples that we want our neural network to make predictions for. We will call these “first” and “second”. We can then call the .predict() function and pass through the arrays. We can guess from our original table that the first number in the input determines the output. The first example, “first”, has a 1 in the first column, and therefore the output should be a 1. The second example has a 0 in the first column, and so the output should be a 0.