<a href="https://colab.research.google.com/github/SD0313/StartOnAI/blob/master/Neural_Net_From_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Neural Network from Scratch Tutorial in Python**
###### Created by **(Karthik Bhargav, Keshav Shah, Sauman Das)** for [StartOnAI](https://startonai.com/)
---


#Overview

We will cover the following topics in this notebook.


*   Theory of how Perceptrons work and Learn
*   Coded Walkthrough of a Neural Network
*   Model the Wisconsin Breast Cancer Dataset
*   Review Various Applications of Neural Networks

So stay tuned!



# What Are Neural Networks?


In simple terms, neural networks are representative of the human brain, and they are specificially made to recognize patterns. They interpret data through various models. The patterns that these models detect are all numerical specifically in the form of vectors. 

Neural networks are extremely helpful for performing tasks involving clustering and classification. Because of the networks similarity to the human brain, it is able to recognize patterns in unlabeled data.

We will start off by investigating the most basic Nueral Network: **The Perceptron**

## Perceptrons

<img src="https://tinyurl.com/ybcfd78e" alt="perceptron" width="400"/>

[1]



The Perceptron consists of two main components
1.   Neurons ($x_i$)
2.   Weights ($w_i$)

Perceptrons represent the most basic form of a Neural Network with only two layers, the input and output layer.  As shown in the diagram above, both layers are joined by weights represented by the arrows. Each individual neuron represents a number. For example, if there are three inputs, the input layer will consist of 3 neurons plus an additional bias neuron. The importance of the bias ($b$) will become clear later in this tutorial. The output layer simply consists of one neuron in this scenario which represents the number we are attempting to predict. 




**Forward Propagation**

The process of going from the input layer to the output is known as Forward Propagation. To simplify the computations, we will use vector notation to represent the input features and the weights.

  $\vec{x}=\begin{bmatrix}  x_1 & x_2 & ... & x_n\end{bmatrix}$


  $\vec{w}=\begin{bmatrix}  w_1 & w_2 & ... & w_n \end{bmatrix}$

  Finally, to get the value of the output neuron, we simply take the dot product of these two vectors and add the bias. 

  $z=\vec{x}\cdot\vec{w}+b=x_1\times w_1+x_2\times w_2+...+x_n\times w_n+b$






**The Bias Term**

To get a better understanding of this output, lets analyze it with just one input neuron. In other words, our output neuron will store the following.

$z=x_1\times w_1+b$

If we visualize this in two dimensional space, we know that this will represent a line with slope $w_1$ and intercept $b$. We can now easily see the role of the bias. Without it, our model would always go through the origin. Now, we can shift our model along the axes giving us more flexibility while training. However, we are still only able to represent linear models. To add non-linearities to our model we use an activation function.



**Activation Functions**

Lets imagine that we are solving a binary classification problem. This means the range of our output $\hat{y}$ (predicted value) must be $(0, 1)$ since we are predicting a probablity that the input belongs to a certain class. However, the range of a linear equation is $(-\infty, \infty)$. Therefore, we must apply some other function to satisfy this constraint. In binary classification problems, the most common activation function is called the sigmoid function. 

$\sigma(x)=\frac{1}{1+e^{-x}}$


<img src="https://tinyurl.com/ycggxehs" alt="sigmoid_graph" width="400"/>

As you can see in this graph, $\sigma(x)\in(0, 1)$. This activation function makes it possible to predict a probablity for a binary output. As you go further into machine learning, you will see several other activation functions. The most common ones other than sigmoid are ReLU, tanh, and softmax.


**The Output**

Now that we know all the parts of the perceptron, lets see how to get the final output. After forward propogation, we saw the output was

  $z=\vec{x}\cdot\vec{w}+b=x_1\times w_1+x_2\times w_2+...+x_n\times w_n+b$

Finally, we must apply the activation function to get our final output.

$\hat{y}=\sigma(z)$

That is all there is to getting the output from a perceptron! To sum it up in three simple steps:



1.   Get the dot product of the weights and the input features $(\vec{x}\cdot\vec{w})$.
2.   Add the bias $(\vec{x}\cdot\vec{w}+b)$.
3.   Apply the activation function and that is the predicted value $(\hat{y}=\sigma(\vec{x}\cdot\vec{w}+b))$!

So far we know how to take the input values, and return the corresponding output. However, we must adjust the weights to make the network fit the training data. The process of making these adjustments is known as **back propagation**.



**Loss Function**

In order to adjust our weights, first we must figure out a way to numerically signify the accuracy of our prediction. In other words, we need to figure out how close our predicted value to the actual value. Several functions exist for accomplishing this task, however, the most common loss function for binary problems is called **Binary Cross-Entropy**.

$\mathcal{L}(y, \hat{y})=-(y\log(\hat{y}) + (1-y)\log(1-\hat{y}))$

Where $y$ is the actual value (0 or 1) and $\hat{y}$ is the predicted probablity. Looking closer at this equation, we can see that the first term will cancel out if $y=0$ and similarly the second term will cancel out if $y=1$. Therefore, we can write the same equation as a piecewise function.

$\mathcal{L}(y, \hat{y})=\begin{cases}-\log(1-\hat{y}) & \text{if $y=0$} \\-\log(\hat{y}) & \text{if $y=1$}\end{cases}$

Keep in mind that $\hat{y}$ is a decimal value in the range $(0, 1)$. The $\log$ function returns a negative number for such values. As a result, we must take the negative of the log to return a positive value. 

To see why this function works as the error, try experimenting in the next code cell with different values of $y$ and $\hat{y}$ then analyze the corresponding loss function value.

In [None]:
import numpy as np
def binary_crossentropy(y, yhat):
  #code is derived from the piecewise function
  if y == 0:
    return -np.log(1.0-yhat)

  if y == 1:
    return -np.log(yhat)

y = 1 #@param [0, 1] {type:"raw"}
yhat = 0.47 #@param {type:"slider", min:0, max:1, step:0.01}

print(f'Loss: {binary_crossentropy(y, yhat)}')


Loss: 0.7550225842780328



![NN](https://drive.google.com/uc?export=view&id=1EHA2P4kLUQm_FkpYskyJ6QTSskjiaSeo)

[2]

Example of how a neural network can be visualized!


# Code

The following code is us building a neural network from scratch on the Wisconsin Breast Cancer dataset. 

## Imports

We begin the neural network here by importing some necessary libraries that will allow us to actually create the virtual NN, and also display what goes on internally to maximize the accuracy of the NN.

- What is the purpose of each library?

  - The tensorflow and sklearn libraries are used to properly initialize the neural network and the necessary algorithm needed. 
  - From the sklearn library, we import the breast cancer dataset. 
  - We import matplotlib, pandas, and numpy which help us organize and visualize the data and outputs. 

In [8]:
# Loading in the data
import sklearn
from sklearn.datasets import load_breast_cancer 

# Visualization
import matplotlib as mpl   
import matplotlib.pyplot as plt
import pandas as pd

# Building the network 
import numpy as np

# Progress Bar
import tqdm as tqdm


## Loading Dataset, Preprocessing, Visualizing data 

Adjust the slider to view different portions of the data.

In [9]:
data = load_breast_cancer() #read data from sklearn, documentation: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html
full_data = data.data #input features
full_target = data.target #labels
full_df = pd.DataFrame(full_data, columns=data.feature_names) #convert to panda dataframe
full_df['target'] = full_target
start_index = 163 #@param {type:"slider", min:0, max:564, step:1}
full_df[start_index:start_index+5]

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
163,12.34,22.22,79.85,464.5,0.1012,0.1015,0.0537,0.02822,0.1551,0.06761,0.2949,1.656,1.955,21.55,0.01134,0.03175,0.03125,0.01135,0.01879,0.005348,13.58,28.68,87.36,553.0,0.1452,0.2338,0.1688,0.08194,0.2268,0.09082,1
164,23.27,22.04,152.1,1686.0,0.08439,0.1145,0.1324,0.09702,0.1801,0.05553,0.6642,0.8561,4.603,97.85,0.00491,0.02544,0.02822,0.01623,0.01956,0.00374,28.01,28.22,184.2,2403.0,0.1228,0.3583,0.3948,0.2346,0.3589,0.09187,0
165,14.97,19.76,95.5,690.2,0.08421,0.05352,0.01947,0.01939,0.1515,0.05266,0.184,1.065,1.286,16.64,0.003634,0.007983,0.008268,0.006432,0.01924,0.00152,15.98,25.82,102.3,782.1,0.1045,0.09995,0.0775,0.05754,0.2646,0.06085,1
166,10.8,9.71,68.77,357.6,0.09594,0.05736,0.02531,0.01698,0.1381,0.064,0.1728,0.4064,1.126,11.48,0.007809,0.009816,0.01099,0.005344,0.01254,0.00212,11.6,12.02,73.66,414.0,0.1436,0.1257,0.1047,0.04603,0.209,0.07699,1
167,16.78,18.8,109.3,886.3,0.08865,0.09182,0.08422,0.06576,0.1893,0.05534,0.599,1.391,4.129,67.34,0.006123,0.0247,0.02626,0.01604,0.02091,0.003493,20.05,26.3,130.7,1260.0,0.1168,0.2119,0.2318,0.1474,0.281,0.07228,0


In [None]:
# Using the ReLu activation function for the hidden layers
def relu(x):
  return np.maximum(0, x)

# Derivative of ReLu for Forward Propagation
def relu_derivative(x):
  if x > 0:
    return 1
  elif x <= 0:
    return 0

# Binary CrossEntropy for the output layer
def binary_crossentropy(y, yhat):
  if y == 0:
    return -np.log(1.0-yhat)

  if y == 1:
    return -np.log(yhat)





## Forward/Back Prop

In [None]:
def forward_prop(x):
  return 0.01

def back_prop(x):
  return 0.99

## Visualizing the Train/Test loss and Accuracy

# Applications

In this section, we will cover some applications of neural networks. These will be 


*   Medicine
*   Robotics
*   Finance
*   Understanding Natural Language




# References


[1] 

[2]

[3]

[4]

[5]