## Roi Polanitzer learning how to predict anything using ANNs.

## Before you begin

You can do this. Even if you don’t know the first thing about Neural Networks. Even if you’ve never written a line of code.
It’s a lot easier if you’re good at math. No hassle though, if you’re not. I go through each line of code and try to explain the intuition behind it as simply as I can. Of course, it always helps to have some background knowledge, so if you need more help, check out the links at the end of this tutorial.

To start all you need is
- A computer/machine (windows/mac/linux) — (Even android might work)
- Internet to download the requirements (don’t need it for the actual model)

I would advise you to download Python 3.5+ and get anaconda Navigator if you don’t already have either installed.
https://www.python.org/downloads/
https://www.anaconda.com/distribution/
— Also, get-pip

Once you’re in the environment, pip install these packages:
- Keras — pip install Keras or use tf.keras with Tensorflow 2.0
- Sklearn — pip install -U scikit-learn
- Pandas — Comes with anaconda or pip install pandas
- Numpy — Comes with anaconda or pip install numpy
- Matplotlib —Comes with anaconda or pip install matplotlib

If you have all the above, you’re good to go. Once you have learned everything in this article, you will be able to build your own ANN in less than 10 minutes.
If you’re facing any problems, feel free to contact me or leave a comment below. 

# 1. Introduction

## Perceptron/Neuron Overlap

I created this project months ago when I was learning how various regression algorithms work within an Artificial Neural Network Model.
An artificial neural network is an attempt to simulate the network of neurons that make up a human brain so that a computer will be able to learn things and make decisions in a human-like manner.
Artificial Neurons are designed to imitate Neurons in our brains (which have Dendrites, axons, and tails). So in essence, we are creating a model that allows machines to think like humans.

## Tensorflow Playground: Activation Functions

Artificial Neural Networks or ANNs (as I will now call them) are a subset of deep learning. Deep learning is usually associated with having a high number of input layers (equivalent to our senses that gather information), one or more hidden layers (that connect our input layers and perform computational algorithms to determine a probability or otherwise), and one or more output layers (something to predict). The output values can be continuous like in this case, or they can be binary (1 or 0), probabilistic, and even categorical.

The way ANNs work is that they need to have certain weights assigned to each of the input variables. The hidden layers take a sum of the weighted average of these input layers and then apply a hidden activation function.

There are many types of activation functions:
Threshold (binary), Sigmoid (continuous), Rectifier (binary), Hyperbolic Tangent (continuous), SoftMax (Sigmoid for more than 1 output layers).

You can play around with activation functions in the TensorFlow Playground. 

In this ANN model that we’ll be looking at, I used the Rectifier and the Sigmoid function. How did I use both? Here’s the intuition:

Since my output variable is binary, I use the Rectifier function to classify that in my hidden layers, and then I use the Sigmoid function to determine the probability of whether the output will 1 or 0.
The output value and the predicted value will generally be differentiated by a cost function (error).

The goal is to minimize the loss function (cost) since this would bring the predicted value closer to the actual value. This is usually done by changing the weights of the input variables. Sometimes it can take a lot of time and computational power to calculate the actual or global cost function, and it makes sense to use a gradient descent approach to make this process much faster.

## A Gradient Descent

A Gradient descent uses the slope of a loss function at a certain point and tries to move downwards to find the lowest point of the function. However, if my function is not convex (with higher degrees freedom), I could end up at a local minimum rather than the global minimum of the function, and the network wouldn’t be as efficient.

Therefore, I use the stochastic gradient descent method, which runs the function for each and every row and keeps updating the minimum of the cost function. This way, I have a higher chance of finding the global minimum. It is also actually faster than the gradient function since it is running smaller algorithms.

# 2. Importing Data and Preprocessing

In [1]:
# Importing the libraries
import pandas as pd
import numpy as np

In [2]:
df=pd.read_csv('diamonds.csv')

In [3]:
df.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


In [4]:
X = df[['carat','cut','color','clarity','depth','table','x','y','z']]

In [5]:
y = df['price']

## 2.1. Encoding categorical variables

In [6]:
# Encoding the Independent Variables
from sklearn.preprocessing import LabelEncoder
labelencoder_X_1 = LabelEncoder() 
# Converting string labels into numbers.
X['cut']=labelencoder_X_1.fit_transform(X['cut'])
X['color']=labelencoder_X_1.fit_transform(X['color'])
X['clarity']=labelencoder_X_1.fit_transform(X['clarity'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [7]:
X.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,x,y,z
0,0.23,2,1,3,61.5,55.0,3.95,3.98,2.43
1,0.21,3,1,2,59.8,61.0,3.89,3.84,2.31
2,0.23,1,1,4,56.9,65.0,4.05,4.07,2.31
3,0.29,3,5,5,62.4,58.0,4.2,4.23,2.63
4,0.31,1,6,3,63.3,58.0,4.34,4.35,2.75


## 2.2. Splitting the Data into Train and Test sets

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 42)

In [9]:
X_train.shape #shape of X_train

(36139, 9)

In [10]:
X_test.shape #shape of X_test

(17801, 9)

In [11]:
y_train.shape #shape of y_train

(36139,)

In [12]:
y_test.shape #shape of y_test

(17801,)

## 2.3. Feature Scaling

In [13]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# 3. Importing Keras and Libraries for the ANN

In this next section, I’m going to import Keras, which is the most important package I need for my ANN. I’ll use two classes that will help us define the ANN throughout the next section: Sequential and Dense. I won’t go into much detail about these classes, as the code will show what they are doing.

Note: This is an old version of Keras that runs with backend Tensorflow 1. So if you installed Tensorflow 2, you can either downgrade or see the documentation to learn about any changes.

In [14]:
import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


# 4. Building the ANN

The following steps will show how I went about building this artificial neural network:

1. I will start by initializing the ANN using the sequential class
2. I will add the input layer along with the first hidden layer.
3. I will add another second hidden layer
4. Now I will add the output layer.
5. After adding the layers, I will compile the ANN model
6. Finally, I will fit the ANN model to the training set. The model will then train itself based on the number of epochs I mention.
7. Evaluation: I will create a predictor variable and a evaluate the results predicted by the machine and compare them with the actual results.

## 4.1. How to mathematically create an ANN

1. Randomly initialize the weights to small numbers close to 0 (but not 0).
2. Input the first observation of your dataset in the input layer. each feature in one input node.
3. Forward-Propagation: from left to right. the neurons are activated in a way that the impact of each neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result y.
4. Compare the predicted result to the actual result. Measure the generated error.
5. Back-Propagation: from right to left, the error is back-propagated. Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.
6. Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or: Repeat Steps 1 to 5 but updates the weights only after a batch of observations (Batch Learning).
7. When the whole training set passed through the ANN. that makes an epoch. Redo more epochs.

## 4.2. Initialization

So there are actually 2 ways of initializing a model: either with Sequential Layers, like I did above, or the other method is to do it by a graph. The step below is essentially initializing the model as a sequence of layers.
I create the object, which is basically the Artificial Neural Network that I’m about to build.

In [15]:
#Initializing the Artificial Neural Network
regressor = Sequential()

## 4.3. Adding the input layer and the first hidden layer

In the steps below, I used the add method of the object to include the Dense class in the classifier object. Dense is essentially what is allowing us to create the layers for the model.

Now, upon inspecting the Dense class, I can see there are a number of parameters, but as the mathematical steps above show us, I know already which parameters to input for the model.

So I will use the following for the input layer and the first hidden layer:

1. output_dim (output dimensions):
This is simply the number of nodes I want to add in the hidden layer. I had previously learned that there is no right answer to this as experimentation can allow us to choose the right number of nodes, however, in this project, I took the average sum of the number of input and output layers, (9 + 1)/2 = 5.

2. init (random initialization):
This is the first step of the stochastic gradient descent. I need to initialize the weights to small numbers close to 0. The default value for this parameter is given as "glorot_uniform", but for simplification, I will use the "uniform" function, which will initialize the weights according to a uniform distribution.

3. activation:
As the name suggests, this is the activation function. In the first hidden layer, we want to use the rectifier activation function as I had mentioned in the introduction and that's why I input relu in this parameter.

4. input_dim (input dimensions):
This is the number of nodes in the input layer, which I already know is 9.

In [16]:
#Adding the input layer and a hidden layer
regressor.add(Dense(6, activation='relu', kernel_initializer='glorot_uniform',input_dim=9))

## 4.4. Adding the second hidden layer

For this hidden layer, I use the add method on the classifier object again.
Using the dense function, I have a similar line of code, but the only difference is that this time there’s no need to specify the number of input layers since the model already knows how many layers to expect as I have already added the input layer to the model.

In [17]:
#Adding second hidden layer
regressor.add(Dense(5, activation='relu', kernel_initializer='glorot_uniform'))

## 4.5. Adding the output layer

The final layer that we need to code into the model is the output layer. This process will again use the same add method with the Dense class.

However, this time the number of nodes is changed to 1 since there is only oneoutput variable (1 or 0) in this layer, it will only have 1 node.

In [18]:
#Adding output layer
regressor.add(Dense(1, kernel_initializer='glorot_uniform', activation='relu'))

## 4.6. Compiling the ANN model

This time I use the Compile method on the classifier object and I input the following parameters:

1. Optimizer: This the algorithm I want to use to find the optimal set of weights for the ANN model. The model’s layers have been built, but the weights have only been initialized. Therefore, it is important to use an optimizer to find the right combination of weights. rmsprop is one of the stochastic gradient descent algorithms, and that is the one I will use to find the optimal set of weights for this model.

2. Loss: This corresponds to the loss function within the Stochastic gradient descent algorithm.The basic idea of this is that we need to optimize this loss function within the algorithm to find the optimal weights. For example, in linear regression, I use the sum-of-squares loss function to optimize the model. However, for the stochastic gradient descent, I use a logarithmic function known as root_mean_squared_error since we have a continuous output layer.

3. Metrics: Just the criterion metric I use to evaluate the model. I can use the accuracy model (which sees correct predictions over total predictions). So, I input 'accuracy' in the metrics parameter. Since this is expecting a list, I would have to put it in square brackets.

In [19]:
#Compiling the artificial neural network
from keras.losses import mean_squared_error
from keras import backend as K

def root_mean_squared_error(y_true, y_pred):
    return K.sqrt(mean_squared_error(y_true, y_pred))

regressor.compile(optimizer = "rmsprop", loss = root_mean_squared_error, metrics =["accuracy"])

## 4.7. Fitting the ANN model to the training set

Now I will fit the model to the training dataset and will run the model to a certain number of epochs.

I start by using the fit method to fit the regressor model to X_train and y_train. Then, I add two more parameters, which are the batch size and the number of epochs. If you look back at the beginning of this section, steps 6 and 7 refer to these parameters.

In step 6, we can choose to update the weights after every observation or every batch. So for this step, I’ll use batches of 10 to update the weights.

Step 7 tells us that we need to pass the whole training set to more than just 1 epoch. Epoch refers to one round of the entire dataset going through the ANN. I chose 90 epochs for this as choosing these values can be an experimentative process.

In [20]:
#Fitting artifical neural network to the training set
regressor.fit(X_train, y_train, batch_size = 10, epochs = 90)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Epoch 1/90
Epoch 2/90
Epoch 3/90
Epoch 4/90
Epoch 5/90
Epoch 6/90
Epoch 7/90
Epoch 8/90
Epoch 9/90
Epoch 10/90
Epoch 11/90
Epoch 12/90
Epoch 13/90
Epoch 14/90
Epoch 15/90
Epoch 16/90
Epoch 17/90
Epoch 18/90
Epoch 19/90
Epoch 20/90
Epoch 21/90
Epoch 22/90
Epoch 23/90
Epoch 24/90
Epoch 25/90
Epoch 26/90
Epoch 27/90
Epoch 28/90
Epoch 29/90
Epoch 30/90
Epoch 31/90
Epoch 32/90
Epoch 33/90
Epoch 34/90
Epoch 35/90
Epoch 36/90
Epoch 37/90
Epoch 38/90
Epoch 39/90
Epoch 40/90
Epoch 41/90
Epoch 42/90
Epoch 43/90
Epoch 44/90
Epoch 45/90
Epoch 46/90
Epoch 47/90
Epoch 48/90
Epoch 49/90
Epoch 50/90
Epoch 51/90
Epoch 52/90
Epoch 53/90
Epoch 54/90
Epoch 55/90
Epoch 56/90
Epoch 57/90
Epoch 58/90
Epoch 59/90
Epoch 60/90
Epoch 61/90
Epoch 62/90
Epoch 63/90
Epoch 64/90
Epoch 65/90
Epoch 66/90
Epoch 67/90
Epoch 68/90
Epoch 69/90
Epoch 70/90
Epoch 71/90
Epoch 72/90
Epoch 73/90
Epoch 74/90
Epoch 75/90
Epoch 76/90
Ep

<keras.callbacks.callbacks.History at 0x181640be588>

# 5. Predicting Results & Evaluating the Model

Now the model has already run, and I will create a variable, y_pred to store the machine’s predictions. For this, I used the Predict method on the X_test dataset to get values corresponding to y_test.

In [21]:
# Predicting the Test set results
y_pred = regressor.predict(X_test)
y_pred[:10]

array([[  432.82385],
       [ 2126.3196 ],
       [ 1311.3177 ],
       [ 1460.663  ],
       [11893.414  ],
       [ 4820.8013 ],
       [ 1542.7994 ],
       [ 1812.9982 ],
       [ 2230.1882 ],
       [ 5264.267  ]], dtype=float32)

## 5.1. The accuracy, mse, rmse and mae of the model.

In [22]:
#Accuracy
from sklearn.metrics import r2_score
Accuracy = r2_score(y_test, y_pred)
Accuracy

0.9186743904364041

In [23]:
#RMSE
from sklearn.metrics import mean_squared_error
MSE = mean_squared_error(y_pred, y_test)
RMSE = np.sqrt(MSE)
RMSE

1126.8183118546062

In [24]:
#MAE
from sklearn.metrics import mean_absolute_error
MAE = mean_absolute_error(y_pred, y_test)
MAE

566.2885053073293