## Note

Usually it's a good practice to apply following formula in order to find out the total number of hidden layers needed.

Nh = Ns/(α∗ (Ni + No))

where

Ni = number of input neurons.

No = number of output neurons.

Ns = number of samples in training data set.

α = an arbitrary scaling factor usually 2-10.


## Our Values

Ni = 8 (features?)

No = 1 (1 target variable)

Ns = 109 (rows)

a = 5 (starting here and can play around with this number)

so Nh ~ 3

In [8]:
import numpy as np
from keras.layers import Dense, Activation
from keras.models import Sequential
from sklearn.model_selection import train_test_split, KFold
import matplotlib.pyplot as plt
from keras.layers.core import Dropout



# Importing the dataset
dataset = np.genfromtxt('HS_Regents_Sat_Scores_2015.csv', delimiter=',')
X = dataset[1:, :-4]
Y = dataset[1:, -1]
print(Y)

Using TensorFlow backend.


ModuleNotFoundError: No module named 'tensorflow'

In [9]:
# Splitting the dataset into the Training set and Test set
#X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.08, random_state = 0)


counter = 1
d = {1:'first', 2:'second', 3:'third', 4:'fourth', 5:'fifth', 
     6: 'sixth', 7: 'seventh', 8: 'eighth', 9: 'ninth', 10: 'tenth' }

kf = KFold(n_splits=5)
for train, test in kf.split(X):
    
    X_train, X_test, Y_train, Y_test = X[train], X[test], Y[train], Y[test]
    
    # Feature Scaling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)

    # Initialising the ANN
    model = Sequential()

    model.add(Dropout(0.25))
    # Adding the input layer and the first hidden layer
    model.add(Dense(32, activation = 'relu', input_dim = 8))

    # Adding the second hidden layer
    model.add(Dense(units = 32, activation = 'relu'))

    # Adding the third hidden layer
    model.add(Dense(units = 32, activation = 'relu'))

    # Adding the output layer

    model.add(Dense(units = 1))

    #model.add(Dense(1))
    # Compiling the ANN
    model.compile(optimizer = 'adam', loss = 'mean_squared_error')

    # Fitting the ANN to the Training set
    model.fit(X_train, Y_train, batch_size = 10, epochs = 100)
    
    Print("The ", d[counter], " neural network results: ")
    counter+=1
    
    Y_pred = model.predict(X_test)
    plt.plot(Y_test, color = 'red', label = 'Real data')
    plt.plot(Y_pred, color = 'blue', label = 'Predicted data')
    plt.title('Prediction')
    plt.legend()
    plt.show()
    


NameError: name 'KFold' is not defined

Using Dropout Regularization
https://www.analyticsvidhya.com/blog/2018/04/fundamentals-deep-learning-regularization-techniques/

Others available from keras:
https://keras.io/regularizers/


This is the one of the most interesting types of regularization techniques. It also produces very good results and is consequently the most frequently used regularization technique in the field of deep learning.

To understand dropout, let’s say our neural network structure is akin to the one shown below:


So what does dropout do? At every iteration, it randomly selects some nodes and removes them along with all of their incoming and outgoing connections as shown below.



So each iteration has a different set of nodes and this results in a different set of outputs. It can also be thought of as an ensemble technique in machine learning.

Ensemble models usually perform better than a single model as they capture more randomness. Similarly, dropout also performs better than a normal neural network model.

This probability of choosing how many nodes should be dropped is the hyperparameter of the dropout function. As seen in the image above, dropout can be applied to both the hidden layers as well as the input layers.

Source: chatbotslife
Due to these reasons, dropout is usually preferred when we have a large neural network structure in order to introduce more randomness.

In keras, we can implement dropout using the keras core layer. Below is the python code for it:

from keras.layers.core import Dropout

model = Sequential([
 Dense(output_dim=hidden1_num_units, input_dim=input_num_units, activation='relu'),
 Dropout(0.25),

Dense(output_dim=output_num_units, input_dim=hidden5_num_units, activation='softmax'),
 ])
As you can see, we have defined 0.25 as the probability of dropping. We can tune it further for better results using the grid search method.