## ANN Regressor

An ANN regressor is a type of artificial neural network (ANN) that is used for regression tasks.

ANN is composed of layers of interconnected "neurons," which process and transmit information through the network. Each neuron receives input from the previous layer, processes it using an activation function, and then passes it on to the next layer. The output layer of the ANN regressor produces a prediction for the desired output value, in this case the prediction is on the Sale price of Ames housing database.

The weights and biases of the neurons are adjusted during training, using an optimization algorithm such as Adam.
The goal of training is to find a set of weights and biases that minimizes the error between the predicted output and the true output values in the training data.

Once trained, the ANN regressor can be used to make predictions on new input data by passing it through the network and using the trained weights and biases to calculate the predicted output value.


In [59]:
from sklearn.model_selection import train_test_split
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras import optimizers
import pandas as pd
from tensorflow_addons.metrics import RSquare
import tensorflow as tf

X_train = pd.read_csv('x_train_preprocessed_minmax.csv')
X_test = pd.read_csv('x_test_preprocessed_minmax.csv')
y_train = pd.read_csv('y_train_preprocessed_minmax.csv')
y_test = pd.read_csv('y_test_preprocessed_minmax.csv')

y_train = y_train.to_numpy().flatten() #y_train flattened, works better


oh_neighbor = []
for col in X_train.columns:
    if 'Neighborhood_b' in col:
        oh_neighbor.append(col)

X_train.drop(columns=oh_neighbor, inplace=True)
X_test.drop(columns=oh_neighbor, inplace=True)

porch = ['Wood_Deck_SF', 'Open_Porch_SF', 'Enclosed_Porch', 'Three_season_porch', 'Screen_Porch']
surface = ['Total_Finished_Bsmt_SF', 'First_Flr_SF', 'Second_Flr_SF', 'Garage_Area']
baths = ['Full_Bath', 'Half_Bath', 'Bsmt_Full_Bath', 'Bsmt_Half_Bath']

X_train.drop(columns=porch, inplace=True)
X_test.drop(columns=porch, inplace=True)

X_train.drop(columns=surface, inplace=True)
X_test.drop(columns=surface, inplace=True)

X_train.drop(columns=baths, inplace=True)
X_test.drop(columns=baths, inplace=True)



X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.40, random_state=12)




All the feature which have been encoded are removed, this solution significatively reduces the loss parameter.

## Network initialization and setup

Using single hidden layer with half of the units of the input one.
The output layer has a single units due to the necessity to predict only the Sale Price.

For the input and hidden layers the activation function chosen is relu because of its good speed of train and simplicity to compute.
The output has instead the linear activation function for predicting the float value.

The kernel initializer is always normal, so it use a normal distribution to generate tensors.
- Parameters of layers
    1) units = number of neurons
    2) kernel_initializer = define the distribution used to inizialize the layer
    3) activation = activation is a function that is applied to the output do decide the use or not of the neuron
    4) input_shape = reflect the number of feature in the dataset
- Parameters of optimizer
    1) learning_rate = step size at which the algorithm makes update to the model / how fast the model learn , slower learner requires more epoch
    2) beta_1 = momentum term in Adam algorithm, larger value more enphasis on the past
    3) beta_2 = decay rate

In [60]:
model = models.Sequential()

model.add(layers.Dense(units=188, kernel_initializer='normal', activation='relu', input_shape=[188]))
model.add(layers.Dense(units=95, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=1, kernel_initializer='normal', activation=None)) #last layer should be linear

opt = optimizers.Adam(learning_rate=0.0015, beta_1=0.9, beta_2=0.999)

## Model evaluation

In [61]:
model.compile(loss='mean_squared_logarithmic_error', optimizer=opt, metrics=[RSquare()])
model.fit(X_train, y_train,epochs=140,validation_data=(X_val,y_val))
loss = model.evaluate(X_test, y_test)
print('Test loss:', loss)

Epoch 1/140
Epoch 2/140
Epoch 3/140
Epoch 4/140
Epoch 5/140
Epoch 6/140
Epoch 7/140
Epoch 8/140
Epoch 9/140
Epoch 10/140
Epoch 11/140
Epoch 12/140
Epoch 13/140
Epoch 14/140
Epoch 15/140
Epoch 16/140
Epoch 17/140
Epoch 18/140
Epoch 19/140
Epoch 20/140
Epoch 21/140
Epoch 22/140
Epoch 23/140
Epoch 24/140
Epoch 25/140
Epoch 26/140
Epoch 27/140
Epoch 28/140
Epoch 29/140
Epoch 30/140
Epoch 31/140
Epoch 32/140
Epoch 33/140
Epoch 34/140
Epoch 35/140
Epoch 36/140
Epoch 37/140
Epoch 38/140
Epoch 39/140
Epoch 40/140
Epoch 41/140
Epoch 42/140
Epoch 43/140
Epoch 44/140
Epoch 45/140
Epoch 46/140
Epoch 47/140
Epoch 48/140
Epoch 49/140
Epoch 50/140
Epoch 51/140
Epoch 52/140
Epoch 53/140
Epoch 54/140
Epoch 55/140
Epoch 56/140
Epoch 57/140
Epoch 58/140
Epoch 59/140
Epoch 60/140
Epoch 61/140
Epoch 62/140
Epoch 63/140
Epoch 64/140
Epoch 65/140
Epoch 66/140
Epoch 67/140
Epoch 68/140
Epoch 69/140
Epoch 70/140
Epoch 71/140
Epoch 72/140
Epoch 73/140
Epoch 74/140
Epoch 75/140
Epoch 76/140
Epoch 77/140
Epoch 78

## Predicted values

In [62]:
from sklearn.metrics import r2_score
y_pred = model.predict(X_test)
print("Score on test split:",r2_score(y_test, y_pred))

Score on test split: 0.4930839397071828


# Adding layers | overfitting detector / auto_LR_reduction
1) Add more layer, trying with 5
2) Add overfitting detector that will stop before complete all the epoch if needed
3) Add learning rate optimizer that will reduce the LR automatically
4) It is a general practice to reduce the amount of neurons to half of the previous layer

In [63]:
model = models.Sequential()

overfitting_detector = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                        min_delta=0,# minimum change to qualify an improvement
                                                        patience=3, #number of epoch after training will be stopped if we have no improvement
                                                        verbose=1,
                                                        mode='auto', # let algorithm decide when to stop based on quantity monitored
                                                        baseline=None, # we decide to not use any threshold for stopping
                                                        restore_best_weights=False) #restore last model from the best quality epoch previosly found

adjsutable_learning_rate =tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",
                                                               factor=0.4, # factor that decide the reduction of LR
                                                               patience=3, #number of epoch after LR will be reduced
                                                               verbose=1,
                                                               mode="auto",
                                                               min_delta=0.0001, # generic used parameter with this particular value (default)
                                                               cooldown=0, # make reduce LR always active
                                                               min_lr=0) # decide if specify a lower bound to the LR

model.add(layers.Dense(units=189, kernel_initializer='normal', activation='relu', input_shape=[188]))
model.add(layers.Dense(units=95, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=47, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=28, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=1, kernel_initializer='normal', activation=None))

# Increment learning rate because we have a way to stop before the end of all epochs = overfitting detector
# Also the learning rate will be automatically reduced by auto_LR_reduction
opt = optimizers.Adam(learning_rate=0.008,beta_1=0.9, beta_2=0.999)


model.compile(loss='mean_squared_logarithmic_error', optimizer=opt, metrics=[RSquare()])
model.fit(X_train, y_train,epochs=500,
          callbacks=[overfitting_detector,adjsutable_learning_rate],
          workers=50, # number of thread parallel if cpu have all used, it will start use GPU
          validation_data=(X_val,y_val)) #split used for test the model

loss = model.evaluate(X_test, y_test)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

Note how the new model is improved in velocity and score thanks to auto_LR_reduction and overfitting detector

# Adding more layer with common half neurons technique

In [64]:
model = models.Sequential()

overfitting_detector = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                                        min_delta=0,# minimum change to qualify an improvement
                                                        patience=3, #number of epoch after training will be stopped if we have no improvement
                                                        verbose=1,
                                                        mode='auto', # let algorithm decide when to stop based on quantity monitored
                                                        baseline=None, # we decide to not use any threshold for stopping
                                                        restore_best_weights=False) #restore last model from the best quality epoch previosly found

adjsutable_learning_rate =tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",
                                                               factor=0.4, # factor that decide the reduction of LR
                                                               patience=3, #number of epoch after LR will be reduced
                                                               verbose=1,
                                                               mode="auto",
                                                               min_delta=0.0001, # generic used parameter with this particular value (default)
                                                               cooldown=0, # make reduce LR always active
                                                               min_lr=0) # decide if specify a lower bound to the LR

model.add(layers.Dense(units=189, kernel_initializer='normal', activation='relu', input_shape=[188]))
model.add(layers.Dense(units=95, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=47, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=28, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=14, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=7, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=3, kernel_initializer='normal', activation='relu'))
model.add(layers.Dense(units=1, kernel_initializer='normal', activation=None))

# Increment learning rate because we have a way to stop before the end of all epochs = overfitting detector
# Also the learning rate will be automatically reduced by auto_LR_reduction
opt = optimizers.Adam(learning_rate=0.008,beta_1=0.9, beta_2=0.999)


model.compile(loss='mean_squared_logarithmic_error', optimizer=opt, metrics=[RSquare()])
model.fit(X_train, y_train,epochs=500,
          callbacks=[overfitting_detector,adjsutable_learning_rate],
          workers=50, # number of thread parallel if cpu have all used, it will start use GPU
          validation_data=(X_val,y_val)) #split used for test the model

loss = model.evaluate(X_test, y_test)
print('Test loss:', loss)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 64: ReduceLROnPlateau reducing learning rate to 0.003200000151991844.
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/5