# Artificial Neural Networks: Example of use of a Multilayer Perceptron

For this example, we will use the already known dataset about USA housing. This is the real estate problem, and our goal is to help the real estate agent predict housing prices for regions in the USA.

We have used Linear Regression in this context; now, we are going to try Artificial Neural Networks. Let's try using Multilayer Perceptrons.

### The data

It will be used data frame with 5000 observations on the following 7 variables:

* **Avg. Area Income** - Avg. Income of residents of the city house is located in.
* **Avg. Area House Age** - Avg Age of Houses in same city
* **Avg. Area Number of Rooms** - Avg Number of Rooms for Houses in same city
* **Avg. Area Number of Bedrooms** - Avg Number of Bedrooms for Houses in same city
* **Area Population** - Population of city house is located in
* **Price** - Price that the house sold at
* **Address** - Address for the house

## Import libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Get the data

Create the data frame

In [None]:
USAhousing = pd.read_csv('USA_Housing.csv')

In [None]:
USAhousing.info()

Since the feature <em>Address</em> is the one categoric and not needed for the purpose of the exercise, let's drop it:

In [None]:
USAhousing.drop('Address', axis = 1, inplace=True)

In [None]:
USAhousing.head()

In [None]:
USAhousing.describe()

## EDA

Create the histogram of the target - column <em>Price</em>:

In [None]:
sns.histplot(USAhousing['Price'])

Create a heatmap of the features:

Create a pairplot to visualize relations:

## Train Test Split

Define <em>X</em> and <em>Y</em>:

In [None]:
X = USAhousing.drop('Price', axis=1)
Y =  USAhousing[['Price']]

Divide the subsets of test and training data:

In [None]:
from sklearn.model_selection import GridSearchCV, KFold, train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2021)

## Artificial Neural Network (under the form of a MLP)

Install <em>tensorflow</em> if needed: <code>pip install tensorflow</code>

Install <em>keras</em> if needed: <code>pip install --upgrade keras</code>

Install <em>scikeras</em> if needed: <code>pip install scikeras[tensorflow]</code>

### Import more libraries

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense#, Dropout, BatchNormalization
from scikeras.wrappers import KerasRegressor#, KerasClassifier

In [None]:
print('TensorFlow version:', tf.__version__)

### Structure the MLP

Define a model with:
- <em>ReLu</em> as activation function
- sequential topology
- three layers
- <em>MAE</em> as loss function
- <em>Adam</em> as optimizer
- learning rate of <em>0.01</em>
- <em>MAE</em> and <em>MSE</em> as metrics

In [15]:
def build_model(activation='relu', learning_rate=0.01):
    model = Sequential()
    model.add(Dense(16, input_dim = 5, activation = activation))
    model.add(Dense(8, activation = activation))
    model.add(Dense(1, activation = activation)) # output
    
    #Compile the model
    model.compile(
        loss = 'mae',
        optimizer = tf.optimizers.Adam(learning_rate),
        metrics = ['mae', 'mse'])
    return model

Build the model:

In [16]:
model = build_model()
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 16)                96        
                                                                 
 dense_1 (Dense)             (None, 8)                 136       
                                                                 
 dense_2 (Dense)             (None, 1)                 9         
                                                                 
Total params: 241 (964.00 Byte)
Trainable params: 241 (964.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


We will use <em>GridSearchCV</em> to tune the model

### GridSearchCV

Define the grid parameters using a dictionary:

In [None]:
optimizer = ['SGD', 'RMSprop', 'Adagrad']
param_grid = dict(optimizer = optimizer)

Define a <em>KFold</em> with <em>5 splits</em>, <em>shuffle</em> and a <em>random state</em>:

In [None]:
kf = KFold(n_splits=5, shuffle=True, random_state=2023)

Use a <em>KerasRegressor</em> with a <em>batch size</em> of <em>32</em>, <em>validation split</em> of <em>0.2</em> and <em>20 epochs</em>:

In [None]:
model = KerasRegressor(model = build_model, batch_size = 32, validation_split = 0.2, epochs = 20)

Compute a <em>GridSearchCV</em> with <em>NegMAE scoring</em>, <em>refit</em> and a <em>verbose</em> of <em>1</em>:

In [None]:
grid_search = GridSearchCV(estimator=model, param_grid= param_grid, cv=kf, scoring='neg_mean_absolute_error', refit='True', verbose=1)

Fit the model:

In [None]:
grid_search.fit(X_train, y_train)

#### Best results

Find the <em>best score</em> and the <em>best params</em>:

In [None]:
print("Best: %f using %s" % (grid_search.best_score_, grid_search.best_params_))

Find the <em>mean test score</em>, <em>std test score</em> and <em>params</em> for each search:

In [23]:
means = grid_search.cv_results_['mean_test_score']
stds = grid_search.cv_results_['std_test_score']
params = grid_search.cv_results_['params']

In [None]:
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Find the best model:

In [None]:
best_mlp_model = grid_search.best_estimator_
print(best_mlp_model)

### Use the best model

Fit the best model:

In [None]:
best_mlp_model.fit(X_train, y_train, epochs = 20, validation_data = (X_test, y_test), verbose = 1)

In [None]:
plt.plot(best_mlp_model.history_['loss'])
plt.plot(best_mlp_model.history_['val_loss'])
plt.title('model performance')
plt.ylabel('loss values')
plt.xlabel('epoch')
plt.legend(['train','val'], loc='upper left')
plt.show()

In [None]:
pd.DataFrame(best_mlp_model.history_).plot(figsize = (8,5))
plt.show()

### Predictions

Obtain the predictions:

In [None]:
predictions = best_mlp_model.predict(X_test)

Print the first five:

### Evaluate the model

In [1]:
from sklearn import metrics

Assess by MAE, MSE and RMSE:

In [None]:
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_absolute_error(y_test, predictions)))

Scatter the real values with the predictions:

In [None]:
plt.scatter(y_test, predictions)

Create a visualization of the actual and predicted results. Limit it to 200 comparisons:

In [None]:
def real_predicted_viz(limit):
    plt.figure(figsize=(14,6))
    plt.plot(y_test[:limit], color='green', label = 'Actual')
    plt.plot(predictions[:limit], color = 'red', label = 'Predicted')
    plt.grid(alpha = 0.3)
    plt.xlabel('Houses')
    plt.ylabel('Price')
    plt.title('Real vs Predicted')
    plt.legend()
    plt.show()

In [None]:
real_predicted_viz(200)

# Data scaling

**Data scaling** or **normalization** is a process of making model data in a standard format so that the training is improved, accurate, and faster.

Artificial neural networks are "picky" - they prefer scaled data!
Therefore, and since our data have a large variation of values, let's scale the data to be in the interval between [0, 1]:

In [None]:
from sklearn.preprocessing import MinMaxScaler

Visualize the head of <em>X</em>:

And now scaled:

Visualize the head of <em>y</em>:

And now scaled:

## Train Test Split

Divide the subsets of test and training data:

## MLP

Build the model:

#### Best results

Find the <em>best score</em> and the <em>best params</em>:

Find the <em>mean test score</em>, <em>std test score</em> and <em>params</em> for each search:

Find the best model:

### Use the best model

Fit the best model:

### Predictions

Obtain the predictions:

Print the first five:

Unscale the predictions to see the real prices:

Print the first five:

Unscale <em>y_test</em> to get the original values:

Print the first five:

### Evaluate the model

Assess by MAE, MSE and RMSE:

Scatter the real values with the predictions:

Create a visualization of the actual and predicted results. Limit it to 200 comparisons:

##
#### Compare the results with the ones obtained with the Linear Regression model created in class 4.

#### **Which model performed better?**