# Tutorial 6 (Introduction to AI)

# Neural Networks: MLP (Part 1)

## 1. Neural Networks

**Neural networks** require their implementor to meet a number of conditions. These conditions include:

- an adequately sized dataset to both train and test the network.
- an understanding of the basic nature of the problem to be solved so that basic first-cut decision on creating the network can be made. These decisions include the activation and the learning methods.
- an understanding of the development tools.
- adequate processing power (some applications demand real-time processing that exceeds what is available in the standard, sequential processing hardware. The development of hardware is the key to the future of neural networks).

Once these conditions are met, neural networks offer the opportunity of solving problems in areas where other approaches might lack either the processing power or a step-by-step methodology to come to a solution.

## Introduction to Keras and TensorFlow

## Keras

[Keras](https://keras.io/api/) is a high level tool for describing neural networks, which comes alongside Tensorflow.  Keras sits as a layer on top of Tensorflow, making it much easier to create neural networks.  Rather than define the graphs, you define the individual layers of the network with a much more high level API.  Unless you are performing research into entirely new structures of deep neural networks it is unlikely that you need to program TensorFlow directly.  

## TensorFlow

TensorFlow is a powerful open-source software library for AI and machine learning developed by  Google. TensorFlow allows distribution of computation across different computers, as well as multiple CPUs and GPUs within a single machine. Here our examples should run on a single core. TensorFlow provides a Python API (as well as a less documented C++ API).

# Classification or Regression

Like many models, neural networks can function in classification or regression:

* **Regression** - You expect a number as your neural network's prediction.
* **Classification** - You expect a class/category as your neural network's prediction.

The following shows a classification and regression neural network:

![class_2_ann_class_reg.png](attachment:class_2_ann_class_reg.png)

Notice that the output of the regression neural network is numeric and the output of the classification is a class.  Regression, networks always have a single output.  Classification neural networks have an output neuron for each class.

First, let's check TensorFlow:

In [None]:
import tensorflow as tf
from tensorflow import keras
print("Tensor Flow Version: {}".format(tf.__version__))

print(f"Keras Version: {keras.__version__}")

Tensor Flow Version: 2.6.0
Keras Version: 2.6.0


## 1. Neural Network regression using MPG

This week we will return to some familiar examples: the auto-mpg dataset and the iris dataset, and use them to illustrate neural networks using Keras.

This example shows how to encode the auto-mpg dataset for regression.  Remember that:

* Input has both numeric and categorical features
* Input has missing values

This example uses some of the functions defined below, the "helpful functions".  You've seen many of these helper functions before and they allow you to work with your data for better application of AI algorithms, such as neural networks. Consider the following:

* Predictors/Inputs
    * Fill any missing inputs with the median for that column.  Use **missing_median**.
    * Encode textual/categorical values with **encode_text_dummy**.
    * Encode numeric values with **encode_numeric_zscore** (we'll also see scaling of the data).
* Output
    * Discard rows with missing outputs.
    * Encode textual/categorical values with **encode_text_index** (which in turn uses sklearn's LabelEncoder). Or use keras to_categorical method.
    * Do not encode output numeric values.
* Produce final feature vectors (X) and expected output (y) with **to_xy**.

To encode categorical values that are part of the feature vector, use the functions from below. If the categorical value is the target (as was the case with Iris) use the same technique as Iris. The iris technique allows you to decode back to Iris text strings from the predictions.

In [None]:
import base64
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
from sklearn import preprocessing

# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df, name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = f"{name}-{x}"
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)

# Encode text values to a single dummy variable.  The new columns (which do not replace the old) will have a 1
# at every location where the original column (name) matches each of the target_values.  One column is added for
# each target value.
def encode_text_single_dummy(df, name, target_values):
    for tv in target_values:
        l = list(df[name].astype(str))
        l = [1 if str(x) == str(tv) else 0 for x in l]
        name2 = f"{name}-{tv}"
        df[name2] = l

# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df, name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_

# Encode a numeric column as zscores
def encode_numeric_zscore(df, name, mean=None, sd=None):
    if mean is None:
        mean = df[name].mean()
    if sd is None:
        sd = df[name].std()
    df[name] = (df[name] - mean) / sd

# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)

# Convert all missing values in the specified column to the default
def missing_default(df, name, default_value):
    df[name] = df[name].fillna(default_value)

# Convert a Pandas dataframe to the X,y inputs that Keras needs
def to_xy(df, target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)
    # find out the type of the target column.  Is it really this hard? :(
    target_type = df[target].dtypes
    target_type = target_type[0] if hasattr(
        target_type, '__iter__') else target_type
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        dummies = pd.get_dummies(df[target])
        return df[result].values.astype(np.float32), dummies.values.astype(np.float32)
    # Regression
    return df[result].values.astype(np.float32), df[[target]].values.astype(np.float32)

# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return f"{h}:{m:>02}:{s:>05.2f}"

# Regression chart.
def chart_regression(pred, y, sort=True):
    t = pd.DataFrame({'pred': pred, 'y': y.flatten()})
    if sort:
        t.sort_values(by=['y'], inplace=True)
    plt.plot(t['y'].tolist(), label='expected')
    plt.plot(t['pred'].tolist(), label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

# Remove all rows where the specified column is +/- sd standard deviations
def remove_outliers(df, name, sd):
    drop_rows = df.index[(np.abs(df[name] - df[name].mean())
                          >= (sd * df[name].std()))]
    df.drop(drop_rows, axis=0, inplace=True)

# Encode a column to a range between normalized_low and normalized_high.
def encode_numeric_range(df, name, normalized_low=-1, normalized_high=1,
                         data_low=None, data_high=None):
    if data_low is None:
        data_low = min(df[name])
        data_high = max(df[name])
    df[name] = ((df[name] - data_low) / (data_high - data_low)) \
        * (normalized_high - normalized_low) + normalized_low

Now build a neural network model for auto-mpg.  Below are several iteration of this, both trying to (informally) tune the network and manipulate the data.  The first attempt at building a neural network to predict MPG from the rest of the data is straightforward with a two layer (or single hidden layer) network.

**Sequential** is the standard feedforward network library from Keras, with **Dense** being used for layers (that is, each output from one layer is an input to each input in the next layer).

The code reads the csv file (located in the folder described by path), drops the name column, and fills in the missing values from horsepower, and defines X and y (I've included two ways of doing this).

The model has a single hidden layer, defined to have 4 units, each with inputs matching the shape of the features, and a sigmoid activation function.

The output layer has a single unit (since the problem is regression),  will take input from the output of each unit in the hidden layer, and no activation function is supplied (hence a linear combination of the outputs is computed, which is what is wanted for regression).

The error function, the loss, is then defined to be mean squared error (as is standard for regression) and learning is using the adam tactic (usually your first choice).

Finally, the model is fitted to the X,y data, with verbose flag indicating the output of training steps, and epochs indicates the number of epoch to run training for (here, 10).

### Controling the amount of output

Use the verbose flag.  You can eliminate this output by setting the verbose setting of the fit command:

* **verbose=0** - No progress output (use with Juputer if you do not want output)
* **verbose=1** - Display progress bar, does not always work well with Jupyter
* **verbose=2** - Summary progress output (use with Jupyter if you want to know the loss at each epoch)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics
from sklearn.model_selection import train_test_split
path = "../ex1/" #folder where the code is

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

cars = df['name']
df.drop('name',1,inplace=True)
missing_median(df, 'horsepower') #method call to missing_median above
#X,y using the diagnositic helper method above
X,y = to_xy(df,"mpg")

#X,y directly reading the data out as numpy arrays
#y = df['mpg'].to_numpy()
#X = df.drop(columns=['mpg']).to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

model = Sequential()
model.add(Dense(4, input_shape=X[1].shape, activation='sigmoid')) # Hidden 1
model.add(Dense(1)) # Output
model.summary() #note, only works if input shape specified, or Input layer given

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 4)                 32        
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 5         
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________


Now train the model for 10 epochs.

In [None]:
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train,y_train,verbose=2,epochs=10)
model.summary()

Epoch 1/10
10/10 - 0s - loss: 582.3499
Epoch 2/10
10/10 - 0s - loss: 581.4282
Epoch 3/10
10/10 - 0s - loss: 580.5265
Epoch 4/10
10/10 - 0s - loss: 579.6072
Epoch 5/10
10/10 - 0s - loss: 578.7035
Epoch 6/10
10/10 - 0s - loss: 577.7889
Epoch 7/10
10/10 - 0s - loss: 576.8870
Epoch 8/10
10/10 - 0s - loss: 575.9738
Epoch 9/10
10/10 - 0s - loss: 575.0804
Epoch 10/10
10/10 - 0s - loss: 574.1845
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 4)                 32        
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 5         
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________


The error is rather high, and this isn't a good result.  There are a number of things that we could change:
- increase the number of epochs (the loss is dropping, so this could be a good choice)
- change the number of units in the hidden layer (maybe there's not enough scope to tune the network to the data)
- change the activition function
- add more layers
- possibly use a different optimizer

First let's try more epochs, with 200.

In [None]:
model = Sequential()
model.add(Dense(4, input_dim=X.shape[1], activation='sigmoid')) # Hidden 1
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train,y_train,verbose=2,epochs=200)

Epoch 1/200
10/10 - 0s - loss: 573.5871
Epoch 2/200
10/10 - 0s - loss: 572.2319
Epoch 3/200
10/10 - 0s - loss: 570.8829
Epoch 4/200
10/10 - 0s - loss: 569.5245
Epoch 5/200
10/10 - 0s - loss: 568.1859
Epoch 6/200
10/10 - 0s - loss: 566.8389
Epoch 7/200
10/10 - 0s - loss: 565.4934
Epoch 8/200
10/10 - 0s - loss: 564.1403
Epoch 9/200
10/10 - 0s - loss: 562.8221
Epoch 10/200
10/10 - 0s - loss: 561.4672
Epoch 11/200
10/10 - 0s - loss: 560.1289
Epoch 12/200
10/10 - 0s - loss: 558.7985
Epoch 13/200
10/10 - 0s - loss: 557.4666
Epoch 14/200
10/10 - 0s - loss: 556.1339
Epoch 15/200
10/10 - 0s - loss: 554.8104
Epoch 16/200
10/10 - 0s - loss: 553.4996
Epoch 17/200
10/10 - 0s - loss: 552.1866
Epoch 18/200
10/10 - 0s - loss: 550.8715
Epoch 19/200
10/10 - 0s - loss: 549.5601
Epoch 20/200
10/10 - 0s - loss: 548.2344
Epoch 21/200
10/10 - 0s - loss: 546.9258
Epoch 22/200
10/10 - 0s - loss: 545.6234
Epoch 23/200
10/10 - 0s - loss: 544.3242
Epoch 24/200
10/10 - 0s - loss: 543.0267
Epoch 25/200
10/10 - 0s -

Epoch 199/200
10/10 - 0s - loss: 351.0349
Epoch 200/200
10/10 - 0s - loss: 350.0991


<keras.callbacks.History at 0x7fa6affb0490>

The results appears to be somewhat better, but not good.  We could add more epochs, but perhaps that isn't the problem.

So next, let's try more units in the hidden layer.

In [None]:
model = Sequential()
model.add(Dense(32, input_dim=X.shape[1], activation='sigmoid')) # Hidden 1
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train,y_train,verbose=2,epochs=200)

Epoch 1/200
10/10 - 0s - loss: 620.2162
Epoch 2/200
10/10 - 0s - loss: 605.2075
Epoch 3/200
10/10 - 0s - loss: 594.4656
Epoch 4/200
10/10 - 0s - loss: 585.0262
Epoch 5/200
10/10 - 0s - loss: 576.0746
Epoch 6/200
10/10 - 0s - loss: 567.2525
Epoch 7/200
10/10 - 0s - loss: 558.4871
Epoch 8/200
10/10 - 0s - loss: 549.8723
Epoch 9/200
10/10 - 0s - loss: 541.2425
Epoch 10/200
10/10 - 0s - loss: 532.8395
Epoch 11/200
10/10 - 0s - loss: 524.3641
Epoch 12/200
10/10 - 0s - loss: 516.1589
Epoch 13/200
10/10 - 0s - loss: 508.0172
Epoch 14/200
10/10 - 0s - loss: 499.9906
Epoch 15/200
10/10 - 0s - loss: 492.1070
Epoch 16/200
10/10 - 0s - loss: 484.3630
Epoch 17/200
10/10 - 0s - loss: 476.6146
Epoch 18/200
10/10 - 0s - loss: 469.1113
Epoch 19/200
10/10 - 0s - loss: 461.5978
Epoch 20/200
10/10 - 0s - loss: 454.2456
Epoch 21/200
10/10 - 0s - loss: 446.9401
Epoch 22/200
10/10 - 0s - loss: 439.8298
Epoch 23/200
10/10 - 0s - loss: 432.6684
Epoch 24/200
10/10 - 0s - loss: 425.7211
Epoch 25/200
10/10 - 0s -

<keras.callbacks.History at 0x7fa6afa97fd0>

Again, this is an improvement.  We could try a lot more units.

In [None]:
model = Sequential()
model.add(Dense(1024, input_dim=X.shape[1], activation='sigmoid')) # Hidden 1
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X,y,verbose=2,epochs=200)

Epoch 1/200
13/13 - 0s - loss: 437.0599
Epoch 2/200
13/13 - 0s - loss: 194.3474
Epoch 3/200
13/13 - 0s - loss: 95.4476
Epoch 4/200
13/13 - 0s - loss: 65.9695
Epoch 5/200
13/13 - 0s - loss: 60.5927
Epoch 6/200
13/13 - 0s - loss: 61.0416
Epoch 7/200
13/13 - 0s - loss: 60.6898
Epoch 8/200
13/13 - 0s - loss: 60.4298
Epoch 9/200
13/13 - 0s - loss: 60.3439
Epoch 10/200
13/13 - 0s - loss: 60.3075
Epoch 11/200
13/13 - 0s - loss: 60.1167
Epoch 12/200
13/13 - 0s - loss: 60.0193
Epoch 13/200
13/13 - 0s - loss: 59.8972
Epoch 14/200
13/13 - 0s - loss: 59.7999
Epoch 15/200
13/13 - 0s - loss: 59.7262
Epoch 16/200
13/13 - 0s - loss: 59.6829
Epoch 17/200
13/13 - 0s - loss: 59.5031
Epoch 18/200
13/13 - 0s - loss: 59.4825
Epoch 19/200
13/13 - 0s - loss: 59.4301
Epoch 20/200
13/13 - 0s - loss: 59.3156
Epoch 21/200
13/13 - 0s - loss: 58.7917
Epoch 22/200
13/13 - 0s - loss: 58.5312
Epoch 23/200
13/13 - 0s - loss: 58.2196
Epoch 24/200
13/13 - 0s - loss: 58.0219
Epoch 25/200
13/13 - 0s - loss: 57.7623
Epoch 2

<keras.callbacks.History at 0x7fa6afa9b7c0>

This gives another improvement, but there is the danger of overfitting.  Let's experiment with  a different activation function.  Let's try rectified linear units.

In [None]:
model = Sequential()
model.add(Dense(1024, input_dim=X.shape[1], activation='relu')) # Hidden 1
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train,y_train,verbose=2,epochs=200)

Epoch 1/200
10/10 - 0s - loss: 5720.9863
Epoch 2/200
10/10 - 0s - loss: 1754.6990
Epoch 3/200
10/10 - 0s - loss: 935.4927
Epoch 4/200
10/10 - 0s - loss: 438.2071
Epoch 5/200
10/10 - 0s - loss: 413.0261
Epoch 6/200
10/10 - 0s - loss: 297.6965
Epoch 7/200
10/10 - 0s - loss: 158.1299
Epoch 8/200
10/10 - 0s - loss: 105.8655
Epoch 9/200
10/10 - 0s - loss: 88.8117
Epoch 10/200
10/10 - 0s - loss: 75.9348
Epoch 11/200
10/10 - 0s - loss: 66.6568
Epoch 12/200
10/10 - 0s - loss: 65.9367
Epoch 13/200
10/10 - 0s - loss: 54.2123
Epoch 14/200
10/10 - 0s - loss: 49.2195
Epoch 15/200
10/10 - 0s - loss: 44.7212
Epoch 16/200
10/10 - 0s - loss: 41.9780
Epoch 17/200
10/10 - 0s - loss: 39.4044
Epoch 18/200
10/10 - 0s - loss: 38.2757
Epoch 19/200
10/10 - 0s - loss: 36.0393
Epoch 20/200
10/10 - 0s - loss: 35.9760
Epoch 21/200
10/10 - 0s - loss: 37.1670
Epoch 22/200
10/10 - 0s - loss: 30.1459
Epoch 23/200
10/10 - 0s - loss: 37.4788
Epoch 24/200
10/10 - 0s - loss: 32.1469
Epoch 25/200
10/10 - 0s - loss: 29.3595

<keras.callbacks.History at 0x7fa6afa78520>

This seems to be giving a much better answer, but notice that loss is jumping about.  Given that this is occurring early, it suggest the number of hidden units is too high.  So let's bring this number down.

In [None]:
model = Sequential()
model.add(Dense(64, input_dim=X.shape[1], activation='relu')) # Hidden 1
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train,y_train,verbose=2,epochs=200)

Epoch 1/200
10/10 - 0s - loss: 16148.8477 - 219ms/epoch - 22ms/step
Epoch 2/200
10/10 - 0s - loss: 5442.0347 - 9ms/epoch - 932us/step
Epoch 3/200
10/10 - 0s - loss: 1148.0663 - 15ms/epoch - 2ms/step
Epoch 4/200
10/10 - 0s - loss: 511.9598 - 34ms/epoch - 3ms/step
Epoch 5/200
10/10 - 0s - loss: 427.9882 - 16ms/epoch - 2ms/step
Epoch 6/200
10/10 - 0s - loss: 278.0034 - 9ms/epoch - 868us/step
Epoch 7/200
10/10 - 0s - loss: 222.4591 - 20ms/epoch - 2ms/step
Epoch 8/200
10/10 - 0s - loss: 212.5614 - 17ms/epoch - 2ms/step
Epoch 9/200
10/10 - 0s - loss: 207.6781 - 9ms/epoch - 879us/step
Epoch 10/200
10/10 - 0s - loss: 201.7340 - 19ms/epoch - 2ms/step
Epoch 11/200
10/10 - 0s - loss: 196.1636 - 22ms/epoch - 2ms/step
Epoch 12/200
10/10 - 0s - loss: 191.0546 - 15ms/epoch - 1ms/step
Epoch 13/200
10/10 - 0s - loss: 191.1834 - 14ms/epoch - 1ms/step
Epoch 14/200
10/10 - 0s - loss: 186.6448 - 21ms/epoch - 2ms/step
Epoch 15/200
10/10 - 0s - loss: 181.8088 - 25ms/epoch - 3ms/step
Epoch 16/200
10/10 - 0s -

Epoch 128/200
10/10 - 0s - loss: 24.1571 - 18ms/epoch - 2ms/step
Epoch 129/200
10/10 - 0s - loss: 23.7788 - 10ms/epoch - 976us/step
Epoch 130/200
10/10 - 0s - loss: 22.6007 - 13ms/epoch - 1ms/step
Epoch 131/200
10/10 - 0s - loss: 22.3085 - 9ms/epoch - 861us/step
Epoch 132/200
10/10 - 0s - loss: 23.3387 - 10ms/epoch - 1ms/step
Epoch 133/200
10/10 - 0s - loss: 22.3783 - 13ms/epoch - 1ms/step
Epoch 134/200
10/10 - 0s - loss: 24.2633 - 11ms/epoch - 1ms/step
Epoch 135/200
10/10 - 0s - loss: 24.9025 - 8ms/epoch - 802us/step
Epoch 136/200
10/10 - 0s - loss: 25.1157 - 9ms/epoch - 866us/step
Epoch 137/200
10/10 - 0s - loss: 24.6884 - 9ms/epoch - 901us/step
Epoch 138/200
10/10 - 0s - loss: 24.5831 - 10ms/epoch - 1ms/step
Epoch 139/200
10/10 - 0s - loss: 20.9741 - 9ms/epoch - 928us/step
Epoch 140/200
10/10 - 0s - loss: 19.4780 - 10ms/epoch - 981us/step
Epoch 141/200
10/10 - 0s - loss: 22.3112 - 13ms/epoch - 1ms/step
Epoch 142/200
10/10 - 0s - loss: 20.0635 - 12ms/epoch - 1ms/step
Epoch 143/200
10

<keras.callbacks.History at 0x7f96a35b6ac0>

Here the output looks stable, and this might be a good answer (it's comparable to the linear regression answer we computed in Ex3).  You can experiment some more (note that you won't always get the same answer owing to the randomised initial weight values).

### Regression prediction

Next we will perform actual predictions.  These predictions are assigned to the **pred** variable. These are all MPG predictions from the neural network.  Notice that this is a 2D array?  You can always see the dimensions of what is returned by printing out **pred.shape**.  Neural networks can return multiple values, so the result is always an array.  Here the neural network only returns 1 value per prediction (there are 398 cars, so 398 predictions).  However, a 2D array is needed because the neural network has the potential of returning more than one value.

In [None]:
pred = model.predict(X_test)
print("Shape: {}".format(pred.shape))
print(pred[:10])

Shape: (100, 1)
[[ 9.716917 ]
 [22.38632  ]
 [ 8.3450775]
 [19.68946  ]
 [14.825657 ]
 [30.287333 ]
 [31.809189 ]
 [21.804083 ]
 [11.960735 ]
 [22.065765 ]]


We would like to see how good these predictions are.  We know what the correct MPG is for each car, so we can measure how close the neural network was.

In [None]:
# Measure RMSE error.  RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 4.723357200622559


This means that, on average the predictions were within +/- 4.7 values of the correct value.  This is not great, but we will soon see how to improve it.

We can also print out the first 10 cars, with predictions and actual MPG.

In [None]:
# Sample predictions
for i in range(10):
    print(f"{i+1}. Car name: {cars[i]}, MPG: {y[i]}, predicted MPG: {pred[i]}")

1. Car name: chevrolet chevelle malibu, MPG: [18.], predicted MPG: [9.716917]
2. Car name: buick skylark 320, MPG: [15.], predicted MPG: [22.38632]
3. Car name: plymouth satellite, MPG: [18.], predicted MPG: [8.3450775]
4. Car name: amc rebel sst, MPG: [16.], predicted MPG: [19.68946]
5. Car name: ford torino, MPG: [17.], predicted MPG: [14.825657]
6. Car name: ford galaxie 500, MPG: [15.], predicted MPG: [30.287333]
7. Car name: chevrolet impala, MPG: [14.], predicted MPG: [31.809189]
8. Car name: plymouth fury iii, MPG: [14.], predicted MPG: [21.804083]
9. Car name: pontiac catalina, MPG: [14.], predicted MPG: [11.960735]
10. Car name: amc ambassador dpl, MPG: [15.], predicted MPG: [22.065765]


At this point we might consider manipulating the data.  Here, outliers, datapoint lying more than 2 standard deviations from the mean, are removed.  Then the sklearn **standard scaler** is used.

Scaling is important with neural networks, which work particularly well when the data is normally distributed.  The standard scaler transforms the data so that it is normally distributed, and should allow the network to fit the data better.  Remember to transform your testing data too!

In [None]:
print("Length before MPG outliers dropped: {}".format(len(df)))
remove_outliers(df,'mpg',2) #method call to method defined above
print("Length after MPG outliers dropped: {}".format(len(df)))

X,y = to_xy(df,"mpg")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(X_train)
X_train= sc.transform(X_train)
X_test = sc.transform(X_test)

Length before MPG outliers dropped: 398
Length after MPG outliers dropped: 388


Finally, we will add a second hidden layer to the network.

In [None]:
model = Sequential()
model.add(Dense(64, input_dim=X.shape[1], activation='relu')) # Hidden 1
model.add(Dense(64,activation='relu')) #Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train,y_train,verbose=2,epochs=250)

#With test data
pred = model.predict(X_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"Final score (RMSE): {score}")

Epoch 1/250
10/10 - 0s - loss: 570.8696
Epoch 2/250
10/10 - 0s - loss: 534.3590
Epoch 3/250
10/10 - 0s - loss: 494.1058
Epoch 4/250
10/10 - 0s - loss: 444.2019
Epoch 5/250
10/10 - 0s - loss: 380.2871
Epoch 6/250
10/10 - 0s - loss: 308.1571
Epoch 7/250
10/10 - 0s - loss: 230.4095
Epoch 8/250
10/10 - 0s - loss: 154.8958
Epoch 9/250
10/10 - 0s - loss: 96.0061
Epoch 10/250
10/10 - 0s - loss: 58.9855
Epoch 11/250
10/10 - 0s - loss: 44.1212
Epoch 12/250
10/10 - 0s - loss: 38.8894
Epoch 13/250
10/10 - 0s - loss: 34.7113
Epoch 14/250
10/10 - 0s - loss: 30.3636
Epoch 15/250
10/10 - 0s - loss: 26.7741
Epoch 16/250
10/10 - 0s - loss: 24.0604
Epoch 17/250
10/10 - 0s - loss: 21.7730
Epoch 18/250
10/10 - 0s - loss: 19.9429
Epoch 19/250
10/10 - 0s - loss: 18.4404
Epoch 20/250
10/10 - 0s - loss: 17.2270
Epoch 21/250
10/10 - 0s - loss: 16.2454
Epoch 22/250
10/10 - 0s - loss: 15.4319
Epoch 23/250
10/10 - 0s - loss: 14.6714
Epoch 24/250
10/10 - 0s - loss: 14.0600
Epoch 25/250
10/10 - 0s - loss: 13.5625
E

10/10 - 0s - loss: 4.5443
Epoch 208/250
10/10 - 0s - loss: 4.4894
Epoch 209/250
10/10 - 0s - loss: 4.4797
Epoch 210/250
10/10 - 0s - loss: 4.4595
Epoch 211/250
10/10 - 0s - loss: 4.4036
Epoch 212/250
10/10 - 0s - loss: 4.4106
Epoch 213/250
10/10 - 0s - loss: 4.3894
Epoch 214/250
10/10 - 0s - loss: 4.3949
Epoch 215/250
10/10 - 0s - loss: 4.4799
Epoch 216/250
10/10 - 0s - loss: 4.3818
Epoch 217/250
10/10 - 0s - loss: 4.4536
Epoch 218/250
10/10 - 0s - loss: 4.3345
Epoch 219/250
10/10 - 0s - loss: 4.3487
Epoch 220/250
10/10 - 0s - loss: 4.3818
Epoch 221/250
10/10 - 0s - loss: 4.3434
Epoch 222/250
10/10 - 0s - loss: 4.4374
Epoch 223/250
10/10 - 0s - loss: 4.3492
Epoch 224/250
10/10 - 0s - loss: 4.4779
Epoch 225/250
10/10 - 0s - loss: 4.3866
Epoch 226/250
10/10 - 0s - loss: 4.3478
Epoch 227/250
10/10 - 0s - loss: 4.3154
Epoch 228/250
10/10 - 0s - loss: 4.2912
Epoch 229/250
10/10 - 0s - loss: 4.2736
Epoch 230/250
10/10 - 0s - loss: 4.2598
Epoch 231/250
10/10 - 0s - loss: 4.2568
Epoch 232/250


The end result is improved RMSE score.  