## Lab 6: Evaluating Neural Networks 


#### CSC 180  Intelligent Systems (Spring 2020)

#### Dr. Haiquan Chen, California State University, Sacramento

# Helpful Functions for Tensorflow (little gems)

The following functions will be used with TensorFlow to help preprocess the data.  They allow you to build the feature vector for a neural network. 

* Predictors/Inputs 
    * Fill any missing inputs with the median for that column.  Use **missing_median**.
    * Encode textual/categorical values with **encode_text_dummy**.
    * Encode numeric values with **encode_numeric_zscore**.
* Output
    * Discard rows with missing outputs.
    * Encode textual/categorical values with **encode_text_index**.
    * Do not encode output numeric values.
* Produce final feature vectors (x) and expected output (y) with **to_xy**.

In [1]:
from collections.abc import Sequence
from sklearn import preprocessing
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import shutil
import os


# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df, name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name, x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)


# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df, name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_


# Encode a numeric column as zscores
def encode_numeric_zscore(df, name, mean=None, sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name] - mean) / sd


# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)


# Convert all missing values in the specified column to the default
def missing_default(df, name, default_value):
    df[name] = df[name].fillna(default_value)


# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df, target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)
    # find out the type of the target column. 
    target_type = df[target].dtypes
    target_type = target_type[0] if isinstance(target_type, Sequence) else target_type
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        dummies = pd.get_dummies(df[target])
        return df[result].values.astype(np.float32), dummies.values.astype(np.float32)
    else:
        # Regression
        return df[result].values.astype(np.float32), df[target].values.astype(np.float32)

# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)


# Regression chart.
def chart_regression(pred,y,sort=True):
    t = pd.DataFrame({'pred' : pred, 'y' : y.flatten()})
    if sort:
        t.sort_values(by=['y'],inplace=True)
    a = plt.plot(t['y'].tolist(),label='expected')
    b = plt.plot(t['pred'].tolist(),label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

# Remove all rows where the specified column is +/- sd standard deviations
def remove_outliers(df, name, sd):
    drop_rows = df.index[(np.abs(df[name] - df[name].mean()) >= (sd * df[name].std()))]
    df.drop(drop_rows, axis=0, inplace=True)


# Encode a column to a range between normalized_low and normalized_high.
def encode_numeric_range(df, name, normalized_low=-1, normalized_high=1,
                         data_low=None, data_high=None):
    if data_low is None:
        data_low = min(df[name])
        data_high = max(df[name])

    df[name] = ((df[name] - data_low) / (data_high - data_low)) \
               * (normalized_high - normalized_low) + normalized_low


# Training with a Test Set with Early Stopping

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize.  

![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")


### Split data into training and test using train_test_split

In [2]:
import pandas as pd
import numpy as np
import os

from sklearn.model_selection import train_test_split

from sklearn import metrics

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

path = "./data/"
    
filename = os.path.join(path,"iris.csv")    
df = pd.read_csv(filename,na_values=['NA','?'])

species = encode_text_index(df,"species")

x,y = to_xy(df,"species")

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)

model = Sequential()

model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(5,activation='relu'))
model.add(Dense(y.shape[1],activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=2, verbose=2, mode='auto')  

# patience: number of epochs with no improvement after which training will be stopped

# The test set is checked during training to monitor progress for early stopping but is never used for gradient descent (model training)

model.fit(x_train, y_train, validation_data=(x_test,y_test), callbacks=[monitor], verbose=2, epochs=1000)  


W0121 15:06:00.807268 15076 deprecation.py:506] From C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Train on 112 samples, validate on 38 samples
Epoch 1/1000
112/112 - 0s - loss: 2.4280 - val_loss: 2.5256
Epoch 2/1000
112/112 - 0s - loss: 2.2692 - val_loss: 2.3615
Epoch 3/1000
112/112 - 0s - loss: 2.1156 - val_loss: 2.2044
Epoch 4/1000
112/112 - 0s - loss: 1.9733 - val_loss: 2.0496
Epoch 5/1000
112/112 - 0s - loss: 1.8362 - val_loss: 1.9011
Epoch 6/1000
112/112 - 0s - loss: 1.7087 - val_loss: 1.7627
Epoch 7/1000
112/112 - 0s - loss: 1.5876 - val_loss: 1.6322
Epoch 8/1000
112/112 - 0s - loss: 1.4711 - val_loss: 1.5160
Epoch 9/1000
112/112 - 0s - loss: 1.3757 - val_loss: 1.4115
Epoch 10/1000
112/112 - 0s - loss: 1.2918 - val_loss: 1.3218
Epoch 11/1000
112/112 - 0s - loss: 1.2199 - val_loss: 1.2465
Epoch 12/1000
112/112 - 0s - loss: 1.1574 - val_loss: 1.1843
Epoch 13/1000
112/112 - 0s - loss: 1.1041 - val_loss: 1.1311
Epoch 14/1000
112/112 - 0s - loss: 1.0623 - val_loss: 1.0837
Epoch 15/1000
112/112 - 0s - loss: 1.0180 - val_loss: 1.0440
Epoch 16/1000
112/112 - 0s - loss: 0.9855 - val_l

112/112 - 0s - loss: 0.2715 - val_loss: 0.2341
Epoch 135/1000
112/112 - 0s - loss: 0.2694 - val_loss: 0.2323
Epoch 136/1000
112/112 - 0s - loss: 0.2684 - val_loss: 0.2307
Epoch 137/1000
112/112 - 0s - loss: 0.2659 - val_loss: 0.2290
Epoch 138/1000
112/112 - 0s - loss: 0.2650 - val_loss: 0.2275
Epoch 139/1000
112/112 - 0s - loss: 0.2617 - val_loss: 0.2257
Epoch 140/1000
112/112 - 0s - loss: 0.2599 - val_loss: 0.2241
Epoch 141/1000
112/112 - 0s - loss: 0.2585 - val_loss: 0.2226
Epoch 142/1000
112/112 - 0s - loss: 0.2562 - val_loss: 0.2209
Epoch 143/1000
112/112 - 0s - loss: 0.2544 - val_loss: 0.2195
Epoch 144/1000
112/112 - 0s - loss: 0.2528 - val_loss: 0.2187
Epoch 145/1000
112/112 - 0s - loss: 0.2515 - val_loss: 0.2176
Epoch 146/1000
112/112 - 0s - loss: 0.2504 - val_loss: 0.2156
Epoch 147/1000
112/112 - 0s - loss: 0.2482 - val_loss: 0.2138
Epoch 148/1000
112/112 - 0s - loss: 0.2458 - val_loss: 0.2121
Epoch 149/1000
112/112 - 0s - loss: 0.2439 - val_loss: 0.2105
Epoch 150/1000
112/112 

<tensorflow.python.keras.callbacks.History at 0x218a75a6bc8>

Now that the neural network is trained, we can make predictions about the test set.  The following code predicts the type of iris for test set and displays the first five irises. 

In [3]:
pred = model.predict(x_test)
print(pred[0:5]) # print first five predictions

[[1.5738672e-03 9.0597928e-01 9.2446834e-02]
 [9.9893099e-01 1.0689563e-03 1.3976594e-08]
 [2.2971378e-09 2.9503466e-03 9.9704957e-01]
 [7.3679601e-04 8.1273001e-01 1.8653323e-01]
 [9.9398976e-04 8.5114437e-01 1.4786167e-01]]


Each line provides the probability that the iris is one of the 3 types of iris in the data set. 

### Saving Best Weights

It would be good idea to keep track of the most optimal weights during the entire training operation.  


An additional monitor, ModelCheckpoint,  is used and saves a copy of the neural network to **best_weights.hdf5** each time the validation score of the neural network improves.  

Once training is done, we just reload this file and we have the optimal training weights that were found.

In [5]:
import pandas as pd
import io
import requests
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn import metrics
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint

path = "./data/"
    
filename = os.path.join(path,"iris.csv")    
df = pd.read_csv(filename,na_values=['NA','?'])

species = encode_text_index(df,"species")
x,y = to_xy(df,"species")

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)

model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(5,activation='relu'))
model.add(Dense(y.shape[1],activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')

checkpointer = ModelCheckpoint(filepath="dnn/best_weights.hdf5", verbose=0, save_best_only=True) # save best model

model.fit(x_train, y_train,validation_data=(x_test,y_test),callbacks=[monitor,checkpointer],verbose=2,epochs=1000)

model.load_weights('dnn/best_weights.hdf5') # load weights from best model

Train on 112 samples, validate on 38 samples
Epoch 1/1000
112/112 - 0s - loss: 1.5571 - val_loss: 1.4051
Epoch 2/1000
112/112 - 0s - loss: 1.4703 - val_loss: 1.3401
Epoch 3/1000
112/112 - 0s - loss: 1.3884 - val_loss: 1.2813
Epoch 4/1000
112/112 - 0s - loss: 1.3201 - val_loss: 1.2291
Epoch 5/1000
112/112 - 0s - loss: 1.2586 - val_loss: 1.1841
Epoch 6/1000
112/112 - 0s - loss: 1.2055 - val_loss: 1.1473
Epoch 7/1000
112/112 - 0s - loss: 1.1584 - val_loss: 1.1191
Epoch 8/1000
112/112 - 0s - loss: 1.1268 - val_loss: 1.0975
Epoch 9/1000
112/112 - 0s - loss: 1.0942 - val_loss: 1.0816
Epoch 10/1000
112/112 - 0s - loss: 1.0728 - val_loss: 1.0680
Epoch 11/1000
112/112 - 0s - loss: 1.0523 - val_loss: 1.0560
Epoch 12/1000
112/112 - 0s - loss: 1.0397 - val_loss: 1.0460
Epoch 13/1000
112/112 - 0s - loss: 1.0233 - val_loss: 1.0387
Epoch 14/1000
112/112 - 0s - loss: 1.0128 - val_loss: 1.0329
Epoch 15/1000
112/112 - 0s - loss: 1.0072 - val_loss: 1.0288
Epoch 16/1000
112/112 - 0s - loss: 1.0021 - val_l

Epoch 135/1000
112/112 - 0s - loss: 0.4840 - val_loss: 0.4886
Epoch 136/1000
112/112 - 0s - loss: 0.4818 - val_loss: 0.4867
Epoch 137/1000
112/112 - 0s - loss: 0.4797 - val_loss: 0.4838
Epoch 138/1000
112/112 - 0s - loss: 0.4771 - val_loss: 0.4805
Epoch 139/1000
112/112 - 0s - loss: 0.4739 - val_loss: 0.4777
Epoch 140/1000
112/112 - 0s - loss: 0.4710 - val_loss: 0.4749
Epoch 141/1000
112/112 - 0s - loss: 0.4684 - val_loss: 0.4722
Epoch 142/1000
112/112 - 0s - loss: 0.4660 - val_loss: 0.4696
Epoch 143/1000
112/112 - 0s - loss: 0.4637 - val_loss: 0.4671
Epoch 144/1000
112/112 - 0s - loss: 0.4607 - val_loss: 0.4646
Epoch 145/1000
112/112 - 0s - loss: 0.4582 - val_loss: 0.4620
Epoch 146/1000
112/112 - 0s - loss: 0.4568 - val_loss: 0.4592
Epoch 147/1000
112/112 - 0s - loss: 0.4539 - val_loss: 0.4567
Epoch 148/1000
112/112 - 0s - loss: 0.4504 - val_loss: 0.4542
Epoch 149/1000
112/112 - 0s - loss: 0.4479 - val_loss: 0.4517
Epoch 150/1000
112/112 - 0s - loss: 0.4453 - val_loss: 0.4490
Epoch 15

112/112 - 0s - loss: 0.2387 - val_loss: 0.2400
Epoch 268/1000
112/112 - 0s - loss: 0.2386 - val_loss: 0.2388
Epoch 269/1000
112/112 - 0s - loss: 0.2391 - val_loss: 0.2380
Epoch 270/1000
112/112 - 0s - loss: 0.2383 - val_loss: 0.2366
Epoch 271/1000
112/112 - 0s - loss: 0.2354 - val_loss: 0.2356
Epoch 272/1000
112/112 - 0s - loss: 0.2342 - val_loss: 0.2353
Epoch 273/1000
112/112 - 0s - loss: 0.2329 - val_loss: 0.2343
Epoch 274/1000
112/112 - 0s - loss: 0.2320 - val_loss: 0.2334
Epoch 275/1000
112/112 - 0s - loss: 0.2311 - val_loss: 0.2324
Epoch 276/1000
112/112 - 0s - loss: 0.2304 - val_loss: 0.2321
Epoch 277/1000
112/112 - 0s - loss: 0.2295 - val_loss: 0.2298
Epoch 278/1000
112/112 - 0s - loss: 0.2278 - val_loss: 0.2283
Epoch 279/1000
112/112 - 0s - loss: 0.2272 - val_loss: 0.2272
Epoch 280/1000
112/112 - 0s - loss: 0.2263 - val_loss: 0.2263
Epoch 281/1000
112/112 - 0s - loss: 0.2259 - val_loss: 0.2252
Epoch 282/1000
112/112 - 0s - loss: 0.2237 - val_loss: 0.2253
Epoch 283/1000
112/112 

### Potential Keras Issue on Small Networks Regarding Saving Optimal Weights

You might occasionally see this error:

```
OSError: Unable to create file (Unable to open file: name = 'dnn/best_weights.hdf5', errno = 22, error message = 'invalid argument', flags = 13, o_flags = 302)
```

Usually you can just run rerun the code and it goes away.  This is an unfortnuate result of saving a file each time the validation score improves (as described in the previous section).  If the errors improve two rapidly, you might try to save the file twice and get an error from these two saves overlapping.  For larger neural networks this will not be a problem because each training step will take longer, allowing for plenty of time for the previous save to complete.   

## Evaluating Classification Models

### (1) Calculate Classification Accuracy/Precision/Recall/F1-Score

By default, Keras will return the predicted probability for each class. We can change these prediction probabilities into the actual iris predicted with **argmax**.

In [6]:
pred = model.predict(x_test)
pred

array([[5.93001314e-04, 9.41290557e-01, 5.81164062e-02],
       [7.65866816e-01, 1.41933456e-01, 9.21997279e-02],
       [1.88036154e-07, 2.07912788e-04, 9.99791920e-01],
       [1.10954570e-03, 8.49085510e-01, 1.49804980e-01],
       [3.92104965e-04, 9.41108584e-01, 5.84992990e-02],
       [7.65866816e-01, 1.41933456e-01, 9.21997279e-02],
       [5.25685120e-03, 9.48375702e-01, 4.63674329e-02],
       [1.45627346e-04, 7.20251799e-02, 9.27829206e-01],
       [9.16744466e-04, 4.99800533e-01, 4.99282688e-01],
       [2.18676706e-03, 9.64245379e-01, 3.35678160e-02],
       [3.29791103e-04, 2.12707847e-01, 7.86962390e-01],
       [7.65866816e-01, 1.41933456e-01, 9.21997279e-02],
       [7.65866816e-01, 1.41933456e-01, 9.21997279e-02],
       [7.65866816e-01, 1.41933456e-01, 9.21997279e-02],
       [7.65866816e-01, 1.41933456e-01, 9.21997279e-02],
       [7.23839039e-04, 9.12505567e-01, 8.67706016e-02],
       [1.01675541e-05, 3.94493947e-03, 9.96044815e-01],
       [1.93857832e-03, 9.66003

In [7]:
pred = np.argmax(pred,axis=1) # raw probabilities to choose class (highest probability)
print(pred)

[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0]


Now that we have the actual iris flower predicted, we can calculate the percent accuracy (how many were correctly classified).

In [8]:
y_true= np.argmax(y_test,axis=1) 

score = metrics.accuracy_score(y_true, pred)

print("Accuracy score: {}".format(score))

Accuracy score: 1.0


In [9]:
score = metrics.precision_score(y_true, pred, average= "weighted")
print("Precision score: {}".format(score))

Precision score: 1.0


In [10]:
score = metrics.recall_score(y_true, pred, average= "weighted")
print("Recall score: {}".format(score))

Recall score: 1.0


In [11]:
score = metrics.f1_score(y_true, pred, average= "weighted")
print("F1 score: {}".format(score))

F1 score: 1.0


In [12]:
print(metrics.classification_report(y_true, pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38



### (2) Calculate Classification Cross-Entropy Loss (Log Loss)  

Log loss is an error metric that is often used in place of accuracy for classification.  

Log loss allows for "partial credit". For example, a model might be used to classify A, B and C.  The correct answer might be A, however if the classification network chose B as having the highest probability, then accuracy gives the neural network no credit for this classification.  

However, with log loss, the probability of the correct answer is added to the score.  For example, the correct answer might be A, but if the neural network only predicted .4 probability of A being correct, then the value -log(.4) is added.

$$ logloss = -\frac{1}{N}\sum^N_{i=1}\sum^M_{j=1}y_{ij} \log(\hat{y}_{ij}) $$

The following code shows the logloss scores that correspond to the average probablity for the correct item. The **pred** column specifies the average robability for the correct class.  The **logloss** column specifies the log loss for that probability.


Calculating log loss

In [13]:
# Generate predictions
pred = model.predict(x_test)

print("Numpy array of predictions")
print(pred[0:5])
print()
print("y_test:")
print(y_test[0:5])

score = metrics.log_loss(y_test, pred)
print("Log loss score: {}".format(score))

Numpy array of predictions
[[5.9300131e-04 9.4129056e-01 5.8116406e-02]
 [7.6586682e-01 1.4193346e-01 9.2199728e-02]
 [1.8803615e-07 2.0791279e-04 9.9979192e-01]
 [1.1095457e-03 8.4908551e-01 1.4980498e-01]
 [3.9210496e-04 9.4110858e-01 5.8499299e-02]]

y_test:
[[0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]]
Log loss score: 0.17149743884434238


## Evaluating Regression Models

Regression results are evaluated differently than classification.  Consider the following code. 

In [15]:
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn import metrics

path = "./data/"

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

cars = df['name']
df.drop('name',1,inplace=True)
missing_median(df, 'horsepower')

encode_text_dummy(df, 'origin')

x,y = to_xy(df,"mpg")

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split (x, y, test_size=0.25, random_state=45)

model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(10))
model.add(Dense(10))
model.add(Dense(10))

model.add(Dense(1))  # 1 output neuron 


model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')

model.fit(x_train,y_train, validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)


Train on 298 samples, validate on 100 samples
Epoch 1/1000
298/298 - 0s - loss: 208221.3250 - val_loss: 10115.8361
Epoch 2/1000
298/298 - 0s - loss: 9003.8951 - val_loss: 24317.8006
Epoch 3/1000
298/298 - 0s - loss: 17779.9382 - val_loss: 3537.2674
Epoch 4/1000
298/298 - 0s - loss: 1308.7129 - val_loss: 2787.2982
Epoch 5/1000
298/298 - 0s - loss: 2606.8421 - val_loss: 1189.9935
Epoch 6/1000
298/298 - 0s - loss: 459.1732 - val_loss: 448.1644
Epoch 7/1000
298/298 - 0s - loss: 548.2226 - val_loss: 321.8992
Epoch 8/1000
298/298 - 0s - loss: 272.9624 - val_loss: 320.8633
Epoch 9/1000
298/298 - 0s - loss: 302.2563 - val_loss: 250.7872
Epoch 10/1000
298/298 - 0s - loss: 256.6703 - val_loss: 237.1166
Epoch 11/1000
298/298 - 0s - loss: 255.5490 - val_loss: 234.5576
Epoch 12/1000
298/298 - 0s - loss: 246.5717 - val_loss: 237.4696
Epoch 13/1000
298/298 - 0s - loss: 244.4998 - val_loss: 228.2866
Epoch 14/1000
298/298 - 0s - loss: 246.2373 - val_loss: 223.8500
Epoch 15/1000
298/298 - 0s - loss: 238

Epoch 128/1000
298/298 - 0s - loss: 79.8874 - val_loss: 65.9041
Epoch 129/1000
298/298 - 0s - loss: 75.3362 - val_loss: 57.7348
Epoch 130/1000
298/298 - 0s - loss: 72.0217 - val_loss: 56.5760
Epoch 131/1000
298/298 - 0s - loss: 71.7904 - val_loss: 59.2968
Epoch 132/1000
298/298 - 0s - loss: 70.9572 - val_loss: 55.5226
Epoch 133/1000
298/298 - 0s - loss: 71.3460 - val_loss: 55.1946
Epoch 134/1000
298/298 - 0s - loss: 71.9735 - val_loss: 55.4375
Epoch 135/1000
298/298 - 0s - loss: 73.4090 - val_loss: 54.3322
Epoch 136/1000
298/298 - 0s - loss: 71.7654 - val_loss: 55.9514
Epoch 137/1000
298/298 - 0s - loss: 72.5575 - val_loss: 63.9051
Epoch 138/1000
298/298 - 0s - loss: 72.4132 - val_loss: 52.0015
Epoch 139/1000
298/298 - 0s - loss: 66.7816 - val_loss: 53.1220
Epoch 140/1000
298/298 - 0s - loss: 68.9172 - val_loss: 56.1651
Epoch 141/1000
298/298 - 0s - loss: 66.8856 - val_loss: 50.4012
Epoch 142/1000
298/298 - 0s - loss: 65.1983 - val_loss: 49.9189
Epoch 143/1000
298/298 - 0s - loss: 64.6

<tensorflow.python.keras.callbacks.History at 0x1b7e55394e0>

### Mean Square Error

The mean square error is the sum of the squared differences between the prediction ($\hat{y}$) and the expected ($y$).  MSE values are not of a particular unit.  If an MSE value has decreased for a model, that is good. Low MSE values are desired.

$ \text{MSE} = \frac{1}{n} \sum_{i=1}^n \left(\hat{y}_i - y_i\right)^2 $


In [16]:
# Predict
pred = model.predict(x_test)

# Measure MSE error.  
score = metrics.mean_squared_error(pred,y_test)
print("Final score (MSE): {}".format(score))

Final score (MSE): 28.539138793945312


### Root Mean Square Error

The root mean square (RMSE) is essentially the square root of the MSE.  Because of this, the RMSE error is in the same units as the training data outcome. Low RMSE values are desired.

$ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left(\hat{y}_i - y_i\right)^2} $

In [17]:
# Measure RMSE error.  RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print("Final score (RMSE): {}".format(score))

Final score (RMSE): 5.342203617095947


# Performance Improvement by Normalizing Features and Tuning Hyperparameters

There are many different settings that you can use for a neural network.  These can affect performance.  The following code changes some of these, beyond their default values:

* **activation:** relu, sigmoid, tanh
* **Layers and Neuron Counts**
* **optimizer:** adam, sgd, rmsprop, and [others](https://keras.io/optimizers/)

In [3]:
import pandas as pd
import io
import requests
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn import metrics
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint

path = "./data/"
preprocess = True

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

# create feature vector
missing_median(df, 'horsepower')
encode_text_dummy(df, 'origin')
df.drop('name',1,inplace=True)

if preprocess:
    encode_numeric_zscore(df, 'horsepower')
    encode_numeric_zscore(df, 'weight')
    encode_numeric_zscore(df, 'cylinders')
    encode_numeric_zscore(df, 'displacement')
    encode_numeric_zscore(df, 'acceleration')
    encode_numeric_zscore(df, 'year')

# Encode to a 2D matrix for training
x,y = to_xy(df,'mpg')

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=42)

model = Sequential()
model.add(Dense(100, input_dim=x.shape[1], activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=2, mode='auto')

model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)


W0925 13:49:07.298168 14948 deprecation.py:506] From C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Train on 318 samples, validate on 80 samples
Epoch 1/1000
318/318 - 1s - loss: 603.6747 - val_loss: 558.8735
Epoch 2/1000
318/318 - 0s - loss: 577.1689 - val_loss: 527.4671
Epoch 3/1000
318/318 - 0s - loss: 537.3747 - val_loss: 475.0514
Epoch 4/1000
318/318 - 0s - loss: 466.6835 - val_loss: 381.1661
Epoch 5/1000
318/318 - 0s - loss: 346.2345 - val_loss: 240.4421
Epoch 6/1000
318/318 - 0s - loss: 188.0342 - val_loss: 98.8226
Epoch 7/1000
318/318 - 0s - loss: 60.7253 - val_loss: 37.6777
Epoch 8/1000
318/318 - 0s - loss: 36.9030 - val_loss: 37.2447
Epoch 9/1000
318/318 - 0s - loss: 29.0780 - val_loss: 23.2110
Epoch 10/1000
318/318 - 0s - loss: 22.6403 - val_loss: 19.5023
Epoch 11/1000
318/318 - 0s - loss: 19.6912 - val_loss: 17.0077
Epoch 12/1000
318/318 - 0s - loss: 17.4551 - val_loss: 15.2444
Epoch 13/1000
318/318 - 0s - loss: 15.9408 - val_loss: 13.1529
Epoch 14/1000
318/318 - 0s - loss: 14.6113 - val_loss: 11.6205
Epoch 15/1000
318/318 - 0s - loss: 13.6677 - val_loss: 10.7070
Epoch 16

<tensorflow.python.keras.callbacks.History at 0x1da662a0e80>

In [4]:
# Predict and measure RMSE
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print("Score (RMSE): {}".format(score))

Score (RMSE): 2.2832961082458496


In [5]:
# print out prediction
df_y = pd.DataFrame(y_test, columns=['ground_truth'])
df_pred = pd.DataFrame(pred, columns=['predicted'])
result = pd.concat([df_y, df_pred],axis=1)
result

Unnamed: 0,ground_truth,predicted
0,33.000000,32.999001
1,28.000000,31.299675
2,19.000000,20.835340
3,13.000000,15.482175
4,14.000000,13.117922
5,27.000000,25.097567
6,24.000000,28.565420
7,13.000000,12.230333
8,17.000000,19.235163
9,21.000000,19.204615
