## Lab 6: Evaluating Neural Networks and Feature Importance Anlaysis

#### CSC 215 Artificial Intelligence (Spring 2019)

#### Dr. Haiquan Chen, California State University, Sacramento

# Helpful Functions for Tensorflow (little gems)

The following functions will be used with TensorFlow to help preprocess the data.  They allow you to build the feature vector for a neural network. 

* Predictors/Inputs 
    * Fill any missing inputs with the median for that column.  Use **missing_median**.
    * Encode textual/categorical values with **encode_text_dummy**.
    * Encode numeric values with **encode_numeric_zscore**.
* Output
    * Discard rows with missing outputs.
    * Encode textual/categorical values with **encode_text_index**.
    * Do not encode output numeric values.
* Produce final feature vectors (x) and expected output (y) with **to_xy**.

In [1]:
import collections
from sklearn import preprocessing
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import shutil
import os


# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df, name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name, x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)


# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df, name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_


# Encode a numeric column as zscores
def encode_numeric_zscore(df, name, mean=None, sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name] - mean) / sd


# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)


# Convert all missing values in the specified column to the default
def missing_default(df, name, default_value):
    df[name] = df[name].fillna(default_value)


# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df, target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)
    # find out the type of the target column. 
    target_type = df[target].dtypes
    target_type = target_type[0] if isinstance(target_type, collections.Sequence) else target_type
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        dummies = pd.get_dummies(df[target])
        return df[result].values.astype(np.float32), dummies.values.astype(np.float32)
    else:
        # Regression
        return df[result].values.astype(np.float32), df[target].values.astype(np.float32)

# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)


# Regression chart.
def chart_regression(pred,y,sort=True):
    t = pd.DataFrame({'pred' : pred, 'y' : y.flatten()})
    if sort:
        t.sort_values(by=['y'],inplace=True)
    a = plt.plot(t['y'].tolist(),label='expected')
    b = plt.plot(t['pred'].tolist(),label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

# Remove all rows where the specified column is +/- sd standard deviations
def remove_outliers(df, name, sd):
    drop_rows = df.index[(np.abs(df[name] - df[name].mean()) >= (sd * df[name].std()))]
    df.drop(drop_rows, axis=0, inplace=True)


# Encode a column to a range between normalized_low and normalized_high.
def encode_numeric_range(df, name, normalized_low=-1, normalized_high=1,
                         data_low=None, data_high=None):
    if data_low is None:
        data_low = min(df[name])
        data_high = max(df[name])

    df[name] = ((df[name] - data_low) / (data_high - data_low)) \
               * (normalized_high - normalized_low) + normalized_low


# Training with a Test Set with Early Stopping

**Overfitting** occurs when a neural network is trained to the point that it begins to memorize rather than generalize.  

![Training vs Validation Error for Overfitting](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_training_val.png "Training vs Validation Error for Overfitting")


### Split data into training and test using train_test_split

In [3]:
import pandas as pd
import numpy as np
import os

from sklearn.model_selection import train_test_split

from sklearn import metrics

from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.callbacks import EarlyStopping

path = "./data/"
    
filename = os.path.join(path,"iris.csv")    
df = pd.read_csv(filename,na_values=['NA','?'])

species = encode_text_index(df,"species")

x,y = to_xy(df,"species")

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)

model = Sequential()

model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(5,activation='relu'))
model.add(Dense(y.shape[1],activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=2, verbose=2, mode='auto')  

# patience: number of epochs with no improvement after which training will be stopped

# The test set is checked during training to monitor progress for early stopping but is never used for gradient descent (model training)

model.fit(x_train, y_train, validation_data=(x_test,y_test), callbacks=[monitor], verbose=2, epochs=1000)  


Train on 112 samples, validate on 38 samples
Epoch 1/1000
 - 0s - loss: 1.0967 - val_loss: 1.1294
Epoch 2/1000
 - 0s - loss: 1.0560 - val_loss: 1.0901
Epoch 3/1000
 - 0s - loss: 1.0197 - val_loss: 1.0541
Epoch 4/1000
 - 0s - loss: 0.9873 - val_loss: 1.0215
Epoch 5/1000
 - 0s - loss: 0.9614 - val_loss: 0.9943
Epoch 6/1000
 - 0s - loss: 0.9375 - val_loss: 0.9736
Epoch 7/1000
 - 0s - loss: 0.9223 - val_loss: 0.9575
Epoch 8/1000
 - 0s - loss: 0.9095 - val_loss: 0.9443
Epoch 9/1000
 - 0s - loss: 0.9000 - val_loss: 0.9321
Epoch 10/1000
 - 0s - loss: 0.8897 - val_loss: 0.9218
Epoch 11/1000
 - 0s - loss: 0.8804 - val_loss: 0.9120
Epoch 12/1000
 - 0s - loss: 0.8715 - val_loss: 0.9030
Epoch 13/1000
 - 0s - loss: 0.8626 - val_loss: 0.8936
Epoch 14/1000
 - 0s - loss: 0.8542 - val_loss: 0.8845
Epoch 15/1000
 - 0s - loss: 0.8465 - val_loss: 0.8757
Epoch 16/1000
 - 0s - loss: 0.8385 - val_loss: 0.8667
Epoch 17/1000
 - 0s - loss: 0.8310 - val_loss: 0.8572
Epoch 18/1000
 - 0s - loss: 0.8227 - val_loss:

Epoch 152/1000
 - 0s - loss: 0.2829 - val_loss: 0.2730
Epoch 153/1000
 - 0s - loss: 0.2777 - val_loss: 0.2674
Epoch 154/1000
 - 0s - loss: 0.2736 - val_loss: 0.2621
Epoch 155/1000
 - 0s - loss: 0.2701 - val_loss: 0.2571
Epoch 156/1000
 - 0s - loss: 0.2653 - val_loss: 0.2535
Epoch 157/1000
 - 0s - loss: 0.2625 - val_loss: 0.2497
Epoch 158/1000
 - 0s - loss: 0.2572 - val_loss: 0.2429
Epoch 159/1000
 - 0s - loss: 0.2533 - val_loss: 0.2377
Epoch 160/1000
 - 0s - loss: 0.2501 - val_loss: 0.2331
Epoch 161/1000
 - 0s - loss: 0.2455 - val_loss: 0.2296
Epoch 162/1000
 - 0s - loss: 0.2417 - val_loss: 0.2269
Epoch 163/1000
 - 0s - loss: 0.2382 - val_loss: 0.2240
Epoch 164/1000
 - 0s - loss: 0.2354 - val_loss: 0.2184
Epoch 165/1000
 - 0s - loss: 0.2307 - val_loss: 0.2131
Epoch 166/1000
 - 0s - loss: 0.2275 - val_loss: 0.2082
Epoch 167/1000
 - 0s - loss: 0.2240 - val_loss: 0.2042
Epoch 168/1000
 - 0s - loss: 0.2206 - val_loss: 0.2006
Epoch 169/1000
 - 0s - loss: 0.2175 - val_loss: 0.1977
Epoch 170/

<keras.callbacks.History at 0x219ab818400>

Now that the neural network is trained, we can make predictions about the test set.  The following code predicts the type of iris for test set and displays the first five irises. 

In [4]:
pred = model.predict(x_test)
print(pred[0:5]) # print first five predictions

[[1.7158253e-02 8.8477188e-01 9.8069899e-02]
 [9.2713159e-01 6.8635091e-02 4.2333254e-03]
 [9.0311900e-05 3.9192997e-03 9.9599046e-01]
 [1.7526289e-02 7.9978710e-01 1.8268660e-01]
 [1.2922632e-02 9.0696341e-01 8.0114000e-02]]


Each line provides the probability that the iris is one of the 3 types of iris in the data set. 

### Saving Best Weights

It would be good idea to keep track of the most optimal weights during the entire training operation.  


An additional monitor, ModelCheckpoint,  is used and saves a copy of the neural network to **best_weights.hdf5** each time the validation score of the neural network improves.  

Once training is done, we just reload this file and we have the optimal training weights that were found.

In [4]:
import pandas as pd
import io
import requests
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn import metrics
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint

path = "./data/"
    
filename = os.path.join(path,"iris.csv")    
df = pd.read_csv(filename,na_values=['NA','?'])

species = encode_text_index(df,"species")
x,y = to_xy(df,"species")

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)

model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(5,activation='relu'))
model.add(Dense(y.shape[1],activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')

checkpointer = ModelCheckpoint(filepath="dnn/best_weights.hdf5", verbose=0, save_best_only=True) # save best model

model.fit(x_train, y_train,validation_data=(x_test,y_test),callbacks=[monitor,checkpointer],verbose=2,epochs=1000)

model.load_weights('dnn/best_weights.hdf5') # load weights from best model

Train on 112 samples, validate on 38 samples
Epoch 1/1000
 - 0s - loss: 1.1367 - val_loss: 1.1026
Epoch 2/1000
 - 0s - loss: 1.1083 - val_loss: 1.0764
Epoch 3/1000
 - 0s - loss: 1.0801 - val_loss: 1.0553
Epoch 4/1000
 - 0s - loss: 1.0579 - val_loss: 1.0366
Epoch 5/1000
 - 0s - loss: 1.0383 - val_loss: 1.0214
Epoch 6/1000
 - 0s - loss: 1.0234 - val_loss: 1.0082
Epoch 7/1000
 - 0s - loss: 1.0100 - val_loss: 0.9957
Epoch 8/1000
 - 0s - loss: 0.9978 - val_loss: 0.9835
Epoch 9/1000
 - 0s - loss: 0.9846 - val_loss: 0.9713
Epoch 10/1000
 - 0s - loss: 0.9709 - val_loss: 0.9577
Epoch 11/1000
 - 0s - loss: 0.9583 - val_loss: 0.9442
Epoch 12/1000
 - 0s - loss: 0.9472 - val_loss: 0.9336
Epoch 13/1000
 - 0s - loss: 0.9376 - val_loss: 0.9232
Epoch 14/1000
 - 0s - loss: 0.9300 - val_loss: 0.9131
Epoch 15/1000
 - 0s - loss: 0.9196 - val_loss: 0.9029
Epoch 16/1000
 - 0s - loss: 0.9099 - val_loss: 0.8932
Epoch 17/1000
 - 0s - loss: 0.9017 - val_loss: 0.8838
Epoch 18/1000
 - 0s - loss: 0.8918 - val_loss:

Epoch 152/1000
 - 0s - loss: 0.2311 - val_loss: 0.2206
Epoch 153/1000
 - 0s - loss: 0.2296 - val_loss: 0.2182
Epoch 154/1000
 - 0s - loss: 0.2274 - val_loss: 0.2173
Epoch 155/1000
 - 0s - loss: 0.2256 - val_loss: 0.2144
Epoch 156/1000
 - 0s - loss: 0.2230 - val_loss: 0.2115
Epoch 157/1000
 - 0s - loss: 0.2217 - val_loss: 0.2094
Epoch 158/1000
 - 0s - loss: 0.2202 - val_loss: 0.2080
Epoch 159/1000
 - 0s - loss: 0.2180 - val_loss: 0.2074
Epoch 160/1000
 - 0s - loss: 0.2164 - val_loss: 0.2070
Epoch 161/1000
 - 0s - loss: 0.2144 - val_loss: 0.2052
Epoch 162/1000
 - 0s - loss: 0.2127 - val_loss: 0.2037
Epoch 163/1000
 - 0s - loss: 0.2109 - val_loss: 0.2030
Epoch 164/1000
 - 0s - loss: 0.2094 - val_loss: 0.2008
Epoch 165/1000
 - 0s - loss: 0.2078 - val_loss: 0.1989
Epoch 166/1000
 - 0s - loss: 0.2060 - val_loss: 0.1986
Epoch 167/1000
 - 0s - loss: 0.2049 - val_loss: 0.1974
Epoch 168/1000
 - 0s - loss: 0.2035 - val_loss: 0.1933
Epoch 169/1000
 - 0s - loss: 0.2012 - val_loss: 0.1913
Epoch 170/

### Potential Keras Issue on Small Networks Regarding Saving Optimal Weights

You might occasionally see this error:

```
OSError: Unable to create file (Unable to open file: name = 'dnn/best_weights.hdf5', errno = 22, error message = 'invalid argument', flags = 13, o_flags = 302)
```

Usually you can just run rerun the code and it goes away.  This is an unfortnuate result of saving a file each time the validation score improves (as described in the previous section).  If the errors improve two rapidly, you might try to save the file twice and get an error from these two saves overlapping.  For larger neural networks this will not be a problem because each training step will take longer, allowing for plenty of time for the previous save to complete.   

## Evaluating Classification Models

### (1) Calculate Classification Accuracy/Precision/Recall/F1-Score

By default, Keras will return the predicted probability for each class. We can change these prediction probabilities into the actual iris predicted with **argmax**.

In [6]:
pred = model.predict(x_test)

pred = np.argmax(pred,axis=1) # raw probabilities to choose class (highest probability)
print(pred)

[1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0]


Now that we have the actual iris flower predicted, we can calculate the percent accuracy (how many were correctly classified).

In [7]:
y_true= np.argmax(y_test,axis=1) 

score = metrics.accuracy_score(y_true, pred)

print("Accuracy score: {}".format(score))

Accuracy score: 1.0


In [8]:
score = metrics.precision_score(y_true, pred, average= "weighted")
print("Precision score: {}".format(score))

Precision score: 1.0


In [9]:
score = metrics.recall_score(y_true, pred, average= "weighted")
print("Recall score: {}".format(score))

Recall score: 1.0


In [10]:
score = metrics.f1_score(y_true, pred, average= "weighted")
print("F1 score: {}".format(score))

F1 score: 1.0


For more metrics, check this out:

http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

### (2) Calculate Classification Cross-Entropy Loss (Log Loss)  

Log loss is an error metric that is often used in place of accuracy for classification.  

Log loss allows for "partial credit". For example, a model might be used to classify A, B and C.  The correct answer might be A, however if the classification network chose B as having the highest probability, then accuracy gives the neural network no credit for this classification.  

However, with log loss, the probability of the correct answer is added to the score.  For example, the correct answer might be A, but if the neural network only predicted .4 probability of A being correct, then the value -log(.4) is added.

$$ logloss = -\frac{1}{N}\sum^N_{i=1}\sum^M_{j=1}y_{ij} \log(\hat{y}_{ij}) $$

The following code shows the logloss scores that correspond to the average probablity for the correct item. The **pred** column specifies the average robability for the correct class.  The **logloss** column specifies the log loss for that probability.


Calculating log loss

In [11]:
# Generate predictions
pred = model.predict(x_test)

print("Numpy array of predictions")
print(pred[0:5])
print()
print("y_test:")
print(y_test[0:5])

score = metrics.log_loss(y_test, pred)
print("Log loss score: {}".format(score))

Numpy array of predictions
[[1.7158253e-02 8.8477188e-01 9.8069899e-02]
 [9.2713159e-01 6.8635091e-02 4.2333254e-03]
 [9.0311900e-05 3.9192997e-03 9.9599046e-01]
 [1.7526289e-02 7.9978710e-01 1.8268660e-01]
 [1.2922632e-02 9.0696341e-01 8.0114000e-02]]

y_test:
[[0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]]
Log loss score: 0.14241981826183436


## Evaluating Regression Models

Regression results are evaluated differently than classification.  Consider the following code. 

In [17]:
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn import metrics

path = "./data/"

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

cars = df['name']
df.drop('name',1,inplace=True)
missing_median(df, 'horsepower')

encode_text_dummy(df, 'origin')

x,y = to_xy(df,"mpg")

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split (x, y, test_size=0.25, random_state=45)

model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(10))
model.add(Dense(10))
model.add(Dense(10))

model.add(Dense(1))  # 1 output neuron 


model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')

model.fit(x_train,y_train, validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)


Train on 298 samples, validate on 100 samples
Epoch 1/1000
 - 0s - loss: 38233.7653 - val_loss: 1742.5390
Epoch 2/1000
 - 0s - loss: 2801.7481 - val_loss: 5661.8501
Epoch 3/1000
 - 0s - loss: 3932.0458 - val_loss: 969.2856
Epoch 4/1000
 - 0s - loss: 828.9930 - val_loss: 991.2598
Epoch 5/1000
 - 0s - loss: 1026.3954 - val_loss: 617.9030
Epoch 6/1000
 - 0s - loss: 632.2696 - val_loss: 547.0746
Epoch 7/1000
 - 0s - loss: 644.0163 - val_loss: 511.7147
Epoch 8/1000
 - 0s - loss: 597.9163 - val_loss: 500.4651
Epoch 9/1000
 - 0s - loss: 592.0889 - val_loss: 484.2070
Epoch 10/1000
 - 0s - loss: 576.4058 - val_loss: 476.3443
Epoch 11/1000
 - 0s - loss: 568.5558 - val_loss: 466.7324
Epoch 12/1000
 - 0s - loss: 556.9537 - val_loss: 458.2840
Epoch 13/1000
 - 0s - loss: 552.2378 - val_loss: 451.1141
Epoch 14/1000
 - 0s - loss: 535.3213 - val_loss: 441.8762
Epoch 15/1000
 - 0s - loss: 527.6570 - val_loss: 431.8295
Epoch 16/1000
 - 0s - loss: 516.5360 - val_loss: 422.8655
Epoch 17/1000
 - 0s - loss: 

<keras.callbacks.History at 0x219b103fba8>

### Mean Square Error

The mean square error is the sum of the squared differences between the prediction ($\hat{y}$) and the expected ($y$).  MSE values are not of a particular unit.  If an MSE value has decreased for a model, that is good. Low MSE values are desired.

$ \text{MSE} = \frac{1}{n} \sum_{i=1}^n \left(\hat{y}_i - y_i\right)^2 $


In [18]:
# Predict
pred = model.predict(x_test)

# Measure MSE error.  
score = metrics.mean_squared_error(pred,y_test)
print("Final score (MSE): {}".format(score))

Final score (MSE): 31.153902053833008


### Root Mean Square Error

The root mean square (RMSE) is essentially the square root of the MSE.  Because of this, the RMSE error is in the same units as the training data outcome. Low RMSE values are desired.

$ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n \left(\hat{y}_i - y_i\right)^2} $

In [19]:
# Measure RMSE error.  RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print("Final score (RMSE): {}".format(score))

Final score (RMSE): 5.581568241119385


# Training with Cross-Validation

Cross-Validation uses a number of folds, and multiple models, to generate out of sample predictions on the entire dataset.  It is important to note that there will be one model (neural network) for each fold. Each model contributes part of the final out-of-sample prediction.

![K-Fold Crossvalidation](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_kfold.png "K-Fold Crossvalidation")


## Regression with Cross-Validation

The following code trains the MPG dataset using a 5-fold cross-validation.  The expected performance of a neural network, of the type trained here, would be the score for the generated out-of-sample predictions.

In [22]:
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from scipy.stats import zscore
from sklearn.model_selection import KFold
from keras.models import Sequential
from keras.layers.core import Dense, Activation

path = "./data/"

filename_read = os.path.join(path,"auto-mpg.csv")
filename_write = os.path.join(path,"auto-mpg-out-of-sample.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

# Shuffle
np.random.seed(42)
df = df.reindex(np.random.permutation(df.index))
df.reset_index(inplace=True, drop=True)

# Preprocess
cars = df['name']
df.drop('name',1,inplace=True)
missing_median(df, 'horsepower')

encode_text_dummy(df, 'origin')

# Encode to a 2D matrix for training
x,y = to_xy(df,'mpg')

# Cross-Validate
kf = KFold(5)
    
oos_y = []
oos_pred = []
fold = 0
for train, test in kf.split(x):
    fold+=1
    print("Fold #{}".format(fold))
        
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]
    
    model = Sequential()
    model.add(Dense(20, input_dim=x.shape[1], activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    model.compile(loss='mean_squared_error', optimizer='adam')
    
    monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')
    model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=0,epochs=100)
    
    pred = model.predict(x_test)
    
    oos_y.append(y_test)
    oos_pred.append(pred)        

    # Measure this fold's RMSE
    score = np.sqrt(metrics.mean_squared_error(pred,y_test))
    print("Fold score (RMSE): {}".format(score))


# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)

score = np.sqrt(metrics.mean_squared_error(oos_pred,oos_y))
print("Final, the average out of sample score (RMSE): {}".format(score))    

Fold #1
Fold score (RMSE): 12.301321029663086
Fold #2
Fold score (RMSE): 8.759023666381836
Fold #3
Fold score (RMSE): 24.48180389404297
Fold #4
Epoch 00023: early stopping
Fold score (RMSE): 10.580692291259766
Fold #5
Fold score (RMSE): 7.073452949523926
Final, the average out of sample score (RMSE): 14.087748527526855


In [23]:
# print out prediction
oos_y = pd.DataFrame(oos_y, columns=['ground_truth'])
oos_pred = pd.DataFrame(oos_pred, columns=['predicted'])
oosDF = pd.concat([df, oos_y, oos_pred],axis=1)
oosDF

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin-1,origin-2,origin-3,ground_truth,predicted
0,33.0,4,91.0,53.0,1795,17.4,76,0,0,1,33.000000,14.854352
1,28.0,4,120.0,79.0,2625,18.6,82,1,0,0,28.000000,24.851629
2,19.0,6,232.0,100.0,2634,13.0,71,1,0,0,19.000000,8.395741
3,13.0,8,318.0,150.0,3940,13.2,76,1,0,0,13.000000,19.235159
4,14.0,8,318.0,150.0,4237,14.5,73,1,0,0,14.000000,27.605688
5,27.0,4,97.0,88.0,2100,16.5,72,0,0,1,27.000000,12.673261
6,24.0,4,140.0,92.0,2865,16.4,82,1,0,0,24.000000,25.702429
7,13.0,8,440.0,215.0,4735,11.0,73,1,0,0,13.000000,11.084966
8,17.0,8,260.0,110.0,4060,19.0,77,1,0,0,17.000000,39.394062
9,21.0,6,200.0,93.5,2875,17.0,74,1,0,0,21.000000,19.426001


## Classification with Cross-Validation

The following code trains and fits the iris dataset with Cross-Validation.  It also print out the out of sample (predictions on the test set) results.

In [24]:
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from scipy.stats import zscore
from sklearn.model_selection import KFold
from keras.models import Sequential
from keras.layers.core import Dense, Activation

path = "./data/"

filename_read = os.path.join(path,"iris.csv")
filename_write = os.path.join(path,"iris-out-of-sample.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

# Shuffle
np.random.seed(42)
df = df.reindex(np.random.permutation(df.index))
df.reset_index(inplace=True, drop=True)

# Encode to a 2D matrix for training
species = encode_text_index(df,"species")

x,y = to_xy(df,"species")

# Cross-validate
kf = KFold(5)
    
oos_y = []
oos_pred = []
fold = 0

for train, test in kf.split(x):
    fold+=1
    print("Fold #{}".format(fold))
        
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]
    
    model = Sequential()
    model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
    model.add(Dense(25, activation='relu')) # Hidden 2
    model.add(Dense(y.shape[1],activation='softmax')) # Output
    
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    
    monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=25, verbose=1, mode='auto')

    model.fit(x,y,validation_data=(x_test,y_test),callbacks=[monitor],verbose=0,epochs=100)
    
    pred = model.predict(x_test)
    
    oos_y.append(y_test)
    pred = np.argmax(pred,axis=1) # raw probabilities to chosen class (highest probability)
    oos_pred.append(pred)        

    # Measure this fold's accuracy
    y_compare = np.argmax(y_test,axis=1) # For accuracy calculation
    score = metrics.accuracy_score(y_compare, pred)
    print("Fold score (accuracy): {}".format(score))


# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y)
oos_pred = np.concatenate(oos_pred)
oos_y_compare = np.argmax(oos_y,axis=1) # For accuracy calculation

score = metrics.accuracy_score(oos_y_compare, oos_pred)
print("Final score (accuracy): {}".format(score))    

Fold #1
Fold score (accuracy): 1.0
Fold #2
Fold score (accuracy): 1.0
Fold #3
Fold score (accuracy): 0.9666666666666667
Fold #4
Fold score (accuracy): 0.9333333333333333
Fold #5
Fold score (accuracy): 1.0
Final score (accuracy): 0.98


In [25]:
#print out the cross-validated prediction
oos_y = pd.DataFrame(oos_y_compare, columns=['ground_truth'])
oos_pred = pd.DataFrame(oos_pred, columns=['predicted'])
oosDF = pd.concat([df, oos_y, oos_pred],axis=1)
oosDF

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,species,ground_truth,predicted
0,6.1,2.8,4.7,1.2,1,1,1
1,5.7,3.8,1.7,0.3,0,0,0
2,7.7,2.6,6.9,2.3,2,2,2
3,6.0,2.9,4.5,1.5,1,1,1
4,6.8,2.8,4.8,1.4,1,1,1
5,5.4,3.4,1.5,0.4,0,0,0
6,5.6,2.9,3.6,1.3,1,1,1
7,6.9,3.1,5.1,2.3,2,2,2
8,6.2,2.2,4.5,1.5,1,1,1
9,5.8,2.7,3.9,1.2,1,1,1


# Performance Improvement by Normalizing Features and Tuning Hyperparameters

There are many different settings that you can use for a neural network.  These can affect performance.  The following code changes some of these, beyond their default values:

* **activation:** relu, sigmoid, tanh
* **Layers and Neuron Counts**
* **optimizer:** adam, sgd, rmsprop, and [others](https://keras.io/optimizers/)

In [30]:
%matplotlib inline
from matplotlib.pyplot import figure, show
from sklearn.model_selection import train_test_split
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from scipy.stats import zscore
import tensorflow as tf

path = "./data/"
preprocess = True

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

# create feature vector
missing_median(df, 'horsepower')
encode_text_dummy(df, 'origin')
df.drop('name',1,inplace=True)

if preprocess:
    encode_numeric_zscore(df, 'horsepower')
    encode_numeric_zscore(df, 'weight')
    encode_numeric_zscore(df, 'cylinders')
    encode_numeric_zscore(df, 'displacement')
    encode_numeric_zscore(df, 'acceleration')
    encode_numeric_zscore(df, 'year')

# Encode to a 2D matrix for training
x,y = to_xy(df,'mpg')

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=42)

model = Sequential()
model.add(Dense(100, input_dim=x.shape[1], activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=2, mode='auto')

model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)


Train on 318 samples, validate on 80 samples
Epoch 1/1000
 - 1s - loss: 606.6740 - val_loss: 561.4107
Epoch 2/1000
 - 0s - loss: 577.5006 - val_loss: 523.7796
Epoch 3/1000
 - 0s - loss: 528.0903 - val_loss: 458.9183
Epoch 4/1000
 - 0s - loss: 442.1716 - val_loss: 348.9799
Epoch 5/1000
 - 0s - loss: 302.3197 - val_loss: 191.8019
Epoch 6/1000
 - 0s - loss: 130.5734 - val_loss: 56.7240
Epoch 7/1000
 - 0s - loss: 38.5000 - val_loss: 48.4623
Epoch 8/1000
 - 0s - loss: 37.4356 - val_loss: 31.9376
Epoch 9/1000
 - 0s - loss: 25.3143 - val_loss: 23.6517
Epoch 10/1000
 - 0s - loss: 21.9778 - val_loss: 19.4911
Epoch 11/1000
 - 0s - loss: 18.5742 - val_loss: 17.7465
Epoch 12/1000
 - 0s - loss: 16.5808 - val_loss: 14.8617
Epoch 13/1000
 - 0s - loss: 15.0528 - val_loss: 12.9167
Epoch 14/1000
 - 0s - loss: 14.0329 - val_loss: 11.4424
Epoch 15/1000
 - 0s - loss: 12.9991 - val_loss: 10.5752
Epoch 16/1000
 - 0s - loss: 12.3514 - val_loss: 9.5200
Epoch 17/1000
 - 0s - loss: 11.5277 - val_loss: 8.5620
Epo

<keras.callbacks.History at 0x219b3ccbe10>

In [31]:
# Predict and measure RMSE
pred = model.predict(x_test)
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print("Score (RMSE): {}".format(score))

Score (RMSE): 2.3105366230010986


In [32]:
# print out prediction
df_y = pd.DataFrame(y_test, columns=['ground_truth'])
df_pred = pd.DataFrame(pred, columns=['predicted'])
result = pd.concat([df, df_y, df_pred],axis=1)
result

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin-1,origin-2,origin-3,ground_truth,predicted
0,18.0,1.496308,1.089233,0.672271,0.630077,-1.293870,-1.625381,1,0,0,33.0,33.606819
1,15.0,1.496308,1.501624,1.587959,0.853259,-1.475181,-1.625381,1,0,0,28.0,30.946150
2,18.0,1.496308,1.194728,1.195522,0.549778,-1.656492,-1.625381,1,0,0,19.0,21.683954
3,16.0,1.496308,1.060461,1.195522,0.546236,-1.293870,-1.625381,1,0,0,13.0,14.853514
4,17.0,1.496308,1.041280,0.933897,0.565130,-1.837804,-1.625381,1,0,0,14.0,12.905978
5,15.0,1.496308,2.259274,2.451322,1.618455,-2.019115,-1.625381,1,0,0,27.0,25.130396
6,14.0,1.496308,2.499036,3.026898,1.633806,-2.381737,-1.625381,1,0,0,24.0,27.979574
7,14.0,1.496308,2.364769,2.896085,1.584210,-2.563048,-1.625381,1,0,0,13.0,12.103612
8,14.0,1.496308,2.508627,3.157710,1.717647,-2.019115,-1.625381,1,0,0,17.0,19.357462
9,15.0,1.496308,1.885244,2.242022,1.038654,-2.563048,-1.625381,1,0,0,21.0,18.417076


# Feature Importance Analysis

### Feature importance analysis tells us how important each feature is to the prediction of a model.  

In this class, we will focus on the **Input Perturbation** feature ranking algorithm.  This algorithm will work with any regression or classification network.  

This input perturbation algorithm works by ***evaluating a model’s accuracy with each of the inputs individually shuffled (removed) from a data set.***.  More important inputs will produce a less accurate score when they are removed. 

***The algorithm will use logloss to evaluate a classification problem and RMSE for regression.***

In [33]:
from sklearn import metrics
import scipy as sp
import numpy as np
import math
from sklearn import metrics

def perturbation_rank(model, x, y, names, regression):
    errors = []

    for i in range(x.shape[1]):
        hold = np.array(x[:, i])
        np.random.shuffle(x[:, i])
        
        if regression:
            pred = model.predict(x)
            error = metrics.mean_squared_error(y, pred)
        else:
            pred = model.predict_proba(x)
            error = metrics.log_loss(y, pred)
            
        errors.append(error)
        x[:, i] = hold
        
    max_error = np.max(errors)
    importance = [e/max_error for e in errors]

    data = {'name':names,'error':errors,'importance':importance}
    result = pd.DataFrame(data, columns = ['name','error','importance'])
    result.sort_values(by=['importance'], ascending=[0], inplace=True)
    result.reset_index(inplace=True, drop=True)
    return result

### Classification Example using Input Perturbation

In [34]:
import pandas as pd
import io
import os
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import metrics
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.callbacks import EarlyStopping

path = "./data/"
    
filename = os.path.join(path,"iris.csv")    
df = pd.read_csv(filename,na_values=['NA','?'])

species = encode_text_index(df,"species")
x,y = to_xy(df,"species")

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(1))
model.add(Dense(y.shape[1],activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')

model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=0,epochs=1000)


Epoch 00411: early stopping


<keras.callbacks.History at 0x219bf1643c8>

In [35]:
# Rank the features
# from IPython.display import display, HTML

names = list(df.columns) # x+y column names
names.remove("species") # remove the target(y)
rank = perturbation_rank(model, x_test, y_test, names, False)
rank

Unnamed: 0,name,error,importance
0,petal_l,1.860484,1.0
1,petal_w,0.923193,0.496211
2,sepal_w,0.152292,0.081856
3,sepal_l,0.148735,0.079944


### Regression Example using Input Perturbation

In [36]:
# Rank MPG fields

import tensorflow as tf
from sklearn.model_selection import train_test_split
import pandas as pd
import os
import numpy as np
from sklearn import metrics
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.callbacks import EarlyStopping

path = "./data/"

# Set the desired TensorFlow output level for this example
# tf.logging.set_verbosity(tf.logging.ERROR)

filename_read = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename_read,na_values=['NA','?'])

# create feature vector
missing_median(df, 'horsepower')
df.drop('name',1,inplace=True)
encode_numeric_zscore(df, 'horsepower')
encode_numeric_zscore(df, 'weight')
encode_numeric_zscore(df, 'cylinders')
encode_numeric_zscore(df, 'displacement')
encode_numeric_zscore(df, 'acceleration')
encode_numeric_zscore(df, 'year')

encode_text_dummy(df, 'origin')

# Encode to a 2D matrix for training
x,y = to_xy(df,'mpg')

# Split into train/test
x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.20, random_state=42)

model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto')
model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=0,epochs=1000)
    

Epoch 00378: early stopping


<keras.callbacks.History at 0x219bf575fd0>

In [37]:
# Rank the features
# from IPython.display import display, HTML

names = list(df.columns) # x+y column names
names.remove("mpg") # remove the target(y)
rank = perturbation_rank(model, x_test, y_test, names, True)
rank

Unnamed: 0,name,error,importance
0,origin-2,39.91378,1.0
1,origin-1,37.985298,0.951684
2,weight,33.511951,0.839609
3,origin-3,32.728836,0.819988
4,horsepower,27.429987,0.687231
5,year,20.865454,0.522763
6,cylinders,7.868369,0.197134
7,displacement,6.531706,0.163645
8,acceleration,6.424447,0.160958


### References:

* [Google Colab](https://colab.research.google.com/) - Free web based platform that includes Python, Juypter Notebooks, and TensorFlow with free GPU support.  No setup needed.
* [IBM Cognitive Class Labs](https://www.datascientistworkbench.com) - Free web based platform that includes Python, Juypter Notebooks, and TensorFlow.  No setup needed.
* [Python Anaconda](https://www.continuum.io/downloads) - Python distribution that includes many data science packages, such as Numpy, Scipy, Scikit-Learn, Pandas, and much more.
* [TensorFlow](https://www.tensorflow.org/) - Google's mathematics package for deep learning.
* [Kaggle](https://www.kaggle.com/) - Competitive data science.  Good source of sample data.
* T81-558: Applications of Deep Neural Networks. Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/)