# T81-558: Applications of Deep Neural Networks
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 4 Assignment:  Regression Neural Network**

**Student Name: **

# Assignment Instructions

For this assignment you will use the **reg-30-spring-2018.csv** dataset.  This is a dataset that I generated specifically for this semester.  You can find the CSV file in the **data** directory of the class GitHub repository here: [reg-30-spring-2018.csv](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/data/reg-30-spring-2018.csv).

For this assignment you will train a neural network and return the predictions.  You will submit these predictions to the **submit** function.  See [Assignment #1](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class1.ipynb) for details on how to submit an assignment or check that one was submitted.

Complete the following tasks:

* Normalize all numeric to zscores and all text/categorical to dummies.  Do not normalize the *target*.
* Your target (y) is the filed named *target*.
* If you find any missing values (NA's), replace them with the median values for that column.
* No need for any cross validation or holdout.  Just train on the entire data set for 250 epochs.
* You might get a warning, such as **"Warning: The mean of column pred differs from the solution file by 2.39"**.  Do not worry about small values, it would be very hard to get exactly the same result as I did.
* Your submitted dataframe will have these columns: id, pred.


# Helpful Functions

You will see these at the top of every module and assignment.  These are simply a set of reusable functions that we will make use of.  Each of them will be explained as the semester progresses.  They are explained in greater detail as the course progresses.  Class 4 contains a complete overview of these functions.

In [15]:
from sklearn import preprocessing
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import shutil
import os
import requests
import base64


# Encode text values to dummy variables(i.e. [1,0,0],[0,1,0],[0,0,1] for red,green,blue)
def encode_text_dummy(df, name):
    dummies = pd.get_dummies(df[name])
    for x in dummies.columns:
        dummy_name = "{}-{}".format(name, x)
        df[dummy_name] = dummies[x]
    df.drop(name, axis=1, inplace=True)


# Encode text values to a single dummy variable.  The new columns (which do not replace the old) will have a 1
# at every location where the original column (name) matches each of the target_values.  One column is added for
# each target value.
def encode_text_single_dummy(df, name, target_values):
    for tv in target_values:
        l = list(df[name].astype(str))
        l = [1 if str(x) == str(tv) else 0 for x in l]
        name2 = "{}-{}".format(name, tv)
        df[name2] = l


# Encode text values to indexes(i.e. [1],[2],[3] for red,green,blue).
def encode_text_index(df, name):
    le = preprocessing.LabelEncoder()
    df[name] = le.fit_transform(df[name])
    return le.classes_


# Encode a numeric column as zscores
def encode_numeric_zscore(df, name, mean=None, sd=None):
    if mean is None:
        mean = df[name].mean()

    if sd is None:
        sd = df[name].std()

    df[name] = (df[name] - mean) / sd


# Convert all missing values in the specified column to the median
def missing_median(df, name):
    med = df[name].median()
    df[name] = df[name].fillna(med)


# Convert all missing values in the specified column to the default
def missing_default(df, name, default_value):
    df[name] = df[name].fillna(default_value)


# Convert a Pandas dataframe to the x,y inputs that TensorFlow needs
def to_xy(df, target):
    result = []
    for x in df.columns:
        if x != target:
            result.append(x)
    # find out the type of the target column.  Is it really this hard? :(
    target_type = df[target].dtypes
    target_type = target_type[0] if hasattr(target_type, '__iter__') else target_type
    # Encode to int for classification, float otherwise. TensorFlow likes 32 bits.
    if target_type in (np.int64, np.int32):
        # Classification
        dummies = pd.get_dummies(df[target])
        return df.as_matrix(result).astype(np.float32), dummies.as_matrix().astype(np.float32)
    else:
        # Regression
        return df.as_matrix(result).astype(np.float32), df.as_matrix([target]).astype(np.float32)

# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)


# Regression chart.
def chart_regression(pred,y,sort=True):
    t = pd.DataFrame({'pred' : pred, 'y' : y.flatten()})
    if sort:
        t.sort_values(by=['y'],inplace=True)
    a = plt.plot(t['y'].tolist(),label='expected')
    b = plt.plot(t['pred'].tolist(),label='prediction')
    plt.ylabel('output')
    plt.legend()
    plt.show()

# Remove all rows where the specified column is +/- sd standard deviations
def remove_outliers(df, name, sd):
    drop_rows = df.index[(np.abs(df[name] - df[name].mean()) >= (sd * df[name].std()))]
    df.drop(drop_rows, axis=0, inplace=True)


# Encode a column to a range between normalized_low and normalized_high.
def encode_numeric_range(df, name, normalized_low=-1, normalized_high=1,
                         data_low=None, data_high=None):
    if data_low is None:
        data_low = min(df[name])
        data_high = max(df[name])

    df[name] = ((df[name] - data_low) / (data_high - data_low)) \
               * (normalized_high - normalized_low) + normalized_low
        
# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - Pandas dataframe output.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.  
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    r = requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={'csv':base64.b64encode(data.to_csv(index=False).encode('ascii')).decode("ascii"),
        'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code == 200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Assignment #4 Sample Code

The following code provides a starting point for this assignment.

In [54]:
import os
import pandas as pd
from scipy.stats import zscore
from keras.models import Sequential
from keras.layers.core import Dense, Activation
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics

# This is your student key that I emailed to you at the beginnning of the semester.
key = ""  # This is an example key and will not work.

# You must also identify your source file.  (modify for your local setup)
# file='/resources/t81_558_deep_learning/assignment_yourname_class1.ipynb'  # IBM Data Science Workbench
# file='C:\\Users\\jeffh\\projects\\t81_558_deep_learning\\t81_558_class1_intro_python.ipynb'  # Windows
# file='/Users/jeff/projects/t81_558_deep_learning/assignment_yourname_class1.ipynb'  # Mac/Linux
file = ''

# Begin assignment
path = ""


filename_read = os.path.join(path,"reg-30-spring-2018.csv")
df = pd.read_csv(filename_read)

# Encode the feature vector
ids = df['id']
df.drop('id',1,inplace=True)
encode_text_dummy(df, 'region')
encode_text_dummy(df, 'item')
missing_median(df,'width')
encode_numeric_zscore(df, 'distance', mean=None, sd=None)
encode_numeric_zscore(df, 'landings', mean=None, sd=None)
encode_numeric_zscore(df, 'number', mean=None, sd=None)
encode_numeric_zscore(df, 'pack', mean=None, sd=None)
encode_numeric_zscore(df, 'age', mean=None, sd=None)
encode_numeric_zscore(df, 'weight', mean=None, sd=None)
encode_numeric_zscore(df, 'volume', mean=None, sd=None)
encode_numeric_zscore(df, 'width', mean=None, sd=None)
encode_numeric_zscore(df, 'max', mean=None, sd=None)
encode_numeric_zscore(df, 'power', mean=None, sd=None)
encode_numeric_zscore(df, 'size', mean=None, sd=None)
x,y=to_xy(df, 'target')
model = Sequential()
model.add(Dense(10, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x,y,verbose=2,epochs=250)
pred = model.predict(x)

submit_df=pd.DataFrame()
submit_df['id'] = ids
submit_df['pred'] = pred

#(ids,(pred),columns=['id','pred'])
# Save a copy, if you like
submit_df.to_csv('4.csv',index=False)

# Submit the assignment
submit(source_file=file,data=submit_df,key=key,no=4)




Epoch 1/250
 - 1s - loss: 2702409.1276
Epoch 2/250
 - 0s - loss: 72921.4281
Epoch 3/250
 - 0s - loss: 52698.4228
Epoch 4/250
 - 0s - loss: 41763.3765
Epoch 5/250
 - 0s - loss: 33521.2501
Epoch 6/250
 - 0s - loss: 27540.3695
Epoch 7/250
 - 0s - loss: 23236.8166
Epoch 8/250
 - 0s - loss: 20240.8655
Epoch 9/250
 - 0s - loss: 18252.4092
Epoch 10/250
 - 0s - loss: 16784.8569
Epoch 11/250
 - 0s - loss: 15800.8382
Epoch 12/250
 - 0s - loss: 15146.0887
Epoch 13/250
 - 0s - loss: 14639.0009
Epoch 14/250
 - 0s - loss: 14588.8576
Epoch 15/250
 - 0s - loss: 13910.1269
Epoch 16/250
 - 0s - loss: 13743.8788
Epoch 17/250
 - 0s - loss: 13497.4385
Epoch 18/250
 - 0s - loss: 13204.2999
Epoch 19/250
 - 0s - loss: 13350.8271
Epoch 20/250
 - 0s - loss: 13300.1359
Epoch 21/250
 - 0s - loss: 13074.4141
Epoch 22/250
 - 0s - loss: 13096.2856
Epoch 23/250
 - 0s - loss: 13130.4820
Epoch 24/250
 - 0s - loss: 13089.1085
Epoch 25/250
 - 0s - loss: 12942.4921
Epoch 26/250
 - 0s - loss: 12963.7319
Epoch 27/250
 - 0s 

Epoch 214/250
 - 0s - loss: 13331.7551
Epoch 215/250
 - 0s - loss: 12861.3561
Epoch 216/250
 - 0s - loss: 13015.1422
Epoch 217/250
 - 0s - loss: 13750.6624
Epoch 218/250
 - 0s - loss: 12819.9682
Epoch 219/250
 - 0s - loss: 13292.8981
Epoch 220/250
 - 0s - loss: 12682.7056
Epoch 221/250
 - 0s - loss: 13570.3586
Epoch 222/250
 - 0s - loss: 13571.0293
Epoch 223/250
 - 0s - loss: 13044.2276
Epoch 224/250
 - 0s - loss: 13039.3450
Epoch 225/250
 - 0s - loss: 12827.9934
Epoch 226/250
 - 0s - loss: 13236.1011
Epoch 227/250
 - 0s - loss: 12971.6646
Epoch 228/250
 - 0s - loss: 13651.8674
Epoch 229/250
 - 0s - loss: 13033.6696
Epoch 230/250
 - 0s - loss: 12744.6244
Epoch 231/250
 - 0s - loss: 13218.0569
Epoch 232/250
 - 0s - loss: 12792.9662
Epoch 233/250
 - 0s - loss: 12939.6916
Epoch 234/250
 - 0s - loss: 12982.3597
Epoch 235/250
 - 0s - loss: 12779.1705
Epoch 236/250
 - 0s - loss: 13062.1005
Epoch 237/250
 - 0s - loss: 13214.0081
Epoch 238/250
 - 0s - loss: 12868.9849
Epoch 239/250
 - 0s - los

0          1
1          2
2          3
3          4
4          5
5          6
6          7
7          8
8          9
9         10
10        11
11        12
12        13
13        14
14        15
15        16
16        17
17        18
18        19
19        20
20        21
21        22
22        23
23        24
24        25
25        26
26        27
27        28
28        29
29        30
        ... 
1203    1204
1204    1205
1205    1206
1206    1207
1207    1208
1208    1209
1209    1210
1210    1211
1211    1212
1212    1213
1213    1214
1214    1215
1215    1216
1216    1217
1217    1218
1218    1219
1219    1220
1220    1221
1221    1222
1222    1223
1223    1224
1224    1225
1225    1226
1226    1227
1227    1228
1228    1229
1229    1230
1230    1231
1231    1232
1232    1233
Name: id, Length: 1233, dtype: int64
