Import the necessary modules from Numpy, Pandas, Scikit-Learn, and Keras.

In [7]:
import numpy
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold

Load the dataset. Note that the CSV file in this example is actually not comma delimited but whitespace delimited. Pandas makes it easy to import files that are whitespace delimited. We also need to spliut the data into input and output variables.

In [2]:
# load dataset from file
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values

# split data into input and output variables
X = dataset[:,0:13]
Y = dataset[:,13]

Define the neural network model. In this example, we'll define a baseline model with 1 hidden layer containing 13 nodes, rectifier activation on the hidden layer, ADAM gradient descent, and a mean squared error loss function. Note that the output layer doesn't have an activation function because we're interested in predicting numerical values directly without transformation. To make it easy to change network attributes, we define the network as a function.

In [3]:
def baseline_model():
    
    # create model
    model = Sequential()
    model.add(Dense(13, input_dim=13, init='normal', activation='relu'))
    model.add(Dense(1, init='normal'))

    # compile model
    model.compile(loss='mean_squared_error', optimizer='adam')

    return model

We can now create a KerasRegressor estimator that we can call with scikit-learn. In this example, we use the baseline_model we just created above with 100 epochs and a batch size of 5. Note that we set the random number generator to a constant random seed to ensure consistency on many training runs.

In [5]:
# fix random seed
seed = 7
numpy.random.seed(seed)

# create model
estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

Now we're ready to evaluate the model. We'll use 10-fold cross-validation. Note that the process will take several minutes on a desktop CPU. 

In [6]:
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, Y, cv=kfold)

print('Baseline: %.2f (%.2f) MSE' % (results.mean(), results.std()))

Baseline: 32.94 (28.03) MSE
