We will define a very simple model with one hidden layer and define five hyperparameters to tune. They are:

**n_input**: The number of prior inputs to use as input for the model (e.g. 12 months).

**n_nodes**: The number of nodes to use in the hidden layer (e.g. 50).

**n_epochs**: The number of training epochs (e.g. 1000).

**n_batch**: The number of samples to include in each mini-batch (e.g. 32).

**n_diff**: The difference order (e.g. 0 or 12).


Modern neural networks can handle raw data with little pre-processing, such as scaling and differencing. Nevertheless, when it comes to time series data, sometimes differencing the series can make a problem easier to model.

*Differencing is the transform of the data such that a value of a prior observation is subtracted from the current observation, removing trend or seasonality structure.*

We will add support for differencing to the grid search test harness, just in case it adds value to a specific problem. It does add value for the internal airline passengers dataset.

In [1]:
# difference dataset
# will calculate the difference of a given order for the dataset
def difference(data, order):
	return [data[i] - data[i - order] for i in range(order, len(data))]

#Differencing will be optional, where an order of 0 suggests no differencing, whereas 
#an order 1 or order 12 will require that the data be differenced prior to fitting the model 
#and that the predictions of the model will need the differencing reversed prior to returning the forecast.

First, unpacking the list of hyperparameters:

In [2]:
# unpack config
# n_input, n_nodes, n_epochs, n_batch, n_diff = config

Next, we must prepare the data, including the differencing, transforming the data to a supervised format and separating out the input and output aspects of the data samples

We then define and fit the model with the given conifguration.

Complete implementation of the *model_fit()* function:

In [3]:
# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_nodes, n_epochs, n_batch, n_diff = config
	# prepare data
	if n_diff > 0:
		train = difference(train, n_diff)
	# transform series into supervised format
	data = series_to_supervised(train, n_in=n_input)
	# separate inputs and outputs
	train_x, train_y = data[:, :-1], data[:, -1]
	# define model
	model = Sequential()
	model.add(Dense(n_nodes, activation='relu', input_dim=n_input))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit model
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

Once the model is fit, it can be used for making forecasts.
If the data was differenced, the difference must be inverted for the prediction of the model. This involves adding the value at the relative offset from the history back to the value predicted by the model.

It also means that the history must be differenced so that the input data used to make the prediction has the expected form.

Once prepared, we can use the history data to create a single sample as input to the model for making a one-step prediction.

The shape of one sample must be [1, n_input] where n_input is the chosen number of lag observations to use.

Finally, a prediction can be made.

Complete implementation of the model_predict function:

In [4]:
# forecast with the fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _, n_diff = config
	# prepare data
	correction = 0.0
	if n_diff > 0:
		correction = history[-n_diff]
		history = difference(history, n_diff)
	# shape input for model
	x_input = array(history[-n_input:]).reshape((1, n_input))
	# make forecast
	yhat = model.predict(x_input, verbose=0)
	# correct forecast if it was differenced
	return correction + yhat[0]

We can define a model_configs() function that creates a list of the different combinations of parameters to try.

We will define a small subset of configurations to try as an example, including a differencing of 12 months, which we expect will be required.

The grid search can be repeated to narrow in on ranges of values that appear to show better performance.

An implementation of the model_configs():

In [5]:
# create a list of configs to try
def model_configs():
	# define scope of configs
	n_input = [12]
	n_nodes = [50, 100]
	n_epochs = [100]
	n_batch = [1, 150]
	n_diff = [0, 12]
	# create configs
	configs = list()
	for i in n_input:
		for j in n_nodes:
			for k in n_epochs:
				for l in n_batch:
					for m in n_diff:
						cfg = [i, j, k, l, m]
						configs.append(cfg)
	print('Total configs: %d' % len(configs))
	return configs

Combining all the elements required to grid search MLP models for a univariate time series forecasting problem, we get:

In [7]:
# grid search mlps for airline passengers
from math import sqrt
from numpy import array
from numpy import mean
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

# difference dataset
def difference(data, order):
	return [data[i] - data[i - order] for i in range(order, len(data))]

# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_nodes, n_epochs, n_batch, n_diff = config
	# prepare data
	if n_diff > 0:
		train = difference(train, n_diff)
	# transform series into supervised format
	data = series_to_supervised(train, n_in=n_input)
	# separate inputs and outputs
	train_x, train_y = data[:, :-1], data[:, -1]
	# define model
	model = Sequential()
	model.add(Dense(n_nodes, activation='relu', input_dim=n_input))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit model
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

# forecast with the fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _, n_diff = config
	# prepare data
	correction = 0.0
	if n_diff > 0:
		correction = history[-n_diff]
		history = difference(history, n_diff)
	# shape input for model
	x_input = array(history[-n_input:]).reshape((1, n_input))
	# make forecast
	yhat = model.predict(x_input, verbose=0)
	# correct forecast if it was differenced
	return correction + yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

# score a model, return None on failure
def repeat_evaluate(data, config, n_test, n_repeats=10):
	# convert config to a key
	key = str(config)
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	# summarize score
	result = mean(scores)
	print('> Model[%s] %.3f' % (key, result))
	return (key, result)

# grid search configs
def grid_search(data, cfg_list, n_test):
	# evaluate configs
	scores = [repeat_evaluate(data, cfg, n_test) for cfg in cfg_list]
	# sort configs by error, asc
	scores.sort(key=lambda tup: tup[1])
	return scores

# create a list of configs to try
def model_configs():
	# define scope of configs
	n_input = [12]
	n_nodes = [50, 100]
	n_epochs = [100]
	n_batch = [1, 150]
	n_diff = [0, 12]
	# create configs
	configs = list()
	for i in n_input:
		for j in n_nodes:
			for k in n_epochs:
				for l in n_batch:
					for m in n_diff:
						cfg = [i, j, k, l, m]
						configs.append(cfg)
	print('Total configs: %d' % len(configs))
	return configs

# define dataset
series = read_csv('monthly-airline-passengers.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# model configs
cfg_list = model_configs()
# grid search
scores = grid_search(data, cfg_list, n_test)
print('done')
# list top 3 configs
for cfg, error in scores[:3]:
	print(cfg, error)

Total configs: 8
 > 23.091
 > 17.128
 > 21.231
 > 29.980
 > 18.841
 > 18.143
 > 29.214
 > 21.461
 > 17.444
 > 33.589
> Model[[12, 50, 100, 1, 0]] 23.012
 > 20.531
 > 19.559
 > 21.614
 > 19.118
 > 19.001
 > 18.362
 > 21.848
 > 18.842
 > 20.018
 > 20.605
> Model[[12, 50, 100, 1, 12]] 19.950
 > 63.015
 > 42.279
 > 91.787
 > 64.172
 > 51.806
 > 96.806
 > 69.926
 > 39.364
 > 36.523
 > 45.846
> Model[[12, 50, 100, 150, 0]] 60.152
 > 19.593
 > 20.129
 > 18.897
 > 17.921
 > 20.572
 > 19.789
 > 19.861
 > 19.539
 > 20.640
 > 19.409
> Model[[12, 50, 100, 150, 12]] 19.635
 > 17.297
 > 16.129
 > 25.786
 > 19.816
 > 16.393
 > 15.813
 > 21.420
 > 19.012
 > 18.209
 > 20.256
> Model[[12, 100, 100, 1, 0]] 19.013
 > 20.455
 > 19.656
 > 19.982
 > 20.715
 > 18.962
 > 20.510
 > 18.708
 > 21.391
 > 19.751
 > 18.317
> Model[[12, 100, 100, 1, 12]] 19.845
 > 54.982
 > 23.449
 > 69.765
 > 65.526
 > 57.286
 > 69.115
 > 64.826
 > 48.573
 > 73.917
 > 79.937
> Model[[12, 100, 100, 150, 0]] 60.738
 > 19.423
 > 18.275

Running the example, we can see that there are a total of eight configurations to be evaluated by the framework.

Results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. We'll run the example a few times and compare the average outcome.

Each config will be evaluated 10 times; that means 10 models will be created and evaluated using walk-forward validation to calculate an RMSE score before an average of those 10 scores is reported and used to score the configuration.

The scores are then sorted and the top 3 configurations with the lowest RMSE are reported at the end. A skillful model configuration was found as compared to a naive model that reported an RMSE of 50.70.