In [None]:
import numpy as np
import pandas as pd

Machine learning methods like classification models can be used for time series forecasting. Before they can be used, however, time series forecasting problems must be re-framed as supervised learning problems. From a sequence to pairs of input and output sequences.

A key function to help transform time series data into a supervised learning problem is the pandas `shift()` function. Given a DataFrame, the `shift()` (some other libraries calls it `lag`) function can be used to create copies of columns that are pushed forward or pushed backward. This operation is required to create columns of lag observations as well as columns of forecast observations for a time series dataset in a supervised learning format.

Let’s look at some examples of the shift function in action. We start off by defining a toy time series dataset as a sequence of 10 numbers then use the shift function to create the "lagged" time series.

In [4]:
df = pd.DataFrame()
df['t'] = [x for x in range(10)]

# shift all the observations up by one time step
df['t+1'] = df['t'].shift(-1)
df

Unnamed: 0,t,t+1
0,0,1.0
1,1,2.0
2,2,3.0
3,3,4.0
4,4,5.0
5,5,6.0
6,6,7.0
7,7,8.0
8,8,9.0
9,9,


Running the code chunk above gives us two columns in the dataset. The first that contains the original observations and the second that has the shifted observation. Note that the last row would have to be discarded because of the NaN value (there's no value to shift up).

From the output above, we can see that shifting the series forward one time step gives us a primitive supervised learning problem, where the first row shows the input value of 0.0 to the output of the second column 1.0.

In [None]:
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	"""
	Frame a time series as a supervised learning dataset.
	Arguments:
		data: Sequence of observations as a list or NumPy array.
		n_in: Number of lag observations as input (X).
		n_out: Number of observations as output (y).
		dropnan: Boolean whether or not to drop rows with NaN values.
	Returns:
		Pandas DataFrame of series framed for supervised learning.
	"""
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg
 

values = [x for x in range(10)]
data = series_to_supervised(values)
print(data)

# Reference

- [Blog: How to Convert a Time Series to a Supervised Learning Problem in Python](http://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/)