## The Method Described Below uses Sliding Windows

### We can define a mock time series dataset as a sequence of 10 numbers, in this case a single column in a DataFrame

In [1]:
from pandas import DataFrame

df = DataFrame()
df['t'] = [x for x in range(10)]
print(df)

   t
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9


### Shift all the observations down by one time step by inserting one new row at the top, shift function can do this for us and we can insert this shifted column next to our original series.

In [2]:
df['t-1'] = df['t'].shift(1)
print(df)

   t  t-1
0  0  NaN
1  1  0.0
2  2  1.0
3  3  2.0
4  4  3.0
5  5  4.0
6  6  5.0
7  7  6.0
8  8  7.0
9  9  8.0


### increase of shifts of 2,3 or more creates long input sequences X used to forecast output sequences Y

In [3]:
# df['t+2'] = df['t'].shift(2)
#df = DataFrame()
#df['t'] = [x for x in range(10)]
# print(df)

### shift operator accepts negative integer values, pulling up the observations by inserting new rows at the end

In [4]:
df = DataFrame()
df['t'] = [x for x in range(10)]

df['t+1'] = df['t'].shift(-1)
print(df)

   t  t+1
0  0  1.0
1  1  2.0
2  2  3.0
3  3  4.0
4  4  5.0
5  5  6.0
6  6  7.0
7  7  8.0
8  8  9.0
9  9  NaN


#### t current time and future times (t-1, t+n) are forecast times and 
#### past observations (t-1, t-n) are used to make forecasts
#### positive and negative can be used to create a new data frame from a time series with sequences of input
#### and output patterns for a Supervised Learning problem
#### this not only permits classical X => Y prediction but also
#### where X=>Y where both input and output are sequences

## The series_to_supervised() function

#### arguments:
#### 1. data - list/2D NumPy array
#### 2. n_in - number of lag observations as input (X), values btwn[1..len(data)], default = 1
#### 3. n_out - number of observations as output (Y), values btwn[0..len(data-1)], default = 1
#### 4. dropnan - boolean whether or not to drop rows with NaN values, default = True

#### returns: pandas df with each column suitably named both by variable number and timestep, defaults - (t-1) as X & (t) as Y 

In [5]:
from pandas import DataFrame
from pandas import concat

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    
    # input sequence (t-n,...,t-1)
    
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
        
    # forecast sequence (t,t+1,...,t+n)
    
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    
    agg = concat(cols, axis=1)
    agg.columns = names
    
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg

### One-Step Univariate Forecasting
#### use lagged observations (e.g. t-1) as input variables to forecast the current time step (t).

In [6]:
values = [x for x in range(10)]
data = series_to_supervised(values)
print(data)

   var1(t-1)  var1(t)
1        0.0        1
2        1.0        2
3        2.0        3
4        3.0        4
5        4.0        5
6        5.0        6
7        6.0        7
8        7.0        8
9        8.0        9


#### observations are named “var1” and that the input observation is suitably named (t-1) and the output time step is named (t).
#### We can also see that rows with NaN values have been automatically removed from the DataFrame.
#### We can repeat this example with an arbitrary number length input sequence, such as 3. This can be done by specifying the length of the input sequence as an argument

In [7]:
data = series_to_supervised(values, 3)
print(data)

   var1(t-3)  var1(t-2)  var1(t-1)  var1(t)
3        0.0        1.0        2.0        3
4        1.0        2.0        3.0        4
5        2.0        3.0        4.0        5
6        3.0        4.0        5.0        6
7        4.0        5.0        6.0        7
8        5.0        6.0        7.0        8
9        6.0        7.0        8.0        9


####  input sequence is in the correct left-to-right order with the output variable to be predicted on the far right.

### Multi-step or Sequence Forecasting
#### using past observations to forecast a sequence of a future observations
#### sequence forcasting can be framed by specifying another argument 
#### an input sequence of 2 past observations to forecast 2 future observations

In [8]:
values = [x for x in range(10)]
data = series_to_supervised(values, 2, 2)
print(data)

   var1(t-2)  var1(t-1)  var1(t)  var1(t+1)
2        0.0        1.0        2        3.0
3        1.0        2.0        3        4.0
4        2.0        3.0        4        5.0
5        3.0        4.0        5        6.0
6        4.0        5.0        6        7.0
7        5.0        6.0        7        8.0
8        6.0        7.0        8        9.0


### Multivariate Forecasting

#### have observations of multiple different measures and an interest in forecasting 1/more of them
#### 2 sets of Time series observations and wish to forecast 1/both

In [9]:
raw = DataFrame()
# the two observations
raw['ob1'] = [x for x in range(10)]
raw['ob2'] = [x for x in range(50, 60)]

values = raw.values

data = series_to_supervised(values)
print(data)

   var1(t-1)  var2(t-1)  var1(t)  var2(t)
1        0.0       50.0        1       51
2        1.0       51.0        2       52
3        2.0       52.0        3       53
4        3.0       53.0        4       54
5        4.0       54.0        5       55
6        5.0       55.0        6       56
7        6.0       56.0        7       57
8        7.0       57.0        8       58
9        8.0       58.0        9       59


#### reframing with 1 timestep for input and 2 timesteps as forecast sequence

In [11]:
data = series_to_supervised(values, 1, 2)
print(data)

   var1(t-1)  var2(t-1)  var1(t)  var2(t)  var1(t+1)  var2(t+1)
1        0.0       50.0        1       51        2.0       52.0
2        1.0       51.0        2       52        3.0       53.0
3        2.0       52.0        3       53        4.0       54.0
4        3.0       53.0        4       54        5.0       55.0
5        4.0       54.0        5       55        6.0       56.0
6        5.0       55.0        6       56        7.0       57.0
7        6.0       56.0        7       57        8.0       58.0
8        7.0       57.0        8       58        9.0       59.0
