<a href="https://colab.research.google.com/github/cagBRT/timeSeries/blob/main/8_2D_to_3D_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone -l -s https://github.com/cagBRT/timeSeries.git cloned-repo
%cd cloned-repo

Time series forecasting is difficult. Time series problems add the complexity of order dependence between observations.

Time series forecasting has been dominated by linear methods like ARIMA because they are well understood and effective on many problems. 

Classical methods have some limitations:
- Focus on complete data: missing or corrupt data is generally unsupported
-Focus on linear relationships: assuming a linear relationship excludes more complex joint distributions.

- Focus on fixed temporal dependence: the relationship between observations at different times, and in turn the number of lag observations provided as input, must be diagnosed and specified.
- Focus on univariate data: many real-world problems have multiple input variables. 
- Focus on one-step forecasts: many real-world problems require forecasts with a long
time horizon.

Source: Deep Learning for Time-Series Analysis, 2017.

# **Using time series data in CNNs and LSTMs**

CNN's convolutions are popularly known to work on spatial or 2D data. <br>

There are also convolutions for 1D data . This allows a CNN to be used with texts and other time series data. Instead of extracting spatial information, you use 1D convolutions to extract information along the time dimension

**Step 1: Convert the data to three dimensions**

To use data in a CNN or LSTM, we will need to convert the data to three dimensional data.<br>

The input to every CNN and LSTM layer must be three-dimensional.<br>

In [None]:
from IPython.display import Image
Image("timeseries.png" , width=640)

There's also conv3D operations which applies spatial convolutions over volumes.<br>
Conv3D is useful for sequence of images like MRI scans or videos

In [None]:
from numpy import array

A supervised learning algorithm requires that data is provided as a collection of samples,
where each sample has:<br>
- an input component (X)<br>
- an output component (y).

**In this notebook we'll transform 2D data to 3D data**

**Define a function to split the data into samples**

In [None]:
def split_sequence(sequence, n_steps): 
  X, y = list(), list()
  for i in range(len(sequence)):
    # find the end of this pattern
    end_ix = i + n_steps
    # check if we are beyond the sequence
    if end_ix > len(sequence)-1: 
      break
    # gather input and output parts of the pattern
    seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] 
    X.append(seq_x)
    y.append(seq_y)
  return array(X), array(y)

**The data**

Begin with 1 dimensional data

In [None]:
series = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) 
print(series.shape)

Our 10-step univariate series can be expressed as a supervised learning problem with three time steps for input and one step as output, as follows:<br>
X_______y<br>
[1, 2, 3], [4]<br>
[2, 3, 4], [5]<br>
[3, 4, 5], [6]<br>
...<br>

Split the dataset into series of length 3

In [None]:
X, y = split_sequence(series, 3)
print("data shape",X.shape,"label shape=", y.shape)

In [None]:
X

In [None]:
X = X.reshape((X.shape[0], X.shape[1], 1))
print(X.shape)

In [None]:
X

In [None]:
y

**Assignment**<br>
Add more data to the sample dataset and split it into samples

**Another example**

In [None]:
# define univariate time series
series = array([32, 44, 65, 4, 76, 23, 71, 1, 94, 101]) 
print(series.shape)

In [None]:
# transform to a supervised learning problem
X, y = split_sequence(series, 5)
print(X.shape, y.shape) # show each sample

In [None]:
for i in range(len(X)):
  print(X[i], y[i])

Preparing time series data for CNNs and LSTMs requires one additional step beyond transforming the data into a supervised learning problem

*The input layer* for CNN and LSTM models is specified by the *input shape argument* on the first hidden layer of the network.

# LSTM without an input layer<br>
...<br>
model = Sequential() <br>
model.add(LSTM(32)) <br>
model.add(Dense(1))<br>

This LSTM() layer specifies the shape of the input data. 
The input to every CNN and LSTM layer must be three-dimensional
The three dimensions are: <br>
>Samples<br>
Time Steps<br>
Features<br>

We use the notation: [samples, timesteps, features]<br>

# LLSTM with an input layer

When defining the input layer of your LSTM network, the network assumes you have one or more samples and requires that you specify the number of time steps and the number of features. You can do this by specifying a tuple to the input shape argument.

...<br>
model = Sequential()<br>
model.add(LSTM(32, input_shape=(3, 1))) <br>
model.add(Dense(1))<br>

For example, if we have 7 samples and 3 time steps per sample for the input element of our time series, we can reshape it into [7, 3, 1] by providing a tuple to the reshape() function specifying the desired new shape of (7, 3, 1). <br>

The array must have enough data to support the new shape, which in this case it does as [7, 3] and [7, 3, 1] are functionally the same thing.<br>


To transform input from [samples, features] to [samples, timesteps, features]<br>
...

X = X.reshape((7, 3, 1))

To transform input from [samples, features] to [samples, timesteps, features]


...
X = X.reshape((X.shape[0], X.shape[1], 1))

# **Transform a univariate 2d to 3d**


In [None]:
# define univariate time series
series = array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) 
print(series.shape)

In [None]:
# transform to a supervised learning problem
X, y = split_sequence(series, 3)
print(X.shape, y.shape)

In [None]:
X, y = split_sequence(series, 3)
print(X.shape, y.shape)

In [None]:
# transform input from [samples, features] to [samples, timesteps, features] 
X = X.reshape((X.shape[0], X.shape[1], 1))
print(X.shape)

#7 samples, 3 time steps, 1 feature

**Assignment**<br>
Modify the following array to be 3 dimensional<br>
It should have <br>
- 500 samples<br>
- Time steps = 5

In [None]:
import numpy as np
randomSeries = np.random.randint(0,100, size=1000)
print(randomSeries.shape)

In [None]:
#@title
X, y = split_sequence(randomSeries, 500)
print(X.shape, y.shape)

In [None]:
#@title
X, y = split_sequence(randomSeries, 500)
print(X.shape, y.shape)

In [None]:
#@title
# transform input from [samples, features] to [samples, timesteps, features] 
X = X.reshape((X.shape[0], X.shape[1], 1))
print(X.shape)

#7 samples, 3 time steps, 1 feature