## Data Preparation Example:

### Consider that you are in the current situation:
I have two columns in my data file with 5,000 rows, column 1 is time (with 1 hour
interval) and column 2 is the number of sales and I am trying to forecast the number
of sales for future time steps. Help me to set the number of samples, time steps and
features in this data for an LSTM?

There are few problems here:

* Data Shape. LSTMs expect 3D input, and it can be challenging to get your head around this the first time.
* Sequence Length. LSTMs don’t like sequences of more than 200-400 time steps, so the data will need to be split into subsamples.

We will work through this example, broken down into the following 4 steps:
1. Load the Data
2. Drop the Time Column
3. Split Into Samples
4. Reshape Subsequences

### 1. Load the Data

For this example, we will mock loading by defining a new dataset in memory with 5,000
time steps.

In [3]:
from numpy import array

# define the dataset
data = list()
n = 5000

for i in range(n):
    data.append([i+1, (i+1)*10])
data = array(data)

Running this piece both prints the first 5 rows of data and the shape of the loaded data. We
can see we have 5,000 rows and 2 columns: a standard univariate time series dataset.

In [4]:
print(data[:5, :])

[[ 1 10]
 [ 2 20]
 [ 3 30]
 [ 4 40]
 [ 5 50]]


In [5]:
print(data.shape)

(5000, 2)
