<a href="https://colab.research.google.com/github/SyedHasnat/CNNs-LSTMs-Time-Series/blob/main/To_Prepare_Time_Series_Data_for_CNNs_and_LSTMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Prepare Time Series Data for CNNs and LSTMs
## After understanding this notebook, you will know:
####  How to transform a time series dataset into a two-dimensional sup ervised learning format.
####  How to transform a two-dimensional time series dataset into a three-dimensional structure suitable for CNNs and LSTMs.
####  How to step through a worked example of splitting a very long time series into subsequencesready for training a CNN or                LSTM model
## This notebook is divided into three parts
### 1. Time Series to Supervised.
### 2. 3D Data Preparation Basics.
### 3. Univariate Worked Example.

# 1.Time Series to Supervised

In [None]:
import numpy as np

#### Define univariate time series

In [None]:
my_data = np.arange(10)+1
print(my_data)

[ 1  2  3  4  5  6  7  8  9 10]


### The above 10-step univariate series can be expressed as a supervised learning problem with three time steps for input and one step as output.

### Transform to a supervised learning problem 

### Transform input to [samples, features]

In [None]:
x=[]
y=[]
#or x,y=[],[]
#or x,y=list(),list()
for i in range(len(data)):
    endx_i = i+3
    if endx_i > len(data)-1:
        break
    seq_x = data[i:endx_i]
    seq_y = data[endx_i]
    x.append(seq_x)
    y.append(seq_y)
for i in range(len(x)):
    print(x[i],y[i])

[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10


## Lets generalized it means to make function which will handle this 

In [None]:
def supervised_data(length,step_size):
    data=np.arange(length)+1
    x=[]
    y=[]
    for i in range(len(data)):
        end_ix = i + step_size #generating the output like we say for [1 2 3] output should be 4 and so on
        if end_ix > len(data)-1:
            break
       # seq_x, seq_y = data[i:end_ix], data[end_ix]
        seq_x = data[i:end_ix]
        seq_y = data[end_ix]
        x.append(seq_x)
        y.append(seq_y)
    for i in range(len(x)):
        print(x[i], y[i])

In [None]:
supervised_data(10,3)

[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10


### Data in this form can be used directly to train a simple neural network, such as a Multilaye Perceptron. The difficulty for beginners comes when trying to prepare this data for CNNs and LSTMs that require data to have a three-dimensional structure instead of the two-dimensional structure described so far.

# 2. 3D Data Preparation Basics.

### The input to every CNN and LSTM layer must be three-dimensional. The three dimensions of this input are:
####  Samples. One sequence is one sample. A batch is comprised of one or more samples.
####  Time Steps. One time step is one point of observation in the sample. One sample is comprised of multiple time steps.
####  Features. One feature is one observation at a time step. One time step is comprised of one or more features.

In [None]:
from numpy import array 
def supervised_data1(length,step_size):
    data1=np.arange(length)+1
    x=[]
    y=[]
    for i in range(len(data1)):
        end_ix = i + step_size #generating the output like we say for [1 2 3] output should be 4 and so on
        if end_ix > len(data1)-1:
            break
       # seq_x, seq_y = data[i:end_ix], data[end_ix]
        seq_x = data1[i:end_ix]
        seq_y = data1[end_ix]
        x.append(seq_x)
        y.append(seq_y)
    for i in range(len(x)):
        print(x[i], y[i])

In [None]:
supervised_data1(10,3)

[1 2 3] 4
[2 3 4] 5
[3 4 5] 6
[4 5 6] 7
[5 6 7] 8
[6 7 8] 9
[7 8 9] 10


## Check type of x and y

In [None]:
type(x)

list

In [None]:
type(y)

list

In [None]:
#this will not work
x.shape

AttributeError: 'list' object has no attribute 'shape'

## If it is not working we can't reshape it to 3D like

In [None]:
X = x.reshape((x.shape[0], x.shape[1], 1))

AttributeError: 'list' object has no attribute 'reshape'

## To get rid of this problem lets change this list x and y to array

In [None]:
x=np.array(x)
type(x)

numpy.ndarray

In [None]:
#now it will work
x.shape

(7, 3)

In [None]:
y=np.array(y)
type(y)

numpy.ndarray

## Now try x.shape and y.shape it will work

In [None]:
print(y.shape)

(7,)


In [None]:
print(x.shape)

(7, 3)


## Our aim is to convert this (7,3) 2D array into (7,3,1) 3D means 
### To transform input from [samples, features] to [samples, timesteps, features]

In [None]:
X = x.reshape((x.shape[0], x.shape[1], 1))
print(X.shape)

(7, 3, 1)


In [None]:
print(X)

[[[1]
  [2]
  [3]]

 [[2]
  [3]
  [4]]

 [[3]
  [4]
  [5]]

 [[4]
  [5]
  [6]]

 [[5]
  [6]
  [7]]

 [[6]
  [7]
  [8]]

 [[7]
  [8]
  [9]]]


In [None]:
x.size

21

# Data Preparation

### Consider that you are in the situation:
### I have two columns in my data file with 5,000 rows, column 1 is time (with 1 hour interval) and column 2 is the number of sales and I am trying to forecast the number of sales for future time steps. Help me to set the number of samples, time steps and features in this data for an LSTM? 

## There are few problems here:
###  Data Shap e. LSTMs expect 3D input, and it can be challenging to get your head around this the first time.
###  Sequence Length. LSTMs don’t like sequences of more than 200-400 time steps, so the data will need to be split into subsamples.
## We will work through this example, broken down into the following 4 steps:
### 1. Load the Data
### 2. Drop the Time Column
### 3. Split Into Samples
### 4. Reshape Subsequences

# 1. Lets make this Dataset

In [None]:
# example of defining a dataset
import numpy as np
# define the dataset
data=[]
#for 5000 rows
n = 5000 
for i in range(n):
    #interval is 1 hour and i starts from zero thats why adding i+1 to exclude 0
    # comma , mean the next column as we havw 2 coulmn
    data.append([i+1, (i+1)*10]) 
#this data will of type list, lets confirm
print('data is of type: ', type(data))
    #to convert it into array
   # data = array(data)


data is of type:  <class 'list'>


In [None]:
#print from row 1 to 5 and all columns
print(data[:5, :]) 
#it will give error

TypeError: list indices must be integers or slices, not tuple

# Lets change this list to array

In [None]:
data = np.array(data)
print('data is of type: ', type(data))

data is of type:  <class 'numpy.ndarray'>


# Now it will work

In [None]:
#print from row 1 to 5 and all columns
print(data[:5, :]) 

[[ 1 10]
 [ 2 20]
 [ 3 30]
 [ 4 40]
 [ 5 50]]


In [None]:
print(data.shape)

(5000, 2)


# I want to export this data to csv file

In [None]:
import pandas as pd
sales_dataset = pd.DataFrame(data)

In [None]:
sales_dataset.head(10)

Unnamed: 0,0,1
0,1,10
1,2,20
2,3,30
3,4,40
4,5,50
5,6,60
6,7,70
7,8,80
8,9,90
9,10,100


In [None]:
my_file = open('sales_dataset.csv','x')
data_to_csv.to_csv('sales_dataset', index=False, header=False) 
#index= false zaka likam chi beya warsara pa file ki index razi like row 0 row 1 row to and so on
#header=false zaka likam chi beya warsara pa file ki index razi like column 0 column and so on 

# 2. Drop the Time Column

In [None]:

# drop time means extract time column 
data = data[:, 1]
print(data.shape)

(5000,)


# 3. Split Into Samples

In [None]:
# split into samples (e. g. 5000/200 = 25)
samples = list()
length = 200
# step over the 5,000 in jumps of 200
for i in range(0,n,length):
    # grab from i to i + 200
    sample = data[i:i+length]
    samples.append(sample)
print(len(samples))

25


# 4. Reshape Subsequences

In [None]:
# convert list of arrays into 2d array
data = np.array(samples)
print(data.shape)

(25, 200)


In [None]:
# reshape into [samples, timesteps, features]
data = data.reshape((len(samples), length, 1))
print(data.shape)

(25, 200, 1)


# And that is it. The data can now be used as an input (X) to an LSTM model, or even a CNN model.

# Lets Combine the code

In [None]:
# example of creating a 3d array of subsequences
from numpy import array
# define the dataset
data1 = list()
n = 5000
for i in range(n):
    data1.append([i+1, (i+1)*10])
data1 = array(data1)
#___________________________________________________________________________________________________
# drop time
data1 = data1[:, 1]
#___________________________________________________________________________________________________
# split into samples (e. g. 5000/200 = 25)
samples = list()
length = 200
# step over the 5,000 in jumps of 200
for i in range(0,n,length):
    # grab from i to i + 200
    sample = data1[i:i+length]
    samples.append(sample)
# convert list of arrays into 2d array
data1 = array(samples)
#___________________________________________________________________________________________________
# reshape into [samples, timesteps, features]
data1 = data1.reshape((len(samples), length, 1))
print(data1.shape)
#___________________________________________________________________________________________________


(25, 200, 1)


In [None]:
type(data1)

numpy.ndarray

# How to export this data 3D to a csv file below if not working once i have exported 2D data to a csv file you can go through that portion

In [None]:
import pandas as pd
input_dataset_to_CNN_and_LSTM = pd.DataFrame(data1)
my_file = open('input_dataset_to_CNN_and_LSTM.csv','x')
tt.to_csv('input_dataset_to_CNN_and_LSTM', index=False, header=False) 

ValueError: Must pass 2-d input. shape=(25, 200, 1)