## Time Window 
Creating a function with time series as an input using the tensorflow dataset that we can load and use for our Tensorflow model. 

In [0]:
try:
  # if in colab.
  %tensorflow_version 2.x
except Exception:
  pass

import tensorflow as tf

We will first train a model to forecast the next step given the pervious 20 steps, therefore we need to create a dataset of 20-steps windows for training 

In [12]:
# We are creating a dataset that contains the intergers from 0 to 9 
dataset = tf.data.Dataset.range(10) 
for val in dataset:
  print(val.numpy()) # val gives all the tensors and using val.numpy() gives us the actual value.

0
1
2
3
4
5
6
7
8
9


In [13]:
dataset = tf.data.Dataset.range(10)
# calling the window method which requires window size 
dataset= dataset.window(5, shift=1) # specifing shift = 1 which means each window will be shifted 1 time comapred to the previous window
# each window in the dataset is actually a dataset 
# if we iterate over window then, we can actually iterate over through each element(value) in the window.
for window_dataset in dataset:
  for vals in window_dataset:
    print(vals.numpy(),end = " ")
  print( )

0 1 2 3 4 
1 2 3 4 5 
2 3 4 5 6 
3 4 5 6 7 
4 5 6 7 8 
5 6 7 8 9 
6 7 8 9 
7 8 9 
8 9 
9 


In [14]:
# specifying the drop reminder as true so that we get all the window of same size. 
#
dataset = tf.data.Dataset.range(10)
# calling the window method which requires window size 
dataset= dataset.window(5, shift=1, drop_remainder= True) # specifing shift = 1 which means each window will be shifted 1 time comapred to the previous window
# each window in the dataset is actually a dataset 
# if we iterate over window then, we can actually iterate over through each element(value) in the window.
for window_dataset in dataset:
  for vals in window_dataset:
    print(vals.numpy(),end = " ")
  print( )

0 1 2 3 4 
1 2 3 4 5 
2 3 4 5 6 
3 4 5 6 7 
4 5 6 7 8 
5 6 7 8 9 


In [15]:
dataset = tf.data.Dataset.range(10)
dataset= dataset.window(5, shift=1, drop_remainder= True)
# creating tensor of size(5) on the window using the `flat_method` method
dataset = dataset.flat_map(lambda window: window.batch(5))
for window in dataset:
  print(window.numpy())

[0 1 2 3 4]
[1 2 3 4 5]
[2 3 4 5 6]
[3 4 5 6 7]
[4 5 6 7 8]
[5 6 7 8 9]


> Consider the **first 4** elements in the window to be `input features` and the **last one** as the `output label`

In [16]:
dataset = tf.data.Dataset.range(10)
dataset= dataset.window(5, shift=1, drop_remainder= True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
for x,y in dataset:
  print(x.numpy(), y.numpy())

[0 1 2 3] [4]
[1 2 3 4] [5]
[2 3 4 5] [6]
[3 4 5 6] [7]
[4 5 6 7] [8]
[5 6 7 8] [9]


> We need to make sure that the instance of the dataset is shuffled. 
this is to ensure that they are independent and identically distributed or IID, which is necessary especially if you're using gradient descent, which is usually the case. 
To do this we will call the shuffle method and specify the buffer_size

In [17]:
dataset = tf.data.Dataset.range(10)
dataset= dataset.window(5, shift=1, drop_remainder= True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
dataset= dataset.shuffle(buffer_size=10)
for x,y in dataset:
  print(x.numpy(), y.numpy())

## Note: shuffling didn't occur in the element of a window instead it shuffled the window in the dataset.

[1 2 3 4] [5]
[3 4 5 6] [7]
[4 5 6 7] [8]
[5 6 7 8] [9]
[2 3 4 5] [6]
[0 1 2 3] [4]


> We then use the batch method to batches of 2 windows at each training iterations amd also called a prefetch method which will ensure that `Tensorflow` will load the nect batch of data while it's working on teh current batch of data.

> "This is done, so that it never runs out of data and the GPU is kept busy as much as possible."

In [18]:
dataset = tf.data.Dataset.range(10)
dataset= dataset.window(5, shift=1, drop_remainder= True)
dataset = dataset.flat_map(lambda window: window.batch(5))
dataset = dataset.map(lambda window: (window[:-1], window[-1:]))
dataset= dataset.shuffle(buffer_size=10)
dataset= dataset.batch(batch_size=2).prefetch(1)
for x,y in dataset:
  print("X= ",x.numpy())
  print("Y= ",y.numpy())

X=  [[4 5 6 7]
 [2 3 4 5]]
Y=  [[8]
 [6]]
X=  [[3 4 5 6]
 [0 1 2 3]]
Y=  [[7]
 [4]]
X=  [[5 6 7 8]
 [1 2 3 4]]
Y=  [[9]
 [5]]


 Now will wrap every thing in a function by giving it a time series, a window size, the batch size which defaults to 32, a shuffle buffer which defaults to 1000 and then just run it. 

In [0]:
def window_dataset(series, window_size, batch_size=32, shuffle_buffer=1000):
  dataset = tf.data.Dataset.from_tensor_slices(series)
  dataset = dataset.window(window_size+1, shift+1,drop_remainder=True)
  dataset = dataset.flat_map(lambda window: window.batch(window_siz+1))
  dataset = dataset.shuffle(shuffle_buffer)
  dataset= dataset.map(lambda window: (window[:-1], window[-1:]))
  dataset= dataset.batch(batch_size).prefetch(1)
  return dataset