# Tf Series Dataset preparation
In this notebook, we will use tensorflow library to create <br>
Training and validation dataset, instead of doing this manually.

In [3]:
import tensorflow as tf

## Creating Dataset

In [4]:
dataset = tf.data.Dataset.range(10)
print(dataset)

for data in dataset:
    print(data.numpy())

<_RangeDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
0
1
2
3
4
5
6
7
8
9


## Windowing Dataset

In [5]:
WINDOW_SIZE = 5
STEP = 1

windowed_dataset = tf.data.Dataset.window(dataset, WINDOW_SIZE, STEP)
for data in windowed_dataset:
    print(data)

<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
<_VariantDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>


To print the data, iterate over it. Since each element is a array here

In [6]:
for data in windowed_dataset:
    print([x.numpy() for x in data ])

[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]
[6, 7, 8, 9]
[7, 8, 9]
[8, 9]
[9]


2024-01-14 14:08:50.972620: W tensorflow/core/framework/dataset.cc:959] Input of Window will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.


To throw away the unequal sized windows, ie. windows having less than 5 elements, use these functions

In [7]:
window_dataset = tf.data.Dataset.window(dataset, size=WINDOW_SIZE, shift=STEP, drop_remainder=True)

for data in window_dataset:
    print([x.numpy() for x in data])

[0, 1, 2, 3, 4]
[1, 2, 3, 4, 5]
[2, 3, 4, 5, 6]
[3, 4, 5, 6, 7]
[4, 5, 6, 7, 8]
[5, 6, 7, 8, 9]


To convert your window dataset to a Tensor batch to feed into our model , use this function below:

In [8]:
flat_dataset = window_dataset.flat_map(lambda x: x.batch(5))
for window in flat_dataset:
    # Each window here is a tensor
    print(window.numpy())

[0 1 2 3 4]
[1 2 3 4 5]
[2 3 4 5 6]
[3 4 5 6 7]
[4 5 6 7 8]
[5 6 7 8 9]


## Splitting dataset into features & labels

In [9]:
dataset = tf.data.Dataset.range(10)
w_dataset = tf.data.Dataset.window(dataset, WINDOW_SIZE, STEP, drop_remainder=True)

w_dataset_fm = w_dataset.flat_map(lambda w: w.batch(WINDOW_SIZE))
w_map = w_dataset_fm.map(lambda w_tensor: (w_tensor[:-1], w_tensor[-1]))

for sample in w_map:
    x, y = sample
    print(x.numpy(), y.numpy())

[0 1 2 3] 4
[1 2 3 4] 5
[2 3 4 5] 6
[3 4 5 6] 7
[4 5 6 7] 8
[5 6 7 8] 9


Shuffle this batch

In [10]:
w_shuffle_map = w_map.shuffle(10)
for x, y in w_shuffle_map:
    print(x.numpy(), y.numpy())

[1 2 3 4] 5
[0 1 2 3] 4
[2 3 4 5] 6
[3 4 5 6] 7
[5 6 7 8] 9
[4 5 6 7] 8


We can add another sample to each batch for prefetching. <br>
The dataset is generated at run-time by tf. This is done to<br> 
reduce runtime-memory consumption by our model. Model only <br>
consumes one sample, but prefetching one sample beforehand <br>
reduces time taken to load the dataset.

In [16]:
prepared_dataset = w_shuffle_map.batch(2).prefetch(1)

for (x, y) in prepared_dataset:
    print(x.numpy(), y.numpy())

[[4 5 6 7]
 [0 1 2 3]] [8 4]
[[5 6 7 8]
 [1 2 3 4]] [9 5]
[[2 3 4 5]
 [3 4 5 6]] [6 7]
