**Objective:** The _notebook_ presents the method to prepare the features and labels for a _time series machine learning model_ using the `tensorflow` library. Typically, for a time series problem the features and the output is a sequence of values, where features are, for example, like $x_0, x_1, ..., x_{n-1}$ and output is $y = x_n$ for one time step. This can be developed as:

In [30]:
import numpy as np

In [1]:
import tensorflow as tf
print(f"Tensorflow Version: {tf.__version__}")

tf.config.list_physical_devices()

Tensorflow Version: 2.9.0


[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [23]:
dataset = tf.data.Dataset.range(10) # create a tf with elements 0 to 10
type(dataset), [(x.numpy(), type(x)) for x in dataset]

(tensorflow.python.data.ops.dataset_ops.RangeDataset,
 [(0, tensorflow.python.framework.ops.EagerTensor),
  (1, tensorflow.python.framework.ops.EagerTensor),
  (2, tensorflow.python.framework.ops.EagerTensor),
  (3, tensorflow.python.framework.ops.EagerTensor),
  (4, tensorflow.python.framework.ops.EagerTensor),
  (5, tensorflow.python.framework.ops.EagerTensor),
  (6, tensorflow.python.framework.ops.EagerTensor),
  (7, tensorflow.python.framework.ops.EagerTensor),
  (8, tensorflow.python.framework.ops.EagerTensor),
  (9, tensorflow.python.framework.ops.EagerTensor)])

In [24]:
dataset = dataset.window(5, shift = 1, drop_remainder = True)

In [25]:
for window in dataset:
    for array in window:
        print(array.numpy(), end = " ")
    print()

0 1 2 3 4 
1 2 3 4 5 
2 3 4 5 6 
3 4 5 6 7 
4 5 6 7 8 
5 6 7 8 9 


In [26]:
dataset = dataset.flat_map(lambda window : window.batch(5)) # collate elements
[window.numpy() for window in dataset]

[array([0, 1, 2, 3, 4], dtype=int64),
 array([1, 2, 3, 4, 5], dtype=int64),
 array([2, 3, 4, 5, 6], dtype=int64),
 array([3, 4, 5, 6, 7], dtype=int64),
 array([4, 5, 6, 7, 8], dtype=int64),
 array([5, 6, 7, 8, 9], dtype=int64)]

In [27]:
# now map the corresponding `x` and `y` values
dataset = dataset.map(lambda window : (window[:-1], window[-1]))

for x, y in dataset:
    print(x.numpy(), y.numpy())

[0 1 2 3] 4
[1 2 3 4] 5
[2 3 4 5] 6
[3 4 5 6] 7
[4 5 6 7] 8
[5 6 7 8] 9


In [28]:
# shuffle the dataset using the `.shuffle`
dataset = dataset.shuffle(buffer_size = 10)

for x, y in dataset:
    print(x.numpy(), y.numpy())

[1 2 3 4] 5
[5 6 7 8] 9
[4 5 6 7] 8
[0 1 2 3] 4
[3 4 5 6] 7
[2 3 4 5] 6


In [29]:
# batch the data using `.batch` followed by `.prefetch`
dataset = dataset.batch(2).prefetch(1)

for x, y in dataset:
    print(f"<x = {x.numpy()}, y = {y.numpy()}>")

<x = [[5 6 7 8]
 [2 3 4 5]], y = [9 6]>
<x = [[4 5 6 7]
 [3 4 5 6]], y = [8 7]>
<x = [[0 1 2 3]
 [1 2 3 4]], y = [4 5]>


Combining the above knowledge, let's create a general dynamic function that can be used for all type of analysis.

In [39]:
def create_xy(series : np.ndarray, window_size : int, batch_size : int, shuffle_buffer : int, **kwargs):
    """
    Process a `np.ndarray` into XY Lables for AI-ML Time Series Analysis
    
    Consider a 1D-Array the data splits and process it into `xy` where `x` is
    the number of features set by the "window" element and `y` is the last
    element for each data split. For detailed methodlogies check each
    steps as dicussed above.
    """
    
    dataset = tf.data.Dataset.from_tensor_slices(series)
    dataset = dataset.window(window_size + 1, shift = kwargs.get("shift", 1), drop_remainder = True)
    dataset = dataset.flat_map(lambda window : window.batch(window_size + 1))
    dataset = dataset.shuffle(shuffle_buffer).map(lambda window : (window[:-1], window[-1]))
    
    return dataset.batch(batch_size).prefetch(1)

In [46]:
WINDOW, BATCH, SHUFFLE = 5, 3, 20

In [47]:
series = np.random.randint(0, 101, size = 30)
dataset = create_xy(series, window_size = WINDOW, batch_size = BATCH, shuffle_buffer = SHUFFLE)

for x, y in dataset:
    print(f"<x = {x.numpy()}, y = {y.numpy()}>")

<x = [[79 67 30 30  9]
 [14 66 76 16  9]
 [17 39 80 53 79]], y = [18  4 67]>
<x = [[72 34 66 44 14]
 [18  3 44 43 57]
 [16  9  4 17 39]], y = [66 97 80]>
<x = [[50  7 72 34 66]
 [76 16  9  4 17]
 [39 80 53 79 67]], y = [44 39 30]>
<x = [[66 44 14 66 76]
 [34 66 44 14 66]
 [ 9 18  3 44 43]], y = [16 76 57]>
<x = [[30 30  9 18  3]
 [44 43 57 97 12]
 [ 3 44 43 57 97]], y = [44 14 12]>
<x = [[53 79 67 30 30]
 [30  9 18  3 44]
 [80 53 79 67 30]], y = [ 9 43 30]>
<x = [[67 30 30  9 18]
 [ 7 72 34 66 44]
 [ 4 17 39 80 53]], y = [ 3 14 79]>
<x = [[66 76 16  9  4]
 [ 9  4 17 39 80]
 [84 50  7 72 34]], y = [17 53 66]>
<x = [[44 14 66 76 16]], y = [9]>
