In [1]:
import tensorflow as tf
import keras

# 1 Practice with tf.data.Dataset
## 1.1 Commonly used operations
In this exercise we will practice some commonly used operations of tf.data.Dataset
objects.  
We start with a (small) Dataset object where each item contains 4 elements.  
This yields  
```
tf.Tensor([ 0 1 2 100], shape=(4,), dtype=int32)
tf.Tensor([ 1 2 3 101], shape=(4,), dtype=int32)
```


In [None]:
# Create list of tuples to be converted into Dataset object
data = [(i, i+1, i+2, 100 + i) for i in range(20)]
ds = tf.data.Dataset.from_tensor_slices(data)
for item in ds.take(2):
    print(item)

1. Create a Dataset where each item is a tuple (X, y), where y is the last
element of each item in ds.  
This should yield:
```
(<tf.Tensor: shape=(3,), dtype=int32, numpy=array([0, 1, 2], dtype=int32)>, <tf.Tensor: shape=(), dtype=int32, numpy=100>)
(<tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 2, 3], dtype=int32)>, <tf.Tensor: shape=(), dtype=int32, numpy=101>)
```


In [None]:
ds1 = # YOUR CODE HERE
for item in ds1.take(2):
    print(item)

2. Starting from ds only keep those items where the sum of the first three
elements is a multiple of 4.   Use methods from keras.ops to achieve this.  
This should yield:
```
tf.Tensor([  3   4   5 103], shape=(4,), dtype=int32)
tf.Tensor([  7   8   9 107], shape=(4,), dtype=int32)
tf.Tensor([ 11  12  13 111], shape=(4,), dtype=int32)
tf.Tensor([ 15  16  17 115], shape=(4,), dtype=int32)
tf.Tensor([ 19  20  21 119], shape=(4,), dtype=int32)
```

In [None]:
ds2 = ds. # YOUR CODE HERE
for item in ds2:
    print(item)

3. (a) Write a method normalise(x) that takes a batch of elements x
as input where x has shape (batch_size, num_features). The
return value is such that each feature (i.e. each column of x) has
mean zero and standard deviation equal to one. Use methods from
keras.ops.


In [5]:
def normalise(x):
  # YOUR CODE HERE

3. (b) Transform the original dataset ds by taking the following steps:   
* Cast the tensors to tf.float32 tensors.
* Shuffle the dataset. Make sure that the buffer size is large enough
to shuffle the complete dataset, and use seed=42 for reproducibility.
* Create batches of exactly 8 elements each.
* Normalise each batch using the method normalise you just
wrote.  

This should yield:
```
tf.Tensor(
[[-0.47733918 -0.47733918 -0.47733918 -0.47733918]
 [ 1.2028948   1.2028948   1.2028948   1.2028948 ]
 ...
 [-0.9355848  -0.9355848  -0.9355848  -0.9355848 ]], shape=(8, 4), dtype=float32)
tf.Tensor(
[[-0.25607374 -0.25607374 -0.25607374 -0.25607374]
 [ 1.5364425   1.5364425   1.5364425   1.5364425 ]
 ...
 [ 1.2803687   1.2803687   1.2803687   1.2803687 ]], shape=(8, 4), dtype=float32)
```

In [None]:
ds3 = # YOUR CODE HERE
for item in ds3:
  print(item)

## 1.2 A Dataset of Datasets
Write a small method create_ds to create a Dataset object from any object.
The Dataset consists of the input object repeated five times.  
This should yield:
```
tf.Tensor(b'test', shape=(), dtype=string)
tf.Tensor(b'test', shape=(), dtype=string)
tf.Tensor(b'test', shape=(), dtype=string)
tf.Tensor(b'test', shape=(), dtype=string)
tf.Tensor(b'test', shape=(), dtype=string)
```

In [None]:
create_ds = # YOUR CODE HERE
for item in create_ds("test"):
    print(item)


Start with a Dataset consisting of [1,2,...,10].   Apply create_ds to each
element of this Dataset. You get a Dataset where each item itself is a Dataset!
Make sure you understand the output of the following code:

In [None]:
ds_of_ds = tf.data.Dataset.from_tensor_slices([i for i in range(1,11)]).map(create_ds)
for ds in ds_of_ds:
  print(f"{type(ds)}, length = {len(ds)}")
  for item in ds:
    print(item.numpy(), end=" ")
  print()

If you apply flat_map instead of map, the elements of the inner Dataset are
strung together to form a Dataset of integers.

In [None]:
flat_ds = (tf.data.Dataset.from_tensor_slices(
    [i for i in range(1,11)]).flat_map(create_ds))
# YOUR CODE HERE TO PRINT THE CONTENTS OF THIS DATASET

Finally, replace flat_map(create_ds) by
interleave(create_ds, cycle_length=2) and observe the difference in output!