<div style="text-align:left;">
  <a href="https://code213.tech/" target="_blank">
    <img src="code213.PNG" alt="Code213 Logo" width="200"/>
  </a>
  <p><em>Prepared by Latreche Sara</em></p>
</div>


# 5.0 — Data Handling (`tf.data.Dataset`)
<img src="https://www.tensorflow.org/images/tf_logo_social.png" alt="TensorFlow Logo" width="200"/>

**Why use `tf.data.Dataset`?**  

- Efficiently **load and preprocess data** for training and evaluation  
- Handle **large datasets** that do not fit into memory  
- Provide **pipelines with batching, shuffling, mapping, and prefetching**  
- Fully compatible with TensorFlow models (`model.fit`) and custom training loops
## Table of Contents  

- [1 - Packages](#1)  
- [2 - Outline of the Notebook](#2)  
- [3 - Creating a Dataset](#3)  
  - [3.1 - From Python lists/arrays](#3-1)  
  - [3.2 - From TensorFlow tensors](#3-2)  
- [4 - Transforming Datasets](#4)  
  - [4.1 - Mapping functions](#4-1)  
  - [4.2 - Shuffling and batching](#4-2)  
  - [4.3 - Prefetching](#4-3)  
- [5 - Iterating Over a Dataset](#5)  
- [6 - Using Dataset with Model](#6)  
- [7 - Exercises](#7)


## 1 - Packages <a name="1"></a>


In [1]:
import tensorflow as tf
import numpy as np


## 2 - Outline of the Notebook <a name="2"></a>

This notebook covers:  

1. Creating datasets from Python lists or tensors  
2. Transforming datasets using `map`, `batch`, `shuffle`, `prefetch`  
3. Iterating through a dataset  
4. Using datasets directly with a TensorFlow model  
5. Exercises for practice


## 3 - Creating a Dataset <a name="3"></a>
### 3.1 - From Python lists/arrays <a name="3-1"></a>


In [2]:
# Example data
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Create dataset
dataset = tf.data.Dataset.from_tensor_slices((x, y))

# Print elements
for element in dataset:
    print(element)


(<tf.Tensor: shape=(), dtype=int64, numpy=1>, <tf.Tensor: shape=(), dtype=int64, numpy=2>)
(<tf.Tensor: shape=(), dtype=int64, numpy=2>, <tf.Tensor: shape=(), dtype=int64, numpy=4>)
(<tf.Tensor: shape=(), dtype=int64, numpy=3>, <tf.Tensor: shape=(), dtype=int64, numpy=6>)
(<tf.Tensor: shape=(), dtype=int64, numpy=4>, <tf.Tensor: shape=(), dtype=int64, numpy=8>)
(<tf.Tensor: shape=(), dtype=int64, numpy=5>, <tf.Tensor: shape=(), dtype=int64, numpy=10>)


### 3.2 - From TensorFlow tensors <a name="3-2"></a>


In [3]:
x_tf = tf.constant([10, 20, 30])
y_tf = tf.constant([100, 200, 300])

dataset_tf = tf.data.Dataset.from_tensor_slices((x_tf, y_tf))
for elem in dataset_tf:
    print(elem)


(<tf.Tensor: shape=(), dtype=int32, numpy=10>, <tf.Tensor: shape=(), dtype=int32, numpy=100>)
(<tf.Tensor: shape=(), dtype=int32, numpy=20>, <tf.Tensor: shape=(), dtype=int32, numpy=200>)
(<tf.Tensor: shape=(), dtype=int32, numpy=30>, <tf.Tensor: shape=(), dtype=int32, numpy=300>)


## 4 - Transforming Datasets <a name="4"></a>
### 4.1 - Mapping functions <a name="4-1"></a>


In [4]:
# Apply a function to each element
def square(x, y):
    return x**2, y**2

dataset_mapped = dataset.map(square)
for elem in dataset_mapped:
    print(elem)


(<tf.Tensor: shape=(), dtype=int64, numpy=1>, <tf.Tensor: shape=(), dtype=int64, numpy=4>)
(<tf.Tensor: shape=(), dtype=int64, numpy=4>, <tf.Tensor: shape=(), dtype=int64, numpy=16>)
(<tf.Tensor: shape=(), dtype=int64, numpy=9>, <tf.Tensor: shape=(), dtype=int64, numpy=36>)
(<tf.Tensor: shape=(), dtype=int64, numpy=16>, <tf.Tensor: shape=(), dtype=int64, numpy=64>)
(<tf.Tensor: shape=(), dtype=int64, numpy=25>, <tf.Tensor: shape=(), dtype=int64, numpy=100>)


### 4.2 - Shuffling and batching <a name="4-2"></a>


In [5]:
dataset_shuffled = dataset.shuffle(buffer_size=5).batch(2)
for batch in dataset_shuffled:
    print(batch)


(<tf.Tensor: shape=(2,), dtype=int64, numpy=array([2, 4])>, <tf.Tensor: shape=(2,), dtype=int64, numpy=array([4, 8])>)
(<tf.Tensor: shape=(2,), dtype=int64, numpy=array([5, 3])>, <tf.Tensor: shape=(2,), dtype=int64, numpy=array([10,  6])>)
(<tf.Tensor: shape=(1,), dtype=int64, numpy=array([1])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([2])>)


### 4.3 - Prefetching <a name="4-3"></a>


In [6]:
dataset_prefetch = dataset.batch(2).prefetch(tf.data.AUTOTUNE)
for batch in dataset_prefetch:
    print(batch)


(<tf.Tensor: shape=(2,), dtype=int64, numpy=array([1, 2])>, <tf.Tensor: shape=(2,), dtype=int64, numpy=array([2, 4])>)
(<tf.Tensor: shape=(2,), dtype=int64, numpy=array([3, 4])>, <tf.Tensor: shape=(2,), dtype=int64, numpy=array([6, 8])>)
(<tf.Tensor: shape=(1,), dtype=int64, numpy=array([5])>, <tf.Tensor: shape=(1,), dtype=int64, numpy=array([10])>)


## 5 - Iterating Over a Dataset <a name="5"></a>


In [7]:
for x_batch, y_batch in dataset_prefetch:
    print("x:", x_batch.numpy(), "y:", y_batch.numpy())


x: [1 2] y: [2 4]
x: [3 4] y: [6 8]
x: [5] y: [10]


## 6 - Using Dataset with Model <a name="6"></a>


In [8]:
# Build simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(1, input_shape=(1,))
])

model.compile(optimizer='sgd', loss='mse')

# Train using dataset
model.fit(dataset.batch(2), epochs=10, verbose=1)


Epoch 1/10


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - loss: 7.0645 
Epoch 2/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 0.7399
Epoch 3/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 0.0784
Epoch 4/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 0.0145  
Epoch 5/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 0.0099 
Epoch 6/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 0.0101 
Epoch 7/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 0.0102 
Epoch 8/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 0.0101 
Epoch 9/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 0.0099 
Epoch 10/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 0.0097 


<keras.src.callbacks.history.History at 0x277ff07f770>

## 7 - Exercises <a name="7"></a>

1. Create a dataset from x = [1,2,3,4,5,6] and y = [10,20,30,40,50,60].  
2. Shuffle the dataset and batch it with batch size 3.  
3. Map a function to multiply both x and y by 2.  
4. Use the dataset to train a model for 20 epochs to learn y = 2x.  
5. Experiment with `prefetch` and observe training performance.
