**Welcome to Deep Learning with Keras and TensorFlow in Python**

**Presented by: Reza Saadatyar (2024-2025)**<br/>
**E-mail: Reza.Saadatyar@outlook.com**<br/>
**[GitHub](https://github.com/RezaSaadatyar/Deep-Learning-in-python)**

**Outline:**<br/>

**Extract, Transform, Load (ETL) pipeline:**<br/>
▪ `Extract:` Data is gathered from various sources Cloud (e.g., Google Cloud Storage, AWS S3, or Azure Blob Storage), Databases (e.g., MySQL, PostgreSQL), and Local File System (this might include CSV files, JSON files, or other raw data stored locally).<br/>
▪ `Transform:` Data is processed, cleaned, or reformatted to make it suitable for analysis or model training. Common transformations include: normalizing numerical data (e.g., scaling values between 0 and 1), encoding categorical data (e.g., one-hot encoding), handling missing values, and resizing images or tokenizing text (if working with image or NLP datasets).<br/>
▪ `Load:` The transformed data is loaded into a target system, such as a device or storage for further use.<br/>

`tf.data` a TensorFlow API, streamlines loading, preprocessing, and feeding data into models. It excels with large datasets, supporting streaming and parallel processing for efficiency. 

**Key tf.data methods for extraction:**<br/>
▪ `tf.data.Dataset.from_tensor_slices():` Create a dataset from in-memory tensors (e.g., NumPy arrays).<br/>
▪ `tf.data.TextLineDataset:` Load text files line by line (e.g., for CSVs or raw text).<br/>
▪ `tf.data.TFRecordDataset:` Load data stored in TFRecord format, which is optimized for TensorFlow.<br/>
▪ `tf.keras.utils.image_dataset_from_directory(): `Load image datasets directly from a directory structure (useful for image classification tasks).<br/>

**Key tf.data methods for transformation:**<br/>
▪ `dataset.map():` Apply a transformation function to each element.<br/>
▪ `dataset.filter():` Filter out elements based on a condition.<br/>
▪ `dataset.shuffle():` Randomize the dataset.<br/>
▪ `dataset.batch():` Group elements into batches.<br/>


▪ <br/>
▪ <br/>
▪ <br/>
▪ <br/>
▪ <br/>
▪ <br/>



<font color='#FF000e' size="4.5" face="Arial"><b>Import modules</b></font>

In [None]:
import pprint
import numpy as np
import tensorflow as tf

In [4]:
# Create a NumPy array with the given values
x = np.array([8, 3, 20, -1, 0, 1])

# Create a TensorFlow Dataset from the NumPy array using tf.data.Dataset.from_tensor_slices
# This creates a dataset where each element is a slice of the input array
dataset = tf.data.Dataset.from_tensor_slices(x)

# The dataset is now ready for iteration or further processing
dataset, x

(<_TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>,
 array([ 8,  3, 20, -1,  0,  1]))

In [13]:
# Iterate over the dataset and print each element along with its index
for ind, tensor in enumerate(dataset):
    print(f"{ind} → {tensor = }")

0 → tensor = <tf.Tensor: shape=(), dtype=int64, numpy=8>
1 → tensor = <tf.Tensor: shape=(), dtype=int64, numpy=3>
2 → tensor = <tf.Tensor: shape=(), dtype=int64, numpy=20>
3 → tensor = <tf.Tensor: shape=(), dtype=int64, numpy=-1>
4 → tensor = <tf.Tensor: shape=(), dtype=int64, numpy=0>
5 → tensor = <tf.Tensor: shape=(), dtype=int64, numpy=1>


In [None]:
# Inspect the element specification of the dataset
dataset.element_spec

TensorSpec(shape=(), dtype=tf.int64, name=None)

In [19]:
# Create a 2D tensor with random uniform values (shape [100, 5])
x = tf.random.uniform([100, 5])

# Create a TensorFlow Dataset from the 2D tensor using tf.data.Dataset.from_tensor_slices
tf_dataset = tf.data.Dataset.from_tensor_slices(x)

# Create a 1D tensor with random uniform integer values (shape [100]) ranging from 0 to 1
y = tf.random.uniform([100], maxval=2, dtype=tf.int32)

# Create a TensorFlow Dataset from a tuple of tensors (x, y) using tf.data.Dataset.from_tensor_slices
dataset = tf.data.Dataset.from_tensor_slices((x, y))

# Inspect the element specification of the dataset
dataset.element_spec

(TensorSpec(shape=(5,), dtype=tf.float32, name=None),
 TensorSpec(shape=(), dtype=tf.int32, name=None))

In [20]:
ds = tf.data.Dataset.zip((tf_dataset, y))

TypeError: Invalid input to `zip`. Inputs are expected to be (nested) structures of `tf.data.Dataset` objects but encountered object of type <class 'tensorflow.python.framework.ops.EagerTensor'>.