# Working with datasets in Tensorflow
## The tf.keras.datasets vs the Tensorflow Datasets
This is intended as an introductory ML class on datasets (to be merged with datagenerators)
Daniel Trad, using chatGPT.

The tf.keras.datasets module provides access to a number of public datasets as tf.data.Dataset objects, which are easy to use with tf.keras models. These datasets are small and well-understood, and are useful for testing and debugging.

tfds (TensorFlow Datasets) is a collection of datasets ready to use with TensorFlow. It includes a wide range of datasets for various tasks such as object detection, language translation, and recommendation systems. The datasets provided by tfds are typically larger and more complex than those in tf.keras.datasets. They are also well-documented and include detailed information about the data, such as the number of classes, the format of the data, and how the data was collected and preprocessed. Additionally, tfds includes tools for loading, preprocessing, and manipulating the data, making it easier to work with large and complex datasets.

Here is a simple example of how you can create a deep neural network (DNN) in tf.keras for the MNIST dataset using tf.keras.datasets:

In [2]:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize the pixel values
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the model
inputs = tf.keras.layers.Input(shape=(28, 28))
x = Flatten()(inputs)
x = Dense(128, activation='relu')(x)
x = Dense(128, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate the model
model.evaluate(x_test, y_test)


2023-01-06 08:57:50.570562: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-06 08:57:51.742583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10226 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:09:00.0, compute capability: 8.6
2023-01-06 08:57:51.743118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 5859 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:05:00.0, compute capability: 7.5
2023-01-06 08:57:52.181692: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR 

Epoch 1/5
  29/1875 [..............................] - ETA: 3s - loss: 1.6431 - accuracy: 0.5603   

2023-01-06 08:57:53.570769: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.07693281024694443, 0.9782999753952026]

This code will create a simple DNN with two hidden layers (128 units each) and an output layer with 10 units, corresponding to the 10 classes in the MNIST dataset. The model is then compiled using the adam optimizer and the sparse_categorical_crossentropy loss function, and is trained using the training data for 5 epochs. Finally, the model is evaluated on the test data.

Here is a simple example of how you can create a deep neural network (DNN) in tf.keras for the MNIST dataset using tfds:


In [3]:
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model

# Load the MNIST dataset
ds, info = tfds.load('mnist', split=['train', 'test'], as_supervised=True)

# Preprocess the data
def preprocess(image, label):
  image = tf.cast(image, tf.float32) / 255.0
  return image, label

ds = ds.map(preprocess)
ds = ds.batch(32)
ds = ds.prefetch(tf.data.experimental.AUTOTUNE)

# Build the model
inputs = tf.keras.layers.Input(shape=(28, 28))
x = Flatten()(inputs)
x = Dense(128, activation='relu')(x)
x = Dense(128, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
model = Model(inputs=inputs, outputs=predictions)

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(ds, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f40ba074700>

This code will create a simple DNN with two hidden layers (128 units each) and an output layer with 10 units, corresponding to the 10 classes in the MNIST dataset. The data is loaded using tfds.load and preprocessed by normalizing the pixel values and batching the data. The model is then compiled using the adam optimizer and the sparse_categorical_crossentropy loss function, and is trained using the training data for 5 epochs. Note that the test data is not used in this example.

To use the test data to evaluate the model, you can use the ds_test dataset created in the same way as ds, and call model.evaluate on it.