# data generator examples

A generator function is a special type of function that returns an iterator, which is an object that produces a sequence of values when iterated over.

Here is an example of a generator function that produces a sequence of numbers from 0 to n-1, where n is a parameter passed to the function:

In [12]:
def my_range(n):
    i = 0
    while i < n:
        yield i
        i += 1

for i in my_range(5):
    print(i)


0
1
2
3
4


This generator function can be used like any other iterable, such as a list or a tuple. The difference is that a generator function only produces the values on demand, rather than creating a whole list or tuple in memory. This can be useful when working with large datasets that do not fit in memory, as you can process the data one batch at a time.

To create a data generator for a machine learning model, you can use the generator function to yield batches of data for training or evaluation. For example, you can define a generator function that reads data from a file or a database, processes it, and yields it in the appropriate format for your model.



Let us create an example of the MNIST classification. First use data, then use data generator.

In [3]:
#!pip install tensorflow

import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras import Model


In [4]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


In [5]:
class MyModel(Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.flatten = Flatten()
    self.d1 = Dense(128, activation='relu')
    self.d2 = Dense(10, activation='softmax')

  def call(self, x):
    x = self.flatten(x)
    x = self.d1(x)
    return self.d2(x)

model = MyModel()


2022-12-25 17:52:58.973989: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-25 17:53:00.430766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10226 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:09:00.0, compute capability: 8.6
2022-12-25 17:53:00.431323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 5849 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 2070 SUPER, pci bus id: 0000:05:00.0, compute capability: 7.5


In [6]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy'])


In [7]:
model.fit(x_train, y_train, epochs=5)


2022-12-25 17:53:33.215453: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/5
  73/1875 [>.............................] - ETA: 2s - loss: 1.0271 - accuracy: 0.7256 

2022-12-25 17:53:34.740913: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f422c39af10>

In [8]:
model.evaluate(x_test, y_test)




[0.07671114802360535, 0.9767000079154968]

Now, this example above used the data in memory. 
To switch to a generator we have to create a data_generator function to output the data in batches.
Here is an example of a data generator for a machine learning model that uses the Python yield keyword to produce batches of data:

In [9]:
def data_generator(batch_size):
  (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
  x_train, x_test = x_train / 255.0, x_test / 255.0
  while True:
    for i in range(0, len(x_train), batch_size):
      x = x_train[i:i + batch_size]
      y = y_train[i:i + batch_size]
      yield x, y

batch_size = 32
train_generator = data_generator(batch_size)


In [10]:
model.fit_generator(train_generator, epochs=5, steps_per_epoch=len(x_train) // batch_size)




Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f422c15d4c0>

In [11]:
model.evaluate(x_test, y_test)



[0.09495186805725098, 0.9768000245094299]