<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Data-Generators" data-toc-modified-id="Data-Generators-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Data Generators</a></span></li><li><span><a href="#Generator's-function-explanation" data-toc-modified-id="Generator's-function-explanation-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Generator's function explanation</a></span><ul class="toc-item"><li><span><a href="#__init__" data-toc-modified-id="__init__-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span><code>__init__</code></a></span></li><li><span><a href="#on_epoch_end" data-toc-modified-id="on_epoch_end-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span><code>on_epoch_end</code></a></span></li><li><span><a href="#__data_generation" data-toc-modified-id="__data_generation-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span><code>__data_generation</code></a></span></li><li><span><a href="#__len__" data-toc-modified-id="__len__-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span><code>__len__</code></a></span></li><li><span><a href="#__getitem__" data-toc-modified-id="__getitem__-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span><code>__getitem__</code></a></span></li></ul></li><li><span><a href="#Using-it-in-Keras" data-toc-modified-id="Using-it-in-Keras-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Using it in Keras</a></span></li></ul></div>

# Data Generators

Sometimes the data can be too big to load into the memory all at once.For these scenarios we can use data generators.

For this we will be using the `keras.utils.Sequence` as our base class and override the functionality that we need

In [3]:
from tensorflow.keras.utils import Sequence
import numpy as np

In [2]:
class MyDataGenerator(Sequence):
    def __init__(self, id_list, labels, batch_size=32, dim=(32,32,32), n_channels=1, n_classess=10, shuffle=True):
        self.dim = dim
        self.batch_size=batch_size
        self.labels=labels
        self.list_IDs=id_list
        self.n_channels=n_channels
        self.n_classess=n_classess
        self.shuffle=shuffle
        self.on_epoch_end()
        
    def on_epoch_end(self):
        """Updates indexed after each epoch"""
        self.indexes = np.arrange(len(self.list_IDs))
        if self.shuffle == True:
            np.random.shuffle(self.indexes)
        
    def __data_generation(self, list_IDs_temp):
        """Generates data containing batch_size samples""" #X:(n_samples, *dim, n_channels)
        X = np.empty((self.batch_size, *self.dim, self.channels))
        y = np.empty((self.batch_size), dtype=int)
        
        for i, ID in enumerate(list_IDs_temp):
            X[i,] = np.load('path_to_data')
            y[i] = self.labels[ID]
            
        return X, keras.utils.to_categorical(y, num_classess=self.n_classes)
    
    def __len__(self):
        return int(np.floor(en(self.list_IDs)/self.batchsize))
    
    def __getitem__(self, index):
        indexes = self.indexes[index*self.batch_size:(inex+1)*self.batch_size]
        list_IDs_temp = [self.list_IDs[k] for k in indexes]
        x,y = self.__data_generation(list_IDs_temp)

Here, the method on_epoch_end is triggered once at the very beginning as well as at the end of each epoch.

Here, the method on_epoch_end is triggered once at the very beginning as well as at the end of each epoch.

# Generator's function explanation

## `__init__`

This is the constructor where we will accept all the variables that are needed and will store them inside the object for future use.

Params accepted:
- Labels: the y or target data
- batch_size: how much should be the batch stize
- dim: the shape/dimension of the input features or the X
- n_channels:
- n_classes: the no of classes we have for classification
- shuffle: if we want to shuffle our data before creating each batch
- list_ids: the ids whose data has to be extracted from the X and y variables

## `on_epoch_end`

This function is called after the initialization of the Sequence class object and after each epoch. Here we handle the shuffle flag. If its true we need to shuffle the input so that next batch created is shuffled.

**Params**: none<br>
**returns**: none

## `__data_generation`

This function produces the next batch that will be used for the training.

Pramas accpeted: list_ids: the list of ids whose X and y values has to be returned
If one hot encoded target is needed, then it is converted in this function.
If variables are stored in a file and has not been yet read into the memory then the reading of those variables out of the file is handled here.

<b>Tip:We can make this function use multiple cores to read data faster.</b>

**Params**: list of ids<br>
**returns**:X and encoded y in case of one-hot-encoding

## `__len__`

This denotes the no of batches per epoch. Common practice would be to set this values at:
$$
\large
\frac{\#samples}{batch\_size}
$$

**Params**: none<br>
**returns**: no of batches

## `__getitem__`

Now, once estimator knows the no of batches, it calles the `__getitem__` method to generate the batch.

**Params**: index: denotes the batch to fetch<br>
**returns**: X and y for the specific batch

# A new  Heading

# Using it in Keras

In [None]:
my_generator=MyDataGenerator(X_train, y, dim:(), batch_size=n, n_classes:c, n_channels:1, shuffle=True)
model.fit(generator=my_generator)