# Introduction to Keras Sequence

Previously our Keras program would look something like this:

```python
from tensorflow.keras import Sequence
#other import statements

X,y = load_data('some_path_to_data')

model = Sequence()
# model architecture code

model.compile()

model.fit(x=X, y=y)

model.predict()
```


But sometimes our data can be so hide that loading it in one go can be hard or might not fit in the memory

## Introducing DataGenerator

This problem is solved by using the data generators.

For this we inherit the Keras `Sequence` class

## The Constructor

```python
from tensorflow.keras.utils import Sequence

class MyDataGenerator(Sequence):

    def __init__(self, list_IDs, labels, batch_size=32, dim=(32,32,32), n_channels=1,
                 n_classes=10, shuffle=True):
        'Initialization'
        self.dim = dim
        self.batch_size = batch_size
        self.labels = labels
        self.list_IDs = list_IDs
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.shuffle = shuffle
        self.on_epoch_end()
```

## Epoch end method

Here, the method `on_epoch_end` is triggered once at the very beginning as well as at the end of each epoch. If the `shuffle` parameter is set to True, we will get a new order of exploration at each pass (or just keep a linear exploration scheme otherwise).

```python
def on_epoch_end(self):
  'Updates indexes after each epoch'
  self.indexes = np.arange(len(self.list_IDs))
  if self.shuffle == True:
      np.random.shuffle(self.indexes)
        
```

## Data loading method

Another method that is core to the generation process is the one that achieves the most crucial job: producing batches of data. The private method in charge of this task is called `__data_generation` and takes as argument the __list of IDs of the target batch__.

```python
def __data_generation(self, list_IDs_temp):
  'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
  # Initialization
  X = np.empty((self.batch_size, *self.dim, self.n_channels))
  y = np.empty((self.batch_size), dtype=int)

  # Generate data
  for i, ID in enumerate(list_IDs_temp):
      # Store sample
      X[i,] = np.load('data/' + ID + '.npy')

      # Store class
      y[i] = self.labels[ID]

  return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
```

During data generation, this code reads the NumPy array of each example from its corresponding file ID.npy. Since our code is multicore-friendly, note that you can do more complex operations instead (e.g. computations from source files) without worrying that data generation becomes a bottleneck in the training process.

## The length

Each call requests a batch index between 0 and the total number of batches, where the latter is specified in the `__len__` method.

```python
def __len__(self):
  'Denotes the number of batches per epoch'
  return int(np.floor(len(self.list_IDs) / self.batch_size))
```

A common practice is to set this value to $\large floor(\frac{\#samples}{batch\_size})$

## Get item method

Now, when the batch corresponding to a given index is called, the generator executes the `__getitem__` method to generate it.

```python
def __getitem__(self, index):
  'Generate one batch of data'
  # Generate indexes of the batch
  indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]

  # Find list of IDs
  list_IDs_temp = [self.list_IDs[k] for k in indexes]

  # Generate data
  X, y = self.__data_generation(list_IDs_temp)

  return X, y
```

# The new Keras Script

```python
from tensorflow.kera.utils import Sequence
# other import statements

params = {
    'dim' : (32,32,3),
    'batch_size': 64,
    'n_classes': 8,
    'shuffle': True
}

class MyDataGenerator(Sequence):
    # define the generator

train_data = 'path_to_train_data'
eval_data = 'path_to_eval_data'
    
training_generator = MyDataGenerator(train_data, **params)
eval_generator = MyDataGenerator(eval_data, **params)

model = Sequential()
# model architecture
model.compile()

model.fit(training_generator, use_multiprocessing = True, workers = 10, max_que_size=10)

model.evaluate(eval_generator, use_multiprocessing = True, workers = 10, max_que_size=10)
```

The `use_multiprocessing` and the `workers` are the important flag used while using a data generator. Theses flags make use of the multiprocessing of the CPU.

__max_queue_size__ : It specifies how many batches it’s going to prepare in the queue. It doesn’t mean you’ll have multiple generator instances.

__use_multipeocessing__: Multiple generators are instantiated only when use_multiprocessing=True.