# **Generators in Python**

**There are two terms involved when we discuss generators.**



**Generator-Function** 

A generator-function is defined like a normal function, but whenever it needs to generate a value, it does so with the **yield** keyword rather than return. If the body of a **def contains yield**, **the function automatically becomes a generator function.**

In [1]:
# A generator function that yields 1 for first time,
# 2 second time and 3 third time
def simpleGeneratorFun():
	yield 1			
	yield 2			
	yield 3			

# Driver code to check above generator function
for value in simpleGeneratorFun():
	print(value)


1
2
3


**Generator-Object**

 Generator functions **return a generator object**. Generator objects are used either by calling the **next method** on the generator object or using the generator object in a **“for in”** loop (as shown in the above program).

In [2]:
# A Python program to demonstrate use of
# generator object with next()

# A generator function
def simpleGeneratorFun():
	yield 1
	yield 2
	yield 3

# x is a generator object
x = simpleGeneratorFun()

# Iterating over the generator object using next
print(next(x))
print(next(x))
print(next(x)) 

1
2
3


In [3]:
#l=[x for x in range(1000000000000000000000000  )]
#will give memmory error.(Huge memmory Require) 
#in case of list all the values will be stored in the beginning first and then returns.

**In case of  Generator-**

*   **In the beginning all the values are not stored**.
*   **whenever you are asking next(g) then only 1 value will be generated and stored and return**. 



In [4]:
g=(x*x for x in range(100000000000000000000))   # in tuple comprehension generator object is created


In [5]:
print(next(g))
print(next(g))
print(next(g))
print(next(g))


0
1
4
9


All of the datasets that we can load in using the Keras API can comfortably fit into memory. That's great because it makes it really easy to work with them, but in practice datasets are often a lot bigger and won't fit into memory. One way to handle this is to use dataset generators. This is a way to feed data into model without loading it all up in your memory at once.

A generator in Python is a function that returns an object that you can iterate over, and it yields a series of values, but it doesn't store all those value in memory. Instead, it saves its own internal state, and each time we iterate the generator, it yields the next value in the series. So as you might have guessed, in this way, we can use generators to feed data into our model when the data doesn't fit into memory.

# **Saving for data_file.txt**

In [6]:
# *  At what time can I expect my next flight to [Indore](destination)(other)
# *  I want to know my next flight departing time?
# *  Before what time do I need to reach the airport for my domestic flight?
# *  Before what time do I need to reach the airport for my international flight?
# *  Can I know my next flight timing?
# *  Till what time can I board my flight?
# *  Before how much time do I need to reach the airport?
# *  Can I move out at [5 pm](time)(other)
# *  Tell me the next flight till [5 pm](time)(other)

In [7]:
def text_file_txt(filepath):
  with open(filepath,'r') as f:
    for row in f:
      yield row

In [8]:
text_datagen=text_file_txt('/content/data_file.txt')

I can now iterate over the text_datagen objects, for example, using a for loop, just like I might iterate over a list. But the difference here is that the generator doesn't hold every line of the file in memory altogether, but it reads just one line and yields that line each time it's iterated over. That means they generate saves an internal state where it's got to in the file that it's reading without holding the entire file in memory

In [None]:
next(text_datagen)

# **Synthetic Dataset**

In [None]:
import numpy as np

def get_data(batch_size):
  while True:
    y_train=np.random.choice([0,1],(batch_size,1))
    x_train=np.random.randn(batch_size,1)+(2*y_train-1)
    yield x_train,y_train

In [None]:
datagen=get_data(32)

X,Y=next(datagen)

In [None]:
X.shape,Y.shape

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import BinaryCrossentropy
model=Sequential([Dense(1,activation='sigmoid')])
model.compile(loss=BinaryCrossentropy(),optimizer='sgd')

model.fit_generator(datagen,steps_per_epoch=1000,epochs=3)

To understand the **steps_per_epoch argument**, just bear in mind that because the model is getting the data from this generator object, it has no way of knowing how many iterations there are in an epoch. Remember, this generator is yielding an infinite series of data batches. So that's why we're explicitly telling the model that after 1,000 iterations, it should count that as one epoch. Given that, we're then telling the model to train for 10 epochs.

The **train_on_batch method** just performs one optimizer update for a single batch of training data. So here, we are specifying the number of iterations for the entire training, and at each step we get a batch of training data by calling next on our data generator. Then we just feed that batch of training data into the model.train_on _batch method.

In [None]:
# for _ in range(10000):
#   x_train,y_train=next(datagen)
#   model.train_on_batch(x_train,y_train)

In [None]:
datagen_eval=get_data(32)
datagen_test=get_data(32)

we'll need to specify how many steps the evaluation should run for.

In [None]:
model.evaluate_generator(datagen_eval,steps=100)

In [None]:
predictions=model.predict_generator(datagen_test,steps=100)

In [None]:
predictions.shape     # 32*100=3200