<a href="https://colab.research.google.com/github/celina23/ExploratoryDataAnalysis-EDA-/blob/main/iterators_generators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Iterators

Iterators are everywhere in Python. They are elegantly implemented within for loops, comprehensions, generators etc.

Iterator in Python is simply an object that can be iterated upon. An object which will return data, one element at a time.

Iterator object must implement two special methods, __iter__() and __next__(), collectively called the iterator protocol.

# Iterating Through an Iterator in Python

We use the next() function to manually iterate through all the items of an iterator. When we reach the end and there is no more data to be returned, it will raise StopIteration.

In [None]:
lst = [1, 2, 3, 4]

#get an iterator using iter()
myIter = iter(lst)

print next(myIter)

print next(myIter)

print next(myIter)

print next(myIter)

next(myIter)

1
2
3
4


StopIteration: 

A more elegant way of automatically iterating is by using the for loop. 

In [None]:
for element in lst:
    print element

1
2
3
4


# How for loop Actually works?

In [None]:
#create a iterator object from that iterable
iter_obj = iter(lst)

#infinite loop
while True:
    try:
        #get the element
        element = next(iter_obj)
    except StopIteration:
        break

So internally, the for loop creates an iterator object, iter_obj by calling iter() on the iterable.

This for loop is actually an infinite while loop.

Inside the loop, it calls next() to get the next element and executes the body of the for loop with this value. After all the items exhaust, StopIteration is raised which is internally caught and the loop ends.

# Building Your Own Iterator in Python

Building an iterator from scratch is easy in Python. We just have to implement the methods __iter__() and __next__().

The __iter__() method returns the iterator object itself. If required, some initialization can be performed.

The __next__() method must return the next item in the sequence. On reaching the end, and in subsequent calls, it must raise StopIteration.



In [None]:
class PowOfTwo:
    """
    class to implement iterator of powers of two
    """
    
    def __init__(self, max=0):
        self.max = max
        
    def __iter__(self):
        self.n = 0
        return self

    def next(self):
        if self.n <= self.max:
            result = 2 ** self.n
            self.n += 1
            return result
        else:
            raise StopIteration

In [None]:
for i in PowOfTwo(5):
    print i

1
2
4
8
16
32


# Generators

Python generators are a simple way of creating iterators. All the overhead (implement a class with __iter__() and __next__() method, keep track of internal states, raise StopIteration when there was no values to be returned ) are automatically handled by generators in Python.

# How to create a generator in Python?

It is as easy as defining a normal function with yield statement instead of a return statement.

If a function contains at least one yield statement (it may contain other yield or return statements), it becomes a generator function. Both yield and return will return some value from a function.

The difference is that, while a return statement terminates a function entirely, yield statement pauses the function saving all its states and later continues from there on successive calls.

# Differences between Generator function and a Normal function

    Generator function contains one or more yield statement.
    
    When called, it returns an object (iterator) but does not start execution immediately.

    Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using next().

    Once the function yields, the function is paused and the control is transferred to the caller.

    Local variables and their states are remembered between successive calls.

    Finally, when the function terminates, StopIteration is raised automatically on further calls.

In [None]:
def my_generator():
    """
    Simple generator function
    """
    a = 1
    print "First Time"
    yield a
    
    a += 1
    print "Second Time"
    yield a
    
    a += 1
    print "Third Time"
    yield a

In [None]:
n = my_generator()

#we can iterate through the items using next()
next(n)

First Time


1

In [None]:
next(n)

Second Time


2

In [None]:
next(n)

Third Time


3

In [None]:
next(n)

StopIteration: 

Unlike normal functions, the local variables are not destroyed when the function yields. Furthermore, the generator object can be iterated only once.

To restart the process we need to create another generator object using something like n = my_generator().

In [None]:
#we can implement using for loop
for ele in my_generator():
    print ele

First Time
1
Second Time
2
Third Time
3


# Python Generators with a Loop

In [None]:
def reverse_string(myStr):
    """
    function to reverse string
    """
    length = len(myStr)
    for i in range(length-1, -1, -1):
        yield myStr[i]

for c in reverse_string("satish"):
    print c

h
s
i
t
a
s


# Python Generator Expression

The syntax for generator expression is similar to that of a list comprehension in Python. But the square brackets are replaced with round parentheses.

The major difference between a list comprehension and a generator expression is that while list comprehension produces the entire list, generator expression produces one item at a time.

They are kind of lazy, producing items only when asked for. For this reason, a generator expression is much more memory efficient than an equivalent list comprehension.

In [None]:
lst = [1, 2, 3, 4]

#square each item using list comprehenstion
print [x**2 for x in lst]

a = (x**2 for x in lst)
print next(a)

[1, 4, 9, 16]
1


In [None]:
next(a)

4

In [None]:
print next(a)

print next(a)

print next(a)

print next(a)

9
16


StopIteration: 

Generator expression can be used inside functions. When used in such a way, the round parentheses can be dropped.

In [None]:
print sum(x**2 for x in lst)

30


# Why Generators are used in python?

# 1. Easy to Implement

Generators can be implemented in a clear and concise way as compared to their iterator class counterpart.

In [None]:
class PowOfTwo:
    """
    class to implement iterator of powers of two
    """
    
    def __init__(self, max=0):
        self.max = max
        
    def __iter__(self):
        self.n = 0
        return self

    def next(self):
        if self.n <= self.max:
            result = 2 ** self.n
            self.n += 1
            return result
        else:
            raise StopIteration

In [None]:
def pow_two_generator(max=0):
    n = 0
    while n < max:
        yield 2 ** n
        n += 1

# 2. Memory Efficient

A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill if the number of items in the sequence is very large.

Generator implementation of such sequence is memory friendly and is preferred since it only produces one item at a time.

# 3. Represent Infinite Stream

Generators are used to represent an infinite stream of data. 

Infinite streams cannot be stored in memory and since generators produce only one item at a time, it can represent infinite stream of data.

In [None]:
def gen_even():
    """
    generate all even numbers 
    """
    n = 0
    while True:
        yield n 
        n += 2

# 4. Pipelining Generator

Suppose we have a log file from a famous fast food chain. 

The log file has a column that keeps track of the number of Dosa's sold every hour and we want to sum it to find the total Dosa's sold in 2 years.

with open('sells.log') as file:

    dosa_col = (line[2] for line in file)
    
    per_hour = (int(x) for x in dosa_col if x != 'N/A')
    
    print "Total dosas sold = " + str(sum(per_hour))