# I. Introduction to Python > 21. Iterators and Generators


#### [<< Previous lesson](./20_Errors-and-Exceptions.ipynb)   |   [Next lesson >>](./22-Decorators.ipynb)

<hr>
&nbsp;

## Table of content

- [1. Iterables and Iterators](#1)
- [2. Generators](#2)
    - [2.1. Generator functions and generator objects](#2.1)
    - [2.2. Generator expressions](#2.2)
- [3. The itertools Module](#3)
- [Credits](#credits)

<hr>
&nbsp;

## <a id="1"></a>1. Iterables and Iterators

We have seen that by using`for` we can iterate over objects.

In [1]:
# here we iterate over a list
for i in [1, 2, 3, 4]:
    print(i)

1
2
3
4


In [2]:
# we iterate over a string
for char in "python":
    print(char)

p
y
t
h
o
n


In [3]:
# we iterate over a dictionary
for key in {"x": 1, "y": 2}:
    print(key)

x
y


Objects which can be used with a `for` loop are called **iterables**. And to be able to use a `for` loop, an object must implement the **iterator protocol** (the `__iter__` method).


The `__iter__` returns an **iterator**. So this means that an **iterable** is any object that can return an iterator.

And an **iterator** is an object that return a value, one element at a time. It uses the the `__next__` method.

![iterable vs iterator](./attachments/iterable-vs-iterator.png)



So to summarize, in order to receive the element one by one, and sequentially, the iterable is converted to an iterator first, and then the `next()` method is used to get the elements from the iterator:
- Iter-**ables** are able to be iterated over
- Iter-**ators** are the agents that perform the iteration

In [4]:
# let's create the following list
mylist = [1, 2, 3]

We get an iterator from any iterable by using the `iter()` function

In [5]:
# mylist is an iterable, so we can call the iter() method on it
my_iterator =  iter(mylist)

In [6]:
# but it is not an iterator, so we can NOT call next() on it
next(mylist)

TypeError: 'list' object is not an iterator

In [7]:
# my_iterator is an iterator
my_iterator

<list_iterator at 0x7f5fccad0f10>

In [8]:
# check
type(my_iterator)

list_iterator

Once we have an iterator, the only thing you can do with it is get its next item.

In [9]:
# 1st element
next(my_iterator)

1

In [10]:
# 2nd element
next(my_iterator)

2

In [11]:
# 3rd element
next(my_iterator)

3

In [12]:
# no more elements
next(my_iterator)

StopIteration: 

We get a stop iteration exception if we ask for the next item but there aren’t anymore items.

**NOTE:** once an iterator is consumed, we can no longer iterate over it:

In [13]:
# iterating over my_iterator
my_iterator =  iter(mylist)
[x for x in my_iterator]

[1, 2, 3]

In [14]:
# reiterating over it will return an empty list
[x for x in my_iterator]

[]

**NOTE:** to be very precise, an **iterator is always an iterable** but **not every iterable is a iterator**. An iterator always has the `__iter__` method.

In [15]:
# let's reuse our previous iterator
my_iterator =  iter(mylist)

In [16]:
# check
type(my_iterator)

list_iterator

In [17]:
# we can apply the iter() method on an iterator (so it's an iterable too)
iter(my_iterator)

<list_iterator at 0x7f5fccad0dc0>

&nbsp;  

In practice we prefer to use use a `for` loop than to se the `__next__` method. But they are equivalent.

In [18]:
# the above successive next() are equivalent to this 'for' loop
for number in mylist:
    print(number)

1
2
3


Internally, the `for` loop uses `iter()` to create an iterator object and then apply the `next()` method.

And the reason why we don’t get an error at the end is because it will automatically catches it error and stops calling `next()`.

In [19]:
# this is what a 'for' loop actually looks like internally:
my_iterator = iter(mylist)

while True:
    try:
        number = next(my_iterator)  # get the next item
        #                         # do something with element
        
    except StopIteration:         # if StopIteration is raised
        break                     # break from loop

Ironically, the `for` loop is actually an infinite `while` loop.

&nbsp;


**NOTE:** As we said in previous lessons, `zip()`, `map()` and `filter()` are examples of functions that returns iterators.

In [20]:
# zip objects returns iterators
zip_A = zip([1, 2, 3], ['a', 'b', 'c'])

In [21]:
# which means we can apply next() on them
next(zip_A)

(1, 'a')

In [22]:
next(zip_A)

(2, 'b')

In [23]:
next(zip_A)

(3, 'c')

In [24]:
# until they are exhausted
next(zip_A)

StopIteration: 

In [25]:
# and we can only use them once
list(zip_A)

[]

In [26]:
# we can for example convert it as a list
zip_A = zip([1, 2, 3], ['a', 'b', 'c'])
list(zip_A)

[(1, 'a'), (2, 'b'), (3, 'c')]

In [27]:
# and of course, after that it is gone
next(zip_A)

StopIteration: 

In [28]:
 # map objects also returns iterators
map_A = map(len, ['abc', 'de', 'fghi'])

In [29]:
next(map_A)

3

&nbsp;


Check the [python documentation](https://docs.python.org/3/library/stdtypes.html#iterator-types) for more information on iterators.

<hr>
&nbsp;

## <a id="2"></a>2. Generators


### <a id="2.1"></a>2.1. Generator functions and generator objects

**Generators** are a *simple way* of **creating iterators**. With them, there's no need to implement the methods `__iter__` and` __next__` and we don’t have to keep track of an internal state or worry about raising exceptions.

Creating a generator is as easy as creating a function, but with the **`yield`** keyword *instead of* the **`return`** keyword.

In [30]:
# Generator function for the cube of numbers (power of 3)
def cubes_generator(n):
    for num in range(n):
        yield num**3  # use "yield" instead of "return"

In [31]:
# and we can use it like this
for x in cubes_generator(10):
    print(x)

0
1
8
27
64
125
216
343
512
729


**NOTE: Generator functions** return a **generator object**. Generator objects (which are iterators) are used either by calling the `next()` method or in a `for` loop

In [32]:
# this is a generator function
type(cubes_generator)

function

In [33]:
# a generator functions returns a generator object
type(cubes_generator(10))

generator

In [34]:
# check
my_generator = cubes_generator(3)
my_generator

<generator object cubes_generator at 0x7f5fcc264890>

A generator object produces items only on demand.

In [35]:
# we can call next() on a generator object (since it's an iterator)
next(my_generator)

0

In [36]:
next(my_generator)

1

In [37]:
next(my_generator)

8

In [38]:
# until it is empty
next(my_generator)

StopIteration: 

In [39]:
# we can also define infinite sequences
def inf_cubes_generator():  # we are no longer limited by a parameter n
    num = 0
    while True:
        yield num**3
        num += 1

In [40]:
# we define a generator object
my_generator = inf_cubes_generator()

In [41]:
# and we can generate as many cubes as we want
next(my_generator)

0

In [42]:
next(my_generator)

1

In [43]:
next(my_generator)

8

In [44]:
next(my_generator)

27

In [45]:
next(my_generator)

64

In [46]:
# This will never stop unless we stop
for i in range(20):
    print(next(my_generator))

125
216
343
512
729
1000
1331
1728
2197
2744
3375
4096
4913
5832
6859
8000
9261
10648
12167
13824


&nbsp;  

Unlike normal functions, the local variables are not destroyed when the function yields. And since they only produce the next item on demand, they **don't need to store everything in memory**.

Let's now create another generator which calculates [fibonacci](https://en.wikipedia.org/wiki/Fibonacci_number) numbers

![fibonacci sequence](./attachments/fibonacci.png)

In [47]:
# This is how it would look like with a normal function
def fibonacci(n):
    a = 1
    b = 1
    output = []
    
    for i in range(n):
        output.append(a)
        a, b = b, a+b
        
    return output

In [48]:
# And this is the equivalent using a generator function
def fibonacci_generator(n):
    a = 1
    b = 1
    for i in range(n):
        yield a
        a, b = b, a+b

In [49]:
# this is a list
fibonacci(10)

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

In [50]:
# and this is a generator object
fibonacci_generator(10)

<generator object fibonacci_generator at 0x7f5fcc224430>

In [51]:
# which we use with a for loop
for num in fibonacci_generator(10):
    print(num)

1
1
2
3
5
8
13
21
34
55


**NOTE:** if we call some huge value of n (like 100,000) the **normal function** will have to **keep** track of *every single result* **in memory**.

A **generator** however **do not** store all the content in memory. Instead it remembers which value it was at and only computes the next one when asked to (we speak of *lazy evaluation*). Generators allow us to generate as we go along, instead of holding everything in memory.

<hr>
&nbsp;

### <a id="2.2"></a>2.2. Generator expressions

A **generator expression** (also called a **generator comprehension**) has a very similar syntax to **list comprehension**. It is another very easy way to create (simple) generators.

In [52]:
# this is a generator expression
(x**3 for x in range(3))

<generator object <genexpr> at 0x7f5fcc224970>

In [53]:
# we can assign it to a variable
my_generator = (x**3 for x in range(3))

In [54]:
my_generator

<generator object <genexpr> at 0x7f5fcc2249e0>

In [55]:
type(my_generator)

generator

In [56]:
next(my_generator)

0

In [57]:
next(my_generator)

1

In [58]:
next(my_generator)

8

In [59]:
next(my_generator)

StopIteration: 

In [60]:
# let's compare list comprehensions vs generator expression
%timeit [i for i in range(100) if i % 2 == 0]

4.2 µs ± 187 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [61]:
%timeit (i for i in range(100) if i % 2 == 0)

655 ns ± 155 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [62]:
# and reusing our previous example
%timeit [x**3 for x in range(100)]

21.7 µs ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [63]:
%timeit (x**3 for x in range(100))

334 ns ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


we can also compare how much memory each expression takes with the `sys.getsizeof()` [method](https://docs.python.org/3/library/sys.html#sys.getsizeof).

In [64]:
from sys import getsizeof

In [65]:
lst_comp = [x**3 for x in range(100)]
getsizeof(lst_comp)

904

In [66]:
gen_exp = (x**3 for x in range(100))
getsizeof(gen_exp)

112

&nbsp;  

So to summarize:

- There are 2 types of generator: generator functions and generator expressions
- Generators reduce the need to create lists (they are more efficient)
- Any generator is an iterator (but not vice versa!)
- Generators are a special type of iterator (*lazy iterator*) because they do not store their contents in memory.
- They are memory efficient (for the above reason)
- They allow us to work with very large data or even with infinite sequence



![iterator vs generator](./attachments/iterator-generator.png)


&nbsp;

Check the [python documentation](https://docs.python.org/3/library/stdtypes.html#generator-types) for more information on generators.

<hr>
&nbsp;

## <a id="3"></a>3. The `itertools` Module

The `itertools` module provides a lot of interesting tools to work with iterators.

In [67]:
# this is a bunch of iterables
letters = ['a', 'b', 'c', 'd', 'e', 'f']
booleans = [True, False, True, False, False, True]
numbers = [23, 20, 44, 32, 7, 12]
decimals = [0.1, 0.7, 0.4, 0.4, 0.5]

In [68]:
# we can chain iterators with chain()
from itertools import chain

In [70]:
my_chain = chain(letters, booleans, decimals)
list(my_chain)

['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 True,
 False,
 True,
 False,
 False,
 True,
 0.1,
 0.7,
 0.4,
 0.4,
 0.5]

In [71]:
# Let's create a counter with count()
from itertools import count

In [72]:
# let's count from 10 onward with 0.25 increment
my_counter = count(10, 0.25)

In [73]:
# show
for _ in range(5):
    print(next(my_counter))

10
10.25
10.5
10.75
11.0


In [74]:
# another example of using count()
word = "Python"
n = len(word)-1
list(zip(count(n, -1), reversed(word)))

[(5, 'n'), (4, 'o'), (3, 'h'), (2, 't'), (1, 'y'), (0, 'P')]

In [75]:
# we can filter some value with compress()
from itertools import compress

In [76]:
filter_items = compress(letters, booleans)
list(filter_items)

['a', 'c', 'f']

In [77]:
# we can cycle infintly over an iterable
from itertools import cycle

In [78]:
cycled_items = cycle('AB')

In [79]:
# show
for _ in range(6):
    print(next(cycled_items))

A
B
A
B
A
B


In [80]:
# a more advanced example: using cycle() and compress() together
every_third = cycle([False, False, True])
data = range(1, 22)
[i for i in compress(data, every_third)]

[3, 6, 9, 12, 15, 18, 21]

In [81]:
# we can repeat a value infintly
from itertools import repeat

In [82]:
# repeat the number 25 (4x)
list(repeat(25, 4))

[25, 25, 25, 25]

In [83]:
# a more advanced use of repeat()
for (x, y, z) in map(lambda x, y: (x, y, x * y), repeat(3), range(1, 11)):
    print(x, '*', y, '=', z)

3 * 1 = 3
3 * 2 = 6
3 * 3 = 9
3 * 4 = 12
3 * 5 = 15
3 * 6 = 18
3 * 7 = 21
3 * 8 = 24
3 * 9 = 27
3 * 10 = 30


In [84]:
# we can create permutations
from itertools import permutations

In [85]:
# all permutations of the iterables [0, 1, 2]
list(permutations([0, 1, 2]))

[(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]

In [86]:
# we can create combinations
from itertools import combinations

In [87]:
# all combinations of 2 elements taken in range(5)
list(combinations(range(5), 2))

[(0, 1),
 (0, 2),
 (0, 3),
 (0, 4),
 (1, 2),
 (1, 3),
 (1, 4),
 (2, 3),
 (2, 4),
 (3, 4)]

In [88]:
# we can accumulate values by repeatdkt applying the operator +
from itertools import accumulate

In [89]:
list(accumulate('abcde'))

['a', 'ab', 'abc', 'abcd', 'abcde']

In [90]:
list(accumulate(range(1, 6)))

[1, 3, 6, 10, 15]

In [91]:
# accumulate is similar to reduce (but with the intermediate steps)
from functools import reduce
from operator import add

reduce(add, list('abcde'))

'abcde'

In [92]:
reduce(add, list(range(1, 6)))

15

In [93]:
# with reduce we can use the operator *
from operator import mul
reduce(mul, list(range(1, 6)))

120

In [94]:
# but with accumulate also
list(accumulate(range(1, 6), mul))

[1, 2, 6, 24, 120]

It is possible to combine `accumulate()` with any other function that takes 2 arguments.

In [95]:
# we can use any function that can take 2 arguments; here we use max()
lst = [2,4,6,3,1]
list(accumulate(lst, max))

[2, 4, 6, 6, 6]

In [96]:
# and same as we can have an initial value
reduce(add, list(range(1, 6)), 9000)

9015

In [97]:
# accumulate can also have an initial value
list(accumulate(range(1, 6), initial=9000))

[9000, 9001, 9003, 9006, 9010, 9015]

&nbsp;

Check the [python documentation](https://docs.python.org/3/library/itertools.html) for more information on the itertools module.



<hr>
&nbsp;

## <a id="credits"></a>Credits
- [Pierian Data](https://github.com/Pierian-Data/Complete-Python-3-Bootcamp)
- [Nvie](https://nvie.com/posts/iterators-vs-generators/)
- [Anandology](https://anandology.com/python-practice-book/iterators.html)
- [Python Tips](https://book.pythontips.com/en/latest/generators.html)
- [Trey Hunner](https://treyhunner.com/2018/02/python-range-is-not-an-iterator/)
- [Data Camp](https://www.datacamp.com/community/tutorials/python-iterator-tutorial)
- [Programmiz](https://www.programiz.com/python-programming/iterator) and [here](https://www.programiz.com/python-programming/generator)
- [Geeks for geeks](https://www.geeksforgeeks.org/python-itertools)
- [Pymotw](https://pymotw.com/3/itertools/index.html)