# CHAPTER 14 Iterables, iterators and generators
Every generator is an iterator: generators fully implement the iterator
interface. But an iterator — as defined in the GoF book — retrieves
items from a collection, while a generator can produce items
“out of thin air”. That’s why the Fibonacci sequence generator is a
common example: an infinite series of numbers cannot be stored
in a collection. However, be aware that the Python community
treats iterator and generator as synonyms most of the time.
## Sentence take #1: a sequence of words


In [15]:
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
    def __getitem__(self, index):
        return self.words[index]
    def __len__(self):
        return len(self.words)
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [2]:
s = Sentence('"The time has come," the Walrus said,')
s

Sentence('"The time ha... Walrus said,')

In [3]:
for word in s: print (word)

The
time
has
come
the
Walrus
said


In [4]:
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [5]:
s[4]

'the'

### Why sequences are iterable: the iter function
Whenever the interpreter needs to iterate over an object x, it automatically calls iter(x).
The iter built-in function:
1. Checks whether the object implements, __iter__, and calls that to obtain an iterator;
2. If __iter__ is not implemented, but __getitem__ is implemented, Python creates
an iterator that attempts to fetch items in order, starting from index 0 (zero);
3. If that fails, Python raises TypeError, usually saying "'C' object is not itera
ble", where C is the class of the target object.
That is why any Python sequence is iterable: they all implement __getitem__. In fact,
the standard sequences also implement __iter__, and yours should too



In [6]:
class Foo:
    def __iter__(self):
         pass
from collections import abc
issubclass(Foo, abc.Iterable)

True

In [7]:
f = Foo()
isinstance(f, abc.Iterable)

True

## Iterables versus iterators
iterable
Any object from which the iter built-in function can obtain an iterator. Objects
implementing an __iter__ method returning an iterator are iterable. Sequences
are always iterable; so as are objects implementing a __getitem__ method which
takes 0-based indexes.
It’s important to be clear about the relationship between iterables and iterators: Python
obtains iterators from iterables.

In [1]:
s='abc'
for char in s :
    print(char)

a
b
c


In [4]:
s='abc'
it=iter(s)
while True:
    try:
        print(next(it))
    except StopIteration: #
        break
    

a
b
c


StopIteration signals that the iterator is exhausted. This exception is handled internally
in for loops and other iteration contexts like list comprehensions, tuple unpacking
etc.
The standard interface for an iterator has two methods:
__next__
Returns the next available item, raising StopIteration when there are no more
items.
__iter__
Returns self; this allows iterators to be used where an iterable is expected, for
example, in a for loop.
the best way to check if an object x is an iterator is to call
isinstance(x, abc.Iterator).



In [9]:
s3 = Sentence('Pig and Pepper') #
it = iter(s3)
it,next(it),next(it),next(it)

(<iterator at 0x54d82b0>, 'Pig', 'and', 'Pepper')

In [10]:
next(it)

StopIteration: 

In [11]:
list(it)

[]

In [14]:
list(s3),list(iter(s3))

(['Pig', 'and', 'Pepper'], ['Pig', 'and', 'Pepper'])

Since the only methods required of an iterator are __next__ and __iter__, there is no
way to check whether there are remaining items, other than call next() and catch
StopInteration. Also, it’s not possible to “reset” an iterator. If you need to start over,
you need to call iter(…) on the iterable that built the iterator in the first place. Calling
iter(…) on the iterator itself won’t help, because — as mentioned — Itera
tor.__iter__ is implemented by returning self, so this will not reset a depleted iterator.

## Sentence take #2: a classic iterator


In [16]:
import re
import reprlib
RE_WORD = re.compile('\w+')#Compile a regular expression pattern into a regular expression object,
                            #which can be used for matching using its match(), search() and other methods
class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text) #Return all non-overlapping matches of pattern in string, as a list of strings.
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    def __iter__(self):
        return SentenceIterator(self.words)

class SentenceIterator:
    def __init__(self, words):
        self.words = words
        self.index = 0
    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word
    def __iter__(self):
        return self

### Making Sentence an iterator: bad idea
A common cause of errors in building iterables and iterators is to confuse the two. To
be clear: iterables have a __iter__ method that instantiates a new iterator every time.
Iterators implement a __next__ method that returns individual items, and a __iter__
method that returns self.
Therefore, iterators are also iterable, but iterables are not iterators.
The Applicability section4 of the Iterator design pattern in the GoF book says:
Use the Iterator pattern
• to access an aggregate object’s contents without exposing its internal representation.
• to support multiple traversals of aggregate objects.
• to provide a uniform interface for traversing different aggregate structures (that is,
to support polymorphic iteration).

To “support multiple traversals” it must be possible to obtain multiple independent
iterators from the same iterable instance, and each iterator must keep its own internal
state, so a proper implementation of the pattern requires each call to iter(my_itera
ble) to create a new, independent, iterator. That is why we need the SentenceItera
tor class in this example.


An iterable should never act as an iterator over itself. In other words,
iterables must implement __iter__, but not __next__.
On the other hand, for convenience, iterators should be iterable. An
iterator’s __iter__ should just return self.
## Sentence take #3: a generator function

In [17]:
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    def __iter__(self):
        for word in self.words:
            yield word
        return  #This return is not needed
    

### How a generator function works
Any Python function that has the yield keyword in its body is a generator function: a
function which, when called, returns a generator object. In other words, a generator
function is a generator factory.


In [18]:
def gen_123(): #
    yield 1 #
    yield 2
    yield 3

In [19]:
gen_123()

<generator object gen_123 at 0x00000000054ED830>

In [20]:
for i in gen_123(): 
    print(i)

1
2
3


In [21]:
g=gen_123()
next(g),next(g),next(g)

(1, 2, 3)

In [22]:
next(g)

StopIteration: 

A generator function builds a generator object which wraps the body of the function.
When we invoke next(…) on the generator object, execution advances to the next yield
in the function body, and the next(…) call evaluates to the value yielded when the function
body is suspended. Finally, when the function body returns, the enclosing generator
object raises StopIteration, in accordance with the Iterator protocol.

In [4]:
def gen_AB(): #
    print('start')
    yield 'A' #
    print('continue')
    yield 'B' #
    print('end.')
    

In [24]:
for c in gen_AB(): #
    print('-->', c)

start
--> A
continue
--> B
end.


In [25]:
a=gen_AB();
next(a)

start


'A'

In [26]:
next(a)

continue


'B'

In [27]:
next(a)

end.


StopIteration: 

## Sentence take #4: a lazy implementation
The re.finditer function is a lazy version of re.findall which, instead of a list, returns
a generator producing re.MatchObject instances on demand. If there are many
matches, re.finditer saves a lot of memory. Using it, our third version of Sentence is
now lazy: it only produces the next word when it is needed.

In [2]:
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
    def __init__(self, text):
        self.text = text
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    def __iter__(self):
        for match in RE_WORD.finditer(self.text):
            # finditer return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string.
            yield match.group()  # Returns one or more subgroups of the match. 

## Sentence take #5: a generator expression


In [5]:
res1 = [x*3 for x in gen_AB()]

start
continue
end.


The list comprehension eagerly iterates over the items yielded by the generator
object produced by calling gen_AB(): 'A' and 'B'.

In [6]:
for i in res1: #
    print('-->', i)

--> AAA
--> BBB


In [7]:
res2 = (x*3 for x in gen_AB())

The generator expression returns res2. The call to gen_AB() is made, but that
call returns a generator which is not consumed here.

In [8]:
res2

<generator object <genexpr> at 0x00000000059F49E8>

In [9]:
for i in res2: #
    print('-->', i)

start
--> AAA
continue
--> BBB
end.


In [11]:
import re
import reprlib
RE_WORD = re.compile('\w+')
class Sentence:
    def __init__(self, text):
        self.text = text
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    def __iter__(self):
        return (match.group() for match in RE_WORD.finditer(self.text))

## Generator expressions: when to use them
My rule of thumb in choosing the syntax to use is simple: if the generator expression
spans more than a couple of lines, I prefer to code a generator function for the sake of
readability. Also, because generator functions have a name, they can be reused. You can
always name a generator expression and use it later by assigning it to a variable, of course,
but that is stretching its intended usage as a one-off generator.
## Another example: arithmetic progression generator

In [12]:
class ArithmeticProgression:
    def __init__(self, begin, step, end=None):
        self.begin = begin
        self.step = step
        self.end = end # None -> "infinite" series
    def __iter__(self):
        result = type(self.begin + self.step)(self.begin)
        forever = self.end is None
        index = 0
        while forever or result < self.end:
            yield result