# Ch04-Iterators and Generators(迭代器和生成器)

## 4.1. Manually Consuming an Iterator(手动访问迭代器中的元素)

### Problem

You need to process items in an iterable, but for whatever reason, you can’t or don’t want to use a for loop.

### Solution

To manually consume an iterable, use the *next()* function and write your code to catch the *StopIteration* exception.

### Disscussion

In most cases, the for statement is used to consume an iterable. However, every now and then, a problem calls for more precise control over the underlying iteration mechanism. Thus, it is useful to know what actually happens.

The following interactive example illustrates the basic mechanics of what happens during iteration:

In [2]:
items = [1,2,3]

In [3]:
#Get the iterator
it = iter(items) # Invokes items, __iter__()

In [4]:
# Run the iterator
next(it)

1

In [5]:
next(it)

2

In [6]:
next(it)

3

## 4.2. Delegating Iteration(委托迭代)

### Problem

You have built a custom container object that internally holds a list, tuple, or some other iterable. You would like to make iteration work with your new container.

### Solution

Typically, all you need to do is define an \__iter\__() method that delegates iteration to the internally held container.

In [7]:
class Node:
    def __init__(self, value):
        self._value = value
        self._children = []
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    def add_child(self, node):
        self._children.append(node)
    def __iter__(self):
        return iter(self._children)
# Example
if __name__ == '__main__':
    root = Node(0)
    child1 = Node(1)
    child2 = Node(2)
    root.add_child(child1)
    root.add_child(child2)
    for ch in root:
        print(ch)

Node(1)
Node(2)


## 4.3. Creating New Iteration Patterns with Generators(用生成器创建新的迭代模式)

### Problem

You want to implement a custom iteration pattern that’s different than the usual built-in functions (e.g., *range()* , *reversed()* , etc.).

### Solution

Here’s a generator that produces a range of floating-point numbers:

In [8]:
def frange(start, stop, increment):
    x = start
    while x < stop:
        yield x
        x += increment

To use such a function, you iterate over it using a *for* loop or use it with some other function that consumes an iterable (e.g., *sum()* , *list()* , etc.). 

In [9]:
for n in frange(0, 4, 1):
    print(n)

0
1
2
3


In [10]:
list(frange(0,1,0.125))

[0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875]

### Discussion

The mere presence of the yield statement in a function turns it into a generator. Unlike a normal function, a generator only runs in response to iteration. Here’s an experiment you can try to see the underlying mechanics of how such a function works:

In [11]:
def countdown(n):
    print('Starting to count from', n)
    while n>0:
        yield n
        n -= 1
    print('Done!')    

In [12]:
#Creat the generator, notice no output appears
c = countdown(3)
c

<generator object countdown at 0x033ABCF0>

In [13]:
# Run to first yield and emit a value
next(c)

Starting to count from 3


3

In [14]:
next(c)

2

In [15]:
next(c)

1

The key feature is that a generator function only runs in response to “next” operations carried out in iteration. Once a generator function returns, iteration stops. However, the for statement that’s usually used to iterate takes care of these details, so you don’t normally need to worry about them.

## 4.4. Implementing the Iterator Protocol(实现迭代协议)

### Problem

You are building custom objects on which you would like to support iteration, but would like an easy way to implement the iterator protocol.

### Solution

By far, the easiest way to implement iteration on an object is to use a generator function. In Recipe 4.2, a Node class was presented for representing tree structures. Perhaps you want to implement an iterator that traverses nodes in a depth-first pattern. Here is how you could do it:

In [16]:
class Node:
    def __init__(self, value):
        self._value = value
        self._children = []
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    def add_child(self,node):
        self._children.append(node)
    def __iter__(self):
        return iter(self._children)
    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()

In [17]:
if __name__ == '__main__':
    root = Node(0)
    child1 = Node(1)
    child2 = Node(2)
    root.add_child(child1)
    root.add_child(child2)
    child1.add_child(Node(3))
    child1.add_child(Node(4))
    child2.add_child(Node(5))
        
    for ch in root.depth_first():
        print(ch)           

Node(0)
Node(1)
Node(3)
Node(4)
Node(2)
Node(5)


In this code, the *depth_first()* method is simple to read and describe. It first yields itself and then iterates over each child yielding the items produced by the child’s *depth_first()* method (using yield from ).

## 4.5. Iterating in Reverse(反向迭代) 

### Problem 

You want to iterate in reverse over a sequence.

### Solution

Use the built-in *reversed()* function. 

In [18]:
a = [1, 2, 3, 4]

In [19]:
for x in reversed(a):
    print(x)

4
3
2
1


Reversed iteration only works if the object in question has a size that can be determined or if the object implements a \__reversed\__() special method. If neither of these can be satisfied, you’ll have to convert the object into a list first. 

### Discussion

Many programmers don’t realize that reversed iteration can be customized on user-defined classes if they implement the \__reversed\__() method. 

In [20]:
class Countdown:
    def __init__(self, start):
        self.start = start
    # Forward iterator
    def __iter__(self):
        n = self.start
        while n > 0:
            yield n
            n -= 1
    # Reverse iterator
    def __reversed__(self):
        n = 1
        while n <= self.start:
            yield n
            n += 1

## 4.6. Defining Generator Functions with Extra State(定义带有额外状态的生成器函数)

## 4.7. Taking a Slice of an Iterator(对迭代器做切片操作)

### Problem

You want to take a slice of data produced by an iterator, but the normal slicing operator doesn’t work

### Solution

The *itertools.islice()* function is perfectly suited for taking slices of iterators and generators.

In [21]:
def count(n):
    while True:
        yield n
        n += 1

In [22]:
c = count(0)

In [23]:
import itertools

In [24]:
for x in itertools.islice(c, 10, 20):
    print(x)

10
11
12
13
14
15
16
17
18
19


## 4.8. Skipping the First Part of an Iterable(跳过可迭代对象中的前一部分元素)

### Problem 

You want to iterate over items in an iterable, but the first few items aren’t of interest and you just want to discard them.

### Solution

The itertools module has a few functions that can be used to address this task. The first is the *itertools.dropwhile()* function. To use it, you supply a function and an iterable. The returned iterator discards the first items in the sequence as long as the supplied function returns True . Afterward, the entirety of the sequence is produced.

If you happen to know the exact number of items you want to skip, then you can use *itertools.islice()* instead. 

In [25]:
from itertools import islice

In [26]:
items = ['a', 'b', 'c', 1, 4, 10, 15]

In [27]:
for x in islice(items, 3, None):
    print(x)

1
4
10
15


In this example, the last None argument to *islice()* is required to indicate that you want everything beyond the first three items as opposed to only the first three items

## 4.9. Iterating Over All Possible Combinations or Permutations(迭代所有可能的组合或序列)

### Problem

You want to iterate over all of the possible combinations or permutations of a collection of items.

### Solution

The itertools module provides three functions for this task. The first of these— *itertools.permutations()* —takes a collection of items and produces a sequence of tuples that rearranges all of the items into all possible permutations (i.e., it shuffles them into all possible configurations). 

In [28]:
items = ['a', 'b', 'c']

In [29]:
from itertools import permutations

In [30]:
for p in permutations(items):
    print(p)

('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')


If you want all permutations of a smaller length, you can give an optional length argument.

In [31]:
for p in permutations(items, 2):
    print(p)

('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')


Use *itertools.combinations()* to produce a sequence of combinations of items taken from the input. (don't consider sequence)

In [32]:
from itertools import combinations

In [35]:
for c in combinations(items,3):
    print(c)

('a', 'b', 'c')


In [37]:
for c in combinations(items,2):
    print(c)

('a', 'b')
('a', 'c')
('b', 'c')


In [38]:
for c in combinations(items,1):
    print(c)

('a',)
('b',)
('c',)


When producing combinations, chosen items are removed from the collection of possible candidates (i.e., if 'a' has already been chosen, then it is removed from consideration). The *itertools.combinations_with_replacement()* function relaxes this, and allows the same item to be chosen more than once. 

In [42]:
for c in itertools.combinations_with_replacement(items, 3):
    print(c)

('a', 'a', 'a')
('a', 'a', 'b')
('a', 'a', 'c')
('a', 'b', 'b')
('a', 'b', 'c')
('a', 'c', 'c')
('b', 'b', 'b')
('b', 'b', 'c')
('b', 'c', 'c')
('c', 'c', 'c')


## 4.10. Iterating Over the Index-Value Pairs of a Sequence(以索引-值对的形式迭代序列)

### Problem

You want to iterate over a sequence, but would like to keep track of which element of the sequence is currently being processed.

### Solution

The built-in *enumerate()* function handles this quite nicely:

In [43]:
my_list = ['a', 'b', 'c']

In [44]:
for idx, val in enumerate(my_list):
    print(idx, val)

0 a
1 b
2 c


For printing output with canonical(权威) line numbers (where you typically start the numbering at 1 instead of 0), you can pass in a start argument:

In [45]:
for idx,val in enumerate(my_list, 1):
    print(idx,val)

1 a
2 b
3 c


*enumerate()* can be handy for keeping track of the offset into a list for occurrences of certain values, for example. So, if you want to map words in a file to the lines in which they occur, it can easily be accomplished using enumerate() to map each word to the line offset in the file where it was found:

## 4.11. Iterating Over Multiple Sequences Simultaneously(同时迭代多个序列)

### Problem

You want to iterate over the items contained in more than one sequence at a time.

### Solution

To iterate over more than one sequence simultaneously, use the *zip()* function.

In [46]:
xpts = [1, 5, 4, 2, 10, 7]

In [47]:
ypts = [101, 78, 37, 15, 62, 99]

In [48]:
for x, y in zip(xpts, ypts):
    print(x, y)

1 101
5 78
4 37
2 15
10 62
7 99


zip(a, b) works by creating an iterator that produces tuples (x, y) where x is taken from a and y is taken from b . Iteration stops whenever one of the input sequences is exhausted. Thus, the length of the iteration is the same as the length of the shortest input. 

In [49]:
a = [1, 2, 3]

In [50]:
b = ['w', 'x', 'y', 'z']

In [52]:
for i in zip(a,b):
    print(i)

(1, 'w')
(2, 'x')
(3, 'y')


If this behavior is not desired, use *itertools.zip_longest()* instead. 

In [55]:
from itertools import zip_longest
for i in zip_longest(a,b):
    print(i)

(1, 'w')
(2, 'x')
(3, 'y')
(None, 'z')


## 4.12. Iterating on Items in Separate Containers(在不同的容器中进行迭代)

### Problem

You need to perform the same operation on many objects, but the objects are contained in different containers, and you’d like to avoid nested loops without losing the readability of your code.

### Solution

The *itertools.chain()* method can be used to simplify this task. It takes a list of iterables as input, and returns an iterator that effectively masks the fact that you’re really acting on multiple containers. 

In [56]:
from itertools import chain

In [57]:
a = [1, 2, 3, 4]

In [58]:
b = ['x', 'y', 'z']

In [59]:
for  x in chain(a,b):
    print(x)

1
2
3
4
x
y
z


A common use of *chain()* is in programs where you would like to perform certain operations on all of the items at once but the items are pooled into different working sets. 

## 4.15. Iterating in Sorted Order Over Merged Sorted Iterables(合并多个有序序列，再对整个有序序列进行迭代)

### Problem

You have a collection of sorted sequences and you want to iterate over a sorted sequence of them all merged together.

### Solution

The *heapq.merge()* function does exactly what you want.

In [61]:
import heapq

In [62]:
a = [1, 4, 7, 10]

In [63]:
b = [2, 5, 6, 11]

In [64]:
for c in heapq.merge(a, b):
    print(c)

1
2
4
5
6
7
10
11


### Discussion

The iterative nature of *heapq.merge* means that it never reads any of the supplied sequences all at once. This means that you can use it on very long sequences with very little overhead. 