### 4.1 手动访问迭代器中的元素

采用手工的方式从文本中读取文本行
```python
with open('etc/passwd') as f:
    try:
        while Ture:
            line = next(f)
            print(line, end='')
    except StopIteration
    pass
```

一般来说， StopIteration异常是用来通知我们迭代结束的。  
如果手动使用next（），可以命令它返回一个结束值，例如None。  
```python
with open ('etc/passwd') as f:
    while True:
        line = next(f, None)
        if line is None:
            break
        print(line, end='')
```

In [1]:
items = [1, 2, 3]
# Get the iterator
it = iter(items)       # Invokes items.__iter__()
# Run the iterator
next(it)                # Invokes items.__next__()

1

### 4.2 委托迭代

In [2]:
class Node:
    def __init__(self, value):
        self._value =value
        self._children = []
        
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    
    def add_child(self, node):
        self._children.append(node)
        
    def __iter__(self):
        return iter(self._children)
    
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
for ch in root:
    print(ch)

Node(1)
Node(2)


### 4.3 用生成器创建新的迭代模式

In [3]:
def frange(start, stop, increment):
    x = start
    while x < stop:
        yield x
        x += increment
for n in frange(0, 4, 0.5):
    print(n, end='\t')

0	0.5	1.0	1.5	2.0	2.5	3.0	3.5	

### 4.4 实现迭代协议

实现一个迭代器能够以深度优先的模式遍历树的节点

In [5]:
class Node:
    def __init__(self, value):
        self._value =value
        self._children = []
        
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    
    def add_child(self, node):
        self._children.append(node)
        
    def __iter__(self):
        return iter(self._children)
    
    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()
            
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
child1.add_child(Node(3))
child1.add_child(Node(4))
child2.add_child(Node(5))

for ch in root.depth_first():
    print(ch, end='\t')

Node(0)	Node(1)	Node(3)	Node(4)	Node(2)	Node(5)	

Python的迭代协议要求__iter__()返回一个特殊的迭代器对象，  
该对象必须实现__next__()方法，并使用StopIteration异常来通知迭代的完成。  
实现这样的对象常常会比较繁琐，如下展示了depth_first()的另一种实现，  
这里使用了一个相关联的迭代器类。

In [15]:
class Node:
    def __init__(self, value):
        self._value =value
        self._children = []
        
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    
    def add_child(self, node):
        self._children.append(node)
        
    def __iter__(self):
        return iter(self._children)
    
    def depth_first(self):
        return DepthFirstIterator(self)
    
class DepthFirstIterator(object):
    
    def  __init__(self, start_node):
        self._node = start_node
        self._children_iter = None
        self._child_iter = None
        
    def __iter__(self):
        return self
    
    def __next__(self):
        # Return myself if just started;create an iterator for children
        if self._children_iter is None:
            self._children_iter = iter(self._node)
            return self._node
        
        # If Processing a child, return its next item
        elif self._child_iter:
            try:
                nextchild = next(self._child_iter)
                return nextchild
            except StopIteration:
                self._child_iter = None
                return next(self)
            
        # Advance to the next child and start its iteration
        else:
            self._child_iter = next(self._children_iter).depth_first()
            return next(self)
        


### 4.5 反向迭代

可以在自定义的类上实现__reversed__()方法来实现一个反向迭代器。

In [16]:
class CountDown:
    def __init__(self, start):
        self.start = start
        
    # Forward iterator
    def __iter__(self):
        n = self.start
        while n > 0:
            yield n
            n -= 1
    
    # Reverse iterator:
    def __reversed__(self):
        n = 1
        while n <= self.start:
            yield n
            n += 1

### 4.6 定义带有额外状态的生成器函数

可以将其实现为一个类，然后把生成器函数的代码放到__iter__()方法中即可。

In [24]:
from collections import deque
class linehistory:
    def __init__(self, lines, histlen=3):
        self.lines = lines
        self.history = deque(maxlen=histlen)
        
    def __iter__(self):
        for lineno, line in enumerate(self.lines, 1):
            self.history.append((lineno, line))
            yield line
            
    def clear(self):
        self.history.clear()

要使用这个类，可以将其看作是一个普通的生成器函数。  
但是，由于它会创建一个类实例，所以可以访问内部属性。

In [25]:
with open('somefile.txt') as f:
    lines = linehistory(f)
    for line in lines:
        if 'explicitly' in line:
            for lineno, hline in lines.history:
                print('{}:{}'.format(lineno, hline), end='')

11:Although practicality beats purity.
12:Errors should never pass silently.
13:Unless explicitly silenced.


这种生成器类如果用for循环之外的技术来驱动迭代过程，需额外调用一次iter（）。

In [27]:
f = open('somefile.txt')
lines = linehistory(f)
it = iter(lines)
next(it)

'The Zen of Python, by Tim Peters\n'

###  4.7 对迭代器做切片操作

In [28]:
def count(n):
    while True:
        yield n
        n += 1
        
import itertools
c = count(0)
for x in itertools.islice(c, 10, 20):
    print(x, end='\t')

10	11	12	13	14	15	16	17	18	19	

### 4.8 跳过可迭代对象中的前一部分元素

In [3]:
def reverse_count(n):
    while n > 0:
        yield n
        n -= 1
        
from itertools import dropwhile
c = reverse_count(20)
for x in dropwhile(lambda x : x > 10, c):
    print(x, end='\t')

10	9	8	7	6	5	4	3	2	1	

### 4.9 迭代所有可能的组合或排列

itertools.permutations()接受一个元素集合，将其中所有元素重排列为所有可能的情况

In [4]:
from itertools import permutations
items = ['a', 'b', 'c']
for p in permutations(items):
    print(p, end='\t')

('a', 'b', 'c')	('a', 'c', 'b')	('b', 'a', 'c')	('b', 'c', 'a')	('c', 'a', 'b')	('c', 'b', 'a')	

In [5]:
from itertools import permutations
items = ['a', 'b', 'c']
for p in permutations(items, 2):
    print(p, end='\t')

('a', 'b')	('a', 'c')	('b', 'a')	('b', 'c')	('c', 'a')	('c', 'b')	

itertools.combinations()可产生输入序列中元素的全部组合形式。

In [6]:
from itertools import combinations
for c in combinations(items, 2):
    print(c)

('a', 'b')
('a', 'c')
('b', 'c')


### 4.10 以索引-值对的形式迭代序列

In [8]:
my_list = ['a', 'b', 'c']
for idx, val in enumerate(my_list):
    print(idx, val)

0 a
1 b
2 c


In [9]:
my_list = ['a', 'b', 'c']
for idx, val in enumerate(my_list, 1):
    print(idx, val)

1 a
2 b
3 c


### 4.11 同时迭代多个序列

In [10]:
x = [1, 2, 3, 4]
y = [100, 200, 300, 400]
for a, b in zip(x, y):
    print(a, b)

1 100
2 200
3 300
4 400


### 4.12 在不同的容器中进行迭代

** 问题：  **  
    我们需要对许多对象执行相同的操作，但这些对象包含在不同的容器内，  
    我们希望可以避免写出嵌套的循环处理，保持代码可读性。

In [1]:
from itertools import chain
a = [1, 2, 3, 4]
b = ['x', 'y', 'z']
for x in chain(a, b):
    print(x, end= ' ')

1 2 3 4 x y z 

### 4.13 创建处理数据的管道

假设我们有一个超大的目录，其中都是想要出来的日志文件：  
* foo/  
    + access-log-012007.gz
    + access-log-022007.gz  
    + ...
    + access-log-032007.gz
* bar/
    + access-log-092007.bz2
    + ...
    + access-log-022008


假设每个文件都包含如下形式的数据行：  
* 124.115.6.12 -- [10/Jul/2019:00:18:50 -0500] "GET /robots.txt ..." 200 71
* 210.212.209.67 --[10/Jul/2019:00:18:51 -0500] "GET /ply/ ..." 200 11875
* ...

要处理这些文件，可以定义一系列小型的生成器函数，每个函数执行特定的独立任务。

```python
import os
import fnmatch
import gzip
import bz2
import re

def gen_find(filepat, top):
    """
    Find all filenames in a directory tree that match a shell wildcard pattern
    """
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist, filepat):
            yield os.path.join(path, name)
            
def gen_opener(filenames):
    """
    Open a sequence of filenames one at a time producing a file object.
    The file is closed immediately when proceeding to the next iteration.
    """
    for filename in filenames:
        if filename.endswith('.gz'):
            f = gzip.open(filename, 'rt')
        elif filename.endswith('.bz2'):
            f = bz2.open(filename, 'rt')
        else:
            f = open(filename, 'rt')
        yield f
        f.close()

def gen_concatenate(iterators):
    """
    Chain a sequence of iterators together into a single sequence
    """
    for it in iterators:
        yield from it
        
def gen_grep(pattern, lines):
    """
    Look for a regex pattern in a sequence of lines
    """
    pat = re.compile(pattern)
    for line in lines:
        if pat.search(line):
            yield line
 ```

现在可以简单地将这些函数堆叠起来形成一个数据处理地管道。  
如，要找出所有包含关键字python的日志行，只需要：
```python
lognames = gen_find('access-log*', 'www')
files = gen_opener(lognames)
lines = gen_concatenate(files)
pylines = gen_grep('(?i)python', lines)
for line in pylines:
    pass
```

如果稍后相对管道进行扩展，针织可以在生成器表达式中填充数据。  
如下可以找出传送的字节数并统计出字节总量：
```python
lognames = gen_find('access-log*', 'www')
files = gen_opener(lognames)
lines = gen_concatenate(files)
pylines = gen_grep('(?i)python', lines)
bytecolumn = (line.rsplit(None, 1)[1] for line in pylines)
bytes = (int(x) for x in bytecolumn if x != '-')
print('Total', sum(bytes))
```

### 4.14 扁平化处理嵌套的序列

假如有一个嵌套的序列，想将他扁平化处理为一列单独的值。  
使用带有yield from语句的递归生成器函数可以很容易解决。

In [11]:
from collections import Iterable

def flatten(items, ignore_types=(str, bytes)):
    for x in items:
        if isinstance(x ,Iterable) and not isinstance(x, ignore_types):
            yield from flatten(x)
        else:
            yield x
            
items = [1, 2, [3, 4, [5, 6], 7], 8]
for x in flatten(items):
    print(x, end=' ')

1 2 3 4 5 6 7 8 

如果想编写生成器用来把其他的生成器当作子例程调用， yield from是个不错的快捷方式。  
如果不这么用，就需要编写额外for循环的代码，比如这样：
```python
def flatten(items, ignore_types=(str, bytes)):
    for x in items:
        if isinstance(x ,Iterable) and not isinstance(x, ignore_types):
            for i in flatten(x):
                yield i
        else:
            yield x
```            

###  4.15 合并多个有序序列，再对整个有序序列进行迭代

In [12]:
import heapq
a = [1, 4, 7, 10]
b = [2, 5, 6, 11]
for c in heapq.merge(a, b):
    print(c, end= ' ')

1 2 4 5 6 7 10 11 

合并两个有序的文件
```python
import heapq

with open('sorted_file_1', 'rt') as file1, \
    open('sorted_file_2', 'rt') as file2, \
    open('merged_file_3', 'rt') as outf:
        for line in heapq.merge(file1, file2):
            outf.write(line)
```            

### 4.16 用迭代器取代while循环

我们的代码采用while循环来迭代处理数据， 因为这其中涉及及调用某个函数或有某种不常见的测试条件， 
而这些东西没法归类为常见的迭代模式。  
在涉及I/O处理的程序中，编写这样的代码是很常见的：
```python
CHUNKSIZE = 8192

def reader(s):
    while True:
        data = s.recv(CHUNKSIZE)
        if data == b'':
            break
        process_data(data)
```        

这样的代码常常可以用iter（）来替换，如下：
```python
def reader(s):
    for chunck in iter(lambda: s.recv(CHUNKSIZE), b''):
        process_data(data)
```        