## Comprehensions and Generators

### Avoid More Than Two Control Subexpressions in Comprehensions

列表解析是我们很常用的feature，它支持multiple levels的循环嵌套, multiple conditions per loop level的约束逻辑组合，如下例所示。

当超过两层循环，或condition超过两个的时候，应该用普通的for循环替代，增强可读性。

In [1]:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row if x > 2 and x <5] # 两层
print(flat)

[3, 4]


### Avoid Repeated Work in Comprehensions by Using Assignment Expressions

列表解析中的条件判断常常会引入重复的code, 正如前面提到的，可以通过python 3.8引入的`walrus operator :=`来解决

In [4]:
def get_batch(count, size):
    return count // size

somelist = [125, 35, 8, 24]

# repeat
result = [get_batch(i, 8) for i in somelist if get_batch(i, 8)]
print(result)

# solution
result = [batch for i in somelist if (batch := get_batch(i, 8))]
print(result)

[15, 4, 1, 3]
[15, 4, 1, 3]


### Consider Generators Instead of Returning Lists

直接return一个list是我们熟悉的写法，这种写法最主要的问题在于，当数据量很大的时候会爆内存。

针对这种情况，可以用generator来替代，解决内存问题。

用yield实现的generator function, 被调用时"does not actually run but instead immediately returns an iterator".

而每次调用next(iterator)时，"the iterator advances the generator to its **next `yield` expression**", 即执行到下一个yield语句前停止，并返回一次值。

In [11]:
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == " ":
            yield index + 1

address = 'Four score and seven years ago...'

# 可以用next获取下一个值
it = index_words_iter(address)
print(next(it))
print(next(it))

# 可以将iterator强制转换成list
it = index_words_iter(address)
print(list(it))

# 可以用islice切片
import itertools
it = index_words_iter(address)
print(list(itertools.islice(it, 0, 5)))

0
5
[0, 5, 11, 15, 21, 27]
[0, 5, 11, 15, 21]


### Be Defensive When Iterating Over Arguments

书接上文x

这里针对的主要是迭代的参数是iterator而不是container的情况

因为iterator的默认实现只能遍历一次，exhausted之后就空了，而且不会报错。

如果函数中有多轮遍历的情况，应该在开头检测是否是iterator, 如果是，就raise expcetion.

另一方面，针对多轮遍历iterator的需求，可以重载__iter__，被调用时返回一个全新的generator, 就可以支持多轮遍历了，下面就是一个例子。

- iterator: 实现__iter__和__next__
- iterable: 实现__iter__和__getitem__, 如list

In [None]:
class MyNumbers:
    '''
    一个正常的iterator实现
    __iter__中return self
    __next__中遍历结束后raise StopIteration
    '''
    def __iter__(self):
        self.a = 1
        return self
    def __next__(self):
        if self.a <= 20:
            x = self.a
            self.a += 1
            return x
        else:
            raise StopIteration

class ReadVisits:
    def __init__(self, data_path):
        self.data_path = data_path
    '''
    这个例子中通过yield返回的generator自带了__next__
    所以只需手动重载__iter__
    '''
    def __iter__(self):
        print('call __iter__')
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

def normalize(numbers):
    total = sum(numbers) # 第一次调用__iter__
    result = []
    for value in numbers: # 第二次调用__iter__
        percent = 100 * value / total
        result.append(percent)
    return result