## Comprehensions and Generators

### Avoid More Than Two Control Subexpressions in Comprehensions

列表解析是我们很常用的feature，它支持multiple levels的循环嵌套, multiple conditions per loop level的约束逻辑组合，如下例所示。

当超过两层循环，或condition超过两个的时候，应该用普通的for循环替代，增强可读性。

In [1]:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row if x > 2 and x <5] # 两层
print(flat)

[3, 4]


### Avoid Repeated Work in Comprehensions by Using Assignment Expressions

列表解析中的条件判断常常会引入重复的code, 正如前面提到的，可以通过python 3.8引入的`walrus operator :=`来解决

In [2]:
def get_batch(count, size):
    return count // size

somelist = [125, 35, 8, 24]

# repeat
result = [get_batch(i, 8) for i in somelist if get_batch(i, 8)]
print(result)

# solution
result = [batch for i in somelist if (batch := get_batch(i, 8))]
print(result)

[15, 4, 1, 3]
[15, 4, 1, 3]


### Consider Generators Instead of Returning Lists

直接return一个list是我们熟悉的写法，这种写法最主要的问题在于，当数据量很大的时候会爆内存。

针对这种情况，可以用generator来替代，解决内存问题。

用yield实现的generator function, 被调用时"does not actually run but instead immediately returns an iterator".

而每次调用next(iterator)时，"the iterator advances the generator to its **next `yield` expression**", 即执行到下一个yield语句前停止，并返回一次值。

In [3]:
def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == " ":
            yield index + 1

address = 'Four score and seven years ago...'

# 可以用next获取下一个值
it = index_words_iter(address)
print(next(it))
print(next(it))

# 可以将iterator强制转换成list
it = index_words_iter(address)
print(list(it))

# 可以用islice切片
import itertools
it = index_words_iter(address)
print(list(itertools.islice(it, 0, 5)))

0
5
[0, 5, 11, 15, 21, 27]
[0, 5, 11, 15, 21]


### Be Defensive When Iterating Over Arguments

书接上文x

这里针对的主要是迭代的参数是iterator而不是container的情况

因为iterator的默认实现只能遍历一次，exhausted之后就空了，而且不会报错。

如果函数中有多轮遍历的情况，应该在开头检测是否是iterator, 如果是，就raise expcetion.

另一方面，针对多轮遍历iterator的需求，可以重载__iter__，被调用时返回一个全新的generator, 就可以支持多轮遍历了，下面就是一个例子。

- iterator: 实现__iter__和__next__
- iterable: 实现__iter__和__getitem__, 如list

In [4]:
class MyNumbers:
    '''
    一个正常的iterator实现
    __iter__中return self
    __next__中遍历结束后raise StopIteration
    '''
    def __iter__(self):
        self.a = 1
        return self
    def __next__(self):
        if self.a <= 20:
            x = self.a
            self.a += 1
            return x
        else:
            raise StopIteration

class ReadVisits:
    def __init__(self, data_path):
        self.data_path = data_path
    '''
    这个例子中通过yield返回的generator自带了__next__
    所以只需手动重载__iter__
    '''
    def __iter__(self):
        print('call __iter__')
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

def normalize(numbers):
    total = sum(numbers) # 第一次调用__iter__
    result = []
    for value in numbers: # 第二次调用__iter__
        percent = 100 * value / total
        result.append(percent)
    return result

### Consider Generator Expressions for Larget List Comprehensions

在基本的generator和列表解析基础上， python还支持generate expression

"putting list-comprehension-like syntax between `()` characters" to create a generator expression

In [5]:
somelist = list(range(1, 5))
it = (x for x in somelist) # generator expression
print(it)

roots = ((x, x**0.5) for x in it)# generate expression可以进一步组合
print(next(roots))

<generator object <genexpr> at 0x7ff9f47c7120>
(1, 1.0)


### Compose Multiple Generators with `yield from`


yield from是用来处理下例所示的情况。

除了简洁性，yield from也有明显的速度提升。

In [6]:
import timeit

'''
timeit是一个用来测试code snippet执行时间的library
有几个常用函数
timeit.timeit(stmt, setup, timer, number)
timeit.repeat(stmt, setup, timer, repeat, number)

stmt: 要测试的代码
setup: This will have setup details that need to be executed before stmt. The default value is "pass."
timer: This will have the timer value, timeit() already has a default value set, and we can ignore it.
number: 要执行的次数，default是1e6
repeat: 重复执行timeit的次数
'''

def func():
    for i in range(50):
        yield i

def slow():
    for i in func():
        yield i
        
def fast():
    yield from func()

baseline = timeit.timeit(
    stmt='for _ in slow(): pass',
    globals=globals(),
    number=int(1e6))

comparsion = timeit.timeit(
    stmt='for _ in fast(): pass',
    globals=globals(),
    number=int(1e6))

print('baseline = ', baseline)
print('composed = ', comparsion)
print(f'reduction = {((baseline - comparsion) / baseline):.1%}')

baseline =  4.457575645297766
composed =  4.280537655577064
reduction = 4.0%


### Avoid Injecting Data into Generators with `send`

### Avoid Causing State Transitions in Generators with `throw`

(感觉暂时用不到generator的send, throw, 先跳过

### Consider `itertools` for Working with Iterators and Generators

itertools确实好东西！

函数可大致分为三种

- linking iterators together
- filtering items from an iterator
- producing combinations of items from iterators

可用help(itertools)查看帮助

现用现查吧(

In [7]:
import itertools

'''
chain:
数字倒是可以直接用numpy reshape从而flatten
但对于其他类型（比如最常见的字符串)就没有办法了
而chain是永远的神
'''
somelist = [['a','b'], ['c', 'd']]
it = itertools.chain(somelist)
print(list(it))

'''
repeat: 重复
'''
it = itertools.repeat('hello', 3)
print(list(it))

'''
cycle: repeat an **iterator's items**
即不会保留原始的嵌套
'''
it = itertools.cycle([1, 2])
result = [next(it) for _ in range(10)]
print(result)

[['a', 'b'], ['c', 'd']]
['hello', 'hello', 'hello']
[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]
