# Laziness

*加速工作的效率和规模化*


In [4]:
# We build a somewhat large list of integers

integers = range(10000000)

这是10,000,000个数字。 不是很大，但也不小。 我们需要小心，这些整数需要蛮大的资源代价。

素数计算测试是昂贵的。 让我们来做！


In [2]:
import sympy # don't worry if you don't have this
from toolz import filter
primes = filter(sympy.isprime, integers)

Q: *我认为测试素数计算是相当昂贵的。 为什么这么快就回来？

A: The `toolz.filter` is lazy.  没有进行实际的工作。

In [3]:
primes  # not actually a list of primes

<filter at 0x1fc3be0>

然而，我们仍然可以将变量primes看作一个素数列表。 让我们打印前几个

In [5]:
next(primes)

2

In [6]:
next(primes)

3

In [7]:
next(primes)

5

In [8]:
# Note that primes is not a list like integers.  We can't index into it

primes[1000] # 1000th prime please

TypeError: 'filter' object is not subscriptable

In [13]:
# But we *can* still iterate over it

for p in primes:
    print(p)
    if p > 100:  # stop us from going through the whole list
        break

109


请注意，前几个素数2，3，5不包含在此列表中。 惰性迭代器中的项目在使用时消耗，永远不会再被看到。 如果你真的想存储迭代器，那就调用它的`list`构造函数

In [14]:
primes = list(primes)  # fully evaluate the lazy iterator, store it in a list

## Infinite iterators

懒惰执行可以理解为 我们在谈工作，而没有真正在做它。 这种与现实的分离开辟了新的可能性。 例如，这里是所有斐波那契数字

In [15]:
def fib():
    """ A generator for all Fibonacci numbers """
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b


In [18]:
fibs = fib()  # make a lazy iterator, fibs
print(next(fibs))
print(next(fibs))
print(next(fibs))
print(next(fibs))


0
1
1
2


In [21]:
fibs[100]

TypeError: 'generator' object is not subscriptable

## Toolz functions

Toolz提供了与惰性迭代器交互的函数。

In [19]:
from toolz import first, second, nth, drop, take

In [22]:
first(fib())

0

In [23]:
second(fib())

1

In [25]:
nth(100, fib())

354224848179261915075

In [26]:
take(20, fib())  # Yet another iterator

<itertools.islice at 0x7e9b9a8>

In [27]:
list(take(20, fib()))  # But this one is finite, so we can expand it with list

[0,
 1,
 1,
 2,
 3,
 5,
 8,
 13,
 21,
 34,
 55,
 89,
 144,
 233,
 377,
 610,
 987,
 1597,
 2584,
 4181]

In [28]:
later_fibs = drop(100, fib())

In [29]:
next(later_fibs)

354224848179261915075

In [30]:
# These all depend on itertools.islice
from itertools import islice
list(islice(fib(), 10, 20, 2))  # fib()[10:20:1]

[55, 144, 377, 987, 2584]

## 示例：将函数应用于文本

Python文件对象是一个懒惰的迭代器。 当我打开一个大的文本文件时，Python实际上并没有一次读入所有的文本。

In [31]:
book = open('data/tale-of-two-cities.txt')

Each element of the book is a line of text.  

In [33]:
print(1, next(book))
print(2, next(book))
print(3, next(book))
print(4, next(book))

1 锘縏he Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens

2 

3 This eBook is for the use of anyone anywhere at no cost and with

4 almost no restrictions whatsoever.  You may copy it, give it away or



这是书Gutenberg的标题。 这大概是112行。 让我们开始翻书，然后放下标题。

In [34]:
book = open('data/tale-of-two-cities.txt')
book = drop(112, book)
next(book)

'It was the best of times,\n'

这看起来很熟悉。 让我们从所有的行中删除 `\r\n`

In [23]:
from toolz import map  # toolz.map is lazy too

In [35]:
book = map(str.strip, book)
next(book)

'it was the worst of times,'

在语义上，就好像我们将“str.strip”应用于书中的每一行

实际上我们没有做任何工作。

相反，`map`返回了一个新的惰性迭代器，它从原始的`book`中绘制一个元素，应用`str.strip`，然后得到结果。 只有当我们要求时才这样做。



我们可以将懒惰操作链接在一起。 剥去每一行后，我们将把重点放在那些包含单词“Mr.” 的行上，或带有 “Miss” 和 "Mrs."


In [38]:
def good_line(line):
    return "Mr." in line or "Miss" in line or "Mrs." in line

book = filter(good_line, book)
next(book)

'"Yes, Mr. Lorry."'

In [40]:
# Lets take 10 lines from this iterator and display them

for line in take(10, book):
    print(1, line)


1 "I know this messenger, guard," said Mr. Lorry, getting down into the
1 like a larger dog-kennel. Mr. Lorry, the passenger, shaking himself out
1 Mr. Lorry dropped off to sleep. The arrival of his breakfast roused him,
1 time to-day. She may ask for Mr. Jarvis Lorry, or she may only ask for a
1 When Mr. Lorry had finished his breakfast, he went out for a stroll on
1 again charged with mist and vapour, Mr. Lorry's thoughts seemed to cloud
1 Mr. Lorry had been idle a long time, and had just poured out his last
1 In a very few minutes the waiter came in to announce that Miss Manette
1 Miss Manette had taken some refreshment on the road, and required none
1 wig at the ears, and follow the waiter to Miss Manette's apartment.


## Motivation

懒惰主要有两个原因

1.避免无用的工作：我们经常在数据集的开头附近找到我们需要的东西。 在整个数据集上计算是浪费的

2.内存：只有少量的数据集需要留在任何一点的内存。 这使我们能够传输非常大的数据集。

在每种情况下，我们都接近传统的Python语法。 我们经常编写懒惰的代码而不知道它。