### 第一章 数据结构和算法
#### 1.1 解压序列赋值给多个变量

In [2]:
x, y = (4, 5)
print(x, y)
x, y, z = (4, 5)

4 5


ValueError: not enough values to unpack (expected 3, got 2)

#### 1.2 解压可迭代对象赋值给多个变量

In [6]:
*x, y = (1, 2, 3, 4, 5)
print(x, y)
record = ('sdf', 50, 123.45, (12, 18, 2019))
name, year = record[0], record[-1][-1]
print(name, year)
name, *_, (*_, year) = record
print(name, year)

[1, 2, 3, 4] 5
sdf 2019
sdf 2019


#### 1.3 保留最后 N 个元素

In [27]:
from collections import deque
def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line:
            yield line, previous_lines
        previous_lines.append(line)

In [34]:
lines = ['aaaaaaaaaa', 'python', 'bbbbbbbb', 'python', 'cccccc', 'dddddd']
gen = search(lines, 'python')

In [35]:
for lin, pre in gen:
    print(lin, pre)
print(pre)

python deque(['aaaaaaaaaa'], maxlen=5)
python deque(['aaaaaaaaaa', 'python', 'bbbbbbbb'], maxlen=5)
deque(['python', 'bbbbbbbb', 'python', 'cccccc', 'dddddd'], maxlen=5)


#### 1.4 查找最大或最小的 N 个元素

In [36]:
import heapq

nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums))
print(heapq.nsmallest(3, nums))

[42, 37, 23]
[-4, 1, 2]


In [37]:
portfolio = [
{'name': 'IBM', 'shares': 100, 'price': 91.1},
{'name': 'AAPL', 'shares': 50, 'price': 543.22},
{'name': 'FB', 'shares': 200, 'price': 21.09},
{'name': 'HPQ', 'shares': 35, 'price': 31.75},
{'name': 'YHOO', 'shares': 45, 'price': 16.35},
{'name': 'ACME', 'shares': 75, 'price': 115.65}
]
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])
print(cheap)
print(expensive)

[{'name': 'YHOO', 'shares': 45, 'price': 16.35}, {'name': 'FB', 'shares': 200, 'price': 21.09}, {'name': 'HPQ', 'shares': 35, 'price': 31.75}]
[{'name': 'AAPL', 'shares': 50, 'price': 543.22}, {'name': 'ACME', 'shares': 75, 'price': 115.65}, {'name': 'IBM', 'shares': 100, 'price': 91.1}]


In [51]:
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
heap = list(nums)
heapq.heapify(heap)
print(heap)

[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]


In [52]:
heapq.heappop(heap)

-4

#### 1.5 实现一个优先级队列
怎样实现一个按优先级排序的队列？并且在这个队列上面每次 pop 操作总是返回
优先级最高的那个元素

In [57]:
import heapq

class PriorityQueue:
    
    def __init__(self):
        self._queue = []
        self._index = 0
    
    def push(self, item, priority):
        heapq.heappush(self._queue, (-priority, self._index, item))
        self._index += 1
    
    def pop(self):
        return heapq.heappop(self._queue)[-1]

In [64]:
q = PriorityQueue()
q.push('a', 1)
q.push('b', 5)
q.push('c', 4)
q.push('d', 1)

In [65]:
q.pop()

'b'

#### 1.6 字典中的键映射多个值
怎样实现一个键对应多个值的字典（也叫 multidict）？

In [2]:
from collections import defaultdict

d = defaultdict(set)
d['a'].add(1)
d['a'].add(2)
d['b'].add(3)
print(d)

defaultdict(<class 'set'>, {'a': {1, 2}, 'b': {3}})


#### 1.7 字典排序
你想创建一个字典，并且在迭代或序列化这个字典的时候能够控制元素的顺序。

In [4]:
from collections import OrderedDict

d = OrderedDict()
d['a'] = 1
d['b'] = 2
d['c'] = 3
for k in d:
    print(k, d[k])

a 1
b 2
c 3


#### 1.8 字典的运算
怎样在数据字典中执行一些计算操作（比如求最小值、最大值、排序等等）？

In [5]:
prices = {
'ACME': 45.23,
'AAPL': 612.78,
'IBM': 205.55,
'HPQ': 37.20,
'FB': 10.75
}

In [14]:
min_price = min(zip(prices.values(), prices.keys()))
print(min_price)
max_price = max(zip(prices.values(), prices.keys()))
print(max_price)
prices_sorted = sorted(zip(prices.values(), prices.keys()))
print(prices_sorted)

(10.75, 'FB')
(612.78, 'AAPL')
[(10.75, 'FB'), (37.2, 'HPQ'), (45.23, 'ACME'), (205.55, 'IBM'), (612.78, 'AAPL')]


#### 1.9 查找两字典的相同点
怎样在两个字典中寻找相同点（比如相同的键、相同的值等等）？

In [15]:
a = {
'x' : 1,
'y' : 2,
'z' : 3
}
b = {
'w' : 10,
'x' : 11,
'y' : 2
}

In [31]:
print(a.keys() & b.keys())
print(a.keys() - b.keys())
print(a.items() & b.items())
c = {key: a[key] for key in a.keys() - {'z': 'w'}}
print(c)

{'x', 'y'}
{'z'}
{('y', 2)}
{'x': 1, 'y': 2}


#### 1.10 删除序列相同元素并保持顺序
怎样在一个序列上面保持元素顺序的同时消除重复的值？

In [40]:
def dedupe1(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)
def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

In [41]:
a = [1, 5, 2, 1, 9, 1, 5, 10]
print(list(dedupe(a)))

[1, 5, 2, 9, 10]


#### 1.11 命名切片
你的程序已经出现一大堆已无法直视的硬编码切片下标，然后你想清理下代码。

In [43]:
######    0123456789012345678901234567890123456789012345678901234567890'
record = '....................100 .......513.25 ..........'
cost = int(record[20:23]) * float(record[31:37])
print(cost)

51325.0


In [45]:
SHARES = slice(20, 24)
PRICE = slice(31, 37)
cost = int(record[SHARES]) * float(record[PRICE])
print(cost)

51325.0


In [51]:
a = slice(5, 50, 2)
s = 'HelloWorld'
print(a.indices(len(s)))
for i in range(*a.indices(len(s))):
    print(s[i])

(5, 10, 2)
W
r
d


#### 1.12 序列中出现次数最多的元素
怎样找出一个序列中出现次数最多的元素呢？

In [55]:
from collections import Counter

words = [
'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
'my', 'eyes', "you're", 'under'
]
word_counts = Counter(words)
top_three = word_counts.most_common(3)
print(top_three)

[('eyes', 8), ('the', 5), ('look', 4)]


#### 1.13 通过某个关键字排序一个字典列表
你有一个字典列表，你想根据某个或某几个字典字段来排序这个列表。

In [62]:
rows = [
{'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]
rows.sort(key=lambda n: n['uid'])
print(rows)

[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]


In [64]:
rows = [
{'fname': 'Brian', 'lname': 'Jones', 'uid': 1003},
{'fname': 'David', 'lname': 'Beazley', 'uid': 1002},
{'fname': 'John', 'lname': 'Cleese', 'uid': 1001},
{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}
]
from operator import itemgetter

rows_by_fname = sorted(rows, key=itemgetter('fname'))
rows_by_uid = sorted(rows, key=itemgetter('uid'))
print(rows_by_fname)
print(rows_by_uid)

[{'fname': 'Big', 'lname': 'Jones', 'uid': 1004}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'John', 'lname': 'Cleese', 'uid': 1001}]
[{'fname': 'John', 'lname': 'Cleese', 'uid': 1001}, {'fname': 'David', 'lname': 'Beazley', 'uid': 1002}, {'fname': 'Brian', 'lname': 'Jones', 'uid': 1003}, {'fname': 'Big', 'lname': 'Jones', 'uid': 1004}]


#### 1.14 排序不支持原生比较的对象
你想排序类型相同的对象，但是他们不支持原生的比较操作。


In [68]:
class User:
    def __init__(self, user_id):
        self.user_id = user_id
    def __repr__(self):
        return 'User({})'.format(self.user_id)

In [69]:
from operator import attrgetter

users = [User(23), User(3), User(99)]
print(sorted(users, key=attrgetter('user_id')))

[User(3), User(23), User(99)]


#### 1.15 通过某个字段将记录分组
你有一个字典或者实例的序列，然后你想根据某个特定的字段比如 date 来分组迭
代访问。

In [70]:
rows = [
{'address': '5412 N CLARK', 'date': '07/01/2012'},
{'address': '5148 N CLARK', 'date': '07/04/2012'},
{'address': '5800 E 58TH', 'date': '07/02/2012'},
{'address': '2122 N CLARK', 'date': '07/03/2012'},
{'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
{'address': '1060 W ADDISON', 'date': '07/02/2012'},
{'address': '4801 N BROADWAY', 'date': '07/01/2012'},
{'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]

In [81]:
from operator import itemgetter
from itertools import groupby

rows.sort(key=itemgetter('date'))
for date, items in groupby(rows, key=itemgetter('date')):
    print(date)
    for i in items:
        print(' ', i)

07/01/2012
  {'address': '5412 N CLARK', 'date': '07/01/2012'}
  {'address': '4801 N BROADWAY', 'date': '07/01/2012'}
07/02/2012
  {'address': '5800 E 58TH', 'date': '07/02/2012'}
  {'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'}
  {'address': '1060 W ADDISON', 'date': '07/02/2012'}
07/03/2012
  {'address': '2122 N CLARK', 'date': '07/03/2012'}
07/04/2012
  {'address': '5148 N CLARK', 'date': '07/04/2012'}
  {'address': '1039 W GRANVILLE', 'date': '07/04/2012'}


#### 1.16 过滤序列元素
你有一个数据序列，想利用一些规则从中提取出需要的值或者是缩短序列

In [84]:
mylist = [1, 4, -5, 10, -7, 2, 3, -1]
print([n for n in mylist if n > 0])

[1, 4, 10, 2, 3]


In [85]:
values = ['1', '2', '-3', '-', '4', 'N/A', '5']
def is_int(val):
    try:
        x = int(val)
        return True
    except ValueError:
        return False
ivals = list(filter(is_int, values))
print(ivals)

['1', '2', '-3', '4', '5']


In [86]:
addresses = [
'5412 N CLARK',
'5148 N CLARK',
'5800 E 58TH',
'2122 N CLARK',
'5645 N RAVENSWOOD',
'1060 W ADDISON',
'4801 N BROADWAY',
'1039 W GRANVILLE',
]
counts = [ 0, 3, 10, 4, 1, 7, 6, 1]

In [88]:
from itertools import compress

more5 = [n > 5 for n in counts]
print(more5)
print(list(compress(addresses, more5)))

[False, False, True, False, False, True, True, False]
['5800 E 58TH', '1060 W ADDISON', '4801 N BROADWAY']


#### 1.17 从字典中提取子集
你想构造一个字典，它是另外一个字典的子集。

In [2]:
prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}
p1 = {k: v for k, v in prices.items() if v > 200}
print(p1)

{'AAPL': 612.78, 'IBM': 205.55}


#### 1.18 映射名称到序列元素
你有一段通过下标访问列表或者元组中元素的代码，但是这样有时候会使得你的
代码难以阅读，于是你想通过名称来访问元素。

In [3]:
from collections import namedtuple

Subscriber = namedtuple('Subscriber', 'addr joined')
sub = Subscriber('sdfsefsefwse', '2019-12-11')
print(sub.addr, sub.joined)

sdfsefsefwse 2019-12-11


#### 1.19 转换并同时计算数据
你需要在数据序列上执行聚集函数（比如 sum() , min() , max() ），但是首先你需
要先转换或者过滤数据

In [2]:
nums = [1, 2, 3, 4, 5]
s = sum(x ** 2 for x in nums)
print(s)
s = ('ACME', 50, 123.45)
print(','.join(str(x) for x in s))

55
ACME,50,123.45


#### 1.20 合并多个字典或映射
现在有多个字典或者映射，你想将它们从逻辑上合并为一个单一的映射后执行某
些操作，比如查找值或者检查某些键是否存在。

In [9]:
a = {'x': 1, 'z': 3 }
b = {'y': 2, 'z': 4 }

from collections import ChainMap
c = ChainMap(a, b)
print(c['x'])
print(c['y'])
print(c['z'])

1
2
3


### 第二章：字符串和文本
#### 2.1 使用多个界定符分割字符串
你需要将一个字符串分割为多个字段，但是分隔符 (还有周围的空格) 并不是固定
的。

In [17]:
line = 'asdf fjdk; afed, fjek,asdf, foo'
import re
print(re.split(r'[;,\s]+', line))
print(re.split(r'(;|,|\s)\s*', line))

['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo']


#### 2.2 字符串开头或结尾匹配
你需要通过指定的文本模式去检查字符串的开头或者结尾，比如文件名后缀，URL
Scheme 等等。

In [21]:
filename = 'spam.txt'
print(filename.endswith('.txt'))
print(filename.startswith('spam'))
filename = [ 'Makefile', 'foo.c', 'bar.py', 'spam.c', 'spam.h' ]
print([name for name in filename if name.endswith(('.c', '.h'))])
print([name for name in filename if name.endswith(['.c', '.h'])])

True
True
['foo.c', 'spam.c', 'spam.h']


TypeError: endswith first arg must be str or a tuple of str, not list

#### 2.3 用 Shell 通配符匹配字符串
你想使用 Unix Shell 中常用的通配符 (比如 *.py , Dat[0-9]*.csv 等) 去匹配文
本字符串

In [28]:
from fnmatch import fnmatch, fnmatchcase

print(fnmatch('foo.txt', '*.txt'))
print(fnmatch('foo.txt', '?oo.txT'))
print(fnmatchcase('foo.txt', '?oo.txT'))
print(fnmatch('Dat45.cxv', 'Dat[0-9]*'))

True
True
False
True


#### 2.4 字符串匹配和搜索
你想匹配或者搜索特定模式的文本
- 如果你想匹配的是字面字符串，那么你通常只需要调用基本字符串方法就行，比如str.find() , str.endswith() , str.startswith() 或者类似的方法：

In [33]:
text = 'yeah, but no, but yeah, but no, but yeah'
print(text == 'yeah')
print(text.startswith('yeah'))
print(text.endswith('no'))
print(text.find('no'))

False
True
False
10


- 对于复杂的匹配需要使用正则表达式和 re 模块。为了解释正则表达式的基本原理，
假设你想匹配数字格式的日期字符串比如 11/27/2012 ，你可以这样做：

In [35]:
text1 = '11/27/2012'
text2 = 'Nov 27, 2012'
import re
if re.match(r'\d+/\d+/\d+', text1):
    print('yes')
else:
    print('no')

yes


#### 2.5 字符串搜索和替换
你想在字符串中搜索和替换指定的文本模式

In [40]:
text = 'yeah, but no, but yeah, but no, but yeah'
print(text.replace('yeah', 'yep'))
text = 'Today is 11/27/2012. PyCon starts 3/13/2013.'
import re
print(re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text))

yep, but no, but yep, but no, but yep
Today is 2012-11-27. PyCon starts 2013-3-13.


In [43]:
from calendar import month_abbr

def change_date(m):
    mon_name = month_abbr[int(m.group(1))]
    return '{} {} {}'.format(m.group(2), mon_name, m.group(3))
print(re.sub(r'(\d+)/(\d+)/(\d+)', change_date, text))

Today is 27 Nov 2012. PyCon starts 13 Mar 2013.


#### 2.6 字符串忽略大小写的搜索替换
你需要以忽略大小写的方式搜索与替换文本字符串
- re.I 

#### 2.7 最短匹配模式
你正在试着用正则表达式匹配某个文本模式，但是它找到的是模式的最长可能匹配。而你想修改它变成查找最短的可能匹配。
- ? f非贪婪模式

#### 2.8 多行匹配模式
你正在试着使用正则表达式去匹配一大块的文本，而你需要跨越多行去匹配。
- re.S

#### 2.9 将 Unicode 文本标准化
你正在处理 Unicode 字符串，需要确保所有字符串在底层有相同的表示。

In [47]:
s1 = 'Spicy Jalape\u00f1o'
s2 = 'Spicy Jalapen\u0303o'
print(s1)
print(s2)
print(s1 == s2, len(s1), len(s2))

Spicy Jalapeño
Spicy Jalapeño
False 14 15


In [52]:
import unicodedata
t1 = unicodedata.normalize('NFC', s1)
t2 = unicodedata.normalize('NFC', s2)
print(t1 == t2, ascii(t1))
t3 = unicodedata.normalize('NFD', s1)
t4 = unicodedata.normalize('NFD', s2)
print(t3 == t4, ascii(t3))

True 'Spicy Jalape\xf1o'
True 'Spicy Jalapen\u0303o'


In [56]:
s = '\ufb01'
print(unicodedata.normalize('NFD', s))
print(unicodedata.normalize('NFKC', s))
print(unicodedata.normalize('NFKD', s))

ﬁ
fi
fi


In [59]:
t1 = unicodedata.normalize('NFD', s1)
print(''.join(c for c in t1 if not unicodedata.combining(c)))

Spicy Jalapeno


#### 2.10 在正则式中使用 Unicode
你正在使用正则表达式处理文本，但是关注的是 Unicode 字符处理。

In [65]:
import re

num = re.compile(r'\d+')
print(num.match('123').group())
print(num.match('\u0661\u0662\u0663').group())


123
١٢٣


#### 2.11 删除字符串中不需要的字符
你想去掉文本字符串开头，结尾或者中间不想要的字符，比如空白。
- strip、lstrip、rstrip、replace、re.sub

#### 2.12 审查清理文本字符串
一些无聊的幼稚黑客在你的网站页面表单中输入文本”pýtĥöñ”，然后你想将这些
字符清理掉。

In [66]:
s = 'pýtĥöñ\fis\tawesome\r\n'
print(s)

pýtĥöñis	awesome



In [70]:
remap = {
    ord('\t'): ' ',
    ord('\f'): ' ',
    ord('\r'): None
}
a = s.translate(remap)
print(a)

pýtĥöñ is awesome



In [76]:
import unicodedata
import sys
cmb_chrs = dict.fromkeys(c for c in range(sys.maxunicode) if unicodedata.combining(chr(c)))
b = unicodedata.normalize('NFD', a)
print(b)
print(b.translate(cmb_chrs))

pýtĥöñ is awesome

python is awesome



#### 2.13 字符串对齐
你想通过某种对齐方式来格式化字符串

In [5]:
text = 'Hello World'
print(text.ljust(20, '='))
print(text.rjust(20, '-'))
print(text.center(20, '*'))
print('{:^20}'.format('abc'))

---------Hello World
****Hello World*****
        abc         


#### 2.14 合并拼接字符串
你想将几个小的字符串合并为一个大的字符串

In [9]:
parts = ['Is', 'Chicago', 'Not', 'Chicago?']
print(''.join(parts))
print(' '.join(parts))
print(','.join(parts))

IsChicagoNotChicago?
Is Chicago Not Chicago?
Is,Chicago,Not,Chicago?


#### 2.15 字符串中插入变量
你想创建一个内嵌变量的字符串，变量被它的值所表示的字符串替换掉。

In [13]:
s = '{name} has {n} message.'
print(s.format(name='Guido', n=39))

Guido has 39 message.


In [16]:
name = 'ling'
n = 27
print(s.format_map(vars()))

ling has 27 message.


#### 2.16 以指定列宽格式化字符串
你有一些长字符串，想以指定的列宽将它们重新格式化。

In [26]:
import textwrap

s = "Look into my eyes, look into my eyes, the eyes, the eyes, \
the eyes, not around the eyes, don't look around the eyes, \
look into my eyes, you're under."

print(textwrap.fill(s, 70))
print(textwrap.fill(s, 40, initial_indent='    '))
print(textwrap.fill(s, 40, subsequent_indent='    '))

Look into my eyes, look into my eyes, the eyes, the eyes, the eyes,
not around the eyes, don't look around the eyes, look into my eyes,
you're under.
    Look into my eyes, look into my
eyes, the eyes, the eyes, the eyes, not
around the eyes, don't look around the
eyes, look into my eyes, you're under.
Look into my eyes, look into my eyes,
    the eyes, the eyes, the eyes, not
    around the eyes, don't look around
    the eyes, look into my eyes, you're
    under.


In [29]:
import os
print(os.get_terminal_size().columns)

100


#### 2.17 在字符串中处理 html 和 xml
你想将 HTML 或者 XML 实体如 &entity; 或 &#code; 替换为对应的文本。再者，
你需要转换文本中特定的字符 (比如 <, >, 或 &)。

In [40]:
s = 'Elements are written as "<tag>text</tag>".'
import html
print(s)
print(html.escape(s))
print(html.escape(s, quote=False))

Elements are written as "<tag>text</tag>".
Elements are written as &quot;&lt;tag&gt;text&lt;/tag&gt;&quot;.
Elements are written as "&lt;tag&gt;text&lt;/tag&gt;".


In [47]:
s = 'Spicy &quot;Jalape&#241;o&quot.'
print(html.unescape(s))
t = 'The prompt is &gt;&gt;&gt;'
from xml.sax.saxutils import unescape
print(unescape(t))

Spicy "Jalapeño".
The prompt is >>>


#### 2.18 字符串令牌解析
你有一个字符串，想从左至右将其解析为一个令牌流。

In [53]:
import re
from collections import namedtuple
text = 'foo = 23 + 42 * 10'
NAME = r'(?P<NAME>[a-zA-Z_][a-zA-Z_0-9]*)'
NUM = r'(?P<NUM>\d+)'
PLUS = r'(?P<PLUS>\+)'
TIMES = r'(?P<TIMES>\*)'
EQ = r'(?P<EQ>=)'
WS = r'(?P<WS>\s+)'
master_pat = re.compile('|'.join([NAME, NUM, PLUS, TIMES, EQ, WS]))

def generate_tokens(pat, text):
    Token = namedtuple('Token', ['type', 'value'])
    scanner = pat.scanner(text)
    for m in iter(scanner.match, None):
        yield Token(m.lastgroup, m.group())

In [54]:
tokens = (tok for tok in generate_tokens(master_pat, text)
          if tok.type != 'WS')
for tok in tokens:
    print(tok)

Token(type='NAME', value='foo')
Token(type='EQ', value='=')
Token(type='NUM', value='23')
Token(type='PLUS', value='+')
Token(type='NUM', value='42')
Token(type='TIMES', value='*')
Token(type='NUM', value='10')


#### 2.19 实现一个简单的递归下降分析器
你想根据一组语法规则解析文本并执行命令，或者构造一个代表输入的抽象语法
树。如果语法非常简单，你可以自己写这个解析器，而不是使用一些框架。

#### 2.20 字节字符串上的字符串操作
你想在字节字符串上执行普通的文本操作 (比如移除，搜索和替换)。

### 第三章：数字日期和时间
#### 3.1 数字的四舍五入
你想对浮点数执行指定精度的舍入运算。

In [8]:
print(round(1.23, 1))
print(round(1.27, 1))
print(round(-1.27, 1))
print(round(1.25361, 3))
print(round(1627731, -1))
print(round(1627731, -2))
print(round(1627731, -3))

1.2
1.3
-1.3
1.254
1627730
1627700
1628000


In [13]:
x = 1.23456
print(format(x, '0.2f'))
print(format(x, '0.3f'))
print(format(x, '0.4f'))

1.23
1.235
1.2346


In [18]:
print(2.1 + 4.1)

6.199999999999999


#### 3.2 执行精确的浮点数运算
你需要对浮点数执行精确的计算操作，并且不希望有任何小误差的出现。

In [23]:
from decimal import Decimal
a = Decimal('2.1')
b = Decimal('4.1')
print(a + b)
print((a + b == Decimal('6.2')))

6.2
True


In [29]:
from decimal import localcontext
print(a / b)
with localcontext() as ctx:
    ctx.prec = 4
    print(a / b)
with localcontext() as ctx:
    ctx.prec = 50
    print(a / b)

0.5121951219512195121951219512
0.5122
0.51219512195121951219512195121951219512195121951220


In [31]:
nums = [1.23e+18, 1, -1.23e+18]
print(sum(nums))
import math
print(math.fsum(nums))

0.0
1.0


#### 3.3 数字的格式化输出
你需要将数字格式化后输出，并控制数字的位数、对齐、千位分隔符和其他的细节。

In [40]:
x = 1234.5678
print(format(x, '0.2f'))
print(format(x, '>10.1f'))
print(format(x, '<10.1f'))
print(format(x, '^10.1f'))
print(format(x, ','))
print(format(x, '0,.1f'))
print(format(x, 'e'))
print(format(x, '0.2E'))

1234.57
    1234.6
1234.6    
  1234.6  
1,234.5678
1,234.6
1.234568e+03
1.23E+03


#### 3.4 二八十六进制整数
你需要转换或者输出使用二进制，八进制或十六进制表示的整数。

In [44]:
x = 1234
print(bin(x))
print(oct(x))
print(hex(x))
print(format(x, 'b'))
print(format(x, 'o'))
print(format(x, 'x'))

0b10011010010
0o2322
0x4d2
10011010010
2322
4d2


In [47]:
print(int('0b10011010010', 2))
print(int('0o2322', 8))
print(int('0x4d2', 16))

1234
1234
1234


#### 3.5 字节到大整数的打包与解包
你有一个字节字符串并想将它解压成一个整数。或者，你需要将一个大整数转换为
一个字节字符串。

In [55]:
data = b'\x00\x124V\x00x\x90\xab\x00\xcd\xef\x01\x00#\x004'
print(len(data))
print(int.from_bytes(data, 'little'))
print(int.from_bytes(data, 'big'))
n = 94522842520747284487117727783387188
print(n.to_bytes(16, 'little'))
print(n.to_bytes(16, 'big'))

16
69120565665751139577663547927094891008
94522842520747284487117727783387188
b'4\x00#\x00\x01\xef\xcd\x00\xab\x90x\x00V4\x12\x00'
b'\x00\x124V\x00x\x90\xab\x00\xcd\xef\x01\x00#\x004'


In [58]:
import struct
hi, lo = struct.unpack('>QQ', data)
print(hi, lo)
print((hi << 64) + lo)

5124093560524971 57965157801984052
94522842520747284487117727783387188


#### 3.6 复数的数学运算
你写的最新的网络认证方案代码遇到了一个难题，并且你唯一的解决办法就是使
用复数空间。再或者是你仅仅需要使用复数来执行一些计算操作。

In [64]:
a = complex(2, 4)
b = 3 - 5j
print(a, b)
print(a.real, a.imag, a.conjugate())
print(a + b)
import cmath
print(cmath.sin(a))

(2+4j) (3-5j)
2.0 4.0 (2-4j)
(5-1j)
(24.83130584894638-11.356612711218174j)


#### 3.7 无穷大与 NaN
你想创建或测试正无穷、负无穷或 NaN(非数字) 的浮点数。

In [68]:
a = float('inf')
b = float('-inf')
c = float('nan')
print(a, b, c)
import math
print(math.isinf(b), math.isnan(c))

inf -inf nan
True True


In [74]:
print(a + 45, a * 2)
print(c + 45, c / 2)
d = float('nan')
print(c == d, c is d)

inf inf
nan nan
False False


#### 3.8 分数运算
你进入时间机器，突然发现你正在做小学家庭作业，并涉及到分数计算问题。或者
你可能需要写代码去计算在你的木工工厂中的测量值。

In [78]:
from fractions import Fraction
a = Fraction(5, 4)
b = Fraction(7, 16)
print(a, b, a + b)
print(a * b)
print(a.numerator, a.denominator)
print(3.75.as_integer_ratio())

5/4 7/16 27/16
35/64
5 4
(15, 4)


#### 3.9 大型数组运算
你需要在大数据集 (比如数组或网格) 上面执行计算。

In [81]:
x = [1, 2, 3, 4]
y = [5, 6, 7, 8]
print(x * 2)
print(x + y)

[1, 2, 3, 4, 1, 2, 3, 4]
[1, 2, 3, 4, 5, 6, 7, 8]


In [86]:
import numpy as np

ax = np.array([1, 2, 3, 4])
ay = np.array([5, 6, 7, 8])
print(ax * 2)
print(ay + 10)
print(ax + ay)
print(ax * ay)

[2 4 6 8]
[15 16 17 18]
[ 6  8 10 12]
[ 5 12 21 32]


#### 3.10 矩阵与线性代数运算
你需要执行矩阵和线性代数运算，比如矩阵乘法、寻找行列式、求解线性方程组等
等。

In [88]:
import numpy as np
m = np.matrix([[1,-2,3],[0,4,5],[7,8,-9]])
print(m)
print(m.T)
print(m.I)

[[ 1 -2  3]
 [ 0  4  5]
 [ 7  8 -9]]
[[ 1  0  7]
 [-2  4  8]
 [ 3  5 -9]]
[[ 0.33043478 -0.02608696  0.09565217]
 [-0.15217391  0.13043478  0.02173913]
 [ 0.12173913  0.09565217 -0.0173913 ]]


#### 3.11 随机选择
你想从一个序列中随机抽取若干元素，或者想生成几个随机数。

In [89]:
import random
values = [1, 2, 3, 4, 5, 6]

In [107]:
# 随机选择一个元素
print(random.choice(values))

3


In [141]:
# 随机选择N个元素
print(random.sample(values, 2))

[5, 2]


In [144]:
# 打乱序列顺序
random.shuffle(values)
print(values)

[6, 5, 2, 3, 1, 4]


In [195]:
# 生成随机整数
print(random.randint(0, 11))

9


In [218]:
# 生成 0 到 1 范围内均匀分布的浮点数
print(random.random())

0.5931006797746412


In [252]:
# 获取 N 位随机位 (二进制) 的整数
print(random.getrandbits(200))

403171516493486046159825657670799652904161931727364243720979


In [279]:
random.seed(1234)
print(random.randint(0, 11))

7


##### 3.12 基本的日期与时间转换
你需要执行简单的时间转换，比如天到秒，小时到分钟等的转换。

In [282]:
from datetime import timedelta
a = timedelta(days=2, hours=6)
b = timedelta(hours=4.5)
c = a + b
print(c, c.days, c.seconds, c.total_seconds())

2 days, 10:30:00 2 37800 210600.0


In [9]:
from datetime import datetime, timedelta
a = datetime(2012, 9, 23)
print(a + timedelta(days=10))
b = datetime(2012, 12, 21)
d = b - a
print(d.days)
now = datetime.today()
print(now)
print(now + timedelta(months=10))

2012-10-03 00:00:00
89
2019-12-26 13:47:10.839754


TypeError: 'months' is an invalid keyword argument for __new__()

In [10]:
from dateutil.relativedelta import relativedelta
print(b + relativedelta(days=1))

2012-12-22 00:00:00


#### 3.13 计算最后一个周五的日期
你需要查找星期中某一天最后出现的日期，比如星期五。

In [1]:
from datetime import datetime, timedelta

weekdays = ['Monday', 'Tuesday', 'Webnesday',
            'Thursday', 'Friday', 'Saturday', 'Sunday']


def get_previous_byday(dayname, start_date=None):
    if start_date is None:
        start_date = datetime.today()
    day_num = start_date.weekday()
    day_num_target = weekdays.index(dayname)
    days_ago = (7 + day_num - day_num_target) % 7
    if days_ago == 0:
        days_ago = 7
    target_date = start_date - timedelta(days=days_ago)
    return target_date

In [2]:
print(datetime.today())
print(get_previous_byday('Monday'))
print(get_previous_byday('Tuesday'))
print(get_previous_byday('Friday'))

2019-12-27 13:28:08.297171
2019-12-23 13:28:08.297171
2019-12-24 13:28:08.297171
2019-12-20 13:28:08.297171


In [28]:
from datetime import datetime
from dateutil.relativedelta import relativedelta
from dateutil.rrule import *

d = datetime.now()
print(d, d.weekday())
print(d + relativedelta(weekday=FR))
print(d + relativedelta(weekday=MO(-1)))

2019-12-26 14:19:08.634798 3
2019-12-27 14:19:08.634798
2019-12-23 14:19:08.634798


#### 3.14 计算当前月份的日期范围
你的代码需要在当前月份中循环每一天，想找到一个计算这个日期范围的高效方
法。

In [32]:
from datetime import datetime, date, timedelta
import calendar


def get_month_range(start_date=None):
    if start_date is None:
        start_date = date.today()
    start_date.replace(day=1)
    _, days_in_month = calendar.monthrange(start_date.year, start_date.month)
    end_date = start_date + timedelta(days=days_in_month)
    return (start_date, end_date)

In [35]:
a_day = timedelta(days=1)
first_day, last_day = get_month_range()
print(first_day, last_day)
while first_day < last_day:
    print(first_day)
    first_day += a_day

2019-12-01 2020-01-01
2019-12-01
2019-12-02
2019-12-03
2019-12-04
2019-12-05
2019-12-06
2019-12-07
2019-12-08
2019-12-09
2019-12-10
2019-12-11
2019-12-12
2019-12-13
2019-12-14
2019-12-15
2019-12-16
2019-12-17
2019-12-18
2019-12-19
2019-12-20
2019-12-21
2019-12-22
2019-12-23
2019-12-24
2019-12-25
2019-12-26
2019-12-27
2019-12-28
2019-12-29
2019-12-30
2019-12-31


In [36]:
def date_range(start, stop, step):
    while start < stop:
        yield start
        start += step

In [37]:
for d in date_range(*get_month_range(), a_day):
    print(d)

2019-12-01
2019-12-02
2019-12-03
2019-12-04
2019-12-05
2019-12-06
2019-12-07
2019-12-08
2019-12-09
2019-12-10
2019-12-11
2019-12-12
2019-12-13
2019-12-14
2019-12-15
2019-12-16
2019-12-17
2019-12-18
2019-12-19
2019-12-20
2019-12-21
2019-12-22
2019-12-23
2019-12-24
2019-12-25
2019-12-26
2019-12-27
2019-12-28
2019-12-29
2019-12-30
2019-12-31


#### 3.15 字符串转换为日期
你的应用程序接受字符串格式的输入，但是你想将它们转换为 datetime 对象以便
在上面执行非字符串操作。

In [41]:
from datetime import datetime

text = '2019-09-20'
y = datetime.strptime(text, '%Y-%m-%d')
z = datetime.now()
diff = z - y
print(diff, y , z)
nice_z = datetime.strftime(z, '%A %B %d %Y')
print(nice_z)

97 days, 14:44:50.827785 2019-09-20 00:00:00 2019-12-26 14:44:50.827785
Thursday December 26 2019


#### 3.16 结合时区的日期操作
你有一个安排在 2012 年 12 月 21 日早上 9:30 的电话会议，地点在芝加哥。而你
的朋友在印度的班加罗尔，那么他应该在当地时间几点参加这个会议呢？

In [46]:
from datetime import datetime
from pytz import timezone
import pytz

d = datetime(2019, 12, 21, 9, 30, 0)
print(d)
central = timezone('US/Central')
loc_d = central.localize(d)
print(loc_d)
bang_d = loc_d.astimezone(timezone('Asia/Kolkata'))
print(bang_d)

2019-12-21 09:30:00
2019-12-21 09:30:00-06:00
2019-12-21 21:00:00+05:30


In [47]:
utc_d = loc_d.astimezone(pytz.utc)
print(utc_d)

2019-12-21 15:30:00+00:00


### 第四章：迭代器与生成器
#### 4.1 手动遍历迭代器
你想遍历一个可迭代对象中的所有元素，但是却不想使用 for 循环。
- next(), 并捕获StopIteration异常表示迭代结束。

In [3]:
items = [1, 2, 3]
it = iter(items)

In [4]:
next(it)

1

In [5]:
next(it)

2

In [6]:
next(it)

3

In [7]:
next(it)

StopIteration: 

#### 4.2 代理迭代
你构建了一个自定义容器对象，里面包含有列表、元组或其他可迭代对象。你想直
接在你的这个新容器对象上执行迭代操作。

In [8]:
class Node:
    def __init__(self, value):
        self._value = value
        self._children = []
    
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    
    def add_child(self, node):
        self._children.append(node)
    
    def __iter__(self):
        return iter(self._children)

In [11]:
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
for ch in root:
    print(ch)

Node(1)
Node(2)


#### 4.3 使用生成器创建新的迭代模式
你想实现一个自定义迭代模式，跟普通的内置函数比如 range() , reversed() 不
一样。

In [12]:
def frange(start, stop, increment):
    x = start
    while x < stop:
        yield x
        x += increment

In [13]:
for n in frange(0, 4, 0.5):
    print(n)

0
0.5
1.0
1.5
2.0
2.5
3.0
3.5


In [14]:
print(list(frange(0, 4, 0.5)))

[0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5]


#### 4.4 实现迭代器协议
你想构建一个能支持迭代操作的自定义对象，并希望找到一个能实现迭代协议的
简单方法。

In [15]:
class Node1:
    def __init__(self, value):
        self._value = value
        self._children = []
    
    def __repr__(self):
        return 'Node({!r})'.format(self._value)
    
    def add_child(self, node):
        self._children.append(node)
    
    def __iter__(self):
        return iter(self._children)
    
    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()

In [16]:
root = Node1(0)
child1 = Node1(1)
child2 = Node1(2)
root.add_child(child1)
root.add_child(child2)
child1.add_child(Node1(3))
child1.add_child(Node1(4))
child2.add_child(Node1(5))
for c in root.depth_first():
    print(c)

Node(0)
Node(1)
Node(3)
Node(4)
Node(2)
Node(5)


#### 4.5 反向迭代
你想反方向迭代一个序列

In [17]:
a = [1, 2, 4, 5]
for x in reversed(a):
    print(x)

5
4
2
1


In [19]:
class Countdown:
    def __init__(self, start):
        self.start = start
    
    def __iter__(self):
        n = self.start
        while n > 0:
            yield n
            n -= 1
    
    def __reversed__(self):
        n = 1
        while n <= self.start:
            yield n
            n += 1

In [20]:
for r in reversed(Countdown(5)):
    print(r)

1
2
3
4
5


In [21]:
for r in Countdown(5):
    print(r)

5
4
3
2
1


#### 4.6 带有外部状态的生成器函数
你想定义一个生成器函数，但是它会调用某个你想暴露给用户使用的外部状态值。
#### 4.7 迭代器切片
你想得到一个由迭代器生成的切片对象，但是标准切片操作并不能做到。

In [22]:
def count1(n):
    while True:
        yield n
        n += 1

In [23]:
c = count1(0)
print(c[10:20])

TypeError: 'generator' object is not subscriptable

In [34]:
from itertools import islice

for x in islice(c, 10, 15):
    print(x)

165
166
167
168
169


#### 4.8 跳过可迭代对象的开始部分
你想遍历一个可迭代对象，但是它开始的某些元素你并不感兴趣，想跳过它们。

In [71]:
c = count1(0)

In [73]:
from itertools import dropwhile

for n in islice(dropwhile(lambda x: x<10, c), 0, 5):
    print(n)

10
11
12
13
14


#### 4.9 排列组合的迭代
你想迭代遍历一个集合中元素的所有可能的排列或组合

In [74]:
from itertools import permutations

items = ['a', 'b', 'c']
for p in permutations(items):
    print(p)

('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')


In [75]:
for p in permutations(items, 2):
    print(p)

('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')


In [78]:
from itertools import combinations

for c in combinations(items, 3):
    print(c)

('a', 'b', 'c')


In [80]:
for c in combinations(items, 2):
    print(c)

('a', 'b')
('a', 'c')
('b', 'c')


#### 4.10 序列上索引值迭代
你想在迭代一个序列的同时跟踪正在被处理的元素索引。

In [91]:
my_list = ['a', 'b', 'c']
for i, v in enumerate(my_list):
    print(i, v)

0 a
1 b
2 c


#### 4.11 同时迭代多个序列
你想同时迭代多个序列，每次分别从一个序列中取一个元素。

In [93]:
xpts = [1, 5, 4, 2, 10]
ypts = [101, 78, 37, 15, 62, 99]
for x, y in zip(xpts, ypts):
    print(x, y)

1 101
5 78
4 37
2 15
10 62


In [94]:
from itertools import zip_longest
for x, y in zip_longest(xpts, ypts, fillvalue=0):
    print(x, y)

1 101
5 78
4 37
2 15
10 62
0 99


#### 4.12 不同集合上元素的迭代
你想在多个对象执行相同的操作，但是这些对象在不同的容器中，你希望代码在不
失可读性的情况下避免写重复的循环。

In [95]:
from itertools import chain

a = [1, 2, 3, 4]
b = ['x', 'y', 'z']
for x in chain(a, b):
    print(x)

1
2
3
4
x
y
z


#### 4.13 创建数据处理管道
你想以数据管道 (类似 Unix 管道) 的方式迭代处理数据。比如，你有个大量的数据
需要处理，但是不能将它们一次性放入内存中。
#### 4.14 展开嵌套的序列
你想将一个多层嵌套的序列展开成一个单层列表

In [2]:
from collections.abc import Iterable

def flatten(items, ignore_types=(str, bytes)):
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, ignore_types):
            yield from flatten(x)
        else:
            yield x

In [3]:
items = [1, 2, [3, 4, [5, 6], 7], 8]
for x in flatten(items):
    print(x)

1
2
3
4
5
6
7
8


#### 4.15 顺序迭代合并后的排序迭代对象
你有一系列排序序列，想将它们合并后得到一个排序序列并在上面迭代遍历。

In [4]:
import heapq
a = [1, 4, 5, 10]
b = [2, 6, 7, 34]
for c in heapq.merge(a, b):
    print(c)

1
2
4
5
6
7
10
34


#### 4.16 迭代器代替 while 无限循环
你在代码中使用 while 循环来迭代处理数据，因为它需要调用某个函数或者和一
般迭代模式不同的测试条件。能不能用迭代器来重写这个循环呢？

In [7]:
import sys
f = open('passwd')
for chunk in iter(lambda: f.read(10), ''):
    print(chunk)

sdfasdf
sd

fas
d
f
w

e
g
x
g
s

dg
s
df
a

sdfas
dg



