# 数据结构和算法

## 1.1 将序列分解为单独的变量

###  1.1 将一个包含N个元素的元组或序列分解成N个单独的变量

In [1]:
from myfunc import *

In [2]:
vars = (1, 3, 5)
x, y, z = vars
print(x, y, z)

1 3 5


In [3]:
alice = ['Alice', 'male', '20']
name, gender, age = alice
print(name, gender, age)

Alice male 20


In [4]:
data = ['ACME', 50, 91.1, (2012, 12, 21)]
_, shares, price, _ = data # 选一个用不到的变量名 _ 代表丢弃的数据
print(shares, price)

50 91.1


### 1.2 从任意长度的可迭代对象中分解元素

In [5]:
import random


def avg(li):
    return sum(li) / len(li)

def drop_first_lasst():
    grades = gen_random(10)
    grades.sort()
    print(grades)
    first, *middle, last = grades # 注意*表达式的使用
    print('after drop first and last:', middle)
    return avg(middle)

avg = drop_first_lasst()
print('the average score is', avg)

[21, 37, 38, 41, 60, 68, 73, 80, 89, 91]
after drop first and last: [37, 38, 41, 60, 68, 73, 80, 89]
the average score is 60.75


In [6]:
records = [
    ('move', 1, 1),
    ('attack', 'ha!'),
    ('level up')
]

def move(x, y):
    print('you are now at ({}, {})'.format(x, y))
        
def attack(info):
    print('attack!', info)

def level_up():
    print('Level up! congratulations!')
    
for order, *args in records:
    if order == 'move':
        move(*args)
    elif order == 'attack':
        attack(*args)
    else:
        level_up()

you are now at (1, 1)
attack! ha!
Level up! congratulations!


In [7]:
record = ['ACME', 50, 123.45, (12, 18, 2012)]
name, *_, (*_, year) = record
print(name, year)

ACME 2012


In [9]:
items = gen_unique_random(8)
print(items)
head, *tail = items
print(head, tail)

[47, 57, 78, 9, 12, 40, 97, 31]
47 [57, 78, 9, 12, 40, 97, 31]


In [15]:
def _sum(items):
    '''利用拆分功能实现精巧的递归（但并不实用...
    '''
    head, *tail = items
    return head + _sum(tail) if tail else head

items = gen_unique_random(10)
assert sum(items) == _sum(items)

### 1.3 保存最后N个元素

In [34]:
from collections import deque

def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history) # 用于存储pattern开始的前maxlen行
    for line in lines:
        if pattern in line:
            yield line, previous_lines # 匹配到则使用生成器返回结果
        previous_lines.append(line) # 否则将该行添加到队列中

In [35]:
with open('myfunc.py') as f:
    for line, previous_lines in search(f, 'def', 1):
        for pline in previous_lines:
            print(pline[start: end], end='')
        start = line.find(' ')
        end = line.find('(')
        print(line[start: end])
        print('-' * 20)

 cal_time
--------------------
 gen_random
--------------------
 gen_unique_random
--------------------


**有关deque的补充内容**

创建一个固定长度的队列

In [38]:
q = deque(maxlen=3)
for i in range(3):
    q.append(i)

print(q)

q.append(3)
print(q)

deque([0, 1, 2], maxlen=3)
deque([1, 2, 3], maxlen=3)


不指定参数则生成一个两边无界的队列

In [39]:
q = deque()
for i in range(1, 4):
    q.append(i)
    
print(q)
q.appendleft(4)
print(q)
assert q.pop() == 3
assert q.poplefttft() == 4

deque([1, 2, 3])
deque([4, 1, 2, 3])


从队列两端添加或弹出元素的复杂度都是$O(1)$。这和列表不同，当从列表头部插入或移除元素时，列表的复杂度为$O(n)$。

In [43]:
def test_insert_head(obj):
    for i in range(100000):
        obj.insert(0, i)
        
li = gen_random(10000)
q = deque(li)

cal_time(test_insert_head, li)
cal_time(test_insert_head, q)

total time 3.4121339321136475
total time 0.019765377044677734


### 1.4 找到最大或最小的N个元素

`heapq`中有两个函数——`nlargest()`和`nsmallest()`

In [47]:
import heapq

nums = gen_unique_random(10, -10, 10)
print(nums)
print(heapq.nlargest(3, nums))
print(heapq.nsmallest(3, nums))

[1, -8, 7, 9, -4, -9, 0, 2, 4, 10]
[10, 9, 7]
[-9, -8, -4]


这两个函数都接受一个参数`key`，从而允许它们工作在更加复杂的数据结构上。

In [58]:
from collections import namedtuple
from pprint import pprint

People = namedtuple('People', 'name age salery brith')
Tom = People('Tom', 23, 5000, '1995-07-02')
Alice = People('Alice', 21, 5500, '1997-11-20')
John = People('John', 27, 9000, '1991-01-31')
Bob = People('Bob', 38, 12000, '1980-09-10')
workers = [Tom, Alice, John, Bob]

pprint(heapq.nsmallest(2, workers, key=lambda x: x.salery))
pprint(heapq.nlargest(2, workers, key=lambda x: x.brith))
pprint(heapq.nsmallest(3, workers, key=lambda x: x.name))

[People(name='Tom', age=23, salery=5000, brith='1995-07-02'),
 People(name='Alice', age=21, salery=5500, brith='1997-11-20')]
[People(name='Alice', age=21, salery=5500, brith='1997-11-20'),
 People(name='Tom', age=23, salery=5000, brith='1995-07-02')]
[People(name='Alice', age=21, salery=5500, brith='1997-11-20'),
 People(name='Bob', age=38, salery=12000, brith='1980-09-10'),
 People(name='John', age=27, salery=9000, brith='1991-01-31')]


如果正在寻找的最大或最小的N个元素，且同集合中元素的总数目相比，N很小，那么使用下面这种方法能提供更好的性能。

In [59]:
nums = gen_random(10, -10, 10)
print(nums)

heap = list(nums)
heapq.heapify(heap)
print(heap)

[4, -6, 3, -9, -8, -6, -2, -1, -7, -6]
[-9, -8, -6, -7, -6, 3, -2, -1, -6, 4]


堆最重要的特性就是`heap[0]`总是最小的那个元素。接下来的元素可以通过`heapq.heappop()`方法轻松找到，该方法将最小的元素弹出，然后以第二小的元素取而代之（这个操作的复杂度是$O(logN)$, $N$代表堆的大小）。

In [60]:
for i in range(3):
    print(heapq.heappop(heap))

-9
-8
-7


当所要找的元素数量相对较小时，函数`nlargest()`和`nsmallest()`才是最适用的。如果只是简单的找最大最小元素，那么用`max()`和`min()`会更快。如果N和集合本身大小差不多时，通常更快的方法是先对集合排序，然后做切片操作（例如,使用`sorted(items)[:N]`或者`sorted(items[-N:])`）。