第二章 序列构成的数组

2.1 内置序列类型概览

容器序列：list,tuple,collections.deque(双端队列）这些序列可以存放不同类型的数据

扁平序列：str、bytes、bytearray、memoryview只能存放一种类型

可变序列：list、bytearray、array.array、collections.deque和memoryview

不可变序列：tuple、str、bytes

![chapter2_1](chapter2_1.jpg)

2.2 列表推导和生成器表达式

列表推导是构建列表的快捷方式，而生成器表达式则可以用来创建其他任何类型的序列（可读性、更高效）。

2.2.1 列表推导和可读性

把一格字符串变成unicode码位的列表

示例2.1和2.2哪个更容易懂

In [1]:
symbols = '%^$*)*'
codes=[]
for symbol in symbols:
    codes.append(ord(symbol))
codes

[37, 94, 36, 42, 41, 42]

In [2]:
symbols='%^$*)*'
codes=[ord(symbol) for symbol in symbols]
codes

[37, 94, 36, 42, 41, 42]

列表推导在python2.x中的变量泄露问题

列表推导、生成器以及类似的set、dict在python3中都有了自己的局部作用域

2.2.2 列表推导同filter、map的比较

In [5]:
symbols='%^$*)*'
codes1=[ord(s) for s in symbols if ord(s) >0]
print(codes1)
codes=list(filter(lambda c:c>0,map(ord,symbols)))#map返回的是列表
codes

[37, 94, 36, 42, 41, 42]


[37, 94, 36, 42, 41, 42]

2.2.3 笛卡尔积

In [6]:
colors=['black','white']
sizes=['S','M','L']
tshirt=[(color,size) for color in colors for size in sizes]
tshirt

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

列表推导作用只有一个，生成列表。

2.2.4 生成器表达式（逐个产出元素，节省内存）

In [9]:
symbols='%^$*)*'
tuple(ord(symbol) for symbol in symbols)

(37, 94, 36, 42, 41, 42)

In [10]:
import  array
array.array('I',(ord(symbol) for symbol in symbols))

array('I', [37, 94, 36, 42, 41, 42])

In [13]:
for tshirt in ('%s %s' %(c,s) for c in colors for s in sizes):
    print(tshirt)

black S
black M
black L
white S
white M
white L


2.3 元祖不仅仅是不可变的列表

2.3.1 元组和记录

In [14]:
lax_coordinates=(33.9425,-118.408056)
city,year,pop,chg,area=('Tokyo',2003,32450,0.66,8014)
traveler_ids=[('USA','31195855'),('BPA','CE342567'),('ESP','XDA205856')]
for passport in sorted(traveler_ids):
    print('%s/%s'%passport)

BPA/CE342567
ESP/XDA205856
USA/31195855


In [15]:
for country,_ in traveler_ids:
    print(country)

USA
BPA
ESP


2.3.2 元组拆包

In [16]:
latitude,longtitude=(33.9425,-118.408056)#元组拆包

In [17]:
divmod(20,8)#商和余数

(2, 4)

In [19]:
t=(20,8)
divmod(*t) #可用*运算符把一个可迭代对象拆开作为函数的参数

(2, 4)

In [20]:
import  os
_,filename=os.path.split('/home/luciano/.ssh/idea.pub')
filename

'idea.pub'

In [21]:
a,b,*rest = range(5)
a,b,rest

(0, 1, [2, 3, 4])

In [24]:
a,*rest,b = range(2)
a,rest,b

(0, [], 1)

2.3.3 嵌套元组拆包

In [30]:
metro_areas=[
    ('Tokyo','JP',36.933,(35.689722,1396.691667)),
    ('Delhi NCR','IN',21.935,(28.613889,77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]
print('{:20}  | {:^9} | {:^9}'.format('','lat.','long.'))
fmt = '{:15}|{:9.4f}|{:9.4f}'
for name,cc,pop,(latitude,longtitude) in metro_areas:
    if longtitude<=0:
        print(fmt.format(name,latitude,longtitude))


                      |   lat.    |   long.  
Mexico City    |  19.4333| -99.1333
New York-Newark|  40.8086| -74.0204
Sao Paulo      | -23.5478| -46.6358


2.3.4 具名元组

In [31]:
from  collections import namedtuple
City = namedtuple('City','name country population coordinates')
tokyo = City('Tokyo','JP',36.933,(35.689722,139.691667))
tokyo

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [32]:
tokyo.population

36.933

In [33]:
tokyo[1]

'JP'

创建一个具名元组需要两个参数，一个是类名，另一个是各个字段的名字。后者是可以是由数个字符串组成的可迭代对象，或者是由空格隔开的字段名组成的字符串

除了普通元祖的属性，具名元组还有专有属性，如：_fields类属性，类方法_make(iterable)和实例方法_asdict()。

In [34]:
City._fields

('name', 'country', 'population', 'coordinates')

In [37]:
Latlong=namedtuple('Latlong','lat long')
delhi_data = ('Delhi NCR','IN',21.935,Latlong(28.613889,77.208889))
delhi = City._make(delhi_data)#接受一个可迭代对象生成一个类的实例对象，作用相对于City(*delhi_data)
delhi

City(name='Delhi NCR', country='IN', population=21.935, coordinates=Latlong(lat=28.613889, long=77.208889))

In [38]:
delhi._asdict()#以collections.OrderedDict形式返回

OrderedDict([('name', 'Delhi NCR'),
             ('country', 'IN'),
             ('population', 21.935),
             ('coordinates', Latlong(lat=28.613889, long=77.208889))])

In [41]:
delhi._asdict()['name']

'Delhi NCR'

In [42]:
for key,value in delhi._asdict().items():
    print(key+":",value)

name: Delhi NCR
country: IN
population: 21.935
coordinates: Latlong(lat=28.613889, long=77.208889)


2.3.5作为不可变列表的元祖

![chapter2_3_1](image/chapter2_3_1.jpg)

![chapter2_3_2](image/chapter2-3-2.jpg)

2.4 切片

2.4.1 为什么切片和区间会忽略最后一个元素

In [50]:
l=[10,20,30,40,50,60]
l[:2]

[10, 20]

In [51]:
l[2:]

[30, 40, 50, 60]

2.4.2 对对象进行切片

s[a:b:c]对s在a和b之间以c为间隔取值。c负值意味着反向取值

In [52]:
s='bicycle'
s[::3]

'bye'

In [53]:
s[::-1]

'elcycib'

In [54]:
s[::-2]

'eccb'

给切片命名

In [56]:
invoice='12345678910wjy'
name=slice(11,None)
invoice[name]

'wjy'

In [59]:
import numpy as np
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
a.shape

(3, 3)

In [60]:
a[1]

array([4, 5, 6])

In [61]:
a[1,...]#相当于a[1,:]

array([4, 5, 6])

In [70]:
l= list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [71]:
l[2:5]=[20,30]
l

[0, 1, 20, 30, 5, 6, 7, 8, 9]

In [72]:
del l[5:7]
l

[0, 1, 20, 30, 5, 8, 9]

In [73]:
l[3::2]=[11,22]
l

[0, 1, 20, 11, 5, 22, 9]

In [74]:
l[2:5]=100

TypeError: can only assign an iterable

In [75]:
l[2:5]=[100]

In [76]:
l

[0, 1, 100, 22, 9]

注意事项：a*n这个语句中，若序列a里的元素是对其他可变对象的引用，结果会出乎意料。如，你想用my_list=[[]]*3来初始化一个由列表组成的列表，但得到的列表里三个元素其实是三个引用，且三个引用都指向同个列表，一个变其他也变。

建立由列表组成的列表

In [77]:
#示例2-12
board=[['_']*3 for i in range(3)]
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [78]:
board[1][2]='X'
board

[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]

含有三个指向同一对象的引用的列表时无用的

In [79]:
weird_board=[['_']*3]*3
weird_board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [80]:
weird_board[1][2]='0'
weird_board

[['_', '_', '0'], ['_', '_', '0'], ['_', '_', '0']]

In [81]:
row=['_']*3
board=[]
for i in range(3):
    board.append(row)
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [83]:
board[1][2]='0'
board

[['_', '_', '0'], ['_', '_', '0'], ['_', '_', '0']]

In [84]:
id(row[1])

2510252680840

In [85]:
id(row[0])

2510252680840

In [86]:
row[0]='0'
row

['0', '_', '0']

In [87]:
#相反，示例2-12中的方法等同于这样做：
board=[]
for i in range(3):
    row=['_']*3#每次都新建了一个列表
    board.append(row)
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [88]:
board[1][2]='X'
board

[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]

a+=b：就地改动，相当于调用a.extend(b)。如果类没有__iadd__，会退一步调用__add__，相当于a=a+b

a=a+b：先计算a+b得到新对象，再赋值给a

In [89]:
l=[1,2,3]
id(l)

2510342038600

In [90]:
l*=2
l

[1, 2, 3, 1, 2, 3]

In [91]:
id(l)

2510342038600

In [92]:
t=(1,2,3)
id(t)

2510341920592

In [93]:
t*=2
id(t)

2510341563112

一个关于+=的谜题

In [94]:
t=(1,2,[30,40])
t[2]+=[50,60]

TypeError: 'tuple' object does not support item assignment

In [95]:
t

(1, 2, [30, 40, 50, 60])

![chapter2-6-1](image/chapter2-6-1.png)

![chapter2-6-2](image/chapter2-6-2.png)

In [97]:
t=(1,2,[30,40])
t[2].extend([50,60])
t

(1, 2, [30, 40, 50, 60])

不要把可变对象放在元组里

增量赋值不是一个原子操作，它虽然抛出异常，但还是完成了


list.sort()就地排序，返回None。和rando，shuffle类似
sorted会新建一个列表作为返回值，这个方法可接受任何形式的可迭代对象作为参数，甚至包括不可变序列或生成器。而不管接受何种参数，返回都是列表
list.sort方法和sorted函数都有两个可选的关键字参数

reverse：True或False

key：一个只有一个参数的函数，这个函数会被用在序列的每个元素上，key=str.lower、key=len

In [98]:
fruit = ['grape','respberry','apple','banana']
sorted(fruit)

['apple', 'banana', 'grape', 'respberry']

In [99]:
fruit

['grape', 'respberry', 'apple', 'banana']

In [100]:
sorted(fruit,reverse=True)

['respberry', 'grape', 'banana', 'apple']

In [101]:
sorted(fruit,key=len)#由结果可看到这个排序算法是稳定的，即先出现的排在前面

['grape', 'apple', 'banana', 'respberry']

In [102]:
sorted(fruit,key=len,reverse=True)

['respberry', 'banana', 'grape', 'apple']

In [103]:
fruit.sort()
fruit

['apple', 'banana', 'grape', 'respberry']

bisect模块包含两个主要函数，bisect和insort，都是利用二分查找算法来在有序序列中查找或插入元素

In [110]:
import bisect
import sys

HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]
ROW_FMT = '{0:2d} @ {1:2d}      {2}{0:<2d}'
def demo(bisect_fn):
    for needle in reversed(NEEDLES):
        position = bisect_fn(HAYSTACK, needle)  # <1>
        offset = position * '   |'  # <2>
        print(ROW_FMT.format(needle, position, offset))  # <3>

if __name__ == '__main__':

    if sys.argv[-1] == 'left':    # <4>
        bisect_fn = bisect.bisect_left
    else:
        bisect_fn = bisect.bisect

    print('DEMO:', bisect_fn.__name__)  # <5>
    print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK))
    demo(bisect_fn)

DEMO: bisect
haystack ->  1  4  5  6  8 12 15 20 21 23 23 26 29 30
31 @ 14         |   |   |   |   |   |   |   |   |   |   |   |   |   |31
30 @ 14         |   |   |   |   |   |   |   |   |   |   |   |   |   |30
29 @ 13         |   |   |   |   |   |   |   |   |   |   |   |   |29
23 @ 11         |   |   |   |   |   |   |   |   |   |   |23
22 @  9         |   |   |   |   |   |   |   |   |22
10 @  5         |   |   |   |   |10
 8 @  5         |   |   |   |   |8 
 5 @  3         |   |   |5 
 2 @  1         |2 
 1 @  1         |1 
 0 @  0      0 


bisect=bisect_right，和bisect_left的区别

In [111]:
def grade(score,breakpoints=[60,70,80,90],grades='FEDCBA'):
    i=bisect.bisect(breakpoints,score)
    return grades[i]
[grade(score) for score in [33,99,77,70,89,90]]

['F', 'B', 'D', 'D', 'C', 'B']

In [112]:
import bisect
import random

SIZE = 7

random.seed(1729)

my_list = []
for i in range(SIZE):
    new_item = random.randrange(SIZE*2)
    bisect.insort(my_list, new_item)
    print('%2d ->' % new_item, my_list)

10 -> [10]
 0 -> [0, 10]
 6 -> [0, 6, 10]
 8 -> [0, 6, 8, 10]
 7 -> [0, 6, 7, 8, 10]
 2 -> [0, 2, 6, 7, 8, 10]
10 -> [0, 2, 6, 7, 8, 10, 10]
