# Chapter 2: An Array of Sequences

有些操作，对String, List, tables 是相同的（例如：`for element in list_a`）。这三种数据结构又称作Trains。

## 1. Overview of  Build-In Sequences

The standard library offers a rich selection of sequence types implemented in C:
- Container sequences: hold references to the objects they contain (可以理解成一个指针或是引用)
    - list, tuple, and collections.deque can hold items of different types.
- Flat sequences: store the value of each item within its own memory (直接存值)
    - primitive values: str, bytes, bytearray, memoryview, and array.array hold items of one type.
    
另外，可以根据一个sequence 是否是mutable，分为两类：
- mutable sequences
    - list, bytearray, array.array, collections.deque, and memoryview
- Immutable sequences
    - tuple, str, and bytes
    
最常用的sequence 是list，下面我们先从list 开始。

### 1.1 List Comprehension

In [1]:
list_a = [1,2,3,4,5]
list_b = []

#### 方法1: for loop

In [2]:
for element in list_a:
    list_b.append(str(element))
list_b

['1', '2', '3', '4', '5']

#### 方法2: List Comprehension

list comprehension 只做一件事：生成一个新的list

In [3]:
list_c = [str(element) for element in list_a]
list_c

['1', '2', '3', '4', '5']

#### Listcomps v.s. map and filter

Map and filter: more details in Chapter 5.

In [4]:
list_d = [str(element) for element in list_a if element > 2]
list_d

['3', '4', '5']

In [9]:
list_5 = list(filter(lambda c: int(c) > 2, map(str, list_a)))
list_5

['3', '4', '5']

#### 使用list comprehension 实现cartesian products

generate lists from the Cartesian product of two or more iterables.

In [11]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes] # list of tuples
tshirts

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

如果不用list comprehension，我们需要写两个for loop 嵌套

### 1.2 Generator Expression

用来生成其他类型的sequences (tuples, arrays)

Genexps use the same syntax as listcomps, but are enclosed in parentheses rather than brackets

下面我们看tuple

In [14]:
tuple_a = tuple(str(element) for element in list_a)
tuple_a

('1', '2', '3', '4', '5')

The generator expression yields items one by one; a list with all six T-shirt variations is never produced in this example.

如果只用一次，使用generator 会很快，尤其当item 很多的时候。

更多关于generator 的信息，参见Chapter 14.

In [17]:
for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
    print(tshirt)

black S
black M
black L
white S
white M
white L


## 2. Tuples Are Not Just Immutable Lists

#### mutable v.s. immutable
A mutable object can be changed after it's created, and an immutable object can't.

Tuple 的两个作用：
1. Tuple 可以看作是不可更改的list (immutable list)
2. 因为tuple 的顺序是不可以改的，所以tuple 可以看作"record" with no field name.

### 2.1 Tuple as immutable list

list 的所有方法，tuple 都支持 (除了增加和删除)，参见下表。

<img src="figures/tuple_as_immutable_list_1.png" width="500"/>
<img src="figures/tuple_as_immutable_list_2.png" width="500"/>

#### extend

In [3]:
list_1 = [1,2,3]
list_2 = [4,5,6]
list_1.__add__(list_2)

[1, 2, 3, 4, 5, 6]

In [4]:
tuple_1 = (1, 2, 3)
tuple_2 = (4, 5, 6)
tuple_1.__add__(tuple_2)

(1, 2, 3, 4, 5, 6)

#### slicing

In [7]:
list_1[:2]

[1, 2]

In [8]:
tuple_1[:2]

(1, 2)

### 2.2 Tuple as record

**tuple unpacking**

#### 1. parallel assignnment
- swapping
- return multiple value of a function

In [10]:
la_coordinates = (33.9425, -118.408056)

In [12]:
latitude = la_coordinates[0]
latitude

33.9425

In [14]:
longitude = la_coordinates[1]
longitude

-118.408056

以上是分别赋值，下面我们看parallel assignment

In [15]:
latitude, longitude = la_coordinates
print(latitude)
print(longitude)

33.9425
-118.408056


**第一个用处**：

parallel assignment 一个常用的场景是swapping without temporary variable

In [17]:
a = 1
b = 5
print ('a: ', a)
print('b: ', b)

# swap
c = a
a = b
b = c
print ('a: ', a)
print('b: ', b)

a:  1
b:  5
a:  5
b:  1


In [18]:
a = 1
b = 5
print ('a: ', a)
print('b: ', b)

# 使用parallel assignment 
b, a = a, b # 一行代码实现
print ('a: ', a)
print('b: ', b)

a:  1
b:  5
a:  5
b:  1


**第二个用处**：

enable functions to return multiple values in a way that is convenient to the caller.

In [21]:
def get_first_two_elements(list_a):
    return list_a[0], list_a[1]

In [20]:
list_a = [1, 2]

In [22]:
a, b = get_first_two_elements(list_a)
print(a)
print(b)

1
2


可以用一个dummy variable `_` 作为一个place holder，表明不在意某个变量

In [23]:
_, b = get_first_two_elements(list_a)
print(b)

2


In [24]:
import os
_, filename = os.path.split('/home/myname/paper.tex')

In [25]:
filename

'paper.tex'

可以使用 `*args` 来匹配多个items

In [38]:
a, b, *c = range(5)
print(a)
print(b)
print(c)

0
1
[2, 3, 4]


In [40]:
a, b, *c = range(3)
print(a)
print(b)
print(c)

0
1
[2]


In [41]:
a, b, *c = range(2)
print(a)
print(b)
print(c)

0
1
[]


`*args` 可以出现在任意位置

In [42]:
a, *b, c = range(5)
print(a)
print(b)
print(c)

0
[1, 2, 3]
4


`*args` 只能出现一次，否则会报错

In [43]:
a, *b, *c = range(2)
print(a)
print(b)
print(c)

SyntaxError: two starred expressions in assignment (<ipython-input-43-3678e28f7358>, line 1)

#### 2. Nested Tuple Unpacking

In [71]:
cities = [('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
          ('Delhi', 'IND', 21.935, (28.613889, 77.208889)),
          ('Mexico City', 'MX', 20.142, (19.43333, -99.133333))]

In [76]:
print('{} | {}'.format('', 'Longitude'))
for city, country, population, (latitude, longitude) in cities:
    # 只显示东半球的城市
    if longitude > 0:
        print ('{} | {}'.format(city, longitude))

 | Longitude
Tokyo | 139.691667
Delhi | 77.208889


#### 表格对齐

In [79]:
print('{:15} | {:^6}'.format('', 'Longitude'))
for city, country, population, (latitude, longitude) in cities:
    # 只显示东半球的城市
    if longitude > 0:
        print ('{:15} | {:9.4f}'.format(city, longitude))

                | Longitude
Tokyo           |  139.6917
Delhi           |   77.2089


Tuple 用来表示一个record 有一个问题，就是只能按顺序表示每个field 的意思，但是没有field name。
如果想加入field name，则需要一namedtuple 对象。

#### 3. Name Tuples

**注意** Named Tuple 对象所占内存大小与tuple 一样，因为field name 信息存在class 定义中而非object 的attribute 中。

In [44]:
from collections import namedtuple

In [54]:
# 定义一个named tuple 类
# 参数1: class name (city)
# 参数2: list of field name (iterable of strings / single space delimited string)
City = namedtuple('City', 'name country population coordinates') # Using single space delimited string
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

In [55]:
tokyo

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [56]:
tokyo.coordinates # 使用.field_name 来访问每个field

(35.689722, 139.691667)

In [58]:
City2 = namedtuple('City', ['name', 'Country', 'Population', 'Coordinates']) # using iterable of strings
tokyo2 = City2('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

In [59]:
tokyo2

City(name='Tokyo', Country='JP', Population=36.933, Coordinates=(35.689722, 139.691667))

namedtuple 有三个比较关键的attribtues 和methods:
- `_fields`: 获取所有field 信息 
- `_make(interable)`: 通过一个iterable 来构建一个namedtuple
- `_asDict()`: 将namedtuple 转换成一个`collection.OrderedDict` 对象 

In [60]:
tokyo._fields

('name', 'country', 'population', 'coordinates')

In [61]:
City._fields

('name', 'country', 'population', 'coordinates')

In [62]:
delhi_data = ('Delhi', 'IND', 21.935, (28.613889, 77.208889))

In [63]:
delhi = City._make(delhi_data)

In [64]:
delhi

City(name='Delhi', country='IND', population=21.935, coordinates=(28.613889, 77.208889))

In [68]:
delhi._asdict()

OrderedDict([('name', 'Delhi'),
             ('country', 'IND'),
             ('population', 21.935),
             ('coordinates', (28.613889, 77.208889))])

可以通过`asdict()`做显示上的优化

In [69]:
for key, value in delhi._asdict().items():
    print(key + ':', value)

name: Delhi
country: IND
population: 21.935
coordinates: (28.613889, 77.208889)


## 3. Slicing and Concatenation

- slicing: 对一个sequence 取一部分(做减法)
- concatenation: 将多个sequence 合并(做加法)

### 3.1 Slicing
sequence types (e.g., list, tuple, str) 都支持slicing 操作，这一节我们介绍一下slicing 的高级用法。

#### 1. 基本操作

`[start:end:step]`

In [80]:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [81]:
l[:2]

[1, 2]

In [82]:
l[2:]

[3, 4, 5, 6, 7, 8, 9, 10]

#### step

In [139]:
s = 'hello world!'

In [140]:
s[::3] # 不限头尾，step = 3

'hlwl'

In [141]:
s[::-3] # 不限头尾，从右向左 step = 3

'!r l'

In [145]:
s[2::2] # 不限头尾，从右向左 step = 3

'lowrd'

In [146]:
s[2::-2] # 不限头尾，从右向左 step = 3

'lh'

#### 2. slice objects

slice 其实实现是创建了一个slice 对象`slice(start, stop, step)`.
创建slice 对象在许多场景中很handy.


In [129]:
invoice = """
... 0.....6.................................40........52...55........
... 1909  Pimoroni PiBrella                 $17.50      3  $52.50
... 1489  6mm Tactile Switc x20             $4.95       2  $9.90
... 1510  Panavise Jr. - PV-201             $28.00      1  $28.00
... 1601  PiTFT Mini Kit 320x240            $34.95      1  $34.95
... """

In [130]:
invoice

'\n0.....6.................................40........52...55........\n1909  Pimoroni PiBrella                 $17.50      3  $52.50\n1489  6mm Tactile Switc x20             $4.95       2  $9.90\n1510  Panavise Jr. - PV-201             $28.00      1  $28.00\n1601  PiTFT Mini Kit 320x240            $34.95      1  $34.95\n'

In [131]:
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)

In [132]:
line_items = invoice.split('\n')[2:] # 从第三行开始 去除前两行文件头

In [133]:
line_items

['1909  Pimoroni PiBrella                 $17.50      3  $52.50',
 '1489  6mm Tactile Switc x20             $4.95       2  $9.90',
 '1510  Panavise Jr. - PV-201             $28.00      1  $28.00',
 '1601  PiTFT Mini Kit 320x240            $34.95      1  $34.95',
 '']

In [134]:
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

$17.50       Pimoroni PiBrella                 
$4.95        6mm Tactile Switc x20             
$28.00       Panavise Jr. - PV-201             
$34.95       PiTFT Mini Kit 320x240            
 


#### 3. multi-dimensional slicing



In [136]:
import numpy as np
data = np.array([[1,2,3,4,5],
                 [6,7,8,9,10],
                 [11,12,13,14,15]])

In [138]:
data[:2, :2]

array([[1, 2],
       [6, 7]])

#### ellipsis `...`

在高维数组中(例如: tensor)，用ellipsis 表示省略。

`x[i, ...]` is a shortcut for `x[i, :, :, :,]`

In [153]:
x = np.array([[[111,112,113,114,115], [121,122,123,124,125]], [[211,212,213,214,215], [221,222,223,224,225]]])


In [154]:
x.ndim

3

In [155]:
x.shape

(2, 2, 5)

In [156]:
x[0, :, :,]

array([[111, 112, 113, 114, 115],
       [121, 122, 123, 124, 125]])

In [157]:
x[0, ...]

array([[111, 112, 113, 114, 115],
       [121, 122, 123, 124, 125]])

#### 4. Assigning to slice

注意，等号的右边必须是一个interable object, 例如list.

In [158]:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [159]:
l[0:5] = [1] # 重新赋值前5个值

In [160]:
l

[1, 6, 7, 8, 9, 10]

In [161]:
l[0:5] = 1 # error

TypeError: can only assign an iterable

In [164]:
del l[1:] # 删除除了第一个元素的所有元素

In [165]:
l

[1]

### 3.2 Concatenation

#### 1. 使用`+`

必须是相同类型(sequence type)，例如，不能把一个String 和一个list 相加。

In [166]:
l1 = [1,2,3]
l2 = [4,5,6]

In [167]:
l1 + l2

[1, 2, 3, 4, 5, 6]

In [168]:
l1 + 'hello' # 不同类型不能相加

TypeError: can only concatenate list (not "str") to list

#### 2. 使用`*`

注意，both `+` and `*` 都会创建一个新的对象，不会改变原始对象

In [177]:
l3 = l1 * 3
l3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [170]:
'hello ' * 3

'hello hello hello '

In [178]:
l3[0] = 0

In [179]:
l3 # 修改l3

[0, 2, 3, 1, 2, 3, 1, 2, 3]

In [180]:
l1 # l1 不会有变化

[1, 2, 3]

In [181]:
my_list = [[]]
my_list

[[]]

In [182]:
my_list_of_list = my_list * 3
my_list_of_list

[[], [], []]

In [183]:
my_list_of_list[0] = [1,2,3]

In [184]:
my_list_of_list

[[1, 2, 3], [], []]

In [185]:
my_list

[[]]

#### 3. Building Lists of Lists

In [186]:
# 通过list comprehension 创建一个空的棋盘
board = [['_'] * 3 for i in range(3)]

In [187]:
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [190]:
board[0][1] = 'x'

In [191]:
board

[['_', 'x', '_'], ['_', '_', '_'], ['_', '_', '_']]

上面的做法等同于以下方法:

In [205]:
board4=[]
for i in range(3):
    row = ['_'] * 3
    board4.append(row)

In [206]:
board4

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [207]:
board4[0][1] = 'x'

In [208]:
board4

[['_', 'x', '_'], ['_', '_', '_'], ['_', '_', '_']]

#### pitfall: 下面演示一个错误的做法

In [192]:
board2 = [['_'] * 3] * 3 

In [194]:
board2

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [195]:
board2[0][1] = 'x'

In [196]:
board2

[['_', 'x', '_'], ['_', 'x', '_'], ['_', 'x', '_']]

the outer list is made of three references to the same inner list. So all rows are aliases to refer to the same object.

上面的代码，其实等同于下面代码：

In [201]:
row = ['_'] * 3
board3 = []
for i in range(3):
    board3.append(row) # 相同的对象row 被append 了三次

In [202]:
board3

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [203]:
board3[0][1] = 'x'

In [204]:
board3

[['_', 'x', '_'], ['_', 'x', '_'], ['_', 'x', '_']]

我们在第八章重点介绍references and mutable objects.

介绍了`+` 和`*`, 我们下面介绍`+=` 和`*=` operators

## 4. Augmented Assignment with Sequences