# Chapter 2 An Array of Sequences

In [9]:
import pprint
pp = pprint.PrettyPrinter()

## Overview of Built-in Sequences

classified by data type

* _Container sequences_: `list`, `tuple` and `collections.deque` can hold items of different types.
* _Flat sequences_: `str`, `bytes`, `bytearray`, `memoryview` and `array.array` hold items of one type.

grouped by mutability
* _Mutable sequences_: `list`, `bytearray`, `array.array`, `collections.deque` and `memoryview`
* _Immutable sequences_: `tuple`, `str` and `bytes`

## List Comprehensions and Generator Expressions
more readable

### List Comprehensions (ListComp)

In [3]:
# Example 2.1 Build a list of Unicode codepoints from a string

symbols = '$¢£¥€¤'
codes = []
for symbol in symbols:
    codes.append(ord(symbol))

print(codes)

[36, 162, 163, 165, 8364, 164]


In [4]:
# Example 2.2 build a list of Unicode codepoints using ListComp
codes = [ord(symbol) for symbol in symbols]
print(codes)

[36, 162, 163, 165, 8364, 164]


In [6]:
# Example 2.3 listcomp vs map/filter
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
print(beyond_ascii)

beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
print(beyond_ascii)

[162, 163, 165, 8364, 164]
[162, 163, 165, 8364, 164]


### Cartesian Product

In [12]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors 
                         for size in sizes]
pp.pprint(tshirts)

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]


In [14]:
tshirts = [(color, size) for size in sizes
                         for color in colors]
pp.pprint(tshirts)

[('black', 'S'),
 ('white', 'S'),
 ('black', 'M'),
 ('white', 'M'),
 ('black', 'L'),
 ('white', 'L')]


### Generator Expressions

Genexps use the same syntax as listcomps, but are enclosed in parentheses. The benefit is memory efficiency (not storing the whole list)

In [15]:
tuple(ord(symbol) for symbol in symbols)

(36, 162, 163, 165, 8364, 164)

In [16]:
import array
array.array('I', (ord(symbol) for symbol in symbols))

array('I', [36, 162, 163, 165, 8364, 164])

In [17]:
# Cartesian product in a genexp
for tshirt in (f'{c}, {s}' for c in colors for s in sizes):
    print(tshirt)


black, S
black, M
black, L
white, S
white, M
white, L


## Tuples Are Not Just Immutable Lists

### Tuples as Records
Each item in the tuple holds the data for one field and the position of the item gives its meaning.

In [20]:
lax_coordinates = (33.9425, -118.408056)
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)
traverler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDA205856')]
for passport in sorted(traverler_ids):
    print('{}/{}'.format(passport[0], passport[1]))
for country, _ in traverler_ids:
    print(country)

BRA/CE342567
ESP/XDA205856
USA/31195855
USA
BRA
ESP


### Tuple Unpacking

* parallel assignemnt: assign an iterable to a tuple of variables
* prefix with a star

In [23]:
latitute, longitude = lax_coordinates # parallel assignment
print(f'{latitute}, {longitude}')
# prefix with a star
t = (20, 8)
divmod(*t)

33.9425, -118.408056


(2, 4)

In [24]:
# return multiple values in a function
import os
_, filname = os.path.split('/tmp/tmp')
print(filname)

tmp


Use * to grab excessitems

In [25]:
a, b, *rest = range(5)
print(rest)

a, *body, c, d = range(5)
print(body)

[2, 3, 4]
[1, 2]


### Nested Tuple Unpacking

Make the expression matches the nesting structure.

In [27]:
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))
fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, _, _, (lat, lon) in metro_areas:
    if lon <= 0:
        print(fmt.format(name, lat, lon))

                |   lat.    |   long.  
Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
Sao Paulo       |  -23.5478 |  -46.6358


## Named Tuples

`collections.namedtuple` produces subclasses of `tuple` with field names and a class name.

In [30]:
from collections import namedtuple
City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
print(tokyo)
print(tokyo.population)
print(tokyo.coordinates)

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))
36.933
(35.689722, 139.691667)


Attributes in a named tuple. 
* `_fields`: a tuple of field names in the class.
* `_make(iterable)`: create a named tuple from an iterable
* `_asdict()`: return a `collections.OrderedDict`

In [34]:
print(City._fields)
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
delhi = City._make(delhi_data)
print(delhi)
print(delhi._asdict())
for key, value in delhi._asdict().items():
    print(key + ':', value)


('name', 'country', 'population', 'coordinates')
City(name='Delhi NCR', country='IN', population=21.935, coordinates=LatLong(lat=28.613889, long=77.208889))
OrderedDict([('name', 'Delhi NCR'), ('country', 'IN'), ('population', 21.935), ('coordinates', LatLong(lat=28.613889, long=77.208889))])
name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)


### Tuples as Immutable Lists
`tuple` supports all list methods that do not involve adding or removing items except `__reversed__`.

In [54]:
s = (1, 2, 3)
s2 = (3,)
print(s + s2, s.__add__(s2))

print(2 in s, s.__contains__(2))

print(s.count(1))
print(s[2])
print(s.index(2))
print(len(s))
print(s * 3)
print(3 * s)

(1, 2, 3, 3) (1, 2, 3, 3)
True True
1
3
1
3
(1, 2, 3, 1, 2, 3, 1, 2, 3)
(1, 2, 3, 1, 2, 3, 1, 2, 3)


## Slicing

### Assigning to Slices

In [58]:
l = list(range(15))
print(l)

The length of the assigned list is less than the slice works only when the slice is continouous.

In [59]:
l[2:5] = [20, 30]
print('after l[2:5] = [20, 30].', l)

after l[2:5] = [20, 30]. [0, 1, 20, 30, 6, 7, 8, 9, 10, 11, 12, 13, 14]


In [62]:
# not work
l[3::2] = [11, 22]

ValueError: attempt to assign sequence of size 2 to extended slice of size 5

In [63]:
# not work
l[3::10] = [11, 22, 33, 44]

ValueError: attempt to assign sequence of size 4 to extended slice of size 1

In [65]:
l[3::10] = [11]
print('after l[3::10] = [11].', l)

after l[3::10] = [11]. [0, 1, 20, 11, 6, 7, 8, 9, 10, 11, 12, 13, 14]


In [66]:
del l[5:7]
print('del l[5:7].', l)

del l[5:7]. [0, 1, 20, 11, 6, 9, 10, 11, 12, 13, 14]


## Using + and * with Sequences

* `*`: concantenate multiple copies of the same sequence
* `+`: concantenate two sequences

`+` and `*` creates a new object instead of changing the operands.

In [70]:
l = list(range(1, 4))
print(l * 5)
print(l + l)
print(5 * 'abcd')

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 2, 3, 1, 2, 3]
abcdabcdabcdabcdabcd


### Building Lists of Lists

The key point is do not apply `*` on lists of lists, because `*` only copies the reference to the list.

In [71]:
# correct
board = [['_'] * 3 for i in range(3)]
print(board)
board[1][2] = 'X'
print(board)

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]


In [72]:
# wrong
wired_board = [['_'] * 3] * 3
print(wired_board)
wired_board[1][2] = '0'
wired_board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]


[['_', '_', '0'], ['_', '_', '0'], ['_', '_', '0']]

Use `is` to check `wired_board`'s all elements refer to the same list.

In [74]:
print(wired_board[0] is wired_board[1])
print(board[0] is board[1])

True
False


### Agumented Assignemnt with Sequences

`+=` and `*=` behave depending on the first operand. Take `+=` for example `a += b`.

1. If `__iadd__` is implement, do in-place update.
1. Otherwise, `a = a + b`.

The values are the same for the two cases but 2. changes the id of `a`.

In [78]:
a = [1, 2, 3]
b = [4, 5]
print(f'id(a) = {id(a)}, a = {a}')
a += b
print(f'id(a) = {id(a)}, a = {a}')

a2 = (1, 2, 3)
b2 = (4, 5)
print(f'id(a2) = {id(a2)}, a2 = {a2}')
a2 += b2
print(f'id(a2) = {id(a2)}, a2 = {a2}')


id(a) = 139986533111688, a = [1, 2, 3]
id(a) = 139986533111688, a = [1, 2, 3, 4, 5]
id(a2) = 139986537230056, a2 = (1, 2, 3)
id(a2) = 139987029102000, a2 = (1, 2, 3, 4, 5)


### A += Assignment Puzzler

In [81]:
t = (1, 2, [30, 40])
try:
    t[2] += [50, 60]
except TypeError as err:
    print('get error: ', err)
print(f't = {t}')

get error:  'tuple' object does not support item assignment
t = (1, 2, [30, 40, 50, 60])


Use [Python Visualizer](http://www.pythontutor.com/visualize.html#code=t%20%3D%20%281,%202,%20%5B30,%2040%5D%29%0At%5B2%5D%20%2B%3D%20%5B50,%2060%5D&cumulative=false&curInstr=2&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) to emulate the code.

1. First do in-place add on the list `t[2]`.
1. Assign the new list to `t[2]`. As `t` is immutable, the assignment throws an exception.

In [82]:
import dis
dis.dis('t[2] += [50, 60]')

  1           0 LOAD_NAME                0 (t)
              2 LOAD_CONST               0 (2)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_CONST               1 (50)
             10 LOAD_CONST               2 (60)
             12 BUILD_LIST               2
             14 INPLACE_ADD
             16 ROT_THREE
             18 STORE_SUBSCR
             20 LOAD_CONST               3 (None)
             22 RETURN_VALUE


Three lessons:
1. Putting mutable items in tuples is not a good idea.
1. Augmented assignment is not an atomic operation