# An Array of Sequences

## Overview of Built-In Sequences
Sequence types 분류 방법

* **Container sequences:** list, tuple처럼 다른 타입의 원소를  담을 수 있음
* **Flat sequences:**  str, bytes, bytearray, memoryview, array.array처럼 한가지 타입만 담을 수 있음


* **Mutable sequences:** list, bytearray, array.array 같은 가변성
* **Immutable sequences:** tuple, str, bytes 같은 불변성

[collections.abc - Abstract Base Classes for Containers](https://docs.python.org/3/library/collections.abc.html)  
[collections.abc UML class diagram](https://bugs.python.org/file47357/base.png)

## List Comprehensions and Generator Expressions
#### Example 2-1. Build a list of Unicode codepoints from a string

In [1]:
symbols = '$¢£¥€¤'
codes = []
for symbol in symbols:
    codes.append(ord(symbol))
codes

#### Example 2-2. Build a list of Unicode codepoints from a string, take two

In [2]:
symbols = '$¢£¥€¤'
codes = [ord(symbol) for symbol in symbols]
codes

#### Example 2-3. The same list built by a listcomp and a map/filter composition

In [3]:
symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
beyond_ascii

In [4]:
beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
beyond_ascii

#### map and filter were not always faster than the equivalent listcomps

In [5]:
import timeit

TIMES = 10000

SETUP = """
symbols = '$¢£¥€¤'
def non_ascii(c):
    return c > 127
"""

def clock(label, cmd):
    res = timeit.repeat(cmd, setup=SETUP, number=TIMES)
    print(label, *('{:.3f}'.format(x) for x in res))

clock('listcomp        :', '[ord(s) for s in symbols if ord(s) > 127]')
clock('listcomp + func :', '[ord(s) for s in symbols if non_ascii(ord(s))]')
clock('filter + lambda :', 'list(filter(lambda c: c > 127, map(ord, symbols)))')
clock('filter + func   :', 'list(filter(non_ascii, map(ord, symbols)))')

listcomp        : 0.013 0.014 0.014 0.013 0.014
listcomp + func : 0.019 0.018 0.019 0.019 0.019
filter + lambda : 0.015 0.015 0.015 0.015 0.016
filter + func   : 0.015 0.014 0.016 0.014 0.016


### Cartesian Products
#### Example 2-4. Cartesian product using a list comprehension

In [6]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes]

tshirts

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

In [7]:
for color in colors:
    for size in sizes:
        print((color, size))

('black', 'S')
('black', 'M')
('black', 'L')
('white', 'S')
('white', 'M')
('white', 'L')


In [8]:
tshirts = [(color, size) for size in sizes for color in colors]

tshirts

[('black', 'S'),
 ('white', 'S'),
 ('black', 'M'),
 ('white', 'M'),
 ('black', 'L'),
 ('white', 'L')]

### Generator Expressions

#### Example 2-5. Initializing a tuple and an array from a generator expression

In [9]:
symbols = '$¢£¥€¤'
tuple(ord(symbol) for symbol in symbols)

(36, 162, 163, 165, 8364, 164)

In [10]:
import array

array.array('I', (ord(symbol) for symbol in symbols))

array('I', [36, 162, 163, 165, 8364, 164])

#### Example 2-6. Cartesian product in a generator expression

In [11]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
for tshirt in (f'{c} {s}' for c in colors for s in sizes):
    print(tshirt)

black S
black M
black L
white S
white M
white L


전체 조합을 변수에 저장하지 않고 대상 원소만 참조할 수 있음-> 메모리 절약

## Tuples Are Not Just Immutable Lists

튜플은 단순한 '불변 리스트'가 아님. 다양하게 사용 가능

### Tuples as records
#### Example 2-7. Tuples used as records

In [12]:
city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014) # Tuple Unpacking
travler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDA205856')]

In [13]:
for passport in sorted(travler_ids):
    print('%s/%s' % passport)

BRA/CE342567
ESP/XDA205856
USA/31195855


In [14]:
for country, _ in travler_ids:
    print(country)

USA
BRA
ESP


In [15]:
lax_coordinates = (33.9425, -118.408056)
lat, long = lax_coordinates

In [16]:
lat

33.9425

In [17]:
long

-118.408056

### Nested Tuple Unpacking

#### Example 2-8. Unpacking nested tuples to access the longitude

In [18]:
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))

fmt = '{:15} | {:9.4f} | {:9.4f}'
for name, cc, pop, (latitude, longitude) in metro_areas:
    if longitude <= 0:
        print(fmt.format(name, latitude, longitude))

                |   lat.    |   long.  
Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
Sao Paulo       |  -23.5478 |  -46.6358


### Named Tuples
**collections.namedtuple:** 튜플에 필드명과 클래스명을 추가한 서브클래스  
필드명이 클래스에 저장 -> 기존 튜플과 같은 크기의 메모리 사용

#### 2-9  Defining and using a named tuple type

In [19]:
from collections import namedtuple

City = namedtuple('City', 'name country population coordinates')
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
tokyo

City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))

In [20]:
tokyo.population

36.933

In [21]:
tokyo.coordinates

(35.689722, 139.691667)

In [22]:
tokyo[1]

'JP'

#### 2-10. Named tuple attributes and methods

In [23]:
City._fields

('name', 'country', 'population', 'coordinates')

In [24]:
LatLong = namedtuple('LatLong', 'lat long')
delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889))
delhi = City._make(delhi_data)
delhi._asdict()

OrderedDict([('name', 'Delhi NCR'),
             ('country', 'IN'),
             ('population', 21.935),
             ('coordinates', LatLong(lat=28.613889, long=77.208889))])

In [25]:
for key, value in delhi._asdict().items():
    print(key + ':',value)

name: Delhi NCR
country: IN
population: 21.935
coordinates: LatLong(lat=28.613889, long=77.208889)


## Slicing
### Slice Objects

#### Example 2-11. Line items from a flat-file invoice

In [26]:
invoice = """
0.....6.................................40........52...55........
1909  Pimoroni Pimoroni PiBrella        $17.50      3     $52.5
1489  6mm Tactile Switch x20            $4.95       2     $9.9
1510  Panavise Jr. - PV-201             $28.00      1     $28
1601  PiTFT Mini Kit 320x240            $34.95      1 $34.95
"""

SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

$17.50       Pimoroni Pimoroni PiBrella        
$4.95        6mm Tactile Switch x20            
$28.00       Panavise Jr. - PV-201             
$34.95       PiTFT Mini Kit 320x240            
 


### Building Lists of Lists
#### Example 2-12. A list with three lists of length 3 can represent a tic-tac-toe board

In [27]:
board = [['_'] * 3 for i in range(3)]
board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [28]:
board[1][2] = 'X'
board

[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]

In [29]:
# same as example 2-12
board = []

for i in range(3):
    row = ['_'] * 3 #
    board.append(row)

board[2][0] = 'X'

board

[['_', '_', '_'], ['_', '_', '_'], ['X', '_', '_']]

#### Example 2-13. A list with three references to the same list is useless

In [30]:
weird_board = [['_'] * 3] * 3
weird_board

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]

In [31]:
weird_board[1][2] = '0'
weird_board

[['_', '_', '0'], ['_', '_', '0'], ['_', '_', '0']]

In [32]:
# same as 2-13
row = ['_'] * 3
board = []
for i in range(3):
    board.append(row)
board[1][2] = '0'
board

[['_', '_', '0'], ['_', '_', '0'], ['_', '_', '0']]

## Augmented Assignment with Sequences
+= 같은 복합할당 연산자.  
\__iadd\__ 특수 메서드로 작동.  
근데 iadd가 구현되지 않은 객체(불변성 시퀀스)는 add로 작동한다고 함. a = a + b의 꼴로.  
이럴 경우 a+b 연산 후 a라는 새로운 객체를 생성해 할당함.  
iadd의 구현 여부에 따라 +=가 객체의 정체성을 바꿀 수도 있음. 

In [33]:
l = [1,2,3]
print(id(l))
l *= 2
print(id(l))

93464376
93464376


In [34]:
t = (1,2,3)
print(id(t))
t *= 2
print(id(t))

90942200
70069808


#### Example 2-14. A riddle
``` Python
# a부터 d 중 옳은 답
t = (1, 2, [30, 40])
t[2] += [50, 60]
```

a. t는 [1, 2, [30, 40, 50, 60]이 된다.  
b. TypeError 발생. 'tuple' object does not support item assignment  
c. a와 b 모두 틀림.  
d. a와 b 모두 맞음.  

In [35]:
t = (1, 2, [30, 40])
t[2] += [50, 60]

TypeError: 'tuple' object does not support item assignment

In [36]:
t

(1, 2, [30, 40, 50, 60])

에러는 발생했는데 값은 바뀜.  
값 연산 후 생성한 새 객체를 튜플에 넣고 보니 가변형이라 에러가 뜸.

#### Example 2-16. Bytecode for the expression s[a] += b

In [37]:
import dis
dis.dis('s[a] += b')

  1           0 LOAD_NAME                0 (s)
              2 LOAD_NAME                1 (a)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_NAME                2 (b)
             10 INPLACE_ADD
             12 ROT_THREE
             14 STORE_SUBSCR
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE


1. 6에서 s[a]를 가져옴
2. 10에서 iadd 연산 수행(리스트면 가변객체니까 성공)
3. 근데 튜플이라서 iadd가 구현 안돼서 add 써야됨. s[a] = s[a] + b 해서 새 가변형 객체 만들어놨더니 할당할 대상이 튜플이라 못 넣음 -> 에러

### list.sort and the sorted Build-In Function

list.sort는 새로운 객체를 만들지 않고 리스트 내부를 변경. 그래서 None 반환.  
random.choice도 똑같음. 
근데 sorted()는 새로운 객체 반환함. -> sorted는 불변형, 제네레이터 포함 모든 iterable한 객체 인자로 넣기 가능. 

### Managing Ordered Sequences with bisect
bisect.bisect: 이진탐색 시퀀스 정렬  
bisect.insort: 정렬된 시퀀스 안에 항목 삽입

#### Example 2-17. bisect finds insertion points for items in a sorted sequence

In [38]:
import bisect
import sys

HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]
NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]

ROW_FMT = '{0:2d} @ {1:2d}     {2}{0:<2d}'

def demo(bisect_fn):
    for needle in reversed(NEEDLES):
        position = bisect_fn(HAYSTACK, needle)
        offset = position * '  |'
        print(ROW_FMT.format(needle, position, offset))
        
if __name__ == '__main__':
    if sys.argv[-1] == 'left':
        bisect_fn = bisect.bisect_left
    else:
        bisect_fn = bisect.bisect
        
print('DEMO:', bisect_fn.__name__)
print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK))
demo (bisect_fn)

DEMO: bisect_right
haystack ->  1  4  5  6  8 12 15 20 21 23 23 26 29 30
31 @ 14       |  |  |  |  |  |  |  |  |  |  |  |  |  |31
30 @ 14       |  |  |  |  |  |  |  |  |  |  |  |  |  |30
29 @ 13       |  |  |  |  |  |  |  |  |  |  |  |  |29
23 @ 11       |  |  |  |  |  |  |  |  |  |  |23
22 @  9       |  |  |  |  |  |  |  |  |22
10 @  5       |  |  |  |  |10
 8 @  5       |  |  |  |  |8 
 5 @  3       |  |  |5 
 2 @  1       |2 
 1 @  1       |1 
 0 @  0     0 


#### Example 2-18. Given a test score, grade returns the corresponding letter grade


In [39]:
def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
    i = bisect.bisect(breakpoints, score)
    return grades[i]
[grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

['F', 'A', 'C', 'C', 'B', 'A', 'A']

### Inserting with bisect.insort
#### Example 2-19. Insort keeps a sorted sequence always sorted

In [40]:
import bisect
import random

SIZE = 7
random.seed(1234)
my_list = []

for i in range(SIZE):
    new_item = random.randrange(SIZE*2)
    bisect.insort(my_list, new_item)
    print('%2d ->' % new_item, my_list)

12 -> [12]
 7 -> [7, 12]
 1 -> [1, 7, 12]
 0 -> [0, 1, 7, 12]
 1 -> [0, 1, 1, 7, 12]
12 -> [0, 1, 1, 7, 12, 12]
 9 -> [0, 1, 1, 7, 9, 12, 12]


## When a List Is Not the Answer

리스트가 짱 편하긴 한데 답은 아닐 때.  
예) 데이터가 짱 많을 때 -> 배열, FIFO or LILO -> deque

### Arrays

#### Example 2-20. Creating, saving, and loading a large array of floats

In [41]:
from array import array
from random import random

floats = array('d', (random() for i in range(10**7)))
floats[-1]

0.11746189079407765

In [42]:
fp = open('floats.bin', 'wb')
floats.tofile(fp)
fp.close()

floats2 = array('d')
fp = open('floats.bin', 'rb')
floats2.fromfile(fp, 10**7)
fp.close()
floats2[-1]

0.11746189079407765

In [43]:
floats2 == floats

True

### Memory Views
공유 메모리 시퀀스. bytes를 복사하지 않고 배열의 슬라이스를 다룰 수 있게 해줌.
[When should a memoryview bne used?](https://stackoverflow.com/questions/4845418/when-should-a-memoryview-be-used/)

#### Example 2-21. Changing the value of an array item by poking one of its bytes

In [44]:
numbers = array('h', [-2, -1, 0, 1, 2])
memv = memoryview(numbers)

len(memv)

5

In [45]:
memv[0]

-2

In [46]:
memv_oct = memv.cast('B')

In [47]:
memv_oct.tolist()

[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]

In [48]:
memv_oct[5] = 4
numbers

array('h', [-2, -1, 1024, 1, 2])

signed short('h') 5개 있는 배열 생성 후 unsigned char('B')로 형변환. 2바이트 unsigned int의 최상위 바이트에서 4는 1024(1024 + 0) 

### NumPy and SciPy
#### Example 2-22. Basic operations with rows and columns in a numpy.ndarray

In [49]:
import numpy
a = numpy.arange(12)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [50]:
type(a)

numpy.ndarray

In [51]:
a.shape

(12,)

In [52]:
a.shape = 3, 4

In [53]:
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [54]:
a[2]

array([ 8,  9, 10, 11])

In [55]:
a[2, 1]

9

In [56]:
a[:, 1]

array([1, 5, 9])

In [57]:
a.T

array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

In [58]:
a.transpose()

array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

### Deques and Other Queues
#### Example 2-23. Working with d deque

In [59]:
from collections import deque

dq = deque(range(10), maxlen=10)
dq

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [60]:
dq.rotate(3)
dq

deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6])

In [61]:
dq.rotate(-4)
dq

deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])

In [62]:
dq.appendleft(-1)
dq

deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [63]:
dq.extend([11, 22, 33])
dq

deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33])

In [64]:
dq.extendleft([10,20,30,40])
dq

deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8])