### Mutable sequences

list(), bytearray(), array.array(), collections.deque()

### Immutable sequences

tuple(), str(), bytes()

In [2]:
from collections import abc
print(issubclass(tuple, abc.Sequence))
print(issubclass(list, abc.MutableSequence))

True
True


Mutable sequences inherit from immutable sequences and implement few additional methods

### List comprehensions

The goal of listcomps is always to build a new list, unlike normal for loops

In [3]:
x = "abcd"
codes = [ord(c) for c in x]
print(codes)

[97, 98, 99, 100]


**"Walrus operator"** `:=` defines the scope of the target as the enclosing function.
Thanks to that variables assigned with `:=` operator remain accesable outsite their normal scope (eg. for loop)

In [5]:
x = [1, 2, 3, 4]
codes = [last := c for c in x]
print(codes)
print(last)
# print(c) -- returns an error

[1, 2, 3, 4]
4


#### Cartesian product using a listcomps
The items that made cartesian product are tuples made of every combinations of input iterables 

In [7]:
colors = ['black', 'white']
sizes = ['s', 'm', 'l']
tshirts = [(color, size) for color in colors
                          for size in sizes]
print(tshirts)

for color in colors:
    for size in sizes:
        print((color, size))

[('black', 's'), ('black', 'm'), ('black', 'l'), ('white', 's'), ('white', 'm'), ('white', 'l')]
('black', 's')
('black', 'm')
('black', 'l')
('white', 's')
('white', 'm')
('white', 'l')


### Generator expressions

genexps are use to build types other than lists to witch listcomps are used.
Genexps use the same syntax as listcomps, but are rather enclosed in parentheses than brackets

In [9]:
symbols = '!@#$%^&*'
t = tuple(ord(symbol) for symbol in symbols)
print(t)

import array
a = array.array('I', (ord(symbol) for symbol in symbols))
print(a)

(33, 64, 35, 36, 37, 94, 38, 42)
array('I', [33, 64, 35, 36, 37, 94, 38, 42])


Genexps yields items one by one and thanks to this there is no need to store an entire sequence in memory when we dont need it later

### Tuples

Tuples can be used as:

* immutable lists
* records with no field names

*single item tuples must be written with a trailing comma*

##### Tuples as records

Each item in the tuple holds data for one field, and the position of the item gives its meaning

In [11]:
city, year, pop, chg, area = ('Tokyo', 2003, 32_450, 0.66, 8014)

traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDA205856')]
for passport in traveler_ids:
    print('%s/%s' % passport) # % formatting operator understands tuples and treats each item as seperate field

USA/31195855
BRA/CE342567
ESP/XDA205856


##### Tuples as immutable lists

This solution brings 2 key benefits:

* clarity - tuple in code means that its length will never change
* performance - tuple uses less memory than list of the same length

***Attention** immutablility of the tuple applies only to the references contained in it*

In [12]:
a = (10, 'alpha', [1, 2])
b = (10, 'alpha', [1, 2])
print(a == b)
b[-1].append(99)
print(a == b)

True
False


Tuples which elements can change are unhashable.
Unhashable tuple cannot be inserted as dict key, or a set element.

In [13]:
def fixed(o):
    try:
        hash(o)
    except TypeError:
        return False
    return True

tf = (10, 'alpha', (1, 2))
tm = (10, 'alpha', [1, 2])
print(fixed(tf))
print(fixed(tm))

True
False


### Unpacking sequences and iterables

In [15]:
lax_coordinates = (33.9425, -118.408056)
latitude, longitude = lax_coordinates  # unpacking
print(latitude)
print(longitude)

33.9425
-118.408056


In [16]:
a = 1
b = 2
b, a = a, b
print(a)
print(b)

2
1


We can use * to unpack and pass it as input to a function

In [18]:
res = divmod(20, 8)
print(res)

t = (20, 8)
res = divmod(*t)
print(res)

(2, 4)
(2, 4)


#### Using * to grab excess items

In [19]:
a, b, *rest = range(8)
print(a)
print(b)
print(rest)

0
1
[2, 3, 4, 5, 6, 7]


In [21]:
a, *middle, c, d = range(8)
print(a)
print(middle)
print(c)
print(d)

0
[1, 2, 3, 4, 5]
6
7


We can use `*` multiple times in function calls

In [22]:
def fun(a, b, c, d, *rest):
    return a, b, c, d, rest
print(fun(*[1, 2], 3, *range(4, 7)))

(1, 2, 3, 4, (5, 6))


We can also use `*` when defining `list`, `tuple` or `set`

In [23]:
l = [*range(4), 4]
print(l)
s = {*range(4), 4, *(5, 6, 7)}
print(s)

[0, 1, 2, 3, 4]
{0, 1, 2, 3, 4, 5, 6, 7}


#### Nested unpacking

The target of an unpacking can use nesting, e.g., `(a, b, (c, d))`. Python will do the right thing if the value has the same nesting structure.

In [24]:
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]
for name, _, _, (lat, lon) in metro_areas:
    if lon <= 0:
        print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')

Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
São Paulo       |  -23.5478 |  -46.6358


### Patern matching with sequences

In [26]:
# imaginary method to handle robot commands
'''
def handle_command(self, message):
    match message:
        case ['BEEPER', freq, times]:
            self.beep(times, freq)
        case ['NECK', angle]:  3
                self.rotate_neck(angle)
        case ['LED', ident, intensity]:  4
            self.leds[ident].set_brightness(ident, intensity)
        case ['LED', ident, red, green, blue]:  5
            self.leds[ident].set_color(ident, red, green, blue)
        case _:  6
            raise InvalidCommand(message)
'''

"\ndef handle_command(self, message):\n    match message:\n        case ['BEEPER', freq, times]:\n            self.beep(times, freq)\n        case ['NECK', angle]:  3\n                self.rotate_neck(angle)\n        case ['LED', ident, intensity]:  4\n            self.leds[ident].set_brightness(ident, intensity)\n        case ['LED', ident, red, green, blue]:  5\n            self.leds[ident].set_color(ident, red, green, blue)\n        case _:  6\n            raise InvalidCommand(message)\n"

#### match/case vs switch/case

At first match/case might look similar to c switch/case but there is 1 key difference: match/case support **deconstructing** - a more advanced form of unpacking

*deconstructing requires python>=3.10

In [27]:
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
for record in metro_areas:
    match record: # The subject of this match is record
        case [name, _, _, (lat, lon)] if lon <= 0: # case has 2 parts: pattern & optional guard with if keyword
            print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')

                |  latitude | longitude
Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
São Paulo       |  -23.5478 |  -46.6358


Pattern `[name, _, _, (lat, lon)]` matches a sequence with 4 items, where the last item must be 2-item sequence

**important**:
Sequence patterns may be written as tuples or lists or any combination of nested tuples and lists, but it makes **no difference** which syntax we will use. **squere brackets and parentheses mean the same**

A sequence pattern can match instances of most of the actual and virtual subclasses of `collections.abc.Sequence`, except `str`, `bytes` and `bytearray`. A match subject of 1 of those types is treated as an "atomic" value. In order to treat those types as sequences they must be converted in the match cluause.

In [1]:
phone = "231439090"

match tuple(phone):
    case ['1', *rest]:
        print("NA or Carabban")
    case ['2', *rest]:
        print("Africa")
    case ['3' | '4', *rest]:
        print("Europe")
    case _:
        raise Exception("worng value")

Africa


Unlike unpacking, patterns don't destructure iterables that are not sequences (eg. iterators)

The `_` symbol is special in patterns: it matches any single item in that position, but it never bound to the value of the matched item. It is also the only variable that can appear more than once in a pattern

Any part of a pattern can be bind whith a variable using the `as` keyword

`
case [name, _, _, (lat, lon) as coord]:
`

now, given the subject `['Warsow', 'PL', 59.2, (21.9, 37.2)]` the preceding pattern will match, and set the following variables: `name = 'Warsaw'`, `lat = 21.9`, `lon = 37.2` and `coord = (21.9, 37.2)`

Patterns can be make more specific by adding type information. For eg. 

`case [str(name), _, _, (float(lat), float(lon))]:`

will match the same nested sequence as prev. example, but the first item must be an instance of `str`, and both items in the tuple must be instances of `float`

*those expresions looks like constructor calls, but in context of patterns, that syntax performs a runtime type check, not a conversion of types*

To match any subject sequence that starts with `str` and ends with nested seq. of 2 `flaot`s:

`case [str(name), *_, (float(lat), float(lon))]:`

This solution would allow to match an arbitrary length seq. without binding middle elements to any variable, thanks to using `*_`, on the other hand using `*extra` would bound those items to a var named `extra` as a `list` with 0 or more items

Pattern matching is an example of declarative programming: the code describes **what** you want to match, not **how** to match it. The shape of the code follows the shape of the data

### Slicing

##### Why Slices and Ranges Exclude the last item

* easy to see the lenght of a slice or range when only stop position is given. eg. `range(3)`, `my_list(:3)`
* easy to compute len of a slice or range when start and stop are given. `stop - start`
* easy to split a seq into 2 parts at any index `x`, without overlapping. `my_list[:x]` and `my_list[x:]`

In [3]:
l = [10, 20, 30, 40, 50, 60]
print(l[:2]) # split at 2
print(l[2:])
print(l[:3]) # split at 3
print(l[3:])


[10, 20]
[30, 40, 50, 60]
[10, 20, 30]
[40, 50, 60]


#### Slice Objects

syntax of slice: `s[start:stop:step]`

The notation `a:b:c` is only valid within `[]` and it produces a slice object: `slice(a, b, c)`

In [5]:
s = '123456789'
print(s[::3])
print(s[::-1])
print(s[::-2])

147
987654321
97531


###### Naming slices


In [6]:
invoice = """
    0.....6.................................40........52...55........
    1909  Pimoroni PiBrella                     $17.50    3    $52.50
    1489  6mm Tactile Switch x20                 $4.95    2     $9.90
    1510  Panavise Jr. - PV-201                 $28.00    1    $28.00
    1601  PiTFT Mini Kit 320x240                $34.95    1    $34.95
    """
SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY =  slice(52, 55)
ITEM_TOTAL = slice(55, None)
line_items = invoice.split('\n')[2:]
for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

        $17. 09  Pimoroni PiBrella             
         $4. 89  6mm Tactile Switch x20        
        $28. 10  Panavise Jr. - PV-201         
        $34. 01  PiTFT Mini Kit 320x240        
 


Slices are not just useful to extract information from sequences, they can also be used to change mutable seq. in place - without rebuilding them from scratch

Mutable sequences can be modified in place using slice notation on the lefthand side of an assigment statement or as the target of a `del` statement

In [7]:
l = list(range(10))
print(l)
l[2:5] = [20, 30]
print(l)
del l[5:7]
print(l)
l[3::2] = [11, 22]
print(l)
l[2:5] = [100]
print(l)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 20, 30, 5, 6, 7, 8, 9]
[0, 1, 20, 30, 5, 8, 9]
[0, 1, 20, 11, 5, 22, 9]
[0, 1, 100, 22, 9]


##### Concatenation using `+` & `*` with sequences

* both operands of `+` must be of the same sequence type
* neither of them is modified
* new seq. of the same type is created as result of concatenation

*To concat multiple copies of the same seq. multiply it by int*

***both `+` and `*` create new object, and never change their operands***

In [8]:
l = [1, 2, 3]
print(l * 5)

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]


In [9]:
print(5 * 'abcd')

abcdabcdabcdabcdabcd


**Warning**

Using `*` on sequences containing mutable types result in sequence containing multiple references to the same sequence. eg. `my_list = [[]] * 3` result in a list containing 3 references to the same inner list

In [10]:
wrong_board = [['_'] * 3] * 3
print(wrong_board)
wrong_board[1][2] = 'X'
print(wrong_board)

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
[['_', '_', 'X'], ['_', '_', 'X'], ['_', '_', 'X']]


When we need to init a list containing a certain num of nested lists, the best way of doing so is with list comprehension

In [11]:
board = [['_'] * 3 for i in range(3)]
print(board)
board[1][2] = 'X'
print(board)

[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]
[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]


##### Augmented Assigment with sequences

The augmented assignment operators `+=` & `*=` behave differently, depending on the first operand.

The special method that makes `+=` work is `__iadd__` (in-place addition). However, if `__iadd__` is not implemented, python is calling `__add__`.

For eg. simple expression `a += b` in case of a is mutable seq. (e.g. list, bytearray, itp.), which implement `__iadd__`, a will be modified in place but if a is immutable seq. (e.g. tuple), which does not implement `__iadd__`, the `__add__` method will be called, resulting the same effect as `a = a + b` which as a result produces a new object which is assigned to a. The same applies to `*=` which is implemented with `__imul__`

In [17]:
l = [1, 2, 3] # mutable type
print(id(l))

1971626248768


In [18]:
l *= 2
print(id(l))

1971626248768


In [19]:
t = (1, 2, 3) # immutable type
print(id(t))

1971626121984


In [20]:
t *= 2
print(id(t))

1971625752512


Repeated concat of **immutable** seq. is **inefficient**, cause instead of appending new items, the interpreter has to copy the whole target seq. to create a new one with the new items

In [29]:
t = (1, 2, [30, 40])
t[2] += [50, 60]


TypeError: 'tuple' object does not support item assignment

In [30]:
print(t)

(1, 2, [30, 40, 50, 60])


In [32]:
import dis
dis.dis('s[a] += b')

  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (s)
              4 LOAD_NAME                1 (a)
              6 COPY                     2
              8 COPY                     2
             10 BINARY_SUBSCR
             20 LOAD_NAME                2 (b)
             22 BINARY_OP               13 (+=)
             26 SWAP                     3
             28 SWAP                     2
             30 STORE_SUBSCR
             34 LOAD_CONST               0 (None)
             36 RETURN_VALUE


#### list.sort vs sorted built-in

The `list.sort` method sorts a list in place - without making a copy.

In contrast, the built-in func `sorted` creates a new list and returns it.

*By convention all all func and methods that change object in place should return `None` to make it clean that no new object was created.*

Python's sorting algorith is **stable** - preserves the relative order of equall items

In [36]:
fruits = ['grape', 'raspberry', 'apple', 'banana']
print(sorted(fruits))

['apple', 'banana', 'grape', 'raspberry']


In [39]:
print(sorted(fruits, reverse=True))

['raspberry', 'grape', 'banana', 'apple']


In [40]:
print(sorted(fruits, key=len))

['grape', 'apple', 'banana', 'raspberry']


In [41]:
print(fruits)

['grape', 'raspberry', 'apple', 'banana']


In [43]:
fruits.sort()
print(fruits)

['apple', 'banana', 'grape', 'raspberry']


*By default, python sorts strings by character code. - that means ASCII upercase letters come before lowercase*

#### Managing ordered sequences with bisect

The `bisect` module offers 2 main func that use **binary search** alg. to quickly find and insert items in any sorted seq.

##### Searching with bisect

`bisect(haystack, needle)` performs a bin search for a needle in a haystack and returns an index at which the needle could be inserted while mantaining sorted ascending order.

In [45]:
import bisect
haystack = [1, 2, 3, 4, 7, 9, 20, 23]
needle = 3
print(bisect.bisect(haystack, needle))

3


In [47]:
needle = 10
print(bisect.bisect(haystack, needle))

6


#### Inserting with bisect

`insort(seq, item)` inserts item into seq that seq. keep being sorted in ascending order

In [49]:
import bisect
haystack = [1, 2, 3, 4, 7, 9, 20, 23]
bisect.insort(haystack, 3)
print(haystack)
bisect.insort(haystack, 22)
print(haystack)
bisect.insort(haystack, 5)
print(haystack)

[1, 2, 3, 3, 4, 7, 9, 20, 23]
[1, 2, 3, 3, 4, 7, 9, 20, 22, 23]
[1, 2, 3, 3, 4, 5, 7, 9, 20, 22, 23]


### Alternatives to `list`

* `array` saves a lot of memory when we need to store a lot of primitive types
* when we are adding and removing a lot of items from start and end we might use `deque`
* when we need to frequently check presence of item, a good option might be `set`

#### Arrays

Arrays support **all** mutable seq. operations (including: `.pop`, `.insert` and `.extend`) and additional methods for fast loading and saving, such as `.frombytes` & `.tofile`.

When creating an `array` we specify typecode - a letter representing underlying C type used to store each elem. in array. For e.g. `array('b')` crates an array in which every item is of type `signed char`

`array` type does not have an in-place `sort` method, to sort the array you need to use built-in `sorted` func

In [51]:
from array import array
from random import random

floats = array('d', (random() for i in range(10**7)))
print(floats[-1])

0.11043195693652919


In [52]:
'''
    fp = open('floats.bin', 'wb')
    floats.tofile(fp) # save array to file
    fp.close()
    floats2 = array('d') # create an empty array of doubles
    fp = open('floats.bin', 'rb')
    floats2.fromfile(fp, 10**7) # read 10 milion nums from the bin file
    fp.close()
'''

"\n    fp = open('floats.bin', 'wb')\n    floats.tofile(fp) # save array to file\n    fp.close()\n    floats2 = array('d') # create an empty array of doubles\n    fp = open('floats.bin', 'rb')\n    floats2.fromfile(fp, 10**7) # read 10 milion nums from the bin file\n    fp.close()\n"

#### Deques and other queues

*deque can be bouded - created with max length, when we reach max len and we want to add another elem. the elem that was added earliest will be removed*

In [55]:
from collections import deque
dq = deque(range(10), maxlen=10)
print(dq)

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)


`deque.rotate(n)` takes n elems. form right end (when n > 0) and appends them to the left, when n < 0 takes form left and append to right

In [56]:
dq.rotate(3)
print(dq)

deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)


In [58]:
dq.rotate(-4)
print(dq)

deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)


In [59]:
dq.appendleft(-1)
print(dq)

deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)


In [60]:
dq.extend([11, 22, 33])
print(dq)

deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)


In [61]:
dq.extendleft([10, 20, 30, 40])
print(dq)

deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)


`deque` implements most of the `list` methods, and adds a few specific to its design, like `popleft` and `rotate`.

*removing items from middle of deque is not fast. deque is optimised for appending and popping from the ends*

#### Other implementations of queues

* `queue` provides synchronized classes `SimpleQueue`, `Queue`, `LifoQueue`, and `PriorityQueue`. These can be used for safe communication bet. threads. All besides `SimpleQueue` can be bounded by `maxsize` arg., but they don't discard items to make room as `deque` does. Instead, when the queue is full, the insertion waits until e.g some other thread makes room by poping item from it.
* `multiprocessing` implements its own unbouded `SimpleQueue` and bounded `Queue`, both very similar to those in `queue` package, but designed for multiprocess communication. Additionally a `multiprocessing.JoinableQueue` is implemented for task management.
* `asyncio` provides `Queue`, `LifoQueue`, `PriorityQueue` and `JoinableQueue` inspired by `queue` and `multiprocessing` modules, but adopted for managing tasks in asynchronous programming.
* `heapq` does not implement a queue class, but instead provides `heappuch` & `heappop` functions that allow to use mutable sequences as a **heap queue** or **prirority queue**