# 02 An Array of Sequences
Some notes, observations and questions along chapter 02.

### List Comprehensions and Generator Expressions
Local scope within list comprehensions

In [2]:
word = 'ABC'
codes = [ord(letter) for letter in word]
word

'ABC'

In [3]:
codes

[65, 66, 67]

Assignment Expressions

In [4]:
# The walrus operator (:=) allows assignments within an expression, making the variable 'last' accessible after the list comprehension:
word = 'ABC'
codes = [last := ord(letter) for letter in word]
last  # 'last' binds to the last value in the list comprehension

67

We can't use the walrus operator (= "Assignment Expression") within functions:

In [5]:
def func(a):
    b := 1
    return a+b

func(5) # fails, we would have to use "global" here

SyntaxError: invalid syntax (4261172643.py, line 2)

This is because the named variable is the whole expression. The main idea behind this is to bind the (possibly) expensively computated expression to a name and then re-use it as part of calculating the corresponding value.([ref](https://peps.python.org/pep-0572/))

In an expression, the named variable takes on every value during the iteration, but the variable will ultimately hold the value from the last iteration.

If we want to store certain values, we need to store them during the iteration:

In [6]:
word = "hello"
lasts = []
# since `append` method always returns `None`, we can use logical `or` to keep the list comprehension going:
codes = [lasts.append(last := ord(letter)) or last for letter in word]
lasts

[104, 101, 108, 108, 111]

This reminds me on this [example on late binding closures](https://github.com/StefanieSenger/Playground/blob/main/late_binding_clausures.py) from the Hitchhikers Guide to Python book, though I can't fully entangle what they have in common and where they differ:

In [7]:
def create_multipliers():
    return [lambda x : i * x for i in range(5)] # create a list of lambda functions

for multiplier in create_multipliers():
    print(multiplier(2), end=", ") # all the same results because of late binding in closures

8, 8, 8, 8, 8, 

A little research makes it seem that with lambda expressions in a list comprehension, the binding of the lambda expressions are late bindings and the walrus operator assignment are immediate bindings that are updated until the end of the comprehension is reached and the last value remains. But isn't it dependant of from where you look? From within the list comprehension in the walrus example and from within the lambda frunction, these are immediate bindings, but from outside they are both late bindings?

Comparing list comprehensions with map/filter-composition

In [8]:
symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
beyond_ascii

[162, 163, 165, 8364, 164]

In [9]:
beyond_ascii = list(filter(lambda c: c > 127, map(ord, symbols)))
beyond_ascii

[162, 163, 165, 8364, 164]

Cartesian products in list comprehensions

In [10]:
# example
colors = ['black', 'white']
sizes = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes]
tshirts

[('black', 'S'),
 ('black', 'M'),
 ('black', 'L'),
 ('white', 'S'),
 ('white', 'M'),
 ('white', 'L')]

Generator expressions

In [11]:
# simple example
symbols = '$¢£¥€¤'
# no need to duplicate the parenthesis, since the generator expression is the only argument in a function call
tuple(ord(symbol) for symbol in symbols) # tuple() function extracts all the results from the generator object

(36, 162, 163, 165, 8364, 164)

In [12]:
# example for cartesian product
colors = ['black', 'white']
sizes = ['S', 'M', 'L']

for tshirt in ('%s %s' % (c, s) for c in colors for s in sizes):
    print(tshirt)

# this time iterating over a generator object instead of using tuple() to extract and pack into a tuple

black S
black M
black L
white S
white M
white L


### Tuples are not just immutable lists
Tuples used as records (order matters since the meaning of each field is given by its position in the tuple)

In [13]:
# example to demonstrate the use of tuples as records:
lax_coordinates = (33.9425, -118.408056)
city, year, pop, chg, area = ('Tokyo', 2003, 32_450, 0.66, 8014)
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDA205856')]

for passport in sorted(traveler_ids):
    print('%s/%s' % passport)

BRA/CE342567
ESP/XDA205856
USA/31195855


Tuples used as immutable lists
- code readability (expect that content of tuple is never changed)
- better performance than a list

- immutability only refers to the references contained in a tuple
- if tuple contains mutable type, this one can still change (cause of bugs)

In [14]:
a = (10, 'alpha', [1, 2])
b = (10, 'alpha', [1, 2])
a == b

True

In [15]:
b[-1].append(99)
a == b

False

In [16]:
# only hashable tuples can be used as dict keys
# to find out if a tuple (or any type) is hashable:

def fixed(o):
    try:
        hash(o)
    except TypeError:
        return False
    return hash(o)


tf = (10, 'alpha', (1, 2))  # Contains no mutable items
tm = (10, 'alpha', [1, 2])  # Contains a mutable item (list)
fixed(tf)

-3126647474178173494

Or, more explixitely:

In [17]:
tf.__hash__()

-3126647474178173494

In [18]:
tm.__hash__() # note this tuple too has a hash method implemented

# how would I be able to access the code of tm.__hash__()?

TypeError: unhashable type: 'list'

### Unpacking Sequences and Iterables

In [19]:
# examples of two different unpackings

t = (20, 8)
divmod(*t) # divmod expects two arguments

quotient, remainder = divmod(*t) # and returns two arguments
quotient, remainder

(2, 4)

Using * to grab excess items

In [20]:
# collecting all remaining args into a list:

a, b, *rest = range(5)
a, b, rest

(0, 1, [2, 3, 4])

In [21]:
a, b, *rest = range(3)
a, b, rest

(0, 1, [2])

In [22]:
a, b, *rest = range(2)
a, b, rest

(0, 1, [])

In [23]:
a, *body, c, d = range(5)
a, body, c, d

(0, [1, 2], 3, 4)

Unpacking with * in Function Calls

In [24]:
# function call with several packings and unpackings

def fun(a, b, c, d, *rest):
    return a, b, c, d, rest


fun(*[1, 2], 3, *range(4, 7)) # nobody should write code like that

(1, 2, 3, 4, (5, 6))

I would love to do some exercises with *args and **kwargs, to have it more intuitively accessible. Can you recommend some?

### Pattern Matching with Sequences
There is pattern matching for other data types as well, here the `case` needs to be a sequence.

This syntax checks for if the `record` matches the `case`(s):

In [25]:
# this would be part of a class
def handle_command(self, message):
    match message:
        case ['BEEPER', frequency, times]:
            self.beep(times, frequency)
        case ['NECK', angle]:
            self.rotate_neck(angle)
        case ['LED', ident, intensity]:
            self.leds[ident].set_brightness(ident, intensity)
        case ['LED', ident, red, green, blue]:
            self.leds[ident].set_color(ident, red, green, blue)
        case _: # good style to define an ill-defined pattern
            raise InvalidCommand(message)
        

        
# obj.handle_command(message=["LED, 2, 100"])

In [26]:
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
    ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),
    ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),
    ('São Paulo', 'BR', 19.649, (-23.547778, -46.635833)),
]

def main():
    print(f'{"":15} | {"latitude":>9} | {"longitude":>9}')
    for record in metro_areas:
        match record:
            case [name, _, _, (lat, lon)] if lon <= 0: # the `_` here has the meaning of a wildcard; `*_` would match a variable number of other values
                print(f'{name:15} | {lat:9.4f} | {lon:9.4f}')
main()

# Note that we can match a record of type tuple with a case of type list. It seems a bit odd, but the Fluent Python book 
# explains than since they're all sequences, it doesn't make such big of a difference. Matching other sequence types with 
# `str`, `bytes` and `bytearray` however is not possible; these need to be converted before using them.

                |  latitude | longitude
Mexico City     |   19.4333 |  -99.1333
New York-Newark |   40.8086 |  -74.0204
São Paulo       |  -23.5478 |  -46.6358


Wow, I think this could be super useful. I am still trying to think of a use case within scikit-learn though. It seems to be made up for communication between different computers/processes communication with signals/API requests, but I imagine it could also be used for filtering data in a pipeline or so.

"One key improvement of match over switch is destructuring—a more advanced form of unpacking." Here is a [blog article](https://blog.ashutoshkrris.in/mastering-list-destructuring-and-packing-in-python-a-comprehensive-guide) dedicated to destructing of lists, though I'm not very sure how this differs from packing and unpacking. Maybe it's just the terminology.

### Slicing

In [27]:
s = 'bicycle'
s[::3] # s.__getitem__(slice(start,stop, step))

'bye'

In [28]:
s[::-2]

'eccb'

In [29]:
s[1::-2] # I would have expected to get 'ecc'

'i'

In [30]:
# with negative slicing, the indices turn around: s[-6::-2] returns the same result
s[-6::-2] 

'i'

Noting: With negative steps, we flip the sequence around, including the order of start and stop in indexing.

In [31]:
type(s[1::-2]) # the return type is a str

str

We can name slices:

In [32]:
invoice = """
0.....6.................................40........52...55........
1909 Pimoroni PiBrella                      $17.50    3    $52.50
1489 6mm Tactile Switch x20                  $4.95    2    $9.90
1510 Panavise Jr. - PV-201                  $28.00    1    $28.00
1601 PiTFT Mini Kit 320x240                 $34.95    1    $34.95
"""

SKU = slice(0, 6)
DESCRIPTION = slice(6, 40)
UNIT_PRICE = slice(40, 52)
QUANTITY = slice(52, 55)
ITEM_TOTAL = slice(55, None)

line_items = invoice.split('\n')[2:]

for item in line_items:
    print(item[UNIT_PRICE], item[DESCRIPTION])

    $17.50   imoroni PiBrella                  
     $4.95   mm Tactile Switch x20             
    $28.00   anavise Jr. - PV-201              
    $34.95   iTFT Mini Kit 320x240             
 


In [33]:
type(slice(6, 40)) # slice object

slice

Multidimensional Slicing and Ellipsis

In [34]:
# we can do multi-dimensional slicing on numpy arrays:

import numpy as np

array = np.array(range(9)).reshape(3,3)
array

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [35]:
array[1:, 1:] 

array([[4, 5],
       [7, 8]])

In [36]:
array[1:, ...] # using the elipsis syntax as a shortcut to array[1:, :], can be used on an arbitrary number of dimensions

array([[3, 4, 5],
       [6, 7, 8]])

In [37]:
# Python sequences however can only be sliced in one dimension:

import array as arr

arr = arr.array('i', [1, 2, 3, 4, 5, 6, 7, 8, 9])
arr[2:5]

array('i', [3, 4, 5])

Assigning to slices

In [38]:
l = list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [39]:
l[2:5] = [20, 30] # items at index 2, 3 and 4 substituted by two new items with values 20 and 30
l

[0, 1, 20, 30, 5, 6, 7, 8, 9]

In [40]:
len(l) # one item has been deleted in the previous operation

9

In [41]:
del l[5:7] # more items deleted
l

[0, 1, 20, 30, 5, 8, 9]

In [42]:
l[3::2] = [11, 22]
l

[0, 1, 20, 11, 5, 22, 9]

In [43]:
l[2:5] = [100] # items at index 2, 3 and 4 substituted by one new item with value 100
l

[0, 1, 100, 22, 9]

### Using + and * with Sequences
Both work with sequences if both operants are of the same type. A new sequence of the same time is returned and none of the operants are changed.

In [64]:
word1 = "dog"
word2 = "cat"

word1 + word2

'dogcat'

In [65]:
num1 = [2]
num2 = [3]

num1 + num2

[2, 3]

In [66]:
5 * "abcd"

'abcdabcdabcdabcdabcd'

Augmented Assignements with sequences

+= and *= however will mostly reassign for mutable sequences:

In [67]:
word1 += "fish"
word1 # word1 (str are immutable) seems to be replaced by a new str

'dogfish'

Quote from book: "However, if __iadd__ is not implemented, Python falls back to calling __add__. [...] However, when a does not implement __iadd__, the expression a += b has the same effect as a = a + b [...]. In general, for mutable sequences, it is a good bet that __iadd__ is implemented and that += happens in place."

It seems to me this is what what must have happened in the `word1 += "fish"` example above, since there doesn't seem to be an `__iadd__()` method. `word1` was reassigned.

Do I rely on a textbook or the documentation to tell me this? Again, how can I inspect what internal Python code is run, when I have not written it myself nor imported it?

In [68]:
hasattr(word1, "__iadd__")

False

In [55]:
dir(word1) # similar results for `str.__dict__`

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'stri

In [92]:
# this is a disassembler for Python code:
dis.dis("word1 += 'fish'")

# not sure how to read the output here and if it even helps

  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (word1)
              4 LOAD_CONST               0 ('fish')
              6 BINARY_OP               13 (+=)
             10 STORE_NAME               0 (word1)
             12 RETURN_CONST             1 (None)


In [73]:
l = ["bla", "tra", "la"]
hasattr(l, "__iadd__")

True

In [74]:
l += ["puff"]
l

['bla', 'tra', 'la', 'puff']

Showing how mutability influences if += and *= will return a new object or modidy the old one:

In [78]:
l = [1, 2, 3]
id(l)

125581445348480

In [77]:
l *= 2
l

[1, 2, 3, 1, 2, 3]

In [79]:
id(l) # same id, because l was modified in place

125581445348480

In [81]:
# that would be different if we deal with an immutable tuple:
t = (1, 2, 3)
id(t)

125581445621376

In [82]:
t *= 2
t

(1, 2, 3, 1, 2, 3)

In [83]:
id(t) # different id, because the object was newly created

125581445605856

"Repeated concatenation of immutable sequences is inefficient"

A += Assignment Puzzler

In [84]:
t = (1, 2, [30, 40])
t[2] += [50, 60] # raises error
t

TypeError: 'tuple' object does not support item assignment

In [85]:
t # but still modifies mutable tuple element!

(1, 2, [30, 40, 50, 60])

Maybe that doesn't matter so much, since the error ends the execution of the code, unless you use a try except block, but then you are aware of what is happening.

In [87]:
import dis

dis.dis('s[a] += b')

  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (s)
              4 LOAD_NAME                1 (a)
              6 COPY                     2
              8 COPY                     2
             10 BINARY_SUBSCR
             14 LOAD_NAME                2 (b)
             16 BINARY_OP               13 (+=)
             20 SWAP                     3
             22 SWAP                     2
             24 STORE_SUBSCR
             28 RETURN_CONST             0 (None)


### list.sort versus the sorted built-In

- `list.sort` is an inplace operation (and returns None as a convention to remind the user that no new object was created)
- build-in function `sorted` returns a new list

In [94]:
fruits = ['grape', 'raspberry', 'apple', 'banana']
sorted(fruits, key=len, reverse=True) # creates a new list

['raspberry', 'banana', 'grape', 'apple']

In [95]:
fruits # unchanged

['grape', 'raspberry', 'apple', 'banana']

In [96]:
fruits.sort()
fruits # changed

['apple', 'banana', 'grape', 'raspberry']

### When a List Is Not the Answer

- Python build-in `array` type saves memory when we have the same (numeric) data type
- use `deque` when we need to add or remove data from both ends often
- use `set` when we need to check often if an item is present in a collection (sets are not sequences, because the ordering of their items is not specefied)

Arrays
- most used methods in common with lists, but more efficient memory (no overhead) and very fast saving to and loading from file methods
- here's a [table for the data types](https://docs.python.org/3/library/array.html)

Question: what are the "b" and "B" types in the data types table?

In [5]:
from array import array
from random import random, seed
seed(10)  # Use seed to make the output consistent

floats = array('d', (random() for i in range(10 ** 7))) # data type "d" is like double in C (float)
floats[-1]

0.8190492979077034

In [6]:
# fast loading and saving to files with .tofile() and . fromfile()
with open('floats.bin', 'wb') as fp:
    floats.tofile(fp)

In [7]:
floats2 = array('d')

with open('floats.bin', 'rb') as fp:
    floats2.fromfile(fp, 10 ** 7)

floats2[-1]

0.8190492979077034

In [8]:
floats == floats2

True

Memory Views
- allows to store slices of arrays without copying bytes
- the view on the same array is changed, so that it can be used in different context such as SQLite databases, numpy arrays and others

In [9]:
# create alternate views on the same array of 6 bytes, to operate on it as a 2×3 matrix or a 3×2 matrix:
octets = array('B', range(6)) # "B" datatype is for unsigned int or byte (unclear to me)
m1 = memoryview(octets)
m1.tolist()

[0, 1, 2, 3, 4, 5]

In [10]:
m2 = m1.cast('B', [2, 3])
m2.tolist()

[[0, 1, 2], [3, 4, 5]]

In [11]:
m3 = m1.cast('B', [3, 2])
m3.tolist()

[[0, 1], [2, 3], [4, 5]]

In [13]:
# we can change `octets` from two different views, but it's the same object
m2[1,1] = 22
m3[1,1] = 33
octets

array('B', [0, 1, 2, 33, 22, 5])

We can also use this to corrupt:

In [14]:
numbers = array('h', [-2, -1, 0, 1, 2]) # data type "h" is for short ints
memv = memoryview(numbers)
len(memv)

5

In [15]:
memv[0]

-2

In [22]:
memv_oct = memv.cast('B')
memv_oct.tolist()

# I can see why the negative values cannot be negative with type "B" (unsigned int) anymore, but why are there more values in `memv_oct` than in `memv`?
# It seems that type "B" is not ints, but bytes ....

[254, 255, 255, 255, 0, 4, 1, 0, 2, 0]

In [19]:
memv_oct[5] = 4
numbers # one of the values in the numbers array has changed

array('h', [-2, -1, 1024, 1, 2])

Deques and other queues
- `list.append()` and `list.pop()` allows us to use a list like a stack with FIFO behaviour, removing an item from the front of a list is costly however, because the whole data needs to be shifted then
- `collections.deque` (a thread safe double ended queue) is optimized for removing and appending on both ends (but performes worse than a list in the middle)

In [23]:
import collections

dq = collections.deque(range(10), maxlen=10) # deques can be bounded to a maxlen; if full a value from the opposing end is removed
dq

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)

In [24]:
dq.rotate(3)
dq

deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)

In [25]:
dq.rotate(-4)
dq

deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)

In [26]:
dq.appendleft(-1)
dq

deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)

In [27]:
dq.extend([11, 22, 33])
dq

deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)

In [28]:
dq.extendleft([10, 20, 30, 40])
dq

deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)