# Python Deep Dive Part 2 - Sequences, Iterables, Iterators, Generators, Context Managers

## Introduction

This course is about

* the Python language, canonical CPython 3.6+ implementation
* the standard library
* becoming an expert Python developer
* idiomatic Python

## Sequences

### Sequence types

What are sequences?

* Sequences in Python refer to ordered sets, representing a positionally ordered collection of items. These can include strings, lists, tuples, bytes, bytearrays, buffers, and range objects.
* An iterable which supports efficient element access using integer indices via the `__getitem__()` special method and defines a `__len__()` method that returns the length of the sequence. Some built-in sequence types are `list`, `str`, `tuple`, and `bytes`. Note that `dict` also supports `__getitem__()` and `__len__()`, but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.
* The `collections.abc.Sequence` abstract base class defines a much richer interface that goes beyond just `__getitem__()` and `__len__()`, adding `count()`, `index()`, `__contains__()`, and `__reversed__()`.

Built-in sequence types

* Mutable
  * `list`
  * `bytearray`
* Immutable
  * `string`
  * `tuple`
  * `range` (more restrictive)
  * `bytes`

Additional standard types

* from `collections`
  * `namedtuple()`
  * `deque`
* from `array`
  * `array`

Iterable type vs sequence type

* An iterable is any Python object capable of returning its elements one at a time. It can be used in a `for` loop, allowing you to iterate over its elements.
* Sequences are a specific type of iterable where the elements have a specific order.
* Sequences support indexing and slicing.
* All sequences are iterables, but not all iterables are sequences.

Standard sequence methods

Operation|Result
---|---
`x in s`|`True` if an item of `s` is equal to `x`, else `False`
`x not in s`|`False` if an item of `s` is equal to `x`, else `True`
`s + t`|the concatenation of `s` and `t`
`s * n` or `n * s`|equivalent to adding `s` to itself `n` times
`s[i]`|`i`th item of `s`, origin `0`
`s[i:j]`|slice of `s` from `i` to `j`, `j` exclusive, returns a new sequence object of the same type
`s[i:j:k]`|slice of `s` from `i` to `j` with step `k`, `j` exclusive
`len(s)`|length of `s`
`min(s)`|smallest item of `s`, if an ordering between elements of `s` is defined
`max(s)`|largest item of `s`, if ordering defined
`s.index(x[, i[, j]])`|index of the first occurrence of `x` in `s` (at or after index `i` and before index `j`)
`s.count(x)`|total number of occurrences of `x` in `s`

Ranges

* Ranges implement all of the common sequence operations except concatenation and repetition (due to the fact that range objects can only represent sequences that follow a strict pattern and repetition and concatenation will usually violate that pattern).
* `min`, `max`, `in`, `not in` are not as efficient.
* The advantage of the `range` type over a regular `list` or `tuple` is that a `range` object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the `start`, `stop` and `step` values, calculating individual items and subranges as needed).

Hashing

* Immutable sequence types may support hashing.
* If immutable sequences contains mutable types, they are not hashable.

Beware of concatenations and repetitions

* When concatenating sequences, make sure that the data types are compatible.
* Concatenations and repetitions create new objects.
* Concatenations and repetition with an empty sequence creates an empty sequence.
* Repetition can lead to unexpected results when working with sequences of (mutable) sequences. It repeats the references to the same inner sequence.
* For more complex operations or when creating new sequences based on existing ones, consider using list comprehensions or other explicit methods for clarity and control.

In [71]:
a = [[0, 0]] * 2 # elements inside a will be the same object
print(a)
print(f'{id(a[0])=}, {id(a[1])=}')
a[0][0] = 1
print(a) # a[1][0] changed too

complexes = [1, 2, 1 + 1j, 3 - 5j, 6 - 9j, 0, 10]
print(min(complexes, key=abs))
print(max(complexes, key=abs))

[[0, 0], [0, 0]]
id(a[0])=2249443173120, id(a[1])=2249443173120
[[1, 0], [1, 0]]
0
(6-9j)


### Mutable sequence types

* Mutating an object means changing the object's state without creating a new object.
* Mutating with `[]`
  * `s[i] = x`: element at index `i` is replaced with `x`
  * `s[i:j] = s2`: slice is replaced by the contents of the *iterable* `s2`
  * `del s[i]`: removes element at index `i`
  * `del s[i:j]`: removes entire slice
  * `s[i:j:k] = s2`: assign `s2` to extended slice (slices with the step / stride argument), sizes of `s2` and the slice should match
* Some methods supported by mutable sequence types such as lists
  * `s.clear()`: removes all items from `s`
  * `s.append(x)`: append `x` to the end of `s`
  * `s.insert(i, x)`: insert `x` at index `i`
  * `s.extend(iterable)`: append contents of `iterable` to the end of `s`
  * `s.pop(i)`: removes and returns element at index `i`
  * `s.remove(x)`: removes the first occurrence of `x` in `s`
  * `s.reverse()`: does an in-place reversal of elements of `s`
  * `s.copy()`: returns a shallow copy

In [72]:
l = [1, 2, 3, 4, 5, 6]
print(f'{l[0]=}')
print(f'{l[True]=}') # subclasses of int work too
l[0] = 100
print(f'{l[0]=}')
l.clear()
print(f'{l=}')
l.extend(range(10))
print(f'{l=}')
print(f'{l[1:7:2]=}') # create subsequence
print(f'{l=}')
l[::2] = ('a', 'b', 'c', 'd', 'e') # assign contents of right-side iterable to extended slice
print(f'{l=}')
l[:4] = ('x', 'y', 'z')
print(f'{l=}')
print(f'{l[::-1]}, {l=}') # reverse copy
del l[:2] # delete elements
print(f'{l=}')
l[0:0] = (0, 0) # inserting elements at specific position
print(f'{l=}')
lc = l[:] # shallow copy
print(f'{(lc is l)=}')
l.insert(0, 'first') # insert at index
print(f'{l=}')
print(f'{l.pop(0)=}, {l=}') # pop at index
l.append('last')
print(f'{l=}')
l.remove('last')
print(f'{l=}')
l.reverse() # reverse in-place
print(f'{l=}')
print(f'{id(l)=}, {id(l.copy())=}') # shallow copy


l[0]=1
l[True]=2
l[0]=100
l=[]
l=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l[1:7:2]=[1, 3, 5]
l=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
l=['a', 1, 'b', 3, 'c', 5, 'd', 7, 'e', 9]
l=['x', 'y', 'z', 'c', 5, 'd', 7, 'e', 9]
[9, 'e', 7, 'd', 5, 'c', 'z', 'y', 'x'], l=['x', 'y', 'z', 'c', 5, 'd', 7, 'e', 9]
l=['z', 'c', 5, 'd', 7, 'e', 9]
l=[0, 0, 'z', 'c', 5, 'd', 7, 'e', 9]
(lc is l)=False
l=['first', 0, 0, 'z', 'c', 5, 'd', 7, 'e', 9]
l.pop(0)='first', l=[0, 0, 'z', 'c', 5, 'd', 7, 'e', 9]
l=[0, 0, 'z', 'c', 5, 'd', 7, 'e', 9, 'last']
l=[0, 0, 'z', 'c', 5, 'd', 7, 'e', 9]
l=[9, 'e', 7, 'd', 5, 'c', 'z', 0, 0]
id(l)=2249439761280, id(l.copy())=2249443147392


### Lists vs tuples

Constant folding

* Constant folding is a compiler optimization technique used to evaluate constant expressions at compile-time rather than runtime.
* The goal is to replace expressions involving constants with their computed values, eliminating redundant computations and potentially improving the performance of the compiled code.
* In the context of programming languages like Python, constant folding often involves simplifying expressions that involve literals or constants.
* Constant folding can be applied to various types of expressions, including arithmetic operations, bitwise operations, and other operations involving constants or literals.
* In interpreted languages like Python, some degree of constant folding might occur during the interpretation process as well, although more extensive optimizations are typically performed by Just-In-Time (JIT) compilers or ahead-of-time (AOT) compilers.

Constant folding vs interning

* Constant folding and interning are both optimization techniques used in programming languages to improve the efficiency of code execution. However, they address different aspects of optimization.
* Constant folding is focused on evaluating constant expressions at compile-time rather than runtime. It involves simplifying expressions involving constants or literals and replacing them with their computed values.
* Interning is the process of reusing existing objects with the same value, reducing memory consumption and improving performance. It is often applied to string literals and small integers.
* Constant folding and interning can be complementary optimizations. For instance, constant folding may involve the creation of new constants, and interning can help avoid unnecessary duplication of identical constant objects.

In [73]:
from dis import dis

dis(compile('(1, 2, 3, "a")', 'string', 'eval'))
dis(compile('(1, 2, 3, ["a"])', 'string', 'eval'))
dis(compile('[1, 2, 3]', 'string', 'eval'))

  0           0 RESUME                   0

  1           2 RETURN_CONST             0 ((1, 2, 3, 'a'))
  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (1)
              4 LOAD_CONST               1 (2)
              6 LOAD_CONST               2 (3)
              8 LOAD_CONST               3 ('a')
             10 BUILD_LIST               1
             12 BUILD_TUPLE              4
             14 RETURN_VALUE
  0           0 RESUME                   0

  1           2 BUILD_LIST               0
              4 LOAD_CONST               0 ((1, 2, 3))
              6 LIST_EXTEND              1
              8 RETURN_VALUE


In [74]:
from timeit import timeit

tuple_timer = timeit('(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)', number=10000000) # tuple with all immutable elements
list_timer = timeit('[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]', number=10000000)
print(f'{tuple_timer=}')
print(f'{list_timer=}')
print(f'list creation is about {list_timer / tuple_timer:.2f} times slower than tuple creation')

tuple_timer=0.1483876000274904
list_timer=0.9959629000513814
list creation is about 6.71 times slower than tuple creation


In [75]:
t1 = (1, 2, 3, 4, 5)
t2 = tuple(t1) # t2 is actually just an alias
print(f'{(t1 is t2)=}')

l1 = [1, 2, 3, 4, 5]
l2 = list(l1) # l2 is a shallow copy of l1
print(f'{(l1 is l2)=}')

(t1 is t2)=True
(l1 is l2)=False


Storage efficiency

In [76]:
import sys

t = tuple()
prev = sys.getsizeof(t)
print('size variation of tuple creation:')
for i in range(10):
    t = tuple(range(i + 1))
    size = sys.getsizeof(t)
    delta, prev = size - prev, size
    print(f'{i+1} items: {size=}, {delta=}')

l = list()
prev = sys.getsizeof(l)
print('\nsize variation of list creation:')
for i in range(10):
    l = list(range(i+1))
    size = sys.getsizeof(l)
    delta, prev = size - prev, size
    print(f'{i+1} items: {size=}, {delta=}')

size variation of tuple creation:
1 items: size=48, delta=8
2 items: size=56, delta=8
3 items: size=64, delta=8
4 items: size=72, delta=8
5 items: size=80, delta=8
6 items: size=88, delta=8
7 items: size=96, delta=8
8 items: size=104, delta=8
9 items: size=112, delta=8
10 items: size=120, delta=8

size variation of list creation:
1 items: size=72, delta=16
2 items: size=72, delta=0
3 items: size=88, delta=16
4 items: size=88, delta=0
5 items: size=104, delta=16
6 items: size=104, delta=0
7 items: size=120, delta=16
8 items: size=120, delta=0
9 items: size=136, delta=16
10 items: size=136, delta=0


In [77]:
import sys

l = list()
prev = sys.getsizeof(l)
print('size variation of list appending:')
print(f'0 items: size={prev}')
for i in range(255):
    l.append(i)
    size = sys.getsizeof(l)
    delta, prev = size - prev, size
    print(f'add {i+1} item: {size=}, {delta=}')

size variation of list appending:
0 items: size=56
add 1 item: size=88, delta=32
add 2 item: size=88, delta=0
add 3 item: size=88, delta=0
add 4 item: size=88, delta=0
add 5 item: size=120, delta=32
add 6 item: size=120, delta=0
add 7 item: size=120, delta=0
add 8 item: size=120, delta=0
add 9 item: size=184, delta=64
add 10 item: size=184, delta=0
add 11 item: size=184, delta=0
add 12 item: size=184, delta=0
add 13 item: size=184, delta=0
add 14 item: size=184, delta=0
add 15 item: size=184, delta=0
add 16 item: size=184, delta=0
add 17 item: size=248, delta=64
add 18 item: size=248, delta=0
add 19 item: size=248, delta=0
add 20 item: size=248, delta=0
add 21 item: size=248, delta=0
add 22 item: size=248, delta=0
add 23 item: size=248, delta=0
add 24 item: size=248, delta=0
add 25 item: size=312, delta=64
add 26 item: size=312, delta=0
add 27 item: size=312, delta=0
add 28 item: size=312, delta=0
add 29 item: size=312, delta=0
add 30 item: size=312, delta=0
add 31 item: size=312, delt

Retrieving efficiency

In [78]:
from timeit import timeit

t = tuple(range(1000000))
l = list(t)
tuple_retrieving = timeit('t[999999]', globals=globals(), number=10000000)
list_retrieving = timeit('l[999999]', globals=globals(), number=10000000)
print(f'{tuple_retrieving=}')
print(f'{list_retrieving=}') # list retrieving is slightly faster on this version of Python

tuple_retrieving=0.31453450000844896
list_retrieving=0.30613600002834573


### Index base and slice bounds

0-based index

* Historical considerations
  * The use of 0-based indexing has historical roots in computer science and programming languages. Some early programming languages, like Fortran and assembly languages, used 1-based indexing. However, languages like C and subsequently Python adopted 0-based indexing.
* Memory addressing
  * In many programming languages, including C, arrays are implemented as contiguous blocks of memory. A 0-based index aligns well with memory addressing, where the first element of an array is located at memory address 0.
* Consistency
  * By using 0-based indexing, there is a consistency in addressing elements. The index represents an *offset* from the start of the array, and the index of the first element is 0.
* Mathematical simplicity
  * Mathematically, it simplifies calculations. For example, if you have an array with length `n`, the last element is at index `n-1`, and there are `n-1` elements before it.

Exclusive upper bound

* Consistency with lengths
  * The length of a range is given by the formula `len(range(start, stop)) = stop - start`. This is consistent with how lengths are calculated in other contexts (e.g., slicing a list).
* Simplifies loop logic
  * When using `range()` in loops, the exclusive upper bound simplifies loop logic. The loop runs as long as the index is less than the specified upper bound, aligning with the desired number of iterations.
* Avoiding off-by-one errors
  * The exclusive upper bound ensures that the range covers elements up to, but not including, the specified upper bound, preventing common mistakes in loop constructs and length calculation.

### Copying sequences

Why copy sequence

* Mutable sequences can be modified.
* Sometimes you want to make sure that whatever sequences you are working with cannot be modified.
* Generally we write functions that do not modify the contents of their arguments.
  * However, to clearly indicate to the caller that something is happening in-place, we should not return the object we modified.
  * If we do not do in-place modification, then we return the modified object.

How to copy a sequence

* Simple loop (similar to shallow copy)
* List comprehension
* The `copy` method (not implemented in immutable types)
* Slicing (slicing on immutable types won't make a copy)
* Constructor (immutable types' constructor won't make a copy)
* The `copy` module (generic `copy` and `deepcopy`)

In [79]:
t1 = (1, 2, 3)
t2 = tuple(t1)
print(f'{(t1 is t2)=}') # True

t3 = t1[:]
print(f'{(t3 is t1)=}') # True

(t1 is t2)=True
(t3 is t1)=True


Shallow copies

* Shallow copy creates a new object, but it only copies the references to the objects within the original data structure, not the objects themselves.
* If the inner objects of the original data structure are mutable (e.g., lists), modifications to these objects will be visible in both the original and the copied structures.
* Shallow copy does not handle circular references well. If your data structure contains circular references, shallow copy might result in unexpected behavior or infinite loops.

Deep copies

* Deep copy recursively copies all the objects referenced by the original object. In other words, it creates a completely independent copy of the original object, including all nested objects, rather than just copying references to them.
* Deep copy handles circular references, where objects refer to each other in a cycle. The `copy` module in Python uses a memo dictionary to keep track of objects already copied, preventing infinite recursion.
* You might consider using a deep copy (`copy.deepcopy()`) if you need complete independence between the original and copied structures, especially when dealing with nested mutable objects or custom classes.

Custom copies

* Custom classes can implement the `__copy__` and `__deepcopy__` method to allow you to override how shallow and deep copies are made for custom objects.

In [80]:
# shallow copy
from copy import copy

l1 = [1, 2, 3, 4]
l2 = []
for e in l1:
    l2.append(e)
print(f'{(l1 == l2)=}, {(l1 is l2)=}')

l2 = [e for e in l1]
print(f'{(l1 == l2)=}, {(l1 is l2)=}')

l2 = l1.copy()
print(f'{(l1 == l2)=}, {(l1 is l2)=}')

l2 = list(l1)
print(f'{(l1 == l2)=}, {(l1 is l2)=}')

l2 = l1[:]
print(f'{(l1 == l2)=}, {(l1 is l2)=}')

l2 = copy(l1)
print(f'{(l1 == l2)=}, {(l1 is l2)=}')

# immutable sequences like tuples and strings
# constructor, slicing, copy.copy function won't make a copy

(l1 == l2)=True, (l1 is l2)=False
(l1 == l2)=True, (l1 is l2)=False
(l1 == l2)=True, (l1 is l2)=False
(l1 == l2)=True, (l1 is l2)=False
(l1 == l2)=True, (l1 is l2)=False
(l1 == l2)=True, (l1 is l2)=False


In [81]:
from copy import copy, deepcopy

class MyClass:
    def __init__(self, a):
        self.a = a

x = MyClass(500)
y = MyClass(x)
lst = [x, y]
lst_cp = deepcopy(lst)

print(f'{(y.a is x)=}')
print(f'{(lst_cp[0] is x)=}')
print(f'{(lst_cp[1] is y)=}')

# in the deep copy of lst, the relationship between its two elements
# remains the same as the original lst which is
# e[1].a is e[0]
print(f'{(lst_cp[1].a is lst_cp[0])=}')

(y.a is x)=True
(lst_cp[0] is x)=False
(lst_cp[1] is y)=False
(lst_cp[1].a is lst_cp[0])=True


In [82]:
from copy import deepcopy

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __repr__(self):
        return f'Point({self.x}, {self.y})'
    

class Line:
    def __init__(self, p1, p2):
        self.p1 = p1
        self.p2 = p2

    def __repr__(self):
        return f'Line({self.p1}, {self.p2})'
    

p1 = Point(0, 0)
p2 = Point(10, 10)
line1 = Line(p1, p2)
print(f'{line1=}')
line2 = deepcopy(line1)
print(f'{line2=}')
print(f'{(line1.p1 is line2.p1)=}')


line1=Line(Point(0, 0), Point(10, 10))
line2=Line(Point(0, 0), Point(10, 10))
(line1.p1 is line2.p1)=False


### Slicing

* Slicing is the process of extracting (as well as modifying for mutable sequences) a portion of a sequence, such as a list, string, or tuple.
* The slice notation uses the syntax `sequence[start:stop:step]`.
* Slicing relies on indexing, so it only works with sequence types.
* Additional use cases of slicing with mutable sequences
  * Modify elements in a specific range
  * Delete elements in a specific range
  * Insert elements at a specific position
  * Shallow copy using slicing
  * Reverse the list using slicing, original sequence unchanged

The `slice` type

* `slice(stop)`, `slice(start, stop, step=None)`
  * Return a `slice` object representing the set of indices specified by `range(start, stop, step)`. The `start` and `step` arguments default to `None`.
* The `slice()` function is used to create a `slice` object, which can be passed to the indexing operator `[]` to perform slicing on sequences like lists, strings, or tuples.
* The slice object can be useful because we can name slices and use symbols instead of a literal subsequently.

Range equivalence

* Any indices defined by a `slice` can also be defined using a `range`.
* Slices are defined independently of the sequence being sliced.
* The equivalent range is only calculated once the length of the sequence being sliced is known.
* The effective indices of a `slice` are actually dependent on the length of the sequence being sliced.
  * `seq[i:j:k]` (`k` > `0`)
    * if `i`, `j` > `len(seq)` -> `len(seq)`
    * if `i` < `0` -> `max(0, len(seq) + i)`
    * if `j` < `0` -> `max(0, len(seq) + j)`
    * `i` omitted or `None` -> `0`
    * `j` omitted or `None` -> `len(seq)`
  * `seq[i:j:k]` (`k` < `0`)
    * if `i`, `j` > `len(seq)` -> `len(seq) - 1`
    * if `i` < `0` -> `max(-1, len(seq) + i)`
    * if `j` < `0` -> `max(-1, len(seq) + j)`
    * `i` omitted or `None` -> `len(seq) - 1`
    * `j` omitted or `None` -> `-1`
* The `slice` object has `indices` method that returns a tuple of the equivalent range's start, stop, step for a given length of the sequence being sliced.

In [83]:
l = [0, 1, 2, 3, 4, 5, 6]
print(f'{l[0:1]=}')
print(f'{l[0:0]=}')
print(f'{l[::2]=}')
print(f'{l[::-1]=}')
print(f'{l[slice(1, 4)]=}')
print(f'{l[slice(4)]=}')
print(f'{l[None:4]=}')
print(f'{slice(0, 100, 2).indices(10)=}')
print(f'{l[1:-3:-1]=}') # returns empty list, as the effective slice is slice(1, 4, -1)
print(f'{slice(1, -3, -1).indices(len(l))=}')

l[0:1]=[0]
l[0:0]=[]
l[::2]=[0, 2, 4, 6]
l[::-1]=[6, 5, 4, 3, 2, 1, 0]
l[slice(1, 4)]=[1, 2, 3]
l[slice(4)]=[0, 1, 2, 3]
l[None:4]=[0, 1, 2, 3]
slice(0, 100, 2).indices(10)=(0, 10, 2)
l[1:-3:-1]=[]
slice(1, -3, -1).indices(len(l))=(1, 4, -1)


### Custom sequence types part 1

Creating our own sequence types

* At it's most basic, an immutable sequence type should support two things:
  * returning the length of the sequence (technically not needed)
  * given an index, returning the respective element
* If an object provides this functionality, then we should be able to:
  * retrieve elements by index using `[]` operator
  * iterate through the elements using Python's native looping mechanism, e.g. `for` loops, comprehensions
* Sequence types, at a minimum, implement the following methods:
  * `__len__`
  * `__getitem__`
    * at its most basic, takes in a single integer argument, the index
    * may also choose to handle a slice type argument
    * should return an element of the sequence based on the specified valid index (within [0, length - 1])
    * raise an `IndexError` exception if the index is out of bounds
    * may choose to support
      * negative indices: i < 0 -> i = length - 1
      * slicing: handle slice objects as argument to `__getitem__`

In [84]:
l = [1, 2, 3, 4]
print(f'{l.__getitem__(0)=}') # same as l[0]
print(f'{l.__getitem__(-1)=}')
print(f'{l.__getitem__(slice(1,3,1))=}')
print(f'{l.__len__()=}') # same as len(l)
print(f'{l.__getitem__(slice(1, 3, 1))}') # same as l[1:3:1]

l.__getitem__(0)=1
l.__getitem__(-1)=4
l.__getitem__(slice(1,3,1))=[2, 3]
l.__len__()=4
[2, 3]


In [85]:
l = list(range(10))

# simulate for loop
# for e in l:
#     print(e ** 2)

index = 0
while True:
    try:
        e = l.__getitem__(index)
    except IndexError:
        break
    print(e ** 2, end=' ')
    index += 1
print()

0 1 4 9 16 25 36 49 64 81 


In [46]:
from functools import lru_cache
# from types import NoneType
import math

# The Fibonacci numbers may be defined by the recurrence relation
#   F₀ = 0, F₁ = 1
# and
#   Fₙ = Fₙ₋₁ + Fₙ₋₂
# for n > 1

# @lru_cache(2 ** 10)
# def fib(n):
#     return 0 if n ==0 else 1 if n == 1 else fib(n - 1) + fib(n - 2)

# for n in range(10):
#     print(fib(n), end=' ')
# print()

class Fib:
    """
    Fibonacci sequence class.

    Attributes:
        _start (int): The starting index of the Fibonacci sequence.
        _stop (int): The stopping index of the Fibonacci sequence.
        _step (int): The step size between consecutive elements in the sequence.
    """

    @classmethod
    @lru_cache(2 ** 10)
    def fib(cls, n: int) -> int:
        """
        Calculate the Fibonacci number at index n.

        Args:
            n (int): The index of the Fibonacci number to calculate.

        Returns:
            int: The Fibonacci number at index n.
        """
        return 0 if n ==0 else 1 if n == 1 else cls.fib(n - 1) + cls.fib(n - 2)

    def __init__(self, start: int | None = None, stop: int | None = None, step: int = 1, /) -> None:
        """
        Initialize a Fibonacci sequence object.

        Args:
            start (int | None): The starting index of the sequence.
            stop (int | None): The stopping index of the sequence.
            step (int): The step size between consecutive elements in the sequence.
            All arguments are positional only.

        Raises:
            ValueError: If step is zero, or start and stop are negative.
            TypeError: If both start and stop are not provided or if values provided for start, stop, or stop are not integers.
        """
        # self._type_check(start, (NoneType, int))
        self._type_check(start, (type(None), int))
        self._type_check(stop, (type(None), int))
        self._type_check(step, int)

        if step == 0:
            raise ValueError('Fib() arg 3 must not be zero')
        
        if start is None and stop is None:
            raise TypeError('Fib expected at least 1 argument, got 0')
        
        if start is not None and stop is None:
            start, stop = 0, start
        
        if start < 0:
            raise ValueError('Fib() arg 1 must be no less than zero')
        
        if stop < -1:
            raise ValueError('Fib() arg 2 must be no less than -1')
        
        self._start = start
        self._stop = stop
        self._step = step

    @property
    def start(self) -> int:
        """Get the starting index of the Fibonacci sequence."""
        return self._start
    
    @property
    def stop(self) -> int:
        """Get the stopping index of the Fibonacci sequence."""
        return self._stop
    
    @property
    def step(self) -> int:
        """Get the step size between consecutive elements in the Fibonacci sequence."""
        return self._step

    def __repr__(self) -> str:
        """Return a string representation of the Fibonacci sequence object."""
        return f'Fib({self.start}, {self.stop}, {self.step})'
    
    def __len__(self) -> int:
        """
        Return the number of elements in the Fibonacci sequence.

        Returns:
            int: The number of elements in the sequence.
        """
        # return l if (l := math.ceil((self.stop - self.start) / self.step)) > 0 else 0 # condition first gets evaluated
        return max(0, math.ceil((self.stop - self.start) / self.step))
    
    def __getitem__(self, index: int | slice) -> int | list:
        """
        Get the Fibonacci number at the specified index in the sequence.

        Args:
            index (int | slice): The index of the Fibonacci number to retrieve, or a slice.

        Returns:
            int: The Fibonacci number at the specified index.
            list: The list of Fibonacci numbers of the slice.

        Raises:
            IndexError: If the index is out of range.
            TypeError: If the index is not an integer or slice.
        """
        if isinstance(index, int):
            if index < 0:
                index += len(self)
            actual_index = self.start + index * self.step
            if (self.step > 0 and (actual_index < self.start or actual_index >= self.stop) or
                self.step < 0 and (actual_index > self.start or actual_index <= self.stop)):
                raise IndexError('Fib object index out of range')
            return Fib.fib(actual_index)
        elif isinstance(index, slice):
            # start, stop, and step of slice and also be obtained by calling slice.indices(len)
            # start, stop, step = index.indices(len(self))
            # return list(Fib(start, stop, step)) # not right, the returned list only reflects the original Fib's length

            # method 1: list the original Fib and then slice the generated list, easy to implement but inefficient
            # start, stop, step = index.indices(len(self))
            # return list(self)[start:stop:step]
            # return list(self).__getitem__(index)

            # method 2: calculate the new range based on slice and the original Fib by using range()
            return [Fib.fib(i) for i in range(self.start, self.stop, self.step).__getitem__(index)]
        
            # method 3: calculate new start, stop, step directly from slice and the original Fib
            # if index.step is None or index.step > 0:
            #     start = max(self.start, index.start or self.start)
            #     stop = min(self.stop, index.stop or self.stop)
            # else:
            #     start = min(self.stop - 1, index.stop or self.stop - 1)
            #     stop = max(self.start - 1, index.start or self.start - 1)
            # return list(Fib(start, stop, index.step or 1)) # this is buggy

        else:
            raise TypeError(f'Fib indices must be integers or slices, not {type(index).__name__}')
        
    def _type_check(self, arg, valid_types: type[None | int]) -> None:
        """
        Check if the argument has the specified type.

        Args:
            arg: The argument to check.
            valid_types: The allowed types for the argument.

        Raises:
            TypeError: If the argument has an invalid type.
        """
        if not isinstance(arg, valid_types):
            raise TypeError(f'{type(arg).__name__!r} object cannot be interpreted as an integer')

# erroneous Fib class instances
# Fib()
# Fib(2.2)
# Fib(stop=3)
# Fib(-3)
# Fib(1, 2, 0)
# Fib(1, 2, None)
# print(len(Fib(10, -3, -2)))
        
print(f'{Fib(1)=}')
print(f'{Fib(1,2)=}')
print(f'{Fib(3).start=}, {Fib(3).stop=}, {Fib(3).step=}')
print(f'{len(Fib(3))=}')
print(f'{len(Fib(10, 3, -2))=}')
print(f'{len(Fib(10, 13, -2))=}')

fib = Fib(10)
print(f'\n{fib=}')
# print(fib[-1]) # index out of range
print(f'{fib[0]=}')
print(f"{fib[1]=}")
print(f"{fib[2]=}")
# print(fib[3]) # index out of range
print(f"{fib[-1]=}")

print(f'\n{Fib.fib(0)=}')
print(f'{Fib.fib(1)=}')
print(f'{Fib.fib(2)=}')
print(f'{Fib.fib(3)=}')
print(f'{Fib.fib(4)=}')
print(f'{Fib.fib(10)=}')

print('\nFib(10) in for loop:', end=' ')
for f in Fib(10):
    print(f, end=' ')
print()

print('Fib(10) in list():', end=' ')
print(list(Fib(10)))
print(f'{list(Fib(10)[:])=}') # Fib(10)[:] is the same as Fib(10).__getitem__(slice(0, 10))
print(f'{list(Fib(10)[::-1])=}')
print(f'{list(Fib(10)[::2])=}')
print(f'{list(Fib(10)[::-2])=}')
print(f'{list(Fib(10)[1:8])=}')
print(f'{list(Fib(10)[1:8:2])=}')
print(f'{list(Fib(10)[1:-1])=}')
print(f'{list(Fib(0, 10, 2))=}')
print(f'{list(Fib(0, 10, 2)[1:5:2])=}')
print(f'{list(Fib(0, 10, 2)[-1:5:2])=}')
print(f'{list(Fib(0, 10, 2)[1:-5:2])=}')
print(f'{list(Fib(0, 10, 2)[1:20:2])=}')
print()

print('Fib(0, 10, 2) in for loop:', end=' ')
for f in Fib(0, 10, 2):
    print(f, end=' ')
print()

print('Fib(0, 10, 2) in list():', end=' ')
print(list(Fib(0, 10, 2)))
print()

print(f'{len(Fib(10, 0, -2))=}')
print(f'{list(Fib(10, 0, -2))=}')

Fib(1)=Fib(0, 1, 1)
Fib(1,2)=Fib(1, 2, 1)
Fib(3).start=0, Fib(3).stop=3, Fib(3).step=1
len(Fib(3))=3
len(Fib(10, 3, -2))=4
len(Fib(10, 13, -2))=0

fib=Fib(0, 10, 1)
fib[0]=0
fib[1]=1
fib[2]=1
fib[-1]=34

Fib.fib(0)=0
Fib.fib(1)=1
Fib.fib(2)=1
Fib.fib(3)=2
Fib.fib(4)=3
Fib.fib(10)=55

Fib(10) in for loop: 0 1 1 2 3 5 8 13 21 34 
Fib(10) in list(): [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
list(Fib(10)[:])=[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
list(Fib(10)[::-1])=[34, 21, 13, 8, 5, 3, 2, 1, 1, 0]
list(Fib(10)[::2])=[0, 1, 3, 8, 21]
list(Fib(10)[::-2])=[34, 13, 5, 2, 1]
list(Fib(10)[1:8])=[1, 1, 2, 3, 5, 8, 13]
list(Fib(10)[1:8:2])=[1, 2, 5, 13]
list(Fib(10)[1:-1])=[1, 1, 2, 3, 5, 8, 13, 21]
list(Fib(0, 10, 2))=[0, 1, 3, 8, 21]
list(Fib(0, 10, 2)[1:5:2])=[1, 8]
list(Fib(0, 10, 2)[-1:5:2])=[21]
list(Fib(0, 10, 2)[1:-5:2])=[]
list(Fib(0, 10, 2)[1:20:2])=[1, 8]

Fib(0, 10, 2) in for loop: 0 1 3 8 21 
Fib(0, 10, 2) in list(): [0, 1, 3, 8, 21]

len(Fib(10, 0, -2))=5
list(Fib(10, 0, -2))=[55, 21, 8, 3, 1]


### In-place concatenation and repetition

* Concatenation (`+`)
  * Combining two sequences to create a new one.
    * both sequence must be of the same type
  * Always creating a new sequence, no matter the operands are mutable or not.
* In-place concatenation (`+=`)
  * If the left operand is immutable, `+=` does not perform in-place concatenation.
    * instead it performs concatenation, creating a new sequence
    * both sequence must be of the same type
  * If the left operand is mutable, the left operand is extended with all the elements of the right operand.
    * the right operand can be any iterable

* Repetition (`*`)
  * Creating a new sequence by repeating the elements of an existing one.
  * The repetition number should be non-negative, creating an empty sequence if the number equals zero.
* In-place repetition (`*=`)
  * For immutable sequences, performing concatenation instead.
  * For mutable sequences, modifying an existing sequence by repeating its elements in place.

In [3]:
l1 = [1, 2, 3]
l2 = [4, 5, 6]
l3 = l1 + l2
print(f'{id(l1)=}')
print(f'{id(l2)=}')
print(f'{id(l3)=}')

l1 += l2
print('l1 += l2', f'{l1=}')
print(f'{id(l1)=}')

t = (7, 8)
l1 += t
print('l1 += t', f'{l1=}')
print(f'{id(l1)=}')
print(f'{id(l1*2)=}')

id(l1)=1750337509504
id(l2)=1750337511488
id(l3)=1750337509632
l1 += l2 l1=[1, 2, 3, 4, 5, 6]
id(l1)=1750337509504
l1 += t l1=[1, 2, 3, 4, 5, 6, 7, 8]
id(l1)=1750337509504
id(l1*2)=1750337510912


### Assignments in mutable sequences

Assigning values via indexes, slices and extended slices

* Mutable sequences support assignment via a specific index as well as slices.
* The value being assigned via slicing and extended slicing (with step / stride not equal to 1) must be an iterable.
* For regular slices (non-extended), the slice and the iterable need not to be of the same length.
* With extended slicing, the slice and the iterable must have the same length.
* If the step / stride of the slice is negative, the assignment will be applied in reverse order.

Deleting a slice

* Deletion is just a special case of replacement.
* We simply assign an empty iterable, only working on non-extended slicing.

Insertion using slices

* We can also insert elements using slice assignment.
* The trick is that the slice must be empty, otherwise it would just replace the elements in the slice.
* It only works with non-extended slicing.
* If the value being inserted is iterable, its elements will be inserted separately.

In [8]:
l = [1, 2, 3, 4, 5, 6]
l[2:5] = ('a', 'b', 'c') # slice assignment
print(l)
l[5:6] = ('x', 'y') # slice assignment of different length
print(l)
l[::2] = ('1', '2', '3', '4') # extended slice assignment
print(l)
l[::-2] = '4567' # reverse extended slice assignment
print(l)

l[:3] = () # slice deletion
print(l)
l[0:0] = 'first' # slice insertion, iterable
print(l)
l[0:0] = 'F'
print(l)

[1, 2, 'a', 'b', 'c', 6]
[1, 2, 'a', 'b', 'c', 'x', 'y']
['1', 2, '2', 'b', '3', 'x', '4']
['7', 2, '6', 'b', '5', 'x', '4']
['b', '5', 'x', '4']
['f', 'i', 'r', 's', 't', 'b', '5', 'x', '4']
['F', 'f', 'i', 'r', 's', 't', 'b', '5', 'x', '4']


### Custom sequence types part 2

Concatenation and in-place concatenation

* When dealing with `+` and `+=` operators in the context of sequences, it is essentially an overloaded definition of these operators.
* We can overload them in our custom classes by using the methods `__add__` and `__iadd__`.
* In general, we expect
  * `obj1 + obj2`
    * `obj1` and `obj2` are of the same type
    * result is a new object, also of the same type
  * `obj1 += obj2`
    * `obj2` is any iterable
    * result is the mutated `obj1`

Repetition and in-place repetition

* We can overload the definition of `*` and `*=` in our custom classes by using the methods `__mul__` and `__imul__`.
* In general, we expect
  * `obj * n`
    * `n` is a non-negative integer
    * result is a new object of the same type as `obj`
  * `obj *= n`
    * `n` is a non-negative integer
    * result is the mutated `obj`

Assignment

* Just like accessing elements in a custom sequence type by implementing `__getitem__`, we can handle assignment by implementing `__setitem__`.
* There are a few restrictions with assigning to slices:
  * for any slice, we could only assign an iterable
  * for extended slices, both the slice and the iterable must have the same length

Additional sequence functions and operators

* `__contains__`: `in`
* `__delitem__`: `del`
* `__rmul__`: `n * seq`
  * The way Python works when it encounters an expression like `a * b`
    * it first tries `a.__mul__(b)`
    * if `a` does not support the operation (`TypeError`), it then tries `b.__rmul__(a)`

Implementing `append`, `extend`, `pop`

* Actually there's nothing special going here.
* If we want to, we can just implement methods of the same name.

In [16]:
class MyClass:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return f'MyClass({self.name})'
    
    def __add__(self, other):
        return MyClass(self.name + other.name)
    
    def __iadd__(self, other):
        self.name += other.name
        return self # must return self, otherwise lose reference

    def __mul__(self, n):
        return MyClass(self.name * n)
    
    def __imul__(self, n):
        self.name *= n
        return self
    
    def __rmul__(self, n):
        return self.__mul__(n)
    
    def __contains__(self, value):
        return value in self.name


mc1 = MyClass('Tony')
mc2 = MyClass('Bruce')

print(f'{(mc1 + mc2)=}')
mc1 += mc2
print(f'{mc1=}')

print(f'{(mc1 * 2)=}')
mc1 *= 2
print(f'{mc1=}')

print(f'{(2 * mc1)=}')

print(f'{("Ton" in mc1)=}')

(mc1 + mc2)=MyClass(TonyBruce)
mc1=MyClass(TonyBruce)
(mc1 * 2)=MyClass(TonyBruceTonyBruce)
mc1=MyClass(TonyBruceTonyBruce)
(2 * mc1)=MyClass(TonyBruceTonyBruceTonyBruceTonyBruce)
("Ton" in mc1)=True


In [66]:
import numbers

class Point:
    def __init__(self, x, y):
        if isinstance(x, numbers.Real) and isinstance(y, numbers.Real):
            self._pt = (x, y)
        else:
            raise TypeError('Point coordinates must be real numbers')
        
    def __repr__(self):
        return f'Point(x={self._pt[0]}, y={self._pt[1]})'
    
    def __len__(self):
        return len(self._pt)
    
    def __getitem__(self, index): # this also makes unpacking mechanism working
        return self._pt[index]
    

class Polygon:
    def __init__(self, *pts):
        self._pts = [Point(*pt) for pt in pts]

    def __repr__(self):
        pts_string = ', '.join(map(str, self._pts))
        return f'Polygon({pts_string})'
    
    def __len__(self):
        return len(self._pts)

    def __getitem__(self, key):
        return self._pts[key]
    
    def __setitem__(self, key, value):
        try:
            rhs = [Point(*pt) for pt in value]
            is_single_point = False
        except TypeError:
            try:
                rhs = Point(*value)
                is_single_point = True
            except TypeError:
                raise TypeError('Invalid point or iterable of points')
        if isinstance(key, int) and is_single_point or \
            isinstance(key, slice) and not is_single_point:
            self._pts[key] = rhs
        else:
            raise TypeError('Incompatible index / slice assignment')
            
    
    def __add__(self, other):
        if isinstance(other, Polygon):
            return Polygon(*self._pts, *other._pts)
        else:
            raise TypeError('Can only concatenate with another polygon')
        
    def __iadd__(self, other):
        self.extend(other)
        return self
    
    def append(self, pt):
        self._pts.append(Point(*pt))

    def insert(self, i, pt):
        self._pts.insert(i, Point(*pt))

    def extend(self, pts):
        if isinstance(pts, Polygon):
            self._pts += pts._pts
        else:
            try:
                pts_pts = [Point(*pt) for pt in pts]
                self._pts += pts_pts
            except TypeError:
                print('The right hand side operand must an iterable of real number points')

    def __delitem__(self, key):
        del self._pts[key]

    def pop(self, key=-1):
        return self._pts.pop(key)
    
    def clear(self):
        self._pts.clear()

p = Point(1, 2)
print(p)
x, y = p # unpacking works
print(x, y)

poly1 = Polygon((1, 2), (3, 4))
print(poly1)

for p in poly1:
    print(p)

poly2 = Polygon((0, 0), (2, 3))
print(f'{(poly1 + poly2)=}')
poly1 += poly2
print(id(poly1), poly1)
poly1 += (1, 2), (3, 4)
print(id(poly1), poly1)
print(poly1.extend([(5, 5), (6, 6)]))
poly2.append((3, 7))
print(poly2)
poly1[4:] = []
poly1[1:2] = [(1, 2), (3, 4)]
poly1[0:0] = [(10, 10), (11, 11)]
print(poly1)
print(poly1.pop())
print(poly1)
print(poly1.pop(2))
print(poly1)
del poly1[0]
print(poly1)
poly1.clear()
print(poly1)

Point(x=1, y=2)
1 2
Polygon(Point(x=1, y=2), Point(x=3, y=4))
Point(x=1, y=2)
Point(x=3, y=4)
(poly1 + poly2)=Polygon(Point(x=1, y=2), Point(x=3, y=4), Point(x=0, y=0), Point(x=2, y=3))
1750339543520 Polygon(Point(x=1, y=2), Point(x=3, y=4), Point(x=0, y=0), Point(x=2, y=3))
1750339543520 Polygon(Point(x=1, y=2), Point(x=3, y=4), Point(x=0, y=0), Point(x=2, y=3), Point(x=1, y=2), Point(x=3, y=4))
None
Polygon(Point(x=0, y=0), Point(x=2, y=3), Point(x=3, y=7))
Polygon(Point(x=10, y=10), Point(x=11, y=11), Point(x=1, y=2), Point(x=1, y=2), Point(x=3, y=4), Point(x=0, y=0), Point(x=2, y=3))
Point(x=2, y=3)
Polygon(Point(x=10, y=10), Point(x=11, y=11), Point(x=1, y=2), Point(x=1, y=2), Point(x=3, y=4), Point(x=0, y=0))
Point(x=1, y=2)
Polygon(Point(x=10, y=10), Point(x=11, y=11), Point(x=1, y=2), Point(x=3, y=4), Point(x=0, y=0))
Polygon(Point(x=11, y=11), Point(x=1, y=2), Point(x=3, y=4), Point(x=0, y=0))
Polygon()


### Sorting sequences

Sorting and sort keys

* Python provides a `sorted` function that will sort a given iterable.
  * `sorted(iterable, /, *, key=None, reverse=False)`
  * Return a new list containing all items from the iterable in ascending order.
  * A custom `key` function can be supplied to customize the sort order.
  * The `reverse` flag can be set to request the result in descending order.
  * If provided, `key` must be a function that, for any given element in the sequence being sorted, returns the sort key.
    * The sort key does not have to be numerical.
    * It just needs to be values that are themselves pairwise comparable.
  * If `key` is not provided, Python will sort based on the natural ordering of the elements, i.e. the elements must be pairwise comparable, otherwise you will get an exception.
    * For the natural sort of elements, we can always think of the keys as the elements.
  * The function uses a sort algorithm called TimSort.
  * It is a stable sort.

Stable sort

* A stable sort is one that maintains the relative order of items that have equal keys (or values if using natural ordering).

In-place sorting

* If the iterable is mutable, in-place sorting is possible.
* Python's list objects support in-place sorting.
  * `sort(*, key=None, reverse=False)`
  * Sort the list in ascending order and return `None`.
  * The sort is in-place (i.e. the list itself is modified) and stable (i.e. the order of two equal elements is maintained).
  * If a `key` function is given, apply it once to each list item and sort them, ascending or descending, according to their function values.
  * The `reverse` flag can be set to sort in descending order.

In [70]:
t = 10, 3, 5, 1, 2, 11, 0, 7, 8
print(sorted(t))
s = {10, 3, 5, 1, 2, 11, 0, 7, 8}
print(sorted(s))
d = {'b': 100, 'c': 50, 'a': 10}
print(sorted(d))
print(sorted(d, key=lambda k: d[k]))

[0, 1, 2, 3, 5, 7, 8, 10, 11]
[0, 1, 2, 3, 5, 7, 8, 10, 11]
['a', 'b', 'c']
['a', 'c', 'b']


In [3]:
from timeit import timeit
from random import randint

n = 10_000_000
l = [randint(1, 1000) for _ in range(n)]
print(l[:10])

print(f'{timeit(stmt='sorted(l)', globals=globals(), number=1)=}')
print(f'{timeit(stmt='l.sort()', globals=globals(), number=1)=}')
print(l[:10])

[896, 379, 891, 557, 774, 996, 210, 13, 814, 629]
timeit(stmt='sorted(l)', globals=globals(), number=1)=2.54854750004597
timeit(stmt='l.sort()', globals=globals(), number=1)=2.131396299926564
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


In [5]:
class MyClass:
    def __init__(self, name, value):
        self.name = name
        self.value = value

    def __repr__(self):
        return f'MyClass({self.name}, {self.value})'
    
    def __lt__(self, other):
        return self.value < other.value
    
mc1 = MyClass('c1', 10)
mc2 = MyClass('c2', 60)
mc3 = MyClass('c4', 30)
mc4 = MyClass('c3', 20)

mcs = [mc1, mc2, mc3, mc4]

print(sorted(mcs))
print(sorted(mcs, key=lambda mc: mc.name))

[MyClass(c1, 10), MyClass(c3, 20), MyClass(c4, 30), MyClass(c2, 60)]
[MyClass(c1, 10), MyClass(c2, 60), MyClass(c3, 20), MyClass(c4, 30)]


### List comprehensions

* List comprehensions provide a concise way to create lists.
* Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
* A list comprehension consists of brackets containing an expression followed by a `for` clause, then zero or more for or `if` clauses.
* The result will be a new list resulting from evaluating the expression in the context of the `for` and `if` clauses which follow it.
* If the expression is a tuple, it must be parenthesized.
* List comprehensions can contain complex expressions and nested functions.
* The initial expression in a list comprehension can be any arbitrary expression, including another list comprehension.

Comprehension internals

* Comprehensions have their own local scope, just like a function.
* We should think of a list comprehension as being wrapped in a function that is created by Python that will return a new list when executed.
  * When a list comprehension is compiled, Python creates a temporary function that will be used to evaluate the comprehension.
  * When the comprehension is executed, Python will executed the temporary function.

Comprehension scopes

* As comprehensions are basically functions, they have their own local scope.
* They can also access global variables.
* They can access nonlocal variables, so they can be closures too.

Nested comprehensions

* Comprehensions can be nested within each other.
* Since they are functions, a nested comprehension can access nonlocal variables from the enclosing comprehension.

Nested loops in comprehensions

* We can have nested loops in comprehensions.
* This is not the same as nested comprehensions.
* The order in which the `for` loops are specified in the comprehension corresponds to the order of the nested loops.
* Nested loops in comprehension can also contain `if` statements.
* The order of `for` and `if` statements does matter, just like a normal set of `for` loops and `if` statements.

In [15]:
print(f'{[(x, x ** 2) for x in range(5)]=}') # (x, x ** 2) must be parenthesized

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]
print(f'{[[row[i] for row in matrix] for i in range(len(matrix[0]))]=}')
print(f'{list(map(list, zip(*matrix)))=}') # this way is more readable

# nested list comprehension
print(f'{[[i * j for j in range(1, 11)] for i in range(1, 11)]=}')


[(x, x ** 2) for x in range(5)]=[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]
[[row[i] for row in matrix] for i in range(len(matrix[0]))]=[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
list(map(list, zip(*matrix)))=[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
[[i * j for j in range(1, 11)] for i in range(1, 11)]=[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [2, 4, 6, 8, 10, 12, 14, 16, 18, 20], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30], [4, 8, 12, 16, 20, 24, 28, 32, 36, 40], [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], [6, 12, 18, 24, 30, 36, 42, 48, 54, 60], [7, 14, 21, 28, 35, 42, 49, 56, 63, 70], [8, 16, 24, 32, 40, 48, 56, 64, 72, 80], [9, 18, 27, 36, 45, 54, 63, 72, 81, 90], [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]


In [14]:
from dis import dis

code_string = 'print([i ** 2 for i in (1, 2, 3)])'
compiled_code = compile(source=code_string, filename='string', mode='eval')
exec(compiled_code)
print(f'{compiled_code=}\n')
dis(compiled_code)

[1, 4, 9]
compiled_code=<code object <module> at 0x0000021B53801730, file "string", line 1>

  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (print)
              6 LOAD_CONST               0 ((1, 2, 3))
              8 GET_ITER
             10 LOAD_FAST_AND_CLEAR      0 (i)
             12 SWAP                     2
             14 BUILD_LIST               0
             16 SWAP                     2
        >>   18 FOR_ITER                 7 (to 36)
             22 STORE_FAST               0 (i)
             24 LOAD_FAST                0 (i)
             26 LOAD_CONST               1 (2)
             28 BINARY_OP                8 (**)
             32 LIST_APPEND              2
             34 JUMP_BACKWARD            9 (to 18)
        >>   36 END_FOR
             38 SWAP                     2
             40 STORE_FAST               0 (i)
             42 CALL                     1
             50 RETURN_VALUE
        >>

In [24]:
from math import comb, factorial

def combo(n, k): # the same as math.comb
    return factorial(n) // (factorial(k) * factorial(n - k))

# pascal's triangle
size = 10
pascals_triangle = [ [comb(n, k) for k in range(n + 1)] for n in range(size + 1)]
print(pascals_triangle)

l1 = [1, 2, 3, 4, 5, 6, 7, 8, 9]
l2 = ['a', 'b', 'c', 'd']
print(list(zip(l1, l2)))

print([(item1, item2)
       for index1, item1 in enumerate(l1)
       for index2, item2 in enumerate(l2)
       if index1 == index2])

print([(l1[i], l2[i]) for i in range(min(len(l1), len(l2)))])

number = 0 # number in global scope
l = [number for number in range(10)] # here number is in effect in a function local scope
print(number) # number in global scope not changed

[[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1], [1, 5, 10, 10, 5, 1], [1, 6, 15, 20, 15, 6, 1], [1, 7, 21, 35, 35, 21, 7, 1], [1, 8, 28, 56, 70, 56, 28, 8, 1], [1, 9, 36, 84, 126, 126, 84, 36, 9, 1], [1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1]]
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
0


In [32]:
funcs = []
for i in range(4):
    funcs.append(lambda x: x ** i) # i is bound to the global variable

# all get the same value, because they all referencing the same global variable i
print(funcs[0](10)) # 1000
print(funcs[1](10)) # 1000
print(funcs[2](10)) # 1000
print(funcs[3](10)) # 1000
print(f'{i=}')
i = 4
print(funcs[3](10)) # 10000

funcs = []
for i in range(4):
    j = i
    funcs.append(lambda x: x ** j)

# same result
print(funcs[0](10)) # 1000
print(funcs[1](10)) # 1000
print(funcs[2](10)) # 1000
print(funcs[3](10)) # 1000

# same as above, referencing the same variable i
funcs = [lambda x: x ** i for i in range(4)]
print(funcs[3](10)) # 1000

funcs = []
for i in range(4):
    # j = i
    funcs.append(lambda x, j=i: x ** j) # here j is not a free variable, it is local, and has a default value which is evaluated at compile time
print(funcs[0](10)) # 0
print(funcs[1](10)) # 1
print(funcs[2](10)) # 100
print(funcs[3](10)) # 1000

funcs = [lambda x, j=i: x ** j for i in range(4)]
print(funcs[0](10)) # 0
print(funcs[1](10)) # 1
print(funcs[2](10)) # 100
print(funcs[3](10)) # 1000

1000
1000
1000
1000
i=3
10000
1000
1000
1000
1000
1000
1
10
100
1000
1
10
100
1000


## Project 1

Background information

* A regular strictly convex polygon is a polygon that has the following characteristics:
  * all interior angles are less than 180°
  * all sides have equal length
* For a regular strictly convex polygon with
  * n edges (= n vertices)
  * R circumradius (a radius of the circle inside which the polygon can be inscribed)
  * interior angle = (n - 2) * (180 / n)
  * edge length s = 2 * R * sin(π / n)
  * apothem (the distance from the center of a regular polygon to the midpoint of its sides) a = R * cos(π / n)
  * area = 1 / 2 * n * s * a

Goal 1

* Create a Polygon class
  * Initializer
    * number of edges / vertices
    * circumradius
  * Properties
    * edges
    * vertices
    * interior angle
    * apothem
    * area
    * perimeter
  * Functionality
    * a proper representation (`__repr__`)
    * implements equality (`==`) based on edges and circumradius (`__eq__`)
    * implements `>` based on number of vertices only (`__gt__`)

Goal 2

* Implement a Polygon sequence type
  * Initializer
    * number of vertices for largest polygon in the sequence
    * common circumradius for all polygons
  * Properties
    * max efficient polygon: returns the Polygon with the highest area/perimeter ratio
  * Functionality
    * functions as sequence type (`__getitem__`)
    * supports the `len()` function (`__len__`)

In [4]:
import math
from numbers import Real
from typing import Self
import sys

class Polygon:
    def __init__(self, edges: int, circumradius: Real) -> None:
        if isinstance(edges, int) and edges > 2:
            self._edges = edges
        else:
            raise ValueError('Edges must be an integer number greater than 2')
        if isinstance(circumradius, Real) and circumradius > 0:
            self._circumradius = circumradius
        else:
            raise ValueError('Circumradius must be a number greater than 0')
        
        self._vertices = self._edges
        self._interior_angle = (self._edges - 2) * (180 / self._edges)
        self._apothem = self._circumradius * math.cos(math.pi / self._edges)
        self._edge_length = 2 * self._circumradius * math.sin(math.pi / self._edges)
        self._area = 0.5 * self._edges * self._edge_length * self._apothem
        self._perimeter = self._edges * self._edge_length

    @property
    def edges(self):
        return self._edges
    
    @property
    def vertices(self):
        return self._vertices
    
    @property
    def interior_angle(self):
        return self._interior_angle
    
    @property
    def apothem(self):
        return self._apothem
    
    @property
    def edge_length(self):
        return self._edge_length
    
    @property
    def area(self):
        return self._area
    
    @property
    def perimeter(self):
        return self._perimeter
    
    def __repr__(self):
        return f'Polygon(edges={self._edges}, circumradius={self._circumradius})'
    
    def __eq__(self, other: Self):
        # if isinstance(other, Polygon):
        if isinstance(other, self.__class__):
            return self._edges == other._edges and self._circumradius == other._circumradius
        else:
            # raise TypeError(f"'==' not supported between instances of 'Polygon' and {type(other).__name__!r}")
            return NotImplemented
        
    def __gt__(self, other: Self):
        # if isinstance(other, Polygon):
        if isinstance(other, self.__class__):
            return self._vertices > other._vertices
        else:
            # raise TypeError(f"'>' not supported between instances of 'Polygon' and {type(other).__name__!r}")
            return NotImplemented
        

class PolygonSequence:
    def __init__(self, max_vertices: int, common_circumradius: Real) -> None:
        self._max_efficient_polygon = Polygon(max_vertices, common_circumradius)
        self._max_vertices = max_vertices
        self._common_circumradius = common_circumradius
    
    @property
    def max_efficient_polygon(self):
        return self._max_efficient_polygon
    
    def __repr__(self) -> str:
        return f'PolygonSequence(max_vertices={self._max_vertices}, common_circumradius={self._common_circumradius})'
    
    def __len__(self):
        return self._max_vertices - 2
    
    def __getitem__(self, index: int) -> Self:
        actual_vertices = index + 3
        if actual_vertices > self._max_vertices:
            raise IndexError('PolygonSequence object index out of range')
        else:
            return Polygon(actual_vertices, self._common_circumradius)
            

if __name__ == '__main__':
    def test_polygon():

        try:
            Polygon(2, 10)
            assert False, 'Creating a polygon with 2 sides'
        except ValueError:
            pass

        try:
            Polygon(3, -10)
            assert False, 'Creating a polygon with negative edge length'
        except ValueError:
            pass

        edges = 3
        circumradius = 10
        poly1 = Polygon(edges, circumradius)
        assert str(poly1) == 'Polygon(edges=3, circumradius=10)', f'actual: {str(poly1)}'
        assert poly1.edges == 3, (f'actual: {poly1.edges}, ' # separate f-string on multiple lines
                                  f'expected: {edges}')
        assert poly1.vertices == 3, f'actual: {poly1.vertices}, expected: {edges}'
        assert poly1.interior_angle == 60, f'actual: {poly1.interior_angle}, expected: 60'
        assert math.isclose(poly1.apothem, 5.0, rel_tol=sys.float_info.epsilon,
                            abs_tol=sys.float_info.epsilon), (f'actual {poly1.apothem}', f'expected: 5.0')
        assert poly1.edge_length == 2 * circumradius * math.sin(math.pi / edges), f'actual: {poly1.edge_length}'
        assert poly1.perimeter == 2 * circumradius * math.sin(math.pi / edges) * edges, f'actual: {poly1.perimeter}'
        assert poly1.area == circumradius ** 2 * math.sin(math.pi / edges) * \
            math.cos(math.pi / edges) * edges, f'actual: {poly1.area}'
        poly2 = Polygon(3, 10)
        assert poly1 == poly2
        poly3 = Polygon(4, 5)
        assert poly1 < poly3
        assert poly1 != poly3
        assert not (poly1 == '3') # the result is False
        # poly1 > 3

    test_polygon()

    polies = PolygonSequence(10, 10)
    print(polies)
    print(len(polies))
    print(polies.max_efficient_polygon)
    for poly1 in polies:
        print(poly1)

    try:
        PolygonSequence(2, 10)
        assert False, 'Creating polygon sequece with max edges less than 3'
    except ValueError:
        pass

print(PolygonSequence(1000, 1).max_efficient_polygon.area)

# for p in polies:
#     print(p.area) # the area is getting close to pi when edges getting bigger

PolygonSequence(max_vertices=10, common_circumradius=10)
8
Polygon(edges=10, circumradius=10)
Polygon(edges=3, circumradius=10)
Polygon(edges=4, circumradius=10)
Polygon(edges=5, circumradius=10)
Polygon(edges=6, circumradius=10)
Polygon(edges=7, circumradius=10)
Polygon(edges=8, circumradius=10)
Polygon(edges=9, circumradius=10)
Polygon(edges=10, circumradius=10)
3.141571982779476


## Iterables and iterators

### Iterating collections

* Iterating sequences
  * `__getitem__`
  * assumes indexing started at `0`
* But iteration can be more general than based on sequential indexing, all we need is
  * a bucket of items, i.e. collection or container
  * get next item
    * no concept of ordering needed
    * just a way to get items out of the container one by one
    * a specific order in which this happens is not required, but can be
  * Sets are unordered collections of items
    * sets are not indexable
    * sets are iterable

The concept of next

* For general iteration, all we need is the concept of "get the next item" in the collection.
* If a collection object implements a get next item method, we can get elements out of the collection one by one.
* Use `StopIteration` exception to indicate the end of the iteration.
* It has some drawbacks:
  * cannot use a for loop
  * once we start using next, there's no going back
  * once `StopIteration` reached, we're done with the object

In [10]:
# the problem of this Squares class is its instance cannot be
# iterated using for loops, comprehension, etc
# once the iteration starts, we have no way to start over again,
# and once all the items have been iterated (using next) the object
# becomes useless for iteration
class Squares:
    def __init__(self, length) -> None:
        self._length = length
        self.i = 0

    def __len__(self):
        return self._length

    def __next__(self):
        if self.i >= self._length:
            raise StopIteration
        else:
            result = self.i ** 2
            self.i += 1
            return result


squares = Squares(5)
print(len(squares))

while True:
    try:
        print(next(squares))
    except StopIteration:
        break

5
0
1
4
9
16


The iterator protocol

* A protocol is simply a fancy way of saying that our class is going to implement certain functionality that Python can count on.
* A Python object is considered an iterator when it implements two special methods collectively known as the iterator protocol. These two methods make Python iterators work.
  * `__iter__()`
    * called to initialize the iterator
    * it must return an iterator object, typically returns `self`
  * `__next__()`
    * called to iterate over the iterator
    * it must return the next value in the data stream
    * raise `StopIteration` exception when all elements have been handed out
* If an object is an iterator, we can use it with `for` loops, comprehension, etc.

In [15]:
from random import randint

class RandomInteger:
    def __init__(self, length, *, range_min=0, range_max=10):
        self._length = length
        self._range_min = range_min
        self._range_max = range_max
        self._request_times = 0

    @property
    def length(self):
        return self._length
    
    @property
    def range_min(self):
        return self._range_min
    
    @property
    def range_max(self):
        return self._range_max
    
    def __len__(self):
        return self._length
    
    def __iter__(self):
        return self

    def __next__(self):
        if self._request_times >= self._length:
            raise StopIteration
        else:
            self._request_times += 1
            return randint(self._range_min, self._range_max)
        
random_ints = RandomInteger(10)
print(random_ints.range_min)
print(random_ints.range_max)
print(random_ints.length)
print(len(random_ints))

while True:
    try:
        print(next(random_ints))
    except StopIteration:
        break
print('while loop finishes here')

# the iterator random_ints has been exhausted, so the for loop won't print anything
for i in random_ints:
    print(i)

0
10
10
10
2
5
8
2
4
9
1
7
8
2
while loop finishes here


### Iterables vs iterators

* Iterable
  * An object capable of *returning its members* one at a time.
  * Examples of iterables include all sequence types (such as `list`, `str`, and `tuple`) and some non-sequence types like `dict`, file objects, and objects of any classes you define with an `__iter__()` method or with a` __getitem__()` method that implements sequence semantics.
  * Iterables can be used in a `for` loop and in many other places where a sequence is needed (`zip()`, `map()`, ...). When an iterable object is passed as an argument to the built-in function `iter()`, it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call `iter()` or deal with iterator objects yourself. The `for` statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop.
  * An iterable is a Python object that implements the iterable protocol.
    * The iterable protocol requires that the object implements a single method `__iter__`.
    * `__iter__` returns a new instance of the iterator object used to iterate over the iterable.
  * Iterables never become exhausted, because they always return a new iterator for iteration.
* Iterator
  * An object representing *a stream of data*.
  * Repeated calls to the iterator’s `__next__()` method (or passing it to the built-in function `next()`) return successive items in the stream. When no more data are available a `StopIteration` exception is raised instead. At this point, the iterator object is exhausted and any further calls to its `__next__()` method just raise `StopIteration` again.
  * Iterators are required to have an `__iter__()` method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes.
  * A container object (such as a list) produces a fresh new iterator each time you pass it to the `iter()` function or use it in a `for` loop. Attempting this with an iterator will just return the same *exhausted* iterator object used in the previous iteration pass, making it appear like an empty container.
  * Iterators are iterables, but they are iterables become exhausted.

Separating the collection from the iterator

* Maintaining the data of the collection should be one object.
* Iterating over the data should be a separate object, i.e. the throw-away iterator object.
* The collection is iterable, but the iterator is responsible for iterating over the collection.
* The iterable is created once.
* The iterator is created every time we need to start a fresh iteration.

Iterating over an iterable

* Python has a built-in function `iter()` that calls the `__iter__` method.
* The first thing Python does when we try to iterable over an iterable
  * it calls `iter()` to obtain an iterator
  * then it starts iterating (using `next`, `StopIteration`, etc)

In [18]:
class Cities:
    def __init__(self):
        self._cities = ['Sydney', 'Shanghai', 'New York', 'Tokyo', 'Gold Coast', 'Vancouver']
    
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self): # with __iter__, Cities itself becomes an iterable
        return CityIterator(self)
    

class CityIterator: # this class can also be nested in the class above
    def __init__(self, cities):
        self._cities = cities
        self._index = 0

    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._cities):
            raise StopIteration
        else:
            item = self._cities._cities[self._index]
            self._index += 1
            return item

cities = Cities()
city_iterator = CityIterator(cities)

for city in city_iterator:
    print(city)

for city in cities:
    print(city)

Sydney
Shanghai
New York
Tokyo
Gold Coast
Vancouver
Sydney
Shanghai
New York
Tokyo
Gold Coast
Vancouver


In [21]:
class Cities:
    def __init__(self):
        self._cities = ['Sydney', 'Shanghai', 'New York', 'Tokyo', 'Gold Coast', 'Vancouver']
    
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self): # with __iter__, Cities itself becomes an iterable
        print('Cities.__iter__ called')
        return self.CityIterator(self)
    
    def __getitem__(self, key): # makes Cities also an iterable
        print('Cities.__getitem__ called')
        return self._cities[key]
    
    class CityIterator: # nested class
        def __init__(self, cities):
            print('Creating iterator...')
            self._cities = cities
            self._index = 0

        def __iter__(self):
            return self
        
        def __next__(self):
            print('CityIterator.__next__ called')
            if self._index >= len(self._cities):
                raise StopIteration
            else:
                item = self._cities._cities[self._index]
                self._index += 1
                return item

cities = Cities()
city_iterator = cities.__iter__()
print(cities[:5:2])

for city in city_iterator:
    print(city)

for city in cities: # for prefers to use __iter__ than __getitem__
    print(city)

Cities.__iter__ called
Creating iterator...
Cities.__getitem__ called
['Sydney', 'New York', 'Gold Coast']
CityIterator.__next__ called
Sydney
CityIterator.__next__ called
Shanghai
CityIterator.__next__ called
New York
CityIterator.__next__ called
Tokyo
CityIterator.__next__ called
Gold Coast
CityIterator.__next__ called
Vancouver
CityIterator.__next__ called
Cities.__iter__ called
Creating iterator...
CityIterator.__next__ called
Sydney
CityIterator.__next__ called
Shanghai
CityIterator.__next__ called
New York
CityIterator.__next__ called
Tokyo
CityIterator.__next__ called
Gold Coast
CityIterator.__next__ called
Vancouver
CityIterator.__next__ called


Example 1: consuming iterators manually

In [4]:
from collections import namedtuple

def cast_data(type_and_data):
    type_, data = type_and_data
    match type_:
        case 'STRING' | 'CAT':
            return str(data)
        case 'INT':
            return int(data)
        case 'DOUBLE':
            return float(data)
        case _:
            raise ValueError('Unknown type')

with open('./p2_example1/cars.csv') as file:
    headers = next(file).strip().split(';') # the file object itself is an iterator, so there's no need to call iter() on it
    data_types = next(file).strip().split(';')
    Car = namedtuple('Car', headers)
    # cars = []
    # for row in file:
    #     car = Car(*map(cast_data, zip(data_types, row.strip().split(';'))))
    #     cars.append(car)

    cars = [Car(*map(cast_data, zip(data_types, row.strip().split(';'))))
            for row in file]

cars[:10]

[Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US'),
 Car(Car='Buick Skylark 320', MPG=15.0, Cylinders=8, Displacement=350.0, Horsepower=165.0, Weight=3693.0, Acceleration=11.5, Model=70, Origin='US'),
 Car(Car='Plymouth Satellite', MPG=18.0, Cylinders=8, Displacement=318.0, Horsepower=150.0, Weight=3436.0, Acceleration=11.0, Model=70, Origin='US'),
 Car(Car='AMC Rebel SST', MPG=16.0, Cylinders=8, Displacement=304.0, Horsepower=150.0, Weight=3433.0, Acceleration=12.0, Model=70, Origin='US'),
 Car(Car='Ford Torino', MPG=17.0, Cylinders=8, Displacement=302.0, Horsepower=140.0, Weight=3449.0, Acceleration=10.5, Model=70, Origin='US'),
 Car(Car='Ford Galaxie 500', MPG=15.0, Cylinders=8, Displacement=429.0, Horsepower=198.0, Weight=4341.0, Acceleration=10.0, Model=70, Origin='US'),
 Car(Car='Chevrolet Impala', MPG=14.0, Cylinders=8, Displacement=454.0, Horsepower=220.0, Weight=4354.0, Acc

Example 2: cyclic iterators

In [5]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
directions = ['E', 'S', 'W', 'N']

# combined = []
# for i, n in enumerate(numbers):
#     combined.append(str(n) + directions[i % 4])

combined = [str(n) + directions[i % 4] for i, n in enumerate(numbers)]
print(combined)

['1E', '2S', '3W', '4N', '5E', '6S', '7W', '8N', '9E', '10S', '11W', '12N', '13E']


In [6]:
class CyclicIterator:
    def __init__(self, sequence):
        self._sequence = sequence
        self._index = 0

    def __len__(self):
        return len(self._sequence)

    def __iter__(self):
        return self
    
    def __next__(self):
        # if self._index >= len(self._sequence):
        #     self._index = 0
        # result = self._sequence[self._index]
        result = self._sequence[self._index % len(self)]
        self._index += 1
        return result


numbers = list(range(1, 14))
directions = 'ESWN'
directions_cyclic = CyclicIterator(directions)

print('[', end=' ')
for d in range(10):
    print(next(directions_cyclic), end=' ')
print(']')

# although the CyclicIterator has a finite length, it still causes an infinite for loop
# unless there is an extra length limit in the __next__ method
# for d in directions_cyclic:
#     print(d)

# use zip() and CyclicIterator
# combined = [f'{n}{d}' for n, d in zip(numbers, directions_cyclic)]

# use zip() and repeated iterable
# combined = [f'{n}{d}' for n, d in zip(numbers, 'ESWN' * (len(numbers) // len('ESWN') + 1))]

# use next() with CyclicIterator
combined = [f'{n}{next(directions_cyclic)}' for n in range(1, 14)]

print(combined)

[ E S W N E S W N E S ]
['1W', '2N', '3E', '4S', '5W', '6N', '7E', '8S', '9W', '10N', '11E', '12S', '13W']


In [7]:
import itertools

# itertools.cycle(iterable)
# Make an iterator returning elements from the iterable and saving a copy of each.
# When the iterable is exhausted, return elements from the saved copy.
# Repeats indefinitely. Roughly equivalent to:
# def cycle(iterable):
#     # cycle('ABCD') --> A B C D A B C D A B C D ...
#     saved = []
#     for element in iterable:
#         yield element
#         saved.append(element)
#     while saved:
#         for element in saved:
#               yield element
# this implementation makes it not limited to sequence types unlike my CyclicIterator class

# itertools.zip_longest() is used to pad shorter iterables with a constant value

directions = 'ESWN'
combined = [f'{n}{d}' for n, d in zip(range(1, 14), itertools.cycle(directions))]
print(combined)

['1E', '2S', '3W', '4N', '5E', '6S', '7W', '8N', '9E', '10S', '11W', '12N', '13E']


### Lazy iterables

Lazy evaluation

* This is often used in class properties.
  * Properties of classes may not always be populated when the object is created.
  * Value of a property only becomes known when the property is requested, i.e. it is deferred.
* We can apply the same concept to certain iterables
  * We do not calculate the next item in an iterable until it is actually requested.

Application to iterables

* Retrieving a list of forum posts
  * Posts might be an iterable:
    * each call to `next` returns a list of 5 posts (or a page size)
    * uses lazy loading, i.e. every time `next` is called, go back to database and get next 5 posts
* Infinite iterables
  * Items are not computed until they are requested.
  * Don't try to use a `for` loop over such an iterable.

In [11]:
import math

class Circle:
    def __init__(self, radius):
        self.radius = radius

    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, radius):
        if radius > 0:
            self._radius = radius
            self._area = None # set area to None, so that when area property is requested, it recalculates and returns the right result
        else:
            raise ValueError('Radius should be greater than 0')
        
    @property
    def area(self):
        if not getattr(self, '_area', None): # area won't be 0, since radius > 0
            self._area = math.pi * self._radius ** 2
        return self._area
    
# c1 = Circle(0) # raise ValueError
c2 = Circle(1)
print(c2.radius, c2.area)
print(c2.__dict__)
c2.radius = 2
print(c2.area)

1 3.141592653589793
{'_radius': 1, '_area': 3.141592653589793}
12.566370614359172


In [29]:
class Factorial:
    def __init__(self, length):
        self.length = length

    @property
    def length(self):
        return self._length
    
    @length.setter
    def length(self, length):
        if isinstance(length, int):
            self._length = max(0, length)
        else:
            raise ValueError('Length must be an integer')
        
    def __len__(self):
        return self._length
    
    def __iter__(self):
        return self.FactorialIterator(self._length)
    
    class FactorialIterator:
        def __init__(self, length):
            self.length = length
            self.index = 0

        def __iter__(self):
            return self
        
        def __next__(self):
            if self.index >= self.length:
                raise StopIteration
            else:
                result = 1 if self.index == 0 else self.prev * self.index
                self.index += 1
                self.prev = result
                return result
        

f = Factorial(1)
print(f'{f.length=}')
f.length = 5
print(f'{f.length=}')

for e in f:
    print(e, end=' ')
print()

iter_f = iter(f)
print(next(iter_f), end=' ')
print(next(iter_f), end=' ')
print(next(iter_f))



f.length=1
f.length=5
1 1 2 6 24 
1 1 2


### Python's built-in iterables and iterators

* Python provides many functions that return iterables or iterators.
* You should always be aware of whether you are dealing with an iterable or an iterator.
  * If an object is an iterable (but not an iterator), you can iterate over it many times.
  * If an object is an iterator, you can iterate over it only once.
* Some built-in functions:
  * `range(10)` -> iterable
  * `zip(1l, l2)` -> iterator
  * `enumerate(l1)` -> iterator
  * `open('cars.csv')` -> iterator
  * dictionary `keys()` -> iterable
  * dictionary `values()` -> iterable
  * dictionary `items()` -> iterable

In [31]:
r = range(10)
print(f'{type(r)=}')
print(f'{('__iter__' in dir(r))=}')
print(f'{('__next__' in dir(r))=}')

z = zip('123', 'abc')
print(f'{(z is iter(z))=}')

type(r)=<class 'range'>
('__iter__' in dir(r))=True
('__next__' in dir(r))=False
(z is iter(z))=True


In [33]:
with open('./p2_example1/cars.csv') as f:
    origins = set()
    next(f), next(f)
    
    for row in f:
        origins.add(row.strip().split(';')[-1])
        
print(origins)

{'Japan', 'Europe', 'US'}


### Sorting iterables

In [34]:
import random

class RandomInts:
    def __init__(self, start, stop, length, *, seed=None):
        self.start = start
        self.stop = stop
        self.length = length
        self.seed = seed

    def __len__(self):
        return self.length

    def __iter__(self):
        return self.RandomIntsIterator(self.start, self.stop, self.length, self.seed)
    
    class RandomIntsIterator:
        def __init__(self, start, stop, length, seed):
            self.start = start
            self.stop = stop
            self.length = length
            self.next_times = 0
            random.seed(seed)

        def __iter__(self):
            return self
        
        def __next__(self):
            if self.next_times >= self.length:
                raise StopIteration
            else:
                self.next_times += 1
                return random.randint(self.start, self.stop)

rand_ints = RandomInts(0, 10, 10, seed=0)
print(list(rand_ints))
print(sorted(rand_ints))
print(sorted(rand_ints, reverse=True))

[6, 6, 0, 4, 8, 7, 6, 4, 7, 5]
[0, 4, 4, 5, 6, 6, 6, 7, 7, 8]
[8, 7, 7, 6, 6, 6, 5, 4, 4, 0]


### The `iter()` function

What happens when Python performs an iteration over an iterable

* The very first thing Python does is call the `iter()` function on the object to be iterated.
* If the object implements the `__iter__` method, that method is called and Python uses the returned iterator.
* If the object is a sequence type that only implements `__getitem__` method, calling `iter()` on it still gets an iterator back.
* If neither of `__iter__` or `__getiterm__` exists, raise a `TypeError` exception, i.e. not iterable.

Testing if an object is iterable

* Check if it implements
  * `__getitem__` or
  * `__iter__` and `__iter__` returns an iterator
* Or call `iter()` on it, if no `TypeError` then it is iterable
* Or just try to iterate it, e.g. using a `for` loop

In [39]:
l = [1, 2, 3, 4]
print(f'{type(iter(l))=}')

class Squares:
    def __init__(self, length):
        self._length = length

    def __len__(self):
        return self._length
    
    def __getitem__(self, index):
        if index >= self._length:
            raise IndexError
        else:
            return index ** 2
        
s = Squares(5)
print(list(s))
s_iterator = iter(s) # an iterator is created by iter() on the sequence
print(f'{type(s_iterator)=}')
print(f'{(next(s_iterator))=}')
print(f'{(next(s_iterator))=}')

type(iter(l))=<class 'list_iterator'>
[0, 1, 4, 9, 16]
type(s_iterator)=<class 'iterator'>
(next(s_iterator))=0
(next(s_iterator))=1


### Iterating callables

An iterator approach to iterating over the return value of a callable

* Make an iterator that knows two things
  * the callable to be called
  * a value (the sentinel) to indicate `StopIteration`
* When `next()` is called
  * call the callable and get the result
  * if the result is equal to the sentinel
    * raise `StopIteration`
    * exhaust the iterator
  * otherwise, return the result

Two forms of `iter()` function

* `iter(iterable)` -> iterator for iterable
  * the `iter()` function is able to generate an iterator from an object implementing the sequence protocol (has `__getitem__` method)
* `iter(callable, sentinel)` -> iterator
  * The returned iterator will
    * call the callable when `next()` is called
    * either raise `StopIteration` if the result is equal to the sentinel value or return the result otherwise

In [51]:
def counter():
    i = 0

    def inner():
        nonlocal i
        current, i = i, i + 1
        return current
    
    return inner

cnt = counter()
print(cnt(), end=' ')
print(cnt(), end=' ')
print(cnt())

class CallableIterator:
    def __init__(self, callable_, sentinel):
        self.callable_ = callable_
        self.sentinel = sentinel
        self.exhausted = False

    def __iter__(self):
        return self
    
    def __next__(self):
        if self.exhausted:
            raise StopIteration
        result = self.callable_()
        if result == self.sentinel:
            self.exhausted = True
            raise StopIteration
        else:
            return result

cnt_iterator = CallableIterator(counter(), 10)
print(list(cnt_iterator))
# print(next(cnt_iterator)) # raise StopIteration

cnt_iter = iter(counter(), 5)
print(f'{(type(cnt_iter))=}')
print(list(cnt_iter))

0 1 2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(type(cnt_iter))=<class 'callable_iterator'>
[0, 1, 2, 3, 4]


In [53]:
import random

random.seed(0)
print([random.randint(0, 10) for _ in range(10)])

random.seed(0)
print(list(iter(lambda : random.randint(0, 10), 8)))

[6, 6, 0, 4, 8, 7, 6, 4, 7, 5]
[6, 6, 0, 4]


### Delegating iterators

* In Python, delegation of iterators is a technique that allows one iterator to use another iterator to do some of the work.
* This is useful when you want to iterate over a sequence of items, but you don’t want to write the iteration code yourself. Instead, you can delegate the iteration to another iterator and focus on the logic that you want to apply to each item.
* The `yield from` statement introduced in Python 3.3 makes it easy to delegate iteration to another iterator.
    ```Python
    def flatten(nested):
        for sublist in nested:
            if isinstance(sublist, list):
                yield from flatten(sublist)
            else:
                yield sublist
    ```
  * When you use `yield from`, the sub-iterator is automatically exhausted before control is returned to the calling iterator.
  * This means that you don’t have to worry about manually iterating over the sub-iterator or handling `StopIteration` exceptions.

In [57]:
from collections import namedtuple

Person = namedtuple('Person', 'first_name last_name')

class PersonNames:
    def __init__(self, persons):
        try:
            self._names = [f'{person.first_name} {person.last_name}'.title()
                           for person in persons]
        except (TypeError, AttributeError):
            self._names = []
        
    def __iter__(self):
        return iter(self._names) # delegate the iteration to the iterable list
    
persons = [Person('tony', 'chi'), Person('Bruce', 'lEe'), Person('Eric', 'IDLE')]
person_names = PersonNames(persons)

for name in person_names:
    print(name, end=' ')
print()

print(list(person_names))

Tony Chi Bruce Lee Eric Idle 
['Tony Chi', 'Bruce Lee', 'Eric Idle']


### Reversed iteration

Iterating a sequence in reverse order

* If we have a sequence type, then iterating over the sequence in reverse order can be done using slice.
  * This works, but is wasteful because it makes a copy of the sequence.
* Use `range()` to generate corresponding indexes.
* Use the built-in `reversed()` function.
  * It is cleaner than `range()` and just as efficient.
  * It creates an iterator that will iterate backwards over the sequence.
  * Both `__getitem__` and `__len__` must be implemented. `__len__` is essential for `reversed` function.
  * We can override how `reversed` works by implementing `__reversed__` special method.

Iterating an iterable in reverse

* When we call `reversed()` on a custom iterable
  * Python will look for and call `__reversed__` method.
    * `__reversed__` should return an iterator that will be used to perform the reversed iteration.
  * If `__reversed__` is not there, uses `__getitem__` and `__len__` to create an iterator.
  * If none of requirements are met, raise exception.

In [4]:
from collections import namedtuple

SUITS = tuple('♠♥♦♣')
RANKS = tuple(map(str, range(2, 11))) + tuple('JQKA')

Card = namedtuple('Card', 'rank suit')

class CardDeck:
    def __init__(self):
        self._length = len(SUITS) * len(RANKS)

    @property
    def length(self):
        return self._length

    def __len__(self):
        return self._length
    
    def __iter__(self):
        return self.CardDeckIterator(self._length)
    
    def __reversed__(self):
        return self.CardDeckIterator(self._length, True)
    
    class CardDeckIterator:
        def __init__(self, length, reverse=False):
            self.length = length
            if reverse:
                self.index = length - 1
                self.step = -1
            else:
                self.index = 0
                self.step = 1

        def __iter__(self):
            return self
        
        def __next__(self):
            if self.index >= self.length or self.index < 0:
                raise StopIteration
            else:
                result, self.index = self.index, self.index + self.step
                rank = RANKS[result % len(RANKS)]
                suit = SUITS[result // len(RANKS)]
                return Card(rank, suit)


card_deck = CardDeck()
cards = list(card_deck)[:10]
print(cards)
cards_reversed = list(reversed(card_deck))[:10]
print(cards_reversed)

print(sorted(cards, reverse=True)[:10])

cards_reversed = [card for card, _ in zip(reversed(CardDeck()), range(10))]
print(cards_reversed)

[Card(rank='2', suit='♠'), Card(rank='3', suit='♠'), Card(rank='4', suit='♠'), Card(rank='5', suit='♠'), Card(rank='6', suit='♠'), Card(rank='7', suit='♠'), Card(rank='8', suit='♠'), Card(rank='9', suit='♠'), Card(rank='10', suit='♠'), Card(rank='J', suit='♠')]
[Card(rank='A', suit='♣'), Card(rank='K', suit='♣'), Card(rank='Q', suit='♣'), Card(rank='J', suit='♣'), Card(rank='10', suit='♣'), Card(rank='9', suit='♣'), Card(rank='8', suit='♣'), Card(rank='7', suit='♣'), Card(rank='6', suit='♣'), Card(rank='5', suit='♣')]
[Card(rank='J', suit='♠'), Card(rank='9', suit='♠'), Card(rank='8', suit='♠'), Card(rank='7', suit='♠'), Card(rank='6', suit='♠'), Card(rank='5', suit='♠'), Card(rank='4', suit='♠'), Card(rank='3', suit='♠'), Card(rank='2', suit='♠'), Card(rank='10', suit='♠')]
[Card(rank='A', suit='♣'), Card(rank='K', suit='♣'), Card(rank='Q', suit='♣'), Card(rank='J', suit='♣'), Card(rank='10', suit='♣'), Card(rank='9', suit='♣'), Card(rank='8', suit='♣'), Card(rank='7', suit='♣'), Card

### Caveat of using iterators as function arguments

* Iterators are consumed after one pass.
  * if you pass an iterator to a function, the function will consume the iterator and you will not be able to use it again.
* Iterators can be infinite.
  * Some iterators, such as `itertools.count()`, can generate an infinite number of values. If you pass an infinite iterator to a function, the function may never return.
* Iterators can be slow.
  * Iterators can be slower than lists or other containers, especially if the iterator is generated on the fly.
* Iterators can be more memory-efficient.
  * If the iterator generates its values on the fly, the iterator only generates values as they are needed, rather than generating all the values at once and storing them in memory.

In [8]:
def analyze_mpg(data, print_count=0):
    if iter(data) is data:
        # method 1: raise exception
        # raise ValueError('data cannot be an iterator')

        # method 2: convert to list
        data = list(data)

    max_mpg = 0
    for row in data:
        mpg = float(row.split(';')[1])
        if mpg > max_mpg:
            max_mpg = mpg
    
    if print_count != 0:
        data = data[:print_count]
    for row in data:
        car, mpg = row.split(';')[:2]
        print(f'{car}: {float(mpg) / max_mpg * 100:.2f}%')

with open('./p2_example1/cars.csv') as f:
    next(f), next(f)
    analyze_mpg(f, 10)

Chevrolet Chevelle Malibu: 38.63%
Buick Skylark 320: 32.19%
Plymouth Satellite: 38.63%
AMC Rebel SST: 34.33%
Ford Torino: 36.48%
Ford Galaxie 500: 32.19%
Chevrolet Impala: 30.04%
Plymouth Fury iii: 30.04%
Pontiac Catalina: 30.04%
AMC Ambassador DPL: 32.19%


## Project 2

The starting point for this project is the `Polygon` class and the `PolygonSequence` class created in the previous project.

Goal 1

* Refactor the `Polygon` class so that all the calculated properties are lazy properties, and should not be calculated more than once (the `Polygon` class is immutable)

Goal 2

* Refactor the `PolygonSequence` class into an iterable. Make sure also that the elements in the iterator are computed lazily, i.e. you can no longer use a list as an underlying storage mechanism for your polygons.
* You'll need to implement both an iterable and an iterator.

In [3]:
from numbers import Real
import math
from typing import Self

class Polygon:
    def __init__(self, edges: int, circumradius: Real):
        if isinstance(edges, int) and edges > 2:
            self._edges = edges
        else:
            raise ValueError('edges must be an integer greater than 2')
        if isinstance(circumradius, Real) and circumradius > 0:
            self._circumradius = circumradius
        else:
            raise ValueError('circumradius must be a number greater than 0')
        # self._vertices = self._edges
        self._interior_angle = None
        self._apothem = None
        self._edge_length = None
        self._area = None
        self._perimeter = None
        
    @property
    def edges(self):
        return self._edges
    
    @property
    def circumradius(self):
        return self._circumradius
    
    @property
    def vertices(self):
        return self._edges
    
    @property
    def interior_angle(self):
        if self._interior_angle is None:
            self._interior_angle = (self._edges - 2) * (180 / self._edges)
        return self._interior_angle
    
    @property
    def apothem(self):
        if self._apothem is None:
            self._apothem = math.cos(math.pi / self._edges) * self._circumradius
        return self._apothem
    
    @property
    def edge_length(self):
        if self._edge_length is None:
            self._edge_length = 2 * math.sin(math.pi / self._edges) * self._circumradius
        return self._edge_length
    
    @property
    def area(self):
        if self._area is None:
            self._area = self.apothem * self.edge_length * self._edges / 2
        return self._area
    
    @property
    def perimeter(self):
        if self._perimeter is None:
            self._perimeter = self._edges * self.edge_length
        return self._perimeter
    
    def __repr__(self):
        return f'Polygon(edges={self._edges}, circumradius={self._circumradius})'
    
    def __eq__(self, other: Self):
        if isinstance(other, self.__class__):
            return self._edges == other._edges and self._circumradius == other._circumradius
        else:
            # return False
            return NotImplemented
        
    def __lt__(self, other: Self):
        if isinstance(other, self.__class__):
            if self._circumradius == other._circumradius:
                return self._edges < other._edges
            else:
                return self.area < other.area
        else:
            raise TypeError(f"'<' not supported between instances of 'Polygon' and {type(other).__name__!r}")


class Polygons:
    def __init__(self, max_edges: int, common_circumradius: Real):
        # validate max_edges and common_circumradius by try to create a polygon with these values
        # try:
        #     Polygon(max_edges, common_circumradius)
        # except:
        #     raise
        # self._max_edges = max_edges
        # self._common_circumradius = common_circumradius

        # validate arguments the same way as in Polygon
        if isinstance(max_edges, int) and max_edges > 2:
            self._max_edges = max_edges
        else:
            raise ValueError('max_edges must be an integer greater than 2')
        if isinstance(common_circumradius, Real) and common_circumradius > 0:
            self._common_circumradius = common_circumradius
        else:
            raise ValueError('common_circumradius must be a number greater than 0')
        self._max_efficient_polygon = None # polygon with the highest area/perimeter ratio

    @property
    def max_edges(self):
        return self._max_edges
    
    @property
    def common_circumradius(self):
        return self._common_circumradius
    
    @property
    def max_efficient_polygon(self):
        if self._max_efficient_polygon is None:
            sorted_polygons = sorted(self, key=lambda x: x.area / x.perimeter)
            self._max_efficient_polygon = sorted_polygons[-1] if sorted_polygons else None
        return self._max_efficient_polygon
    
    def __len__(self):
        return self._max_edges - 2
    
    def __repr__(self):
        return f'Polygons(max_edges={self._max_edges}, common_circumradius={self._common_circumradius})'
    
    def __iter__(self):
        return self.PolygonsIterator(self._max_edges, self._common_circumradius)
    
    def __reversed__(self):
        return self.PolygonsIterator(self._max_edges, self._common_circumradius, True)
    
    class PolygonsIterator:
        def __init__(self, max_edges, common_circumradius, reverse=False):
            self.max_edges = max_edges
            self.common_circumradius = common_circumradius
            if reverse:
                self.current_edges = max_edges
                self.step = -1
            else:
                self.current_edges = 3
                self.step = 1

        def __iter__(self):
            return self
        
        def __next__(self):
            if self.current_edges > self.max_edges or self.current_edges < 3:
                raise StopIteration
            else:
                polygon = Polygon(self.current_edges, self.common_circumradius)
                self.current_edges += self.step
                return polygon


poly1 = Polygon(3, 10)
print(f'{poly1=}')
print(f'{poly1.edges=}')
print(f'{poly1.vertices=}')
print(f'{poly1.circumradius=}')
print(f'{poly1.interior_angle=}')
print(f'{poly1.apothem=}')
print(f'{poly1.edge_length=}')
print(f'{poly1.area=}')
print(f'{poly1.perimeter=}')
poly2 = Polygon(3, 10)
print(f'{(poly1 == poly2)=}')
poly3 = Polygon(4, 10)
print(f'{(poly1 < poly3)=}')
print(f'{(poly1 > poly3)=}')
poly4 = Polygon(4, 11)
print(f'{(poly4 > poly3)=}')

polys = Polygons(10, 10)
print(f'{polys=}')
print(f'{len(polys)=}')
print(f'{polys.max_edges=}')
print(f'{polys.common_circumradius=}')
print(f'{polys.max_efficient_polygon=}')
print(list(polys))
print(list(reversed(polys)))

poly1=Polygon(edges=3, circumradius=10)
poly1.edges=3
poly1.vertices=3
poly1.circumradius=10
poly1.interior_angle=60.0
poly1.apothem=5.000000000000001
poly1.edge_length=17.32050807568877
poly1.area=129.9038105676658
poly1.perimeter=51.96152422706631
(poly1 == poly2)=True
(poly1 < poly3)=True
(poly1 > poly3)=False
(poly4 > poly3)=True
polys=Polygons(max_edges=10, common_circumradius=10)
len(polys)=8
polys.max_edges=10
polys.common_circumradius=10
polys.max_efficient_polygon=Polygon(edges=10, circumradius=10)
[Polygon(edges=3, circumradius=10), Polygon(edges=4, circumradius=10), Polygon(edges=5, circumradius=10), Polygon(edges=6, circumradius=10), Polygon(edges=7, circumradius=10), Polygon(edges=8, circumradius=10), Polygon(edges=9, circumradius=10), Polygon(edges=10, circumradius=10)]
[Polygon(edges=10, circumradius=10), Polygon(edges=9, circumradius=10), Polygon(edges=8, circumradius=10), Polygon(edges=7, circumradius=10), Polygon(edges=6, circumradius=10), Polygon(edges=5, circumradiu

## Generators

### Introduction

* Generators are a type of iterators.
* Generator functions
  * generator factories
  * they return a generator when called
  * they are not a generator themselves
* Generator expressions
  * uses comprehension syntax
  * a more concise way of creating generators
  * lik list comprehensions, useful for simple situations

### Yielding and generator functions

* Generator functions are a simple and powerful tool for creating iterators.
  * They are written like regular functions but use the `yield` statement whenever they want to return data.
  * We can thinks of generator functions as generator factories.
  * Each time `next()` is called on it, the generator function resumes where it left off (it remembers all the data values and which statement was last executed).
  * Anything that can be done with generator functions can also be done with class-based iterators.
  * Generators implements the iterator protocol. The `__iter__()` and `__next__()` methods are created automatically.
  * Generators are lazy iterators, and can be infinite.
  * Another key feature is that the local variables and execution state are automatically saved between calls. This made the function easier to write and much more clear than an approach using instance variables like `self.index` and `self.data`.
  * In addition to automatic method creation and saving program state, when generators terminate, they automatically raise `StopIteration`. In combination, these features make it easy to create iterators with no more effort than writing a regular function.
* The `yield` keyword:
  * It emits a value.
  * The function is effectively suspended at `yield` (but it retains its state).
  * Calling `next` on the function resumes running the function right after the `yield` statement.
  * If the function returns something instead of yielding (finishes running), `StopIteration` will be raised, and the return value is the exception message.

In [9]:
import math

# the second form of iter()
def factorial_callable():
    i = 0
    result = 1
    def inner():
        nonlocal i
        nonlocal result
        if i > 0:
            result *= i
        i += 1
        return result
    return inner

fac_iter = iter(factorial_callable(), 120)
print(list(fac_iter))

# generator function
def factorials(n: int):
    if isinstance(n, int) and n >= 0:
        current_n = 0
        while current_n <= n:
            if current_n == 0:
                current = 1
            else:
                current = prev * current_n
            yield current
            prev = current
            current_n += 1

def factorials_math(n: int):
    for i in range(n + 1):
        yield math.factorial(i)

factorial_gen = factorials(10)
print(f'{type(factorial_gen)=}')
print(f'{(iter(factorial_gen) is factorial_gen)=}')
print(f'{('__iter__' in dir(factorial_gen))=}')
print(f'{('__next__' in dir(factorial_gen))=}')

for f in factorial_gen:
    print(f, end=' ')
print()

print(list(factorials_math(10)))

[1, 1, 2, 6, 24]
type(factorial_gen)=<class 'generator'>
(iter(factorial_gen) is factorial_gen)=True
('__iter__' in dir(factorial_gen))=True
('__next__' in dir(factorial_gen))=True
1 1 2 6 24 120 720 5040 40320 362880 3628800 
[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]


In [26]:
from timeit import timeit

# recursive fibonacci generator function
def fibonaccis_recursive(n: int):
    for n in range(n + 1):
        if n == 0:
            yield 0
        elif n == 1:
            yield 1
        else:
            yield max(fibonaccis_recursive(n - 1)) + max(fibonaccis_recursive(n - 2))

fibo_gen = fibonaccis_recursive(5)
for fibo in fibo_gen:
    print(fibo, end=' ')
print()

# non-recursive fibonacci generator function
def fibonaccis(n: int):
    fib0 = 0
    yield fib0
    fib1 = 1
    yield fib1
    for n in range(n - 1):
        fib0, fib1 = fib1, fib0 + fib1
        yield fib1

print(list(fibonaccis(10)))

print(timeit('list(fibonaccis_recursive(10))', globals=globals(), number=10))
print(timeit('list(fibonaccis(10))', globals=globals(), number=10))


0 1 1 2 3 5 
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
0.029105499968864024
1.6600009985268116e-05


### Making an iterable from an iterator

Generators become exhausted

* Generator functions are functions that use `yield`.
  * A generator function is a generator factory, i.e. they return a new generator when called.
* Generators are iterators.
  * They can become exhausted (consumed).
  * They cannot be restarted.
  * This can lead to expected bugs if you try to iterate twice over a generator.

Making an iterable

* The solution is to create an iterable that returns a new iterator every time.
  * Making an iterable with `__iter__` method based on the generator function.
* This is no different than with any other iterator.

In [5]:
# generator function
def squares_gf(n):
    for i in range(n):
        yield i ** 2

class SquaresIterable:
    def __init__(self, n):
        self.n = n

    def __iter__(self):
        return SquaresIterable.squares_gf(self.n)
    
    @staticmethod
    def squares_gf(n):
        for i in range(n):
            yield i ** 2
    
sq_gen = squares_gf(5)
print(list(sq_gen))
print(list(sq_gen)) # [], sq_gen exhausted

sq_iter = SquaresIterable(5)
print(list(sq_iter))
print(list(sq_iter)) # not exhausted, because when every time __iter__ get called, a new generator will be created

sq_gen = squares_gf(5)
sq_enum = enumerate(sq_gen)
print(list(sq_enum))
print(list(sq_enum)) # [], enumerate returns a generator too

[0, 1, 4, 9, 16]
[]
[0, 1, 4, 9, 16]
[0, 1, 4, 9, 16]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16)]
[]


In [10]:
from collections import namedtuple

Card = namedtuple('Card', 'rank suit')
RANKS = tuple(map(str, range(2, 11))) + tuple('JQKA')
SUITS = tuple('♠♥♦♣')

def card_deck_gf1():
    for i in range(len(SUITS) * len(RANKS)):
        rank = RANKS[i % len(RANKS)]
        suit = SUITS[i // len(RANKS)]
        yield Card(rank, suit)

def card_deck_gf2():
    for suit in SUITS:
        for rank in RANKS:
            yield Card(rank, suit)

print(list(card_deck_gf2()))

# CardDeck iterable
class CardDeck:
    Card = namedtuple('Card', 'rank suit')
    RANKS = tuple(map(str, range(2, 11))) + tuple('JQKA')
    SUITS = tuple('♠♥♦♣')

    def __iter__(self):
        return CardDeck.card_generator_function()
    
    def __reversed__(self):
        return CardDeck.card_generator_function(reverse=True)
    
    @staticmethod
    def card_generator_function(*, reverse=False):
        if reverse:
            for suit in CardDeck.SUITS[::-1]:
                for rank in CardDeck.RANKS[::-1]:
                    yield CardDeck.Card(rank, suit)
        else:
            for suit in CardDeck.SUITS:
                for rank in CardDeck.RANKS:
                    yield CardDeck.Card(rank, suit)


card_deck = CardDeck()
print(list(card_deck))
print(list(reversed(card_deck)))

[Card(rank='2', suit='♠'), Card(rank='3', suit='♠'), Card(rank='4', suit='♠'), Card(rank='5', suit='♠'), Card(rank='6', suit='♠'), Card(rank='7', suit='♠'), Card(rank='8', suit='♠'), Card(rank='9', suit='♠'), Card(rank='10', suit='♠'), Card(rank='J', suit='♠'), Card(rank='Q', suit='♠'), Card(rank='K', suit='♠'), Card(rank='A', suit='♠'), Card(rank='2', suit='♥'), Card(rank='3', suit='♥'), Card(rank='4', suit='♥'), Card(rank='5', suit='♥'), Card(rank='6', suit='♥'), Card(rank='7', suit='♥'), Card(rank='8', suit='♥'), Card(rank='9', suit='♥'), Card(rank='10', suit='♥'), Card(rank='J', suit='♥'), Card(rank='Q', suit='♥'), Card(rank='K', suit='♥'), Card(rank='A', suit='♥'), Card(rank='2', suit='♦'), Card(rank='3', suit='♦'), Card(rank='4', suit='♦'), Card(rank='5', suit='♦'), Card(rank='6', suit='♦'), Card(rank='7', suit='♦'), Card(rank='8', suit='♦'), Card(rank='9', suit='♦'), Card(rank='10', suit='♦'), Card(rank='J', suit='♦'), Card(rank='Q', suit='♦'), Card(rank='K', suit='♦'), Card(ran

### Generator expressions and performance

* Comprehension
  * complicated syntax
    * `if` statements
    * multiple nested loops
    * nested comprehensions
  * evaluation is eager
    * all objects are created right away
    * takes longer to return the result
    * iteration is faster (objects already created)
  * has local scope
  * can access nonlocal and global scope
  * the result is an iterable, never gets exhausted
  * entire collection is loaded into memory
* Generator expressions
  * use the same comprehension syntax
    * including `if` statements, nested loops, and nested comprehension
    * but instead of `[]` as in list comprehension, generator expressions use `()`
  * evaluation is *lazy*
    * object creation is delayed until requested by `next()`
    * generator returned immediately
    * iteration is slower (objects need to be created)
  * has local scope
  * can access nonlocal and global scope
  * the result is a generator, gets exhausted
  * only a single item is loaded at a time, i.e. generators tend to have less memory overhead
  * if you do not iterate through all the elements, generators are more efficient

In [15]:
l = [i ** 2 for i in range(5)]
print(f'{type(l)=}')

g = (i ** 2 for i in range(5))
print(f'{type(g)=}')
print(list(g))
print(list(g)) # [], generator exhausted

type(l)=<class 'list'>
type(g)=<class 'generator'>
[0, 1, 4, 9, 16]
[]


In [13]:
import dis

exp = compile('[i ** 2 for i in range(5)]', filename='<string>', mode='eval')
dis.dis(exp)

  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (range)
              6 LOAD_CONST               0 (5)
              8 CALL                     1
             16 GET_ITER
             18 LOAD_FAST_AND_CLEAR      0 (i)
             20 SWAP                     2
             22 BUILD_LIST               0
             24 SWAP                     2
        >>   26 FOR_ITER                 7 (to 44)
             30 STORE_FAST               0 (i)
             32 LOAD_FAST                0 (i)
             34 LOAD_CONST               1 (2)
             36 BINARY_OP                8 (**)
             40 LIST_APPEND              2
             42 JUMP_BACKWARD            9 (to 26)
        >>   44 END_FOR
             46 SWAP                     2
             48 STORE_FAST               0 (i)
             50 RETURN_VALUE
        >>   52 SWAP                     2
             54 POP_TOP
             56 SWAP                     2
 

In [14]:
exp = compile('(i ** 2 for i in range(5))', filename='<string>', mode='eval')
dis.dis(exp)

  0           0 RESUME                   0

  1           2 LOAD_CONST               0 (<code object <genexpr> at 0x00000296C7168030, file "<string>", line 1>)
              4 MAKE_FUNCTION            0
              6 PUSH_NULL
              8 LOAD_NAME                0 (range)
             10 LOAD_CONST               1 (5)
             12 CALL                     1
             20 GET_ITER
             22 CALL                     0
             30 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x00000296C7168030, file "<string>", line 1>:
  1           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0
              6 LOAD_FAST                0 (.0)
        >>    8 FOR_ITER                 9 (to 30)
             12 STORE_FAST               1 (i)
             14 LOAD_FAST                1 (i)
             16 LOAD_CONST               0 (2)
             18 BINARY_OP                8 (**)
             22 YIELD_VALUE              1
             2

In [17]:
start = 1
stop = 11

l = [[i * j for j in range(start, stop)] # nested list comprehension
     for i in range(start, stop)]
g1 = ((i * j for j in range(start, stop)) # nested generator, which is tricky to get fully evaluated
     for i in range(start, stop))
g2 = ([i * j for j in range(start, stop)] # the inner list is not evaluated until requested
      for i in range(start, stop))

print('l:')
print(l)
print('\nlist(g1):')
print(list(g1))
print('\nlist(g2):')
print(list(g2))

l:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [2, 4, 6, 8, 10, 12, 14, 16, 18, 20], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30], [4, 8, 12, 16, 20, 24, 28, 32, 36, 40], [5, 10, 15, 20, 25, 30, 35, 40, 45, 50], [6, 12, 18, 24, 30, 36, 42, 48, 54, 60], [7, 14, 21, 28, 35, 42, 49, 56, 63, 70], [8, 16, 24, 32, 40, 48, 56, 64, 72, 80], [9, 18, 27, 36, 45, 54, 63, 72, 81, 90], [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]

list(g1):
[<generator object <genexpr>.<genexpr> at 0x00000296C710E8E0>, <generator object <genexpr>.<genexpr> at 0x00000296C710F1D0>, <generator object <genexpr>.<genexpr> at 0x00000296C710D490>, <generator object <genexpr>.<genexpr> at 0x00000296C710E330>, <generator object <genexpr>.<genexpr> at 0x00000296C710DF20>, <generator object <genexpr>.<genexpr> at 0x00000296C710DCB0>, <generator object <genexpr>.<genexpr> at 0x00000296C710D3C0>, <generator object <genexpr>.<genexpr> at 0x00000296C710D700>, <generator object <genexpr>.<genexpr> at 0x00000296C710F370>, <generator object <genexpr>.

In [24]:
import math
import pprint
import timeit

def combination(n, k):
    return math.factorial(n) // (math.factorial(k) * math.factorial(n - k))

size = 10

pascals_triangle_list = [[combination(n, k) for k in range(n + 1)] for n in range(size + 1)]
pprint.pprint(pascals_triangle_list)
pascals_triangle_generator = ([combination(n, k) for k in range(n + 1)] for n in range(size + 1))
pprint.pprint(list(pascals_triangle_generator))

print(timeit.timeit(stmt='[[combination(n, k) for k in range(n + 1)] for n in range(size + 1)]', number = 10000, globals=globals()))
print(timeit.timeit(stmt='((combination(n, k) for k in range(n + 1)) for n in range(size + 1))', number = 10000, globals=globals()))
print(timeit.timeit(stmt='([combination(n, k) for k in range(n + 1)] for n in range(size + 1))', number = 10000, globals=globals())) # a bit slower the above one

[[1],
 [1, 1],
 [1, 2, 1],
 [1, 3, 3, 1],
 [1, 4, 6, 4, 1],
 [1, 5, 10, 10, 5, 1],
 [1, 6, 15, 20, 15, 6, 1],
 [1, 7, 21, 35, 35, 21, 7, 1],
 [1, 8, 28, 56, 70, 56, 28, 8, 1],
 [1, 9, 36, 84, 126, 126, 84, 36, 9, 1],
 [1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1]]
[[1],
 [1, 1],
 [1, 2, 1],
 [1, 3, 3, 1],
 [1, 4, 6, 4, 1],
 [1, 5, 10, 10, 5, 1],
 [1, 6, 15, 20, 15, 6, 1],
 [1, 7, 21, 35, 35, 21, 7, 1],
 [1, 8, 28, 56, 70, 56, 28, 8, 1],
 [1, 9, 36, 84, 126, 126, 84, 36, 9, 1],
 [1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1]]
0.29836249991785735
0.003969900077208877
0.004705699975602329


### Yield from

Delegating to another iterator

* PEP 380 adds the `yield from` expression, allowing a generator to delegate part of its operations to another generator.
* This allows a section of code containing `yield` to be factored out and placed in another generator.
* Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator.
* While designed primarily for use in delegating to a subgenerator, the `yield from` expression actually allows delegation to arbitrary *subiterators*.
* For simple iterators, `yield from` iterable is essentially just a shortened form of `for item in iterable: yield item`.
* However, unlike an ordinary loop, `yield from` allows subgenerators to receive sent and thrown values directly from the calling scope, and return a final value to the outer generator

In [25]:
def matrix(n):
    return ((i * j for j in range(1, n + 1)) for i in range(1, n + 1))

def matrix_iterator(n):
    for row in matrix(n):
        yield from row
        print()

for item in matrix_iterator(5):
    print(item, end=' ')


1 2 3 4 5 
2 4 6 8 10 
3 6 9 12 15 
4 8 12 16 20 
5 10 15 20 25 


In [26]:
files = ('./p2_example3/car-brands-1.txt', './p2_example3/car-brands-2.txt', './p2_example3/car-brands-3.txt')

def car_brands_generator(*files):
    for file in files:
        yield from clean_file(file)

def clean_file(file):
    with open(file) as f:
        for row in f:
            yield row.strip()


for car_brand in car_brands_generator(*files):
    print(car_brand)

Alfa Romeo
Aston Martin
Audi
Bentley
Benz
BMW
Bugatti
Cadillac
Chevrolet
Chrysler
Citro毛n
Corvette
DAF
Dacia
Daewoo
Daihatsu
Datsun
De Lorean
Dino
Dodge
Farboud
Ferrari
Fiat
Ford
Honda
Hummer
Hyundai
Jaguar
Jeep
KIA
Koenigsegg
Lada
Lamborghini
Lancia
Land Rover
Lexus
Ligier
Lincoln
Lotus
Martini
Maserati
Maybach
Mazda
McLaren
Mercedes-Benz
Mini
Mitsubishi
Nissan
Noble
Opel
Peugeot
Pontiac
Porsche
Renault
Rolls-Royce
Saab
Seat
艩koda
Smart
Spyker
Subaru
Suzuki
Toyota
Vauxhall
Volkswagen
Volvo


## Project 3

Background info

* The data file `'nyc_parking_tickets_extract.csv'`
  * fields separated by commas
  * data rows are a mix of data types: string, date, int

Goal 1

* Create a lazy iterator that will produce a named tuple for each row of data.
* The contents of each tuple should be an appropriate data type.
* You should not be reading the entire file in memory and then processing it.
* The goal is to keep the required memory overhead to a minimum.

Goal 2

* Calculate the number of violations by car make.
* Use the lazy iterator you created in Goal 1.
* Use lazy evaluation whenever possible.
* You can store the make and violation counts as a dictionary.

In [3]:
from collections import namedtuple, defaultdict
from datetime import datetime
from functools import partial
from pprint import pprint


class Tickets:
    @staticmethod
    def parse_str(string, *, default=None):
        return string.strip() or default
    
    @staticmethod
    def parse_int(string, *, default=None):
        try:
            return int(string)
        except ValueError:
            return default
        
    @staticmethod
    def parse_date(string, *, default=None):
        try:
            return datetime.strptime(string.strip(), '%m/%d/%Y').date()
        except ValueError:
            return default

    @staticmethod
    def tickets_iterator(file):
        with open(file) as file:
            yield from file
        
    value_parsers = (
        parse_int, # the static method need to be defined before the reference to it
        lambda string: Tickets.parse_str(string, default=''), # need to use the class name to correctly reference the static method
        partial(parse_str, default=''), # same effect as above
        partial(parse_str, default=''),
        partial(parse_date, default=''),
        parse_int,
        partial(parse_str, default=''),
        partial(parse_str, default=''),
        partial(parse_str, default='')
    )

    @staticmethod
    def tickets_parser(file, *, default=None):
        ticket_rows = Tickets.tickets_iterator(file)
        headers = next(ticket_rows).strip().lower().replace(' ', '_').split(',')
        Ticket = namedtuple('Ticket', headers)
        for row in ticket_rows:
            # parse each item on the row
            data = list(parser(value) for parser, value in zip(Tickets.value_parsers, row.strip().split(',')))
            if all(item is not None for item in data): # check if any item is None, i.e. can't be parsed
                yield Ticket(*data)
            else:
                yield default


tickets = Tickets().tickets_parser('./p2_project3/nyc_parking_tickets_extract.csv')
tickets_list = list(tickets)
pprint(tickets_list[:5])
print(f'All tickets can be parsed (i.e. no None object in the list): {all(tickets_list)}')

# violations_by_make = {}
violations_by_make = defaultdict(int)
for ticket in tickets_list:
    # method 1, use setdefault
    # violations_by_make.setdefault(ticket.vehicle_make, 0)
    # violations_by_make[ticket.vehicle_make] += 1

    # method 2, use setdefault, combine two statements together
    # violations_by_make[ticket.vehicle_make] = violations_by_make.setdefault(ticket.vehicle_make, 0) + 1

    # method 3, use collections.defaultdict
    violations_by_make[ticket.vehicle_make] += 1 # if key doesn't exist, its value will be defaulted to 0

print(violations_by_make)
print(dict(sorted(violations_by_make.items(), key=lambda item: item[1], reverse=True)))

[Ticket(summons_number=4006478550, plate_id='VAD7274', registration_state='VA', plate_type='PAS', issue_date=datetime.date(2016, 10, 5), violation_code=5, vehicle_body_type='4D', vehicle_make='BMW', violation_description='BUS LANE VIOLATION'),
 Ticket(summons_number=4006462396, plate_id='22834JK', registration_state='NY', plate_type='COM', issue_date=datetime.date(2016, 9, 30), violation_code=5, vehicle_body_type='VAN', vehicle_make='CHEVR', violation_description='BUS LANE VIOLATION'),
 Ticket(summons_number=4007117810, plate_id='21791MG', registration_state='NY', plate_type='COM', issue_date=datetime.date(2017, 4, 10), violation_code=5, vehicle_body_type='VAN', vehicle_make='DODGE', violation_description='BUS LANE VIOLATION'),
 Ticket(summons_number=4006265037, plate_id='FZX9232', registration_state='NY', plate_type='PAS', issue_date=datetime.date(2016, 8, 23), violation_code=5, vehicle_body_type='SUBN', vehicle_make='FORD', violation_description='BUS LANE VIOLATION'),
 Ticket(summons

## Iteration tools

### Aggregators

* Aggregator functions are functions that take a collection of values and return a single value that summarizes the collection.
  * `min(iterable)` -> minimum value in the iterable
  * `max(iterable)` -> maximum value in the iterable
  * `sum(iterable)` -> sum of all the values in the iterable
  * `functools.reduce(function, iterable[, initializer])`
    * Apply `function` of two arguments cumulatively to the items of `iterable`, from left to right, so as to reduce the iterable to a single value.
    * The left argument is the accumulated value and the right argument is the update value from the iterable.
    * If the optional `initializer` is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty.
    * If initializer is not given and iterable contains only one item, the first item is returned.

Associated truth value

* Every object in Python has an associated truth value.
  * Every object has a `True` truth value, except:
    * `None`
    * `False`
    * `0` in any numeric type (e.g. `0`, `0.0`, `0 + 0j`, ...)
    * empty sequences (e.g. list, tuple, string, ...)
    * empty mapping types (e.g. dictionary, set, ...)
    * custom classes that implement a `__bool__` or `__len__` method that returns `False` or `0`
  * `any(iterable)` -> return `True` if any element in `iterable` is truthy, `False` otherwise
  * `all(iterable)` -> return `True` if all elements in `iterable` is truthy, `False` otherwise
* A function that takes a single argument and returns `True` or `False` is called a predicate (e.g. `bool`).
  * We can make `any` or `all` more useful by first applying a predicate to each element of the iterable.
  * Ways to apply the predicate:
    * the `map` function
    * a comprehension

In [3]:
from numbers import Number
from decimal import Decimal
from fractions import Fraction

def is_all_numbers(iterable):
    # return all(map(lambda element: isinstance(element, Number), iterable)) # use map, not efficient
    # return all(isinstance(element, Number) for element in iterable) # use generator, not efficient
    for element in iterable:
        if not isinstance(element, Number):
            return False # short-circuited
    return True

l1 = [0, 1, Decimal('12'), Fraction(1, 2), 0 + 10j, 3.14]
l2 = [0, '1', Decimal('12'), Fraction(1, 2), 0 + 10j, 3.14]
print(f'{is_all_numbers(l1)=}')
print(f'{is_all_numbers(l2)=}')

is_all_numbers(l1)=True
is_all_numbers(l2)=False


### Slicing iterables

* `itertools.islice(iterable, stop)`, `itertools.islice(iterable, start, stop[, step])`
  * Make an *iterator* that returns selected elements from the iterable.
  * Unlike regular slicing, `islice()` does not support negative values for `start`, `stop`, or `step`.

In [7]:
from itertools import islice
import math
import sys

def factorials(n): # return a generator of factorials from 0 to n
    for i in range(n + 1):
        yield math.factorial(i)

facts = factorials(100)

def slice_iterable(iterable, *args):
    s = slice(*args)
    # sys.maxsize is the largest integer value that can be represented on the system
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # consume iterable to the start position
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # consume iterable to the stop position
        for i, element in zip(range(i + 1, stop), iterable):
            pass

sliced = slice_iterable(facts, 1, 10, 2) # here, facts can be iterated before getting sliced, so the indexes is just relative to the rest of elements
print(list(sliced))
print(list(islice(factorials(20), 1, 10, 2)))

[1, 6, 120, 5040, 362880]
[1, 6, 120, 5040, 362880]


### Selecting and filtering

* `filter(function, iterable)`
  * Construct an *iterator* from those elements of iterable for which `function` is true. `iterable` may be either a sequence, a container which supports iteration, or an iterator.
  * If `function` is `None`, the identity function is assumed, that is, all elements of iterable that are false are removed.
  * It is equivalent to the generator expression `(item for item in iterable if function(item))` if `function` is not `None` and `(item for item in iterable if item)` if `function` is `None`.
* `itertools.filterfalse(predicate, iterable)`
  * Make an *iterator* that filters elements from `iterable` returning only those for which the `predicate` is false.
  * If `predicate` is `None`, return the items that are false.
  * Roughly equivalent to: `(item for item in iterable if not predicate(item))` if `predicate` is `None` and `(item for item in iterable if not item)` if `predicate` is `None`.
* `itertools.compress(data, selectors)`
  * Make an *iterator* that filters elements from `data` returning only those that have a corresponding element in `selectors` that evaluates to `True`.
  * Stops when either the `data` or `selectors` iterables has been exhausted.
  * Roughly equivalent to: `(d for d, s in zip(data, selectors) if s)`
* `itertools.takewhile(predicate, iterable)`
  * Make an *iterator* that returns elements from the `iterable` as long as the `predicate` is true.
* `itertools.dropwhile(predicate, iterable)`
  * Make an *iterator* that drops elements from the iterable as long as the `predicate` is true; afterwards, returns *every* element. 
  * The iterator does not produce any output until the `predicate` first becomes false, so it may have a lengthy start-up time.

In [14]:
from itertools import filterfalse, takewhile, dropwhile, compress

def is_odd(n):
    return n % 2 == 1

def generate_cubes(n):
    for i in range(n):
        yield i ** 3

filtered = filter(is_odd, generate_cubes(10))
print(list(filtered))
filtered_false = filterfalse(is_odd, generate_cubes(10))
print(list(filtered_false))

takewhiled = takewhile(lambda x: x < 100, generate_cubes(10))
print(list(takewhiled))
dropwhiled = dropwhile(lambda x: x < 100, generate_cubes(10))
print(list(dropwhiled))

compressed = compress(generate_cubes(10), (True, False, None, True, True, True, 0, 'True'))
print(list(compressed))

[1, 27, 125, 343, 729]
[0, 8, 64, 216, 512]
[0, 1, 8, 27, 64]
[125, 216, 343, 512, 729]
[0, 27, 64, 125, 343]


### Infinite iterators

* `itertools.count(start=0, step=1)`
  * Make an *iterator* that returns evenly spaced values starting with number `start`.
  * `start` and `stop` can be any numeric type.
  * Often used as an argument to `map()` to generate consecutive data points. Also, used with `zip()` to add sequence numbers.
  * When counting with floating point numbers, better accuracy can sometimes be achieved by substituting multiplicative code such as: `(start + step * i for i in count())`.
* `itertools.cycle(iterable)`
  * Make an *iterator* returning elements from the `iterable` and saving a copy of each. When the `iterable` is exhausted, return elements from the saved copy.
  * The `cycle` function allows us to loop over a finite iterable indefinitely.
  * If `iterable` itself is an iterator, it will become exhausted, but `cycle` will still produce infinite sequence.
* `itertools.repeat(object[, times])`
  * Make an *iterator* that returns `object` over and over again. Runs indefinitely unless the `times` argument is specified.
  * A common use for `repeat` is to supply a stream of constant values to `map` or `zip`: `list(map(pow, range(10), repeat(2)))`
  * The items yielded by `repeat` are the same object, i.e. they each reference the same object in memory.

In [24]:
from itertools import count, cycle, repeat, islice
from collections import namedtuple

generate_count = count(1, 0.5)
print(list(islice(generate_count, 5)))
# print(list(range(1, 4, 0.5))) # the arguments to range must be integers

# example of cycle function
Card = namedtuple('Card', 'rank suit')

def card_deck():
    ranks = tuple(str(n) for n in range(2, 11)) + tuple('JQKA')
    suits = tuple('♠♥♦♣')
    for suit in suits:
        for rank in ranks:
            yield Card(rank, suit)

hands = [list() for _ in range(4)] # [list()] * 4 and [[]] * 4 both won't work

# method 1, use hand_index
# hand_index = 0
# for card in card_deck():
#     hand_index = hand_index % 4
#     hands[hand_index].append(card)
#     hand_index += 1

# method 2, use cycle
hand_cycle = cycle(hands)
for card in card_deck():
    next(hand_cycle).append(card)

print(hands)

# example of repeat function

repeats = repeat([], 4)
repeats_list = list(repeats)
repeats_list[0].append(1) # will change every sublist inside, because they all reference to the same list object
print(repeats_list)

[1, 1.5, 2.0, 2.5, 3.0]
[[Card(rank='2', suit='♠'), Card(rank='6', suit='♠'), Card(rank='10', suit='♠'), Card(rank='A', suit='♠'), Card(rank='5', suit='♥'), Card(rank='9', suit='♥'), Card(rank='K', suit='♥'), Card(rank='4', suit='♦'), Card(rank='8', suit='♦'), Card(rank='Q', suit='♦'), Card(rank='3', suit='♣'), Card(rank='7', suit='♣'), Card(rank='J', suit='♣')], [Card(rank='3', suit='♠'), Card(rank='7', suit='♠'), Card(rank='J', suit='♠'), Card(rank='2', suit='♥'), Card(rank='6', suit='♥'), Card(rank='10', suit='♥'), Card(rank='A', suit='♥'), Card(rank='5', suit='♦'), Card(rank='9', suit='♦'), Card(rank='K', suit='♦'), Card(rank='4', suit='♣'), Card(rank='8', suit='♣'), Card(rank='Q', suit='♣')], [Card(rank='4', suit='♠'), Card(rank='8', suit='♠'), Card(rank='Q', suit='♠'), Card(rank='3', suit='♥'), Card(rank='7', suit='♥'), Card(rank='J', suit='♥'), Card(rank='2', suit='♦'), Card(rank='6', suit='♦'), Card(rank='10', suit='♦'), Card(rank='A', suit='♦'), Card(rank='5', suit='♣'), Card(

### Chaining and teeing

* `itertools.chain(*iterables)`
  * Make an *iterator* that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted.
  * Used for treating consecutive sequences as a single sequence.
  * What happens if we want to chain from iterables contained inside another iterable:
    * We can unpack those iterables from the outside iterable, but unpacking is eager not lazy.
    * We can use `itertools.chain.from_iterable(iterable)`.
      * Alternate constructor for `chain()`.
      * Gets chained inputs from a single iterable argument that is evaluated lazily.
* `itertools.tee(iterable, n=2)`
  * Return a tuple of `n` *independent* iterators from a single iterable.
  * The elements of the returned tuple are lazy iterators.
  * Once a `tee()` has been created, the original iterable should not be used anywhere else; otherwise, the iterable could get advanced without the tee objects being informed.
  * `tee` iterators are not threadsafe. A `RuntimeError` may be raised when simultaneously using iterators returned by the same `tee()` call, even if the original iterable is threadsafe.
  * In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use `list()` instead of `tee()`.

In [30]:
from itertools import chain, tee

iter1 = [1, 2, 3]
iter2 = (4, 5, 6)
iter3 = (i for i in range(7, 10))
iters = [iter1, iter2, iter3]
chained = chain(iter1, iter2, iter3)
print(f'{type(chained)=}')
print(list(chained))
chained_iter = chain.from_iterable(iters)
print(list(chained_iter)) # iter3 exhausted

iter4 = (i ** 2 for i in range(5))
teed = tee(iter4, 3)
print(f'{type(teed)=}')
print(f'{teed=}')
print(f'{type(teed[0])=}')
print(f'{(teed[0] is iter(teed[0]))=}')

type(chained)=<class 'itertools.chain'>
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6]
type(teed)=<class 'tuple'>
teed=(<itertools._tee object at 0x00000213479444C0>, <itertools._tee object at 0x0000021347946340>, <itertools._tee object at 0x0000021347947F40>)
type(teed[0])=<class 'itertools._tee'>
(teed[0] is iter(teed[0]))=True


### Mapping and accumulating

* Mapping is applying a callable to each element of an iterable.
  * `map(function, iterable, *iterables)`
    * Return an *iterator* that applies `function` to every item of `iterable`, yielding the results.
    * If additional `iterables` arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel.
    * With multiple iterables, the iterator stops when the *shortest* iterable is exhausted.
    * For cases where the function inputs are already arranged into argument tuples, see `itertools.starmap()`.
  * `itertools.starmap(function, iterable)`
    * Make an `iterator` that computes the `function` using arguments obtained from the `iterable`.
    * Used instead of `map()` when argument parameters are already grouped in tuples from a single iterable.
* Accumulating functions are functions that take a collection of values and return a sequence of values that represent a running total or running calculation.
  * `itertools.accumulate(iterable[, func, *, initial=None])`
    * Make an *iterator* that returns accumulated sums, or accumulated results of other binary functions (specified via the optional `func` argument).
    * If `func` is supplied, it should be a function of two arguments. Elements of the input `iterable` may be any type that can be accepted as arguments to `func`.
    * Usually, the number of elements output matches the input iterable. However, if the keyword argument `initial` is provided, the accumulation leads off with the initial value so that the output has one more element than the input iterable.
    * The `func` argument can be set to `min()` for a running minimum, `max()` for a running maximum, or `operator.mul()` for a running product.

In [6]:
from itertools import starmap, accumulate
import operator

def add(x, y):
    return x + y

pairs = [(x, y) for x in range(5) for y in range(2)]
print(list(starmap(add, pairs)))
print(list(map(add, *zip(*pairs)))) # first transpose pairs with zip, then unpack the result

print(list(accumulate(range(1, 11))))
print(list(accumulate(range(1, 11), lambda x, y: x * y)))
print(list(accumulate(range(1, 11), operator.mul)))

[0, 1, 1, 2, 2, 3, 3, 4, 4, 5]
[0, 1, 1, 2, 2, 3, 3, 4, 4, 5]
[1, 3, 6, 10, 15, 21, 28, 36, 45, 55]
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]


### Zipping

* `zip(*iterables, strict=False)`
  * Iterate over several iterables in parallel, producing an *iterator* of *tuples* with an item from each one.
  * Another way to think of `zip()` is that it turns rows into columns, and columns into rows. This is similar to transposing a matrix.
  * By default, `zip()` stops when the *shortest* iterable is exhausted. It will ignore the remaining items in the longer iterables, cutting off the result to the length of the shortest iterable.
  * `zip()` is often used in cases where the iterables are assumed to be of equal length. In such cases, it’s recommended to use the `strict=True` option. Unlike the default behavior, it raises a `ValueError` if one iterable is exhausted before the others
  * Shorter iterables can be padded with a *constant* value to make all the iterables have the same length. This is done by `itertools.zip_longest()`.
  * Tips and tricks
    * The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using `zip(*[iter(s)]*n, strict=True)`. This repeats the same iterator `n` times so that each output tuple has the result of `n` calls to the iterator. This has the effect of dividing the input into n-length chunks. Similar to `itertools.batched(iterable, n)`.
    * `zip()` in conjunction with the `*` operator can be used to unzip a list, i.e. to restore the zipped lists contained in the list.
* `itertools.zip_longest(*iterables, fillvalue=None)`
  * Make an *iterator* that aggregates elements from each of the iterables.
  * If the iterables are of uneven length, missing values are filled-in with `fillvalue`. If not specified, `fillvalue` defaults to `None`.
  * Iteration continues until the longest iterable is exhausted.
  * f one of the iterables is potentially infinite, then the `zip_longest()` function should be wrapped with something that limits the number of calls (for example `islice()` or `takewhile()`).

In [16]:
from itertools import zip_longest

l1 = [1, 2, 3]
l2 = [4, 5, 6]
zipped = zip(l1, l2)
print(f'{list(zipped)=}')
# use zip to unzip the zipped, i.e. to restore lists zipped together
t1, t2 = zip(*zip(l1, l2))
print(f'{(l1 == list(t1))=}')
print(f'{(l2 == list(t2))=}')

l3 = [7, 8, 9, 10]
print(f'{list(zip_longest(l1, l2, l3, fillvalue='filling'))=}')

chunks = zip(*[iter(range(21))] * 3, strict=True) # raise ValueError when the length of iterable is not divisible by chunk size
# chunks = zip(*[iter(range(20))] * 3) # stops when the iterable exhausted silently
print(list(chunks))

list(zipped)=[(1, 4), (2, 5), (3, 6)]
(l1 == list(t1))=True
(l2 == list(t2))=True
list(zip_longest(l1, l2, l3, fillvalue='filling'))=[(1, 4, 7), (2, 5, 8), (3, 6, 9), ('filling', 'filling', 10)]
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14), (15, 16, 17), (18, 19, 20)]


### Grouping

* `itertools.groupby(iterable, key=None)`
  * Make an *iterator* that returns tuples of consecutive keys and groups from the iterable. The `key` is a function computing a key value for each element. If not specified or is `None`, `key` defaults to an *identity function* and returns the element unchanged. Generally, the iterable needs to already be *sorted* on the same key function.
  * The operation of `groupby()` is similar to the `uniq` filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function).
  * The returned group is itself an iterator that *shares* the underlying iterable with `groupby()`. Because the source is shared, when the `groupby()` object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.

In [17]:
from itertools import groupby

with open('./p2_example4/cars_2014.csv') as f:
    next(f) # skip header row
    make_groups = groupby(f, lambda row: row.split(',')[0])
    model_count_by_make = {make: sum(1 for _ in group) for make, group in make_groups} # len() doesn't work on itertools._grouper

print(model_count_by_make)

{'ACURA': 6, 'ALFA ROMEO': 2, 'APRILIA': 4, 'ARCTIC CAT': 96, 'ARGO': 4, 'ASTON MARTIN': 5, 'AUDI': 27, 'BENTLEY': 2, 'BLUE BIRD': 1, 'BMW': 86, 'BUGATTI': 1, 'BUICK': 5, 'CADILLAC': 7, 'CAN-AM': 61, 'CHEVROLET': 33, 'CHRYSLER': 2, 'DODGE': 7, 'DUCATI': 4, 'FERRARI': 6, 'FIAT': 2, 'FORD': 34, 'FREIGHTLINER': 7, 'GMC': 12, 'HARLEY DAVIDSON': 29, 'HINO': 7, 'HONDA': 91, 'HUSABERG': 4, 'HUSQVARNA': 9, 'HYUNDAI': 13, 'INDIAN': 3, 'INFINITI': 8, 'JAGUAR': 9, 'JEEP': 5, 'JOHN DEERE': 19, 'KAWASAKI': 59, 'KENWORTH': 11, 'KIA': 10, 'KTM': 13, 'KUBOTA': 4, 'KYMCO': 28, 'LAMBORGHINI': 2, 'LAND ROVER': 6, 'LEXUS': 14, 'LINCOLN': 6, 'LOTUS': 1, 'MACK': 9, 'MASERATI': 3, 'MAZDA': 5, 'MCLAREN': 2, 'MERCEDES-BENZ': 60, 'MINI': 3, 'MITSUBISHI': 8, 'NISSAN': 24, 'PEUGEOT': 3, 'POLARIS': 101, 'PORSCHE': 4, 'RAM': 6, 'RENAULT': 4, 'ROLLS ROYCE': 3, 'SCION': 5, 'SEAT': 3, 'SKI-DOO': 67, 'SMART': 1, 'SRT': 1, 'SUBARU': 10, 'SUZUKI': 48, 'TESLA': 2, 'TOYOTA': 19, 'TRIUMPH': 10, 'VESPA': 4, 'VICTORY': 14, 'V

### Combinatorics

* `itertools.product(*iterables, repeat=1)`
  * Cartesian product of input iterables.
  * The cartesian product of two or more sets is the set of all ordered pairs (n-tuples) of the sets.
  * Roughly equivalent to nested for-loops in a generator expression. For example, `product(A, B)` returns the same as `((x,y) for x in A for y in B)`.
  * To compute the product of an iterable with itself, specify the number of repetitions with the optional `repeat` keyword argument. For example, `product(A, repeat=4)` means the same as `product(A, A, A, A)`, but if `A` is an iterator, the latter won't work.
* `itertools.permutations(iterable, r=None)`
  * Return successive `r` length permutations of elements in the `iterable`.
  * If `r` is not specified or is `None`, then `r` defaults to the length of the `iterable` and all possible full-length permutations are generated.
  * The permutation tuples are emitted in lexicographic order according to the order of the input iterable. So, if the input iterable is sorted, the output tuples will be produced in sorted order.
  * Elements are treated as unique based on their *position*, not on their value. So if the input elements are unique, there will be no repeated values within a permutation.
  * The code for `permutations()` can be also expressed as a subsequence of `product()`, filtered to exclude entries with repeated elements (those from the same position in the input pool).
  * The number of items returned is n! / (n-r)! when 0 <= r <= n or 0 when r > n, the same result as `math.perm(n, r)`.
* `itertools.combinations(iterable, r)`
  * Return `r` length subsequences of elements from the input `iterable`.
  * The combination tuples are emitted in lexicographic ordering according to the order of the input iterable. So, if the input iterable is sorted, the output tuples will be produced in sorted order.
  * Elements are treated as unique based on their *position*, not on their value. So if the input elements are unique, there will be no repeated values in each combination.
  * The code for `combinations()` can be also expressed as a subsequence of `permutations()` after filtering entries where the elements are not in sorted order (according to their position in the input pool).
  * The number of items returned is n! / r! / (n-r)! when 0 <= r <= n or 0 when r > n, the same result as `math.comb(n, r)`.
* `itertools.combinations_with_replacement(iterable, r)`
  * Return `r` length subsequences of elements from the input `iterable` allowing individual elements to be repeated more than once.
  * The combination tuples are emitted in lexicographic ordering according to the order of the input iterable. So, if the input iterable is sorted, the output tuples will be produced in sorted order.
  * Elements are treated as unique based on their *position*, not on their value. So if the input elements are unique, the generated combinations will also be unique.
  * The code for `combinations_with_replacement()` can be also expressed as a subsequence of `product()` after filtering entries where the elements are not in sorted order (according to their position in the input pool).
  * The number of items returned is (n+r-1)! / r! / (n-1)! when n > 0.

In [29]:
from itertools import product, count, takewhile, combinations, combinations_with_replacement, permutations, accumulate, starmap
from operator import add

print(list(product(range(4), repeat=2)))

def grid(min_val, max_val, step, *, dimension=2):
    axis = takewhile(lambda val: val <= max_val, count(min_val, step))
    yield from product(axis, repeat=dimension)

print(list(grid(0, 2, 1, dimension=3)))

print(list(permutations('abc')))
print(list(combinations('abc', 2)))
print(list(combinations_with_replacement('abc', 2)))

SUITS = tuple('♠♥♦♣')
RANKS = tuple(map(str, range(2, 11))) + tuple('JQKA')

deck = starmap(add, product(RANKS, SUITS))
print(list(deck))

[(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3), (3, 0), (3, 1), (3, 2), (3, 3)]
[(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 0), (0, 1, 1), (0, 1, 2), (0, 2, 0), (0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), (1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 0), (2, 0, 1), (2, 0, 2), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2)]
[('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]
[('a', 'b'), ('a', 'c'), ('b', 'c')]
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]
['2♠', '2♥', '2♦', '2♣', '3♠', '3♥', '3♦', '3♣', '4♠', '4♥', '4♦', '4♣', '5♠', '5♥', '5♦', '5♣', '6♠', '6♥', '6♦', '6♣', '7♠', '7♥', '7♦', '7♣', '8♠', '8♥', '8♦', '8♣', '9♠', '9♥', '9♦', '9♣', '10♠', '10♥', '10♦', '10♣', 'J♠', 'J♥', 'J♦', 'J♣', 'Q♠', 'Q♥', 'Q♦', 'Q♣', 'K♠', 'K♥', 'K♦', 'K♣', 'A♠', 'A♥', 'A♦', 'A♣']


In [36]:
from itertools import product, starmap, combinations
from operator import add
from collections import namedtuple
from fractions import Fraction

SUITS = tuple('♠♥♦♣')
RANKS = tuple(map(str, range(2, 11))) + tuple('JQKA')
Card = namedtuple('Card', 'rank suit')

# deck = starmap(add, product(RANKS, SUITS))
deck = starmap(Card, product(RANKS, SUITS))
# print(list(deck))

sample_space = combinations(deck, 4)
combination_count = 0
all_aces_count = 0

for combination in sample_space:
    combination_count += 1
    # for card in combination:
    #     if card.rank != 'A':
    #         break
    # else: # no break, i.e. all aces
    #     all_aces_count += 1
    if all(map(lambda card: card.rank == 'A', combination)):
        all_aces_count += 1

print(f'{combination_count = }, {all_aces_count = }')
print(f'odds = {Fraction(all_aces_count, combination_count)} = {all_aces_count / combination_count:.8f}')

combination_count = 270725, all_aces_count = 1
odds = 1/270725 = 0.00000369


## Project 4

Data files

* All four data files contain a common key `SSN` that uniquely identifies each row.
* You are guaranteed that every SSN number
  * appears only once in every file
  * is present in all files
  * the order of SSN in each file is the same

Goal 1

* Create lazy iterators for each of the four files:
  * returns named tuples
  * data types are appropriate (string, date, int, etc)
  * the four iterators are independent of each other
* You will want to make use of the standard library module `csv` for this.
  * CSV files are files that contain multiple lines of data.
  * The individual data fields in a row are:
  * delimited by some separating character (e.g. comma, tab)
  * individual fields may be wrapped in other delimiters (e.g. quotes)
    * this allows the field value to contain what may be otherwise interpreted as a delimiter
  * `csv.reader(csvfile, dialect='excel', **fmtparams)`
    * Return a `reader` object that will process lines from the given csvfile. A csvfile must be an *iterable* of strings, each in the reader’s defined csv format. A csvfile is most commonly a file-like object or list. If csvfile is a file object, it should be opened with `newline=''`. An optional `dialect` parameter can be given which is used to define a set of parameters specific to a particular CSV dialect.
    * Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed unless the `QUOTE_NONNUMERIC` format option is specified (in which case unquoted fields are transformed into floats).

Goal 2

* Create a single iterable that combines all the data from all four files.
  * try reuse the iterators created in Goal 1
  * one named tuple should contains data from all four files per SSN

Goal 3

* Modify you iterator from Goal 2 to filter out stale records.
  * A record is considered stale if the last update date < 3/1/2017

Goal 4

* For non-stale records, generate lists of numbers of car makes by gender.

In [3]:
from csv import reader
from collections import namedtuple
from datetime import datetime, UTC
from functools import partial, reduce
from itertools import cycle, chain, groupby
# from itertools import islice, pairwise, starmap
from operator import add, eq
from pprint import pprint

def parse_str(string, *, default=None):
    if isinstance(string, str):
        if string:
            return string
        else:
            return default
    else:
        raise TypeError(f'string object expected, not {type(string).__name__}')
    
def parse_int(string, *, default=None):
    try:
        if isinstance(string, str):
            if string:
                return int(string)
            else:
                return default
        else:
            raise TypeError(f'string object expected, not {type(string).__name__}')
    except ValueError:
        return 'unparsable data'
    
def parse_datetime(string, format, *, default=None):
    try:
        if isinstance(string, str):
            if string:
                return datetime.strptime(string, format)
            else:
                return default
        else:
            raise TypeError(f'string object expected, not {type(string).__name__}')
    except ValueError:
        return 'unparsable data'

def csv_generator(filename, tuplename, *, delimiter=',', quotechar='"', parsers=[partial(parse_str, default='')]):
    with open(filename, newline='') as f:
        rows = reader(f, delimiter=delimiter, quotechar=quotechar)
        headers = next(rows)
        named = namedtuple(tuplename, headers)
        for row in rows:
            # if the elements in parsers are all the same parser, then parses can contain only one parser
            yield named(*(parser(field) for parser, field in zip(cycle(parsers), row)))

def is_compatible(namedtuples, keyname):
    # return all(starmap(eq, pairwise(map(lambda x: getattr(x, keyname), namedtuples))))
    # return all(getattr(nt, keyname) == getattr(namedtuples[0], keyname) for nt in namedtuples)
    return len(set(map(lambda x: getattr(x, keyname), namedtuples))) == 1

def combine_iterators(*iterators, keyname):
    initial_iteration = True
    zipped = zip(*iterators, strict=True)
    for namedts in zipped:
        if is_compatible(namedts, keyname): # check whether the value of SSN matches
            # combined_kvs = set(chain(*(namedt._asdict().items() for namedt in namedts))) # beware of set, the sequence may vary
            # combined_dict = dict(chain(*(namedt._asdict().items() for namedt in namedts))) # the order of dict is guaranteed, and duplicate elements are removed automatically same as set
            combined_dict = dict(chain.from_iterable((namedt._asdict().items() for namedt in namedts)))
            if initial_iteration:
                # Combined = namedtuple('Combined', (combined_kv[0] for combined_kv in combined_kvs))
                Combined = namedtuple('Combined', combined_dict)
                initial_iteration = False
                # yield Combined(**dict(combined_kvs))
            yield Combined(**combined_dict)
        else:
            raise ValueError(f'{keyname!r} between each iterable does not match, try sorting/cleaning data first')


employment_gen = csv_generator('./p2_project4/employment.csv', 'Employment', parsers=[parse_str])
personal_info_gen = csv_generator('./p2_project4/personal_info.csv', 'PersonalInfo', parsers=[parse_str])
update_status_gen = csv_generator('./p2_project4/update_status.csv',
                                  'UpdateStatus',
                                  parsers=[parse_str,
                                           partial(parse_datetime, format='%Y-%m-%dT%H:%M:%S%z'),
                                           partial(parse_datetime, format='%Y-%m-%dT%H:%M:%S%z')])
vehicles_gen = csv_generator('./p2_project4/vehicles.csv', 'Vehicle', parsers=[parse_str,
                                                                   parse_str,
                                                                   parse_str,
                                                                   parse_int])

# pprint(list(islice(employment_gen, 2)))
# pprint(list(islice(personal_info_gen, 2)))
# pprint(list(islice(update_status_gen, 2)))
# pprint(list(islice(vehicles_gen, 2)))

combined = combine_iterators(personal_info_gen, employment_gen, vehicles_gen, update_status_gen, keyname='ssn')
# pprint(list(islice(combined, 2)))

filtered_stales = filter(lambda record: record.last_updated < datetime(2017, 3, 1, tzinfo=UTC), combined)
# pprint(list(islice(filtered_stales, 2)))
# print(len(list(filtered_stales))) # 129

filtered_non_stales = filter(lambda record: record.last_updated >= datetime(2017, 3, 1, tzinfo=UTC), combined)
# print(len(list(filtered_non_stales))) # 871
non_stale_sorted = sorted(filtered_non_stales, key=lambda record: (record.gender, record.vehicle_make)) # sort by gender and vehicle_make at the same time
# pprint(non_stale_sorted)
vehicle_make_count_by_gender = {}
# for gender, gender_group in groupby(non_stale_sorted, lambda record: record.gender):
#     vehicle_make_count_by_gender[gender] = {vehicle_make: sum(1 for _ in record) for vehicle_make, record in groupby(gender_group, lambda record: record.vehicle_make)}

for k, g in groupby(non_stale_sorted, lambda record: (record.gender, record.vehicle_make)): # iterable can be grouped by complex key like sorted function
    vehicle_make_count_by_gender.setdefault(k[0], {})[k[1]] = sum(1 for _ in g)

# using loop to count is much simpler, easy to implement and no need to sort
# for record in filtered_non_stales:
#     vehicle_make_count_by_gender.setdefault(record.gender, {})
#     vehicle_make_count_by_gender[record.gender][record.vehicle_make] = vehicle_make_count_by_gender[record.gender].setdefault(record.vehicle_make, 0) + 1

pprint(vehicle_make_count_by_gender)

{'Female': {'Acura': 9,
            'Aston Martin': 2,
            'Audi': 13,
            'Austin': 1,
            'BMW': 12,
            'Bentley': 4,
            'Bugatti': 1,
            'Buick': 11,
            'Cadillac': 6,
            'Chevrolet': 42,
            'Chrysler': 6,
            'Dodge': 17,
            'Eagle': 1,
            'Ford': 42,
            'GMC': 22,
            'Geo': 1,
            'Honda': 8,
            'Hyundai': 4,
            'Infiniti': 9,
            'Isuzu': 3,
            'Jaguar': 3,
            'Jeep': 5,
            'Kia': 9,
            'Lamborghini': 2,
            'Land Rover': 8,
            'Lexus': 15,
            'Lincoln': 4,
            'Lotus': 5,
            'Mazda': 13,
            'Mercedes-Benz': 17,
            'Mercury': 5,
            'Mitsubishi': 22,
            'Morgan': 1,
            'Nissan': 12,
            'Oldsmobile': 8,
            'Panoz': 1,
            'Plymouth': 3,
            'Pontiac': 14,
            'Porsc

## Context managers

### Introduction

What is a context

* In Python, a context is the state surrounding a section of code.
* When a function runs, for example, it has a context in which it runs (e.g. global scope, local scope, etc).

Managing the context of a block of code

* Managing the context of a block of code helps us to handle resources efficiently and avoid errors or leaks. 
* A context manager is an object that defines the runtime context to be established when executing a `with` statement.
* The context manager handles the entry into, and the exit from, the desired *runtime context* for the execution of the block of code.
* Context managers are normally invoked using the `with` statement, but can also be used by directly invoking their methods.
* A context manager also ensures that the resources are released even if an exception occurs in the block of code.
* One of the benefits of using context managers is that they make the code more readable and maintainable.
  * By using the `with` statement, we can clearly indicate which block of code is associated with a certain resource, and avoid nested `try-except-finally` clauses.
* Another benefit of using context managers is that they allow us to create our own custom logic for managing resources.
  * We can define our own context managers by using classes or functions with the `@contextmanager` decorator, and implement the `__enter__()` and `__exit__()` methods (or `__aenter__()` and `__aexit__()` for asynchronous context managers). This way, we can create reusable and flexible context managers for different scenarios.
* Use cases
  * Typical uses of context managers include saving and restoring various kinds of global state, locking and unlocking resources, closing opened files, etc.
    * open / close (e.g. file context managers)
    * lock / release
    * change / reset (e.g. Decimal contexts)
    * start / stop
    * enter / exit

### Context managers

`try...except...finnaly`

* The `finally` section always executes, even if an exception occurs in `except` block.
* Works even if inside a function and a `return` is in the `try` or `except` blocks.
* Very useful for writing code that should execute no matter what happens.

The context management protocol

* Classes implement the context management protocol by implementing two methods:
  * `object.__enter__(self)`
    * Enter the runtime context and can optionally return either this object or another object related to the runtime context. The `with` statement will bind this method’s return value to the target(s) specified in the `as` clause of the statement, if any.
    * An example of a context manager that returns itself is a file object. File objects return themselves from `__enter__()` to allow `open()` to be used as the context expression in a `with` statement.
    * An example of a context manager that returns a related object is the one returned by `decimal.localcontext()`. These managers set the active decimal context to a copy of the original decimal context and then return the copy. This allows changes to be made to the current decimal context in the body of the `with` statement without affecting code outside the `with` statement.
  * `object.__exit__(self, exc_type, exc_value, traceback)`
    * Exit the runtime context related to this object. The parameters describe the exception that caused the context to be exited. If the context was exited without an exception, all three arguments will be `None`.
    * If an exception is supplied, and the method wishes to suppress the exception (i.e., prevent it from being propagated), it should return a true value. Otherwise, the exception will be processed normally upon exit from this method.
    * Note that `__exit__()` methods should not reraise the passed-in exception; this is the caller’s responsibility.

How context protocol works

* `with MyClass() as obj:`
  * creates an instance of `MyClass`, no associated symbol with the instance
  * calls `__enter__()` on the instance
  * return value from `__enter__` is associated to `obj`
* After the `with` block, or if an exception occurs inside the `with` block
  * calls `__exit__()` on the instance

Scope of `with` block

* The `with` block does not has its own scope, unlike a function or a comprehension, 
* The scope of anything in the `with` block (including the object returned from `__enter__`) is in the same scope as the `with` statement.
  * `with open(filename) as f:`: `f` is a symbol in the same scope as the `with` statement (e.g. the global scope)
  * In the `with` block `row = next(f)`: `row` is also in the same scope as the `with` statement.
  * After the `with` block, `f` and `row` are still there, but `f` is closed, and `row` has a value.

In [2]:
def my_func(dividend, divisor):
    try:
        print(f'{dividend} / {divisor}:')
        dividend / divisor
    except ZeroDivisionError:
        print('Zero division error occurred')
    finally:
        print('finally ran')

my_func(10, 0)
my_func(10, 1)

10 / 0:
Zero division error occurred
finally ran
10 / 1:
finally ran


In [5]:
with open('test.py') as file:
    print('inside with: file closed?', file.closed)
print('outside with: file closed?', file.closed) # file is still available outside of with

def test():
    with open('test.py') as file:
        print('inside with: file closed?', file.closed)
        return file

file = test()
print('outside with: file closed?', file.closed)

file closed? False
file closed? True
file closed? False
file closed? True


In [12]:
class MyContext:
    def __init__(self):
        print('creating context manager...')
        self.obj = None

    def __enter__(self):
        print('entering context...')
        self.obj = 'the return object'
        return self.obj
    
    def __exit__(self, exc_type, exc_value, traceback):
        print('exiting context...')
        if exc_type:
            print(f'error occurred: {exc_type.__name__}: {exc_value}')
            print(traceback)
        # return False # to propagate the exception
        return True # to suppress the exception

with MyContext() as obj:
    print('    inside with block')
    1 / 0

print(obj) # obj is still available after with statement

creating context manager...
entering context...
    inside with block
exiting context...
error occurred: ZeroDivisionError: division by zero
<traceback object at 0x0000019133F2CB40>
the return object


In [15]:
class Resource:
    def __init__(self, name):
        self.name = name
        self.state = None

    def __repr__(self):
        return f'Resource({self.name})'

class ResourceManager:
    def __init__(self, name):
        self.name = name
        self.resource = None

    def __enter__(self):
        print('entering context...')
        self.resource = Resource(self.name)
        self.resource.state = 'created'
        return self.resource
    
    def __exit__(self, exc_type, exc_value, traceback):
        print('exiting context...')
        self.resource.state = 'destroyed'
        if exc_type:
            print(f'error occurred: {exc_type.__name__}: {exc_value}')
        return True
    
with ResourceManager('foo') as resource:
    print('    inside with block...')
    print(f'    {resource}: {resource.state}')
    1 / 0

print('\noutside with block...')
print(f'{resource}: {resource.state}')

entering context...
    inside with block...
    Resource(foo): created
exiting context...
error occurred: ZeroDivisionError: division by zero

outside with block...
Resource(foo): destroyed


### Ceveat with lazy iterators

In [18]:
import csv

def read_data():
    with open('./p2_project3/nyc_parking_tickets_extract.csv', newline='') as f:
        return csv.reader(f, delimiter=',', quotechar='"') # ValueError: I/O operation on closed file.
        # yield from csv.reader(f, delimiter=',', quotechar='"') # should use yield from or convert the iterator to an iterable
    
reader = read_data()
for row in reader:
    print(row) # I/O error

ValueError: I/O operation on closed file.

### Not just a context manager

In [31]:
class DataIterator: # the instance is an iterator and a context manager
    def __init__(self, filename):
        self._file = open(filename)

    def __iter__(self):
        return self
    
    def __next__(self):
        return next(self._file)
    
    def close(self):
        self._file.close()
    
    def __enter__(self):
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        if not self._file.closed:
            self.close()
        return False
    
with DataIterator('./p2_project3/nyc_parking_tickets_extract.csv') as f:
    print(next(f), end='')
print(f'{f._file.closed=}')

data = DataIterator('./p2_project3/nyc_parking_tickets_extract.csv')
print(next(data), end='')
data.close()
print(f'{data._file.closed=}')  


Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Violation Description
f._file.closed=True
Summons Number,Plate ID,Registration State,Plate Type,Issue Date,Violation Code,Vehicle Body Type,Vehicle Make,Violation Description
data._file.closed=True


### Additional uses

Pattern: open / close

* Open file --> operate on file --> close file
* Open socket --> operate on socket --> close socket

Pattern: start / stop

* Start database transaction --> perform operations --> commit or rollback transaction
* Start timer --> perform operations --> stop timer

Pattern: lock / release

* Acquire thread lock --> perform operations --> release thread lock

Pattern: change / reset

* Change Decimal context precision --> perform operations --> reset Decimal context precision back to original value
* Redirect stout to a file --> perform operations --> reset stdout to original value

Pattern: wacky stuff

* Wrap text in some tags (e.g. html tags)
* Format text automatically in some way

In [21]:
# pattern: change / reset
import decimal

print(decimal.getcontext())
with decimal.localcontext() as ctx: # decimal's built-in context manager
    ctx.prec = 4
    print(decimal.Decimal(1) / decimal.Decimal(3))
print(decimal.Decimal(1) / decimal.Decimal(3))

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])
0.3333
0.3333333333333333333333333333


In [25]:
# pattern: change / reset
import sys

class OutToFile:
    def __init__(self, filename):
        self._filename = filename
        self._original_stdout = sys.stdout

    def __enter__(self):
        self._file = open(self._filename, 'w')
        sys.stdout = self._file
    
    def __exit__(self, exc_type, exc_value, traceback):
        sys.stdout = self._original_stdout
        if not self._file.closed:
            self._file.close()
        return False

with OutToFile('./p2_example5/test.txt'):
    print('Print something to the file')
    print('Print something more to the file')

print('Print something here')

Print something here


In [24]:
# pattern: start / stop
from time import perf_counter, sleep

class Timer:
    def __init__(self):
        self.elapsed = 0

    def __enter__(self):
        self.start = perf_counter()
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        self.elapsed = perf_counter() - self.start
        return False
    
with Timer() as timer:
    sleep(1)

print(f'{timer.elapsed=}')

timer.elapsed=1.0005524999578483


In [29]:
# pattern: wrap text with tags
class Tag:
    def __init__(self, tag):
        self._tag = tag

    def __enter__(self):
        print(f'<{self._tag}>')
    
    def __exit__(self, exc_type, exc_value, traceback):
        print(f'</{self._tag}>')
        return False
    
with Tag('p'):
    print('Hello, I\'m inside the tag')
    with Tag('code'):
        print('Some code here')

print('I\'m out of tag now')

<p>
Hello, I'm inside the tag
<code>
Some code here
</code>
</p>
I'm out of tag now


In [30]:
# pattern: format text in some way
class MakeList:
    def __init__(self, title, tabsize=2, bullet='-'):
        self._title = title
        self._tabsize = tabsize
        self._bullet = bullet
        self._tab_multiple = 0

    def __enter__(self):
        if not self._tab_multiple:
            print(self._title)
        self._tab_multiple += 1
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        self._tab_multiple -= 1
        return False

    def print(self, *values):
        print(f'{' ' * self._tabsize * self._tab_multiple}- ', end='')
        print(*values)

with MakeList('Shopping List') as sl:
    sl.print('Fruit & Veg')
    with sl as fv_sl:
        fv_sl.print('Apples')
        fv_sl.print('Peaches')
        fv_sl.print('Tomatoes')
        fv_sl.print('Onions')
    sl.print('Dairy & Eggs')
    with sl as de_sl:
        de_sl.print('Cheese')
        de_sl.print('Yogurt')
        de_sl.print('Eggs')
    sl.print('Personal Care')
    with sl as pc_sl:
        pc_sl.print('Toothpaste')
        pc_sl.print('Toothbrushes')

Shopping List
  - Fruit & Veg
    - Apples
    - Peaches
    - Tomatoes
    - Onions
  - Dairy & Eggs
    - Cheese
    - Yogurt
    - Eggs
  - Personal Care
    - Toothpaste
    - Toothbrushes


### Generators and context managers

* Mimic context manager pattern using a generator
    ```Python
    def gen(*args):
        # do set up work here

        try:
            yield object
        finally:
            # clean up object here

    ctx = gen(*args)
    obj = next(ctx)
    try:
        # do work with obj
    finally:
        try:
            next(ctx)
        except StopIteration:
            pass
    ```
* Creating a context manager from a generator function
    ```Python
    class GenContext:
        def __init__(self, gen):
            self._gen = gen
        
        def __enter__(self):
            obj = next(self._gen) # get the object yielded from gen
            return obj

        def __exit__(self, exc_type, exc_value, traceback):
            next(self._gen) # the StopIteration exception may occur here need to be caught from outside
            return False

    def open_file(filename, mode): # this function only yield once, so when next() is called on it for the second time, finally clause will be triggered
        f = open(filename, mode)
        try:
            yield f # yield f not yield from f
        finally:
            f.close()

    gen = open_file('test.txt', 'w')
    with GenContext(gen) as f:
        # do work
    ```

In [33]:
def my_gen():
    try:
        yield (1, 2, 3, 4, 5)
    finally:
        print('cleaning up here')

class GenCtxMng:
    def __init__(self, gen):
        self._gen = gen

    def __enter__(self):
        return next(self._gen)
    
    def __exit__(self, exc_type, exc_value, traceback):
        try:
            next(self._gen)
        except StopIteration:
            pass
        return False
    
with GenCtxMng(my_gen()) as obj:
    print(obj)

(1, 2, 3, 4, 5)
cleaning up here


### The `contextmanager` decorator

Using a decorator to encapsulate generator function and context manager

```Python
def contextmanager_decorator(gen_fn):
    def inner(*args, **kwargs):
        gen = gen_fn(*args, **kwargs)
        return GenCtxMng(gen)
    return inner

@contextmanager_decorator # same as open_file = contextmanager_decorator(open_file)
def open_file(filename):
    f = open(filename)
    try:
        yield f
    finally:
        f.close()

with open_file(filename) as f:
    # do work
```

The `contextlib` module

* One of the goals when context manager were introduced to Python was to ensure generator functions could be used to easily create them.
* `@contextlib.contextmanager`
  * This function is a decorator that can be used to define a factory function for `with` statement context managers, without needing to create a class or separate `__enter__()` and `__exit__()` methods.
  * The function being decorated must return a *generator-iterator* when called. This iterator must yield exactly *one* value, which will be bound to the targets in the `with` statement's as clause, if any.
  * At the point where the generator yields, the block nested in the `with` statement is executed. The generator is then resumed after the block is exited. If an unhandled exception occurs in the block, it is reraised inside the generator at the point where the yield occurred. Thus, you can use a `try…except…finally` statement to trap the error (if any), or ensure that some cleanup takes place. If an exception is trapped merely in order to log it or to perform some action (rather than to suppress it entirely), the generator must reraise that exception. Otherwise the generator context manager will indicate to the `with` statement that the exception has been handled, and execution will resume with the statement immediately following the `with` statement.
  * `contextmanager()` uses `ContextDecorator` so the context managers it creates can be used as decorators as well as in `with` statements. When used as a decorator, a new generator instance is implicitly created on each function call (this allows the otherwise "one-shot" context managers created by `contextmanager()` to meet the requirement that context managers support multiple invocations in order to be used as decorators).

In [35]:
from contextlib import contextmanager
from time import perf_counter, sleep

@contextmanager
def timer():
    stats = {}
    stats['start'] = perf_counter()
    try:
        yield stats
    finally:
        stats['end'] = perf_counter()
        stats['elapsed'] = stats['end'] - stats['start']

with timer() as stats:
    sleep(1)
print(stats)

{'start': 361928.5243596, 'end': 361929.5246101, 'elapsed': 1.0002505000447854}


In [36]:
from contextlib import contextmanager
import sys

@contextmanager
def out_to_file(filename):
    original_stdout = sys.stdout
    file = open(filename, 'w')
    sys.stdout = file
    try:
        yield None
    finally:
        sys.stdout = original_stdout

with out_to_file('./p2_example5/test.txt'):
    print('Print something to file from with block')
print('Print from the outside of with block')

with open('./p2_example5/test.txt') as f:
    print(f.readlines())

Print from the outside of with block
['Print something to file from with block\n']


In [37]:
from contextlib import contextmanager, redirect_stdout

with open('./p2_example5/test.txt', 'w') as f:
    with redirect_stdout(f):
        print('Print from with block, using redirect_stdout')
print('Print from outside')

with open('./p2_example5/test.txt') as f:
    print(f.readlines())

Print from outside
['Print from with block, using redirect_stdout\n']


### Nested context managers

* `contextlib.ExitStack`
  * A context manager that is designed to make it easy to programmatically combine other context managers and cleanup functions, especially those that are optional or otherwise driven by input data.
  * The `__enter__()` method returns the `ExitStack` instance, and performs no additional operations.
  * Each instance maintains a stack of registered callbacks that are called in reverse order when the instance is closed (either explicitly or implicitly at the end of a with statement).
  * Since registered callbacks are invoked in the reverse order of registration, this ends up behaving as if multiple nested with statements had been used with the registered set of callbacks. This even extends to exception handling - if an inner callback suppresses or replaces an exception, then outer callbacks will be passed arguments based on that updated state.

In [38]:
from contextlib import contextmanager

@contextmanager
def open_file(filename):
    file = open(filename)
    try:
        yield file
    finally:
        file.close()

class NestedContextManager:
    def __init__(self):
        self._exits = []

    def __enter__(self):
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        for exit in self._exits[::-1]:
            exit(exc_type, exc_value, traceback)
        return False
    
    def enter_context(self, context):
        self._exits.append(context.__exit__)
        return context.__enter__()
    
filenames = './p2_example6/file1.txt', './p2_example6/file2.txt', './p2_example6/file3.txt'

with NestedContextManager() as stack:
    files = (stack.enter_context(open_file(filename)) for filename in filenames)
    for row in zip(*files):
        print(row)

('file1_line1\n', 'file2_line1\n', 'file3_line1\n')
('file1_line2\n', 'file2_line2\n', 'file3_line2\n')
('file1_line3', 'file2_line3', 'file3_line3')


In [39]:
from contextlib import ExitStack

filenames = './p2_example6/file1.txt', './p2_example6/file2.txt', './p2_example6/file3.txt'

with ExitStack() as stack:
    files = [stack.enter_context(open(filename)) for filename in filenames]
    for row in zip(*files):
        print(row)

('file1_line1\n', 'file2_line1\n', 'file3_line1\n')
('file1_line2\n', 'file2_line2\n', 'file3_line2\n')
('file1_line3', 'file2_line3', 'file3_line3')


## Project 5

Project setup

* In this project you are provided two CSV files: `cars.csv` and `personal_info.csv` (first row contains the field names).
* The basic goal will be to create a context manager that only requires the file name and returns an iterator for the data in those files.
* The iterator should yield named tuples with field names based on the header row in the CSV file.

Goal 1

* Implement the context manager using a context manager class, i.e. a class that implements the context manager protocol: `__enter__` and `__exit__`.
* Make sure your iterator use lazy evaluation.
* Try to create a single class that implements both the context manager protocol and the iterator protocol.

Goal 2

* Reimplement what you did in Goal 1, but using a generator function instead.
* You'll have to use `contextlib.@contextmanager`.

Information you may find useful

* CSV files can be read using `csv.readder`, but CSV files can be written in different styles / dialects, i.e the delimiters and quoting characters may vary.
* The `csv` module has a `Sniffer` class we can use to auto-determine the specific dialect by providing it a sample of the csv file.

In [2]:
import csv
from collections import namedtuple
from itertools import islice

class CsvParser:
    def __init__(self, filename, tuplename='Data'):
        self._filename = filename
        self._tuplename = tuplename

    def __enter__(self):
        self._file = open(self._filename, newline='')
        dialect = csv.Sniffer().sniff(self._file.readline())
        self._file.seek(0)
        self._csv_reader = csv.reader(self._file, dialect)
        headers = map(lambda x: x.replace(' ', '_').lower(), next(self._csv_reader))
        self._named_tuple = namedtuple(self._tuplename, headers)
        return self
    
    def __exit__(self, exc_type, exc_value, traceback):
        self._file.close()
        return False
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._file.closed: # if iterator accessed from outside of with block, raise StopIteration instead of ValueError: I/O operation on closed file
            raise StopIteration
        else:
            return self._named_tuple(*next(self._csv_reader))

with CsvParser('./p2_project5/personal_info.csv', 'PersonalInfo') as personals:
    for data in islice(personals, 5):
        print(data)

PersonalInfo(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic')
PersonalInfo(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')
PersonalInfo(ssn='101-84-0356', first_name='Nomi', last_name='Lipprose', gender='Female', language='Yiddish')
PersonalInfo(ssn='104-22-0928', first_name='Justinian', last_name='Kunzelmann', gender='Male', language='Dhivehi')
PersonalInfo(ssn='104-84-7144', first_name='Claudianus', last_name='Brixey', gender='Male', language='Afrikaans')


In [3]:
import csv
from collections import namedtuple
from contextlib import contextmanager
from itertools import islice

@contextmanager
def csv_generator(filename, tuplename='Data'):
    file = open(filename, newline='')
    dialect = csv.Sniffer().sniff(file.readline())
    file.seek(0)
    csv_reader = csv.reader(file, dialect)
    headers = map(lambda x: x.replace(' ', '_').lower(), next(csv_reader))
    named_tuple = namedtuple(tuplename, headers)
    try:
        yield (named_tuple(*data) for data in csv_reader)
    finally:
        file.close()

with csv_generator('./p2_project5/cars.csv') as cars:
    for data in islice(cars, 5):
        print(data)

Data(car='Chevrolet Chevelle Malibu', mpg='18.0', cylinders='8', displacement='307.0', horsepower='130.0', weight='3504.', acceleration='12.0', model='70', origin='US')
Data(car='Buick Skylark 320', mpg='15.0', cylinders='8', displacement='350.0', horsepower='165.0', weight='3693.', acceleration='11.5', model='70', origin='US')
Data(car='Plymouth Satellite', mpg='18.0', cylinders='8', displacement='318.0', horsepower='150.0', weight='3436.', acceleration='11.0', model='70', origin='US')
Data(car='AMC Rebel SST', mpg='16.0', cylinders='8', displacement='304.0', horsepower='150.0', weight='3433.', acceleration='12.0', model='70', origin='US')
Data(car='Ford Torino', mpg='17.0', cylinders='8', displacement='302.0', horsepower='140.0', weight='3449.', acceleration='10.5', model='70', origin='US')


## Python updates

### Python 3.10

[PEP 634: Structural Pattern Matching](https://docs.python.org/3/whatsnew/3.10.html#pep-634-structural-pattern-matching)

* Structural pattern matching has been added in the form of a `match` statement and `case` statements of patterns with associated actions.
* Patterns consist of sequences, mappings, primitive data types as well as class instances.
* Pattern matching enables programs to extract information from complex data types, branch on the structure of data, and apply specific actions based on different forms of data.
* Syntax and operations
  * The generic syntax of pattern matching is:
    ```Python
    match subject:
        case <pattern_1>:
            <action_1>
        case <pattern_2>:
            <action_2>
        case <pattern_3>:
            <action_3>
        case _:
            <action_wildcard>
    ```
  * A `match` statement takes an expression and compares its value to successive patterns given as one or more case blocks. Specifically, pattern matching operates by:
    1. using data with type and shape (the `subject`)
    2. evaluating the `subject` in the `match` statement
    3. comparing the `subject` with each pattern in a `case` statement from top to bottom until a match is confirmed.
    4. executing the action associated with the pattern of the confirmed match
    5. If an exact match is not confirmed, the last case, a wildcard `_`, if provided, will be used as the matching case. If an exact match is not confirmed and a wildcard case does not exist, the entire match block is a no-op.
* While structural pattern matching can be used in its simplest form comparing a variable to a literal in a `case` statement, its true value for Python lies in its handling of the subject’s type and shape.
  * Simple pattern: match to a literal
    * A value, the subject, being matched to several literals, the patterns.
  * Patterns with a literal and variable
    * Patterns can look like unpacking assignments, and a pattern may be used to bind variables.
  * Patterns and classes
    * If you are using classes to structure your data, you can use as a pattern the class name followed by an argument list resembling a constructor. This pattern has the ability to capture class attributes into variables.
  * Patterns with positional parameters
    * You can use positional parameters with some builtin classes that provide an ordering for their attributes (e.g. dataclasses).
    * You can also define a specific position for attributes in patterns by setting the `__match_args__` special attribute in your classes.
  * Nested patterns
    * Patterns can be arbitrarily nested.
  * Complex patterns and the wildcard
    * To this point, the examples have used `_` alone in the last case statement. A wildcard can be used in more complex patterns, such as `('error', code, _)`.
  * Guard
    * We can add an `if` clause to a pattern, known as a "guard".
    * If the guard is false, match goes on to try the next case block.
    * Note that value capture happens before the guard is evaluated.
* Other Key Features
  * Like unpacking assignments, tuple and list patterns have exactly the *same* meaning and actually match arbitrary sequences. Technically, the subject must be a sequence. Therefore, an important exception is that patterns don't match iterators. Also, to prevent a common mistake, sequence patterns don't match strings.
  * Sequence patterns support wildcards: `[x, y, *rest]` and `(x, y, *rest)` work similar to wildcards in unpacking assignments. The name after `*` may also be `_`, so `(x, y, *_)` matches a sequence of at least two items without binding the remaining items.
  * Mapping patterns: `{"bandwidth": b, "latency": l}` captures the `"bandwidth"` and `"latency"` values from a dict. Unlike sequence patterns, extra keys are *ignored*. A wildcard `**rest` is also supported. (But `**_` would be redundant, so is not allowed.)
  * Subpatterns may be captured using the `as` keyword: `case (Point(x1, y1), Point(x2, y2) as p2): ...`. This binds `x1`, `y1`, `x2`, `y2` like you would expect without the `as` clause, and `p2` to the entire second item of the subject.
  * Most literals are compared by *equality*. However, the singletons `True`, `False` and `None` are compared by *identity*.
  * Named constants may be used in patterns. These named constants must be dotted names to prevent the constant from being interpreted as a capture variable.

PEP 618

* The `zip()` function now has an optional `strict` flag, used to require that all the iterables have an equal length.

In [1]:
def http_error(status):
    match status:
        case 400: # match to a literal
            return 'Bad request'
        case 404:
            return 'Not found'
        case 418:
            return 'I\'m a teapot'
        case 401 | 403: # combine several literals in a single pattern using |
            return 'Not allowed'
        case _: # optional wildcard
            return 'Something\'s wrong with the internet'
        
print(http_error(404))
print(http_error(500))

Not found
Something's wrong with the internet


In [6]:
from enum import Enum

class Symbol(Enum):
    LEFT = '←'
    RIGHT = '→'
    UP = '↑'
    DOWN = '↓'
    PICK = '↥'
    DROP = '↧'

symbols = {'left': '←', 'right': '→', 'up': '↑', 'down': '↓',
           'pick': '↥', 'drop': '↧'}

def operation(command):
    match command:
        case ('move', 'left'):
            # return symbols['left']
            return Symbol.LEFT.value
        case ('move', 'right'):
            # return symbols['right']
            return Symbol.RIGHT.value
        case ('move', 'up'):
            # return symbols['up']
            return Symbol.UP.value
        case ('move', 'down'):
            # return symbols['down']
            return Symbol.DOWN.value
        case ('act', 'pick'):
            # return symbols['pick']
            return Symbol.PICK.value
        case ('act', 'drop'):
            # return symbols['drop']
            return Symbol.DROP.value
        case _:
            return 'Something is wrong here'
        
print(operation(['move', 'left']))
print(operation(['move', 'drop']))
print(vars(Symbol.LEFT))

←
Something is wrong here
{'_value_': '←', '_name_': 'LEFT', '__objclass__': <enum 'Symbol'>, '_sort_order_': 0}


In [3]:
symbols = {'left': '←', 'right': '→', 'up': '↑', 'down': '↓',
           'pick': '↥', 'drop': '↧'}

def operation(command):
    match command:
        case ('move', ('left' | 'right' | 'up' | 'down') as direction):
            return symbols[direction]
        case ('act', ('pick' | 'drop') as action):
            return symbols[action]
        case _:
            return 'Something is wrong here'
        
print(operation(['move', 'left']))
print(operation(['act', 'drop']))
print(operation(['move', 'drop']))

←
↧
Something is wrong here


In [10]:
symbols = {'left': '←', 'right': '→', 'up': '↑', 'down': '↓',
           'pick': '↥', 'drop': '↧'}

def operation(command):
    match command:
        case ('move', *directions) if set(directions) <= {'left', 'right', 'up', 'down'}:
            return tuple([symbols[direction] for direction in directions])
        case ('act', *actions) if set(actions) <= {'pick', 'drop'}:
            return tuple(symbols[action] for action in actions)
        case _:
            raise ValueError(f'{command} is not valid command')
        
[operation(('move', 'left', 'left', 'right', 'up', 'up', 'right', 'down')),
 operation(('act', 'pick', 'drop'))]

[('←', '←', '→', '↑', '↑', '→', '↓'), ('↥', '↧')]

### Python 3.9

PEP 615

* The IANA Time Zone Database is now present in the standard library in the `zoneinfo` module.

`math`

* Expanded the `math.gcd()` function to handle multiple arguments. Formerly, it only supported two arguments.
* Added `math.lcm()`: return the least common multiple of specified arguments.
* Added `math.nextafter()`: return the next floating-point value after x towards y.

PEP 584: dictionary merge & update operators

* Merge (`|`) and update (`|=`) operators have been added to the built-in `dict` class. Those complement the existing `dict.update` and `{**d1, **d2}` methods of merging dictionaries.

PEP 616: new string methods to remove prefixes and suffixes

* `str.removeprefix(prefix)` and `str.removesuffix(suffix)` have been added to easily remove an unneeded prefix or a suffix from a string. Corresponding `bytes`, `bytearray`, and `collections.UserString` methods have also been added.

In [29]:
import datetime
import zoneinfo

timezones = zoneinfo.available_timezones()
print(len(timezones)) # if 0, need to install tzdata module
print(timezones) # if set(), need to install tzdata module

current_time_naive = datetime.datetime.now()
current_time_utc = datetime.datetime.now(tz=datetime.UTC)
current_time_local = datetime.datetime.now().astimezone()
print(current_time_naive)
print(current_time_utc, current_time_utc.tzinfo)
print(current_time_local, current_time_local.tzinfo)

tz_shanghai = zoneinfo.ZoneInfo('Asia/Shanghai')
time_shanghai = current_time_utc.astimezone(tz_shanghai)
print(time_shanghai, time_shanghai.tzinfo)


597
{'Pacific/Bougainville', 'America/Swift_Current', 'America/Rankin_Inlet', 'Antarctica/Macquarie', 'Europe/Athens', 'Indian/Mayotte', 'Europe/Sarajevo', 'America/Panama', 'Europe/Volgograd', 'America/Blanc-Sablon', 'Pacific/Midway', 'America/Glace_Bay', 'Etc/Zulu', 'Canada/Newfoundland', 'America/Thunder_Bay', 'America/Recife', 'Australia/NSW', 'America/Godthab', 'Europe/Jersey', 'America/Rainy_River', 'Europe/Luxembourg', 'Etc/GMT+4', 'Canada/Central', 'Asia/Baku', 'WET', 'Europe/Istanbul', 'America/Sitka', 'Asia/Oral', 'America/Chicago', 'Europe/Nicosia', 'Indian/Chagos', 'Africa/Nairobi', 'Australia/West', 'GMT', 'Etc/GMT+5', 'Asia/Kuching', 'Cuba', 'America/Argentina/ComodRivadavia', 'Pacific/Norfolk', 'Japan', 'Africa/Lagos', 'Asia/Chita', 'America/Boa_Vista', 'Atlantic/Canary', 'Mexico/General', 'Pacific/Tarawa', 'Pacific/Guam', 'Egypt', 'Africa/Accra', 'Asia/Shanghai', 'Europe/Monaco', 'Europe/Vaduz', 'America/Noronha', 'Asia/Sakhalin', 'Africa/Asmera', 'Europe/Dublin', 'Paci

In [37]:
from collections import ChainMap

dict1 = {'a': 1, 'b': 2, 'c': 3}
dict2 = {'c': 7, 'd': 8, 'e': 9}

updated = dict1.copy()
updated.update(dict2)
print(f'{updated=}')
print(f'{{**dict1, **dict2}}={dict(dict1, **dict2)}')

# A ChainMap class is provided for quickly linking a number of mappings so they
# can be treated as a single unit.
# It is often much faster than creating a new dictionary and running multiple
# update() calls.
# A ChainMap groups multiple dicts or other mappings together to create a
# single, updateable view.
# The underlying mappings are stored in a list.
# Lookups search the underlying mappings successively until a key is found.
# In contrast, writes, updates, and deletions only operate on the first mapping.
chained = ChainMap(dict1, dict2)
print(f'{chained=}')
print(f'{chained['c']=}')

print(f'{(dict1 | dict2)=}')

updated={'a': 1, 'b': 2, 'c': 7, 'd': 8, 'e': 9}
{**dict1, **dict2}={'a': 1, 'b': 2, 'c': 7, 'd': 8, 'e': 9}
chained=ChainMap({'a': 1, 'b': 2, 'c': 3}, {'c': 7, 'd': 8, 'e': 9})
chained['c']=3
(dict1 | dict2)={'a': 1, 'b': 2, 'c': 7, 'd': 8, 'e': 9}


In [39]:
s = '(log) log: [2024-02-01 09:49:59.764701+00:00 UTC]'
print(s.replace('(log) ', ''))
print(s.lstrip('(log) '))
print(s.removeprefix('(log) '))

log: [2024-02-01 09:49:59.764701+00:00 UTC]
: [2024-02-01 09:49:59.764701+00:00 UTC]
log: [2024-02-01 09:49:59.764701+00:00 UTC]


### Python 3.8

Positional-only parameters

* There is a new function parameter syntax `/` to indicate that some function parameters must be specified positionally and cannot be used as keyword arguments.
* In the following example, parameters `a` and `b` are positional-only, while `c` or `d` can be positional or keyword, and `e` or `f` are required to be keywords:
  ```Python
  def f(a, b, /, c, d, *, e, f):
    print(a, b, c, d, e, f)
  ```
* One use case for this notation is to preclude keyword arguments when the parameter name is not helpful.
* A further benefit of marking a parameter as positional-only is that it allows the parameter name to be changed in the future without risk of breaking client code.
* Since the parameters to the left of `/` are not exposed as possible keywords, the parameters names remain available for use in `**kwargs`. This greatly simplifies the implementation of functions and methods that need to accept arbitrary keyword arguments.

f-strings support `=` for self-documenting expressions and debugging

* Added an `=` specifier to f-strings.
* An f-string such as `f'{expr=}'` will expand to the text of the expression, an equal sign, then the representation of the evaluated expression.
* The `=` specifier will display the whole expression so that calculations can be shown.

`as_integer_ratio()`

* The `bool`, `int`, and `fractions.Fraction` types now have an `as_integer_ratio()` method like that found in `float` and `decimal.Decimal`.
* This minor API extension makes it possible to write `numerator, denominator = x.as_integer_ratio()` and have it work across multiple numeric types.

`functools`

* `functools.lru_cache()` can now be used as a straight decorator rather than as a function returning a decorator.
* Added a new `functools.singledispatchmethod()` decorator that converts methods into generic functions using single dispatch.

`collections`

* The `_asdict()` method for `collections.namedtuple()` now returns a `dict` instead of a `collections.OrderedDict`. This works because regular dicts have guaranteed ordering since Python 3.7. If the extra features of `OrderedDict` are required, the suggested remediation is to cast the result to the desired type: `OrderedDict(nt._asdict())`.

`itertools`

* The `itertools.accumulate()` function added an option `initial` keyword argument to specify an initial value.

`math`

* Added new function `math.dist()` for computing Euclidean distance between two points.
* Added new function, `math.prod()`, as analogous function to `sum()` that returns the product of a start value (default: 1) times an iterable of numbers.
* Added two new combinatorics functions `math.perm()` and `math.comb()`.
* Added a new function `math.isqrt()` for computing accurate integer square roots without conversion to floating point. The new function supports arbitrarily large integers. It is faster than `floor(sqrt(n))` but slower than `math.sqrt()`.
* The function `math.factorial()` no longer accepts arguments that are not int-like.

In [46]:
import datetime
import math

pi = math.pi
current_time = datetime.datetime.now(datetime.UTC)
print(f'{pi=:.7f}')
print(f'{current_time=!s}')
print(f'{current_time=:%Y/%m/%d %H:%M:%S.%f%z %Z}')

pi=3.1415927
current_time=2024-02-01 11:12:22.468653+00:00
current_time=2024/02/01 11:12:22.468653+0000 UTC


### Python 3.7

`collections`

* `collections.namedtuple()` now supports default values.
* `collections.namedtuple()` no longer supports the verbose parameter or `_source` attribute which showed the generated source code for the named tuple class. This was part of an optimization designed to speed-up class creation.

Python data model improvements

* The insertion-order preservation nature of `dict` objects has been declared to be an official part of the Python language spec.