# Agenda

1. Iterators and the iterator protocol
    - The protocol itself
    - Making our classes iterable
    - Generators and generator functions
    - Generator expressions / generator comprehensions
    - `itertools`
2. Decorators
3. Concurrency
    - Threads
    - Multiprocessing
    - `asyncio`


# Iterators

We know that we can put many different data types into a `for` loop:

- Strings, and we get one character per iteration
- Lists or tuples, one element per iteration
- Dicts, one key per iteration
- Files, one line per iteration



In [1]:
for one_character in 'abcd':
    print(one_character)

a
b
c
d


# Iterator protocol

1. `for` turns to the object at the end of the line, and asks: Are you iterable? (`iter`)
    - If not, then we exit with a `TypeError` exception
2. If so, then we get an iteator object back, to which `for` says: Give me your next value (`next`)
    - If there are no more values, then the loop exits (becuse `next` raised the `StopIteration` exception)
3. The next value is assigned to the loop variable (`one_character`)
4. The loop body executes with that assignment
5. We return to step 2

In [2]:
s = 'abcd'

i = iter(s)
type(i)

str_ascii_iterator

In [3]:
next(i)

'a'

In [4]:
next(i)

'b'

In [5]:
next(i)

'c'

In [6]:
next(i)

'd'

In [7]:
next(i)

StopIteration: 

In [8]:
f = open('/etc/passwd')
i = iter(f)

In [9]:
i

<_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>

In [10]:
f is i   # is f its own iterator?

True

In [11]:
iter(10)

TypeError: 'int' object is not iterable

In [12]:
for i in 10:
    print(i)

TypeError: 'int' object is not iterable

In [16]:
class MyIter:
    def __init__(self, data):
        self.data = data
        self.index = 0
        print(f'\tIn MyIter.__init__, {self.data=}, {self.index=}')

    def __iter__(self):  # the job of __iter__ is to return the object's iterator, where __next__ is implemented
        print(f'\tIn MyIter.__iter__, {self.data=}, {self.index=}')
        return self      # I am my own iterator!

    def __next__(self):
        print(f'\tIn MyIter.__next__, {self.data=}, {self.index=}')
        if self.index >= len(self.data):
            print(f'\tIn MyIter.__next__, raising StopIteration')
            raise StopIteration     # stop the loop if we've gone past the end

        value = self.data[self.index]
        self.index += 1
        print(f'\tIn MyIter.__next__, returning {value}')
        return value

m = MyIter('abcd')

print('*** First run')
for one_item in m:
    print(one_item)

print('*** Second run')
for one_item in m:
    print(one_item)

	In MyIter.__init__, self.data='abcd', self.index=0
*** First run
	In MyIter.__iter__, self.data='abcd', self.index=0
	In MyIter.__next__, self.data='abcd', self.index=0
	In MyIter.__next__, returning a
a
	In MyIter.__next__, self.data='abcd', self.index=1
	In MyIter.__next__, returning b
b
	In MyIter.__next__, self.data='abcd', self.index=2
	In MyIter.__next__, returning c
c
	In MyIter.__next__, self.data='abcd', self.index=3
	In MyIter.__next__, returning d
d
	In MyIter.__next__, self.data='abcd', self.index=4
	In MyIter.__next__, raising StopIteration
*** Second run
	In MyIter.__iter__, self.data='abcd', self.index=4
	In MyIter.__next__, self.data='abcd', self.index=4
	In MyIter.__next__, raising StopIteration


# Exercise: Circle

1. Write a `Circle` class that takes two arguments, an iterable (`data`) and an integer (`maxtimes`).
2. When someone iterates over an instance of `Circle`, they should get `maxtimes` results.
3. If `maxtimes` is bigger than the length of `Circle`, then you should go back to the beginning for more values.

Example:

```python
c = Circle('abcd', 7)

for one_item in c:
    print(one_item)  # a b c d a b c 
```    

In [22]:
class Circle:
    def __init__(self, data, maxtimes):
        self.data = data
        self.maxtimes = maxtimes
        self.index = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.index >= self.maxtimes:
            raise StopIteration

        value = self.data[self.index % len(self.data)]
        self.index += 1
        return value

c = Circle('abcd', 7)

print('**** first run')
for one_item in c:
    print(one_item)  

print('**** second run')
for one_item in c:
    print(one_item)  

**** first run
a
b
c
d
a
b
c
**** second run


In [23]:
s = 'abcd'

i1 = iter(s)
i2 = iter(s)

In [24]:
id(i1)

4412964096

In [25]:
id(i2)

4412956512

In [26]:
next(i1)

'a'

In [27]:
next(i1)

'b'

In [28]:
next(i1)

'c'

In [29]:
next(i2)

'a'

In [30]:
class CircleIterator:
    def __init__(self, data, maxtimes):
        self.data = data
        self.maxtimes = maxtimes
        self.index = 0

    def __next__(self):
        if self.index >= self.maxtimes:
            raise StopIteration

        value = self.data[self.index % len(self.data)]
        self.index += 1
        return value

class Circle:
    def __init__(self, data, maxtimes):
        self.data = data
        self.maxtimes = maxtimes

    def __iter__(self):
        return CircleIterator(self.data, self.maxtimes)

c = Circle('abcd', 7)

print('**** first run')
for one_item in c:
    print(one_item)  

print('**** second run')
for one_item in c:
    print(one_item)  

**** first run
a
b
c
d
a
b
c
**** second run
a
b
c
d
a
b
c


In [31]:
# another way -- keeping the reference to Circle

class CircleIterator:
    def __init__(self, the_circle):
        self.the_circle = the_circle
        self.index = 0

    def __next__(self):
        if self.index >= self.the_circle.maxtimes:
            raise StopIteration

        value = self.the_circle.data[self.index % len(self.the_circle.data)]
        self.index += 1
        return value

class Circle:
    def __init__(self, data, maxtimes):
        self.data = data
        self.maxtimes = maxtimes

    def __iter__(self):
        return CircleIterator(self)

c = Circle('abcd', 7)

print('**** first run')
for one_item in c:
    print(one_item)  

print('**** second run')
for one_item in c:
    print(one_item)  

**** first run
a
b
c
d
a
b
c
**** second run
a
b
c
d
a
b
c


# Exercise: Selective iteration

1. Create an iterator, `Select`, that takes two arguments:
    - `data`, which can be any sequence
    - `func`, a function that returns `True` or `False`
2. When I iterate over an instance of `Select`, I'll only get those elements of `data` for which `func` returns `True`
3. Use a two-class iterator construct for this to work

Example:

```python
s = Select([10, 15, 20, 25],
           lambda x: x % 2 == 0)

for one_item in s:
    print(one_item)   # 10 20

In [44]:
class SelectIterator:
    def __init__(self, selector):
        self.selector = selector
        self.index = 0
    
    def __next__(self):
        while self.index < len(self.selector.data):
    
            value = self.selector.data[self.index]
            self.index += 1

            if self.selector.func(value):
                return value

        raise StopIteration
        
class Select:
    def __init__(self, data, func):
        self.data = data
        self.func = func

    def __iter__(self):
        return SelectIterator(self)



In [45]:
s = Select([10, 15, 20, 25],
           lambda x: x % 2 == 1)

for one_item in s:
    print(one_item)   # 10 20

15
25


In [47]:
import itertools
import string

In [48]:
string.ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

In [52]:
c = itertools.combinations(string.ascii_lowercase, 4)

for index, one_combination in enumerate(c):
    print(''.join(one_combination))

    if index > 20:
        break

abcd
abce
abcf
abcg
abch
abci
abcj
abck
abcl
abcm
abcn
abco
abcp
abcq
abcr
abcs
abct
abcu
abcv
abcw
abcx
abcy


In [53]:
# generators

# the world's dumbest function

def myfunc():
    return 1
    return 2
    return 3

In [54]:
myfunc()

1

In [55]:
import dis

In [56]:
dis.dis(myfunc)

  5           0 RESUME                   0

  6           2 RETURN_CONST             1 (1)


In [57]:
# let's do another function that seems very similar

def myfunc():
    yield 1
    yield 2
    yield 3

In [58]:
myfunc()   # what will we get back? answer... a generator object

<generator object myfunc at 0x1076203b0>

# Generators

A generator is an object that supports the iterator protocol. In other words, we can put it in a `for` loop, and/or we can use `iter` and `next` on it to get the next values.

We typically create a generator using a "generator function," a function that instead of using `return` uses the `yield` keyword. `yield` basically means: (a) return the current value to the caller and (b) go to sleep at this point in the program.

When `next` is next invoked on the generator, it "wake up" at the point it went to sleep, and continues as if nothing had happened. All of the local variables are stil intact, and it can keep running.

Generators allow us to express ideas as functions, rather than as classes. There are many instances where that turns out to be easier and clearer.

In [59]:
g = myfunc()

In [60]:
next(g)

1

In [61]:
next(g)

2

In [62]:
next(g)

3

In [63]:
next(g)

StopIteration: 

In [65]:
# Fibonacci sequence

def fib():
    first = 0
    second = 1

    while True:
        yield first
        first, second = second, first+second

for one_number in fib():
    print(one_number, end=' ')

    if one_number > 100_000_000_000:
        break

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 139583862445 

In [66]:
# this would be a very bad idea:

# list(fib())

In [67]:
# one of the functions in itertools is called "chain"

for one_item in itertools.chain('abcd', [10, 20, 30], 'efgh'):
    print(one_item)

a
b
c
d
10
20
30
e
f
g
h


In [68]:
# how could we implement chain ourselves?

def mychain(*args):
    for one_arg in args:            # go through each argument
        for one_item in one_arg:    # go through each element in this argument
            yield one_item          # return the current value, and go to sleep until next is called 

for one_item in mychain('abcd', [10, 20, 30], 'efgh'):
    print(one_item)

a
b
c
d
10
20
30
e
f
g
h


In [69]:
def file_words(filename):
    for one_line in open(filename):
        for one_word in one_line.split():
            yield one_word

In [71]:
g = file_words('myfile.txt')

In [72]:
next(g)

'this'

In [81]:
next(g)

StopIteration: 

In [82]:
dis.show_code(file_words)

Name:              file_words
Filename:          /var/folders/rr/0mnyyv811fs5vyp22gf4fxk00000gn/T/ipykernel_73545/1796451847.py
Argument count:    1
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  3
Stack size:        3
Flags:             OPTIMIZED, NEWLOCALS, GENERATOR
Constants:
   0: None
Names:
   0: open
   1: split
Variable names:
   0: filename
   1: one_line
   2: one_word


In [83]:
def file_words(filename):
    for one_line in open(filename):
        for one_word in one_line.split():
            yield one_word
        return 'hello out there'

In [85]:
g = file_words('myfile.txt')

In [86]:
next(g)

'this'

In [87]:
next(g)

'is'

In [88]:
next(g)

'a'

In [89]:
next(g)

'test'

In [90]:
next(g)

StopIteration: hello out there

In [91]:
def myfunc():
    print('hello')
    return 5  # return means: finish the function *now*, get rid of its frame/local variables, we're done
    print('goodbye')

In [None]:
def myfunc():
    print('hello')
    yield 5  # yield means: (a) running the function produces a generator and (b) each "next" goes through the next yield
             # which returns is value and goes to sleep there, (c) the next "next" wakes up there and continues
    print('goodbye')

# Exercise: `read_n`

When we read from a file in a `for` loop, we get one line at a time -- as a string, with the ending `'\n'`. This works really well, because many files are structured with one record per line.

But what if we have a file that uses two or three lines per record? Then reading in a `for` loop isn't so effective.

I want you to write `read_n`, a generator that takes two arguments:

- `filename`, a string
- `n`, the number of lines you want to get in each iteration

When you iterate over `read_n` on a file, you'll get one string per iteration containing `n` lines. The final iteration on the file might contain fewer than that.

```python
for one_chunk in read_n('/etc/passwd', 5):
    print(one_chunk)
```    

In [95]:
def read_n(filename, n):
    f = open(filename)

    while True:
        output = ''.join([f.readline()
                          for i in range(n)])
        
        if output:
            yield output
        else:
            break

for one_chunk in read_n('/etc/passwd', 5):
    print(one_chunk)

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by

# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##

nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false

_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:3

In [98]:
f = open('linux-etc-passwd.txt')

In [99]:
f.readline()

'# This is a comment\n'

In [100]:
f.readline()

'# You should ignore me\n'

In [101]:
f.readline()

'root:x:0:0:root:/root:/bin/bash\n'

In [102]:
f.readline()

'daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\n'

In [103]:
f.readline()

'bin:x:2:2:bin:/bin:/usr/sbin/nologin\n'

In [104]:
g = read_n('/etc/passwd', 5)

In [105]:
dir(g)

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_suspended',
 'gi_yieldfrom',
 'send',
 'throw']

In [107]:
g.gi_code.co_varnames

('filename', 'n', 'f', 'i', 'output')

In [111]:
g.gi_frame.f_lineno

9

In [110]:
next(g)

'##\n# User Database\n# \n# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n'

In [None]:
# we can rewrite our Select class such that its __iter__ is a gener