# Agenda

- Generators (generator functions)
- Concurrency
    - Threads
    - Processes
    - `asyncio`
- Profiling    
- NumPy + Pandas

- Single-quoted string: `''`
- Double-quoted string: `""`
- `"""triple quoted string"""`
- `r'c:\a\b\c\d\e'`  # auto-doubles backslashes
- `b'abc'` # bytes, not characters
- `f'abc{x}'`  # replaces anything in `{}` with its value -- starting from Python 3.6

In [2]:
# 3.8 added trailing = to f-strings

x = 100
y = [10, 20, 30]

f'{x=}, {y=}'

'x=100, y=[10, 20, 30]'

In [5]:
def myfunc():
    return 1
    return 2
    return 3

In [6]:
myfunc()

1

In [7]:
import dis

In [8]:
dis.dis(myfunc)

  2           0 LOAD_CONST               1 (1)
              2 RETURN_VALUE


In [9]:
def myfunc():
    yield 1
    yield 2
    yield 3

In [10]:
myfunc() 

<generator object myfunc at 0x108f5ad50>

# Iterator protocol

1. We run `iter` on the object, we get the object's iterator back (or an exception).
2. We can run `next` on the returned object.  When we do this, one of two things happens:
    - We get back an object, whatever the iterator wants to return
    - We get an exception, `StopIteration`

In [11]:
s = 'abcd'

In [12]:
iter(s)

<str_iterator at 0x108ce05e0>

In [13]:
i = iter(s)

In [14]:
next(i)

'a'

In [15]:
next(i)

'b'

In [16]:
next(i)

'c'

In [17]:
next(i)

'd'

In [18]:
next(i)

StopIteration: 

In [19]:
for one_letter in s:
    print(one_letter)

a
b
c
d


# Generators

Generators implement the iterator protocol.  Each time we run `next` on a generator, the function body runs until (and including) the next `yield`, and we get that value back.

The next time we run `next`, the function continues from where it left off, just after the `yield`.

In [21]:
def myfunc():
    yield 1
    yield 2
    yield 3
    
g = myfunc()

In [22]:
i = iter(g)

In [24]:
i is g  # a generator is its own iterator; i and g are exactly the same object

True

In [25]:
next(g)

1

In [26]:
next(g)

2

In [27]:
next(g)

3

In [28]:
next(g)

StopIteration: 

In [29]:
def myfunc():
    print("A")
    yield 1
    print("B")
    yield 2
    print("C")
    yield 3
    print("D")
    
g = myfunc()     # run the function, get a generator object

In [30]:
next(g)   # run through the next yield 

A


1

In [31]:
next(g)  # run through next yield, starting at the end of line 3

B


2

In [32]:
next(g)  # run through next yield, starting at the end of line 5

C


3

In [33]:
next(g)  # run through the end of the function, then raise StopIteration

D


StopIteration: 

In [34]:
def fib():
    first = 0
    second = 1
    
    while True:
        yield first
        first, second = second, first+second
    

In [35]:
g = fib()

In [36]:
g

<generator object fib at 0x1090ef610>

In [37]:
next(g)

0

In [38]:
next(g)

1

In [39]:
next(g)

1

In [40]:
next(g)

2

In [42]:
for one_item in fib():
    print(one_item, end=' ')
    
    if one_item > 100_000_000_000:
        break   

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 2971215073 4807526976 7778742049 12586269025 20365011074 32951280099 53316291173 86267571272 139583862445 

# Exercise: `read_n`

We normally get the lines of a file, one at a time, when we iterate (with a `for` loop) over it.  Write a generator function, `read_n`, that takes two arguments, `filename` and `n`.  

With each iteration, we should get a new string with `n` lines -- except for the final iteration, which might have fewer.

```python
for one_chunk in read_n('/etc/passwd', 5):
    print(one_chunk)   # string with up to 5 lines, from /etc/passwd

```

In [54]:
def read_n(filename, n):
    with open(filename) as f:
        while True:
            output_lines = []   # will contain n strings, from reading n lines
            
            for i in range(n):
                output_lines.append(f.readline())
                
            output = ''.join(output_lines)  # create one output string, from the output_lines list

            if output:
                yield output
            else:
                break

In [55]:
for one_chunk in read_n('/etc/passwd', 5):
    print(one_chunk)

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by

# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##

nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false

_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:3

In [56]:
# list comprehension!

def read_n(filename, n):
    with open(filename) as f:
        while True:
            output = ''.join([f.readline()
                             for i in range(n)])


            if output:
                yield output
            else:
                break

In [59]:
for one_chunk in read_n('/etc/passwd', 3):
    print(one_chunk)

##
# User Database
# 

# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.

#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.

##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh

daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false

_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false

_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false

_appstore:

In [60]:
g = read_n('/etc/passwd', 3)

In [61]:
next(g)

'##\n# User Database\n# \n'

In [62]:
next(g)

'# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n# Open Directory.\n'

In [63]:
dir(g)

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_yieldfrom',
 'send',
 'throw']

In [64]:
dis.dis(g.gi_code)

              0 GEN_START                0

  4           2 LOAD_GLOBAL              0 (open)
              4 LOAD_FAST                0 (filename)
              6 CALL_FUNCTION            1
              8 SETUP_WITH              32 (to 74)
             10 STORE_DEREF              0 (f)

  5          12 NOP

  6     >>   14 LOAD_CONST               2 ('')
             16 LOAD_METHOD              1 (join)
             18 LOAD_CLOSURE             0 (f)
             20 BUILD_TUPLE              1
             22 LOAD_CONST               3 (<code object <listcomp> at 0x1090e7e10, file "/var/folders/rr/0mnyyv811fs5vyp22gf4fxk00000gn/T/ipykernel_83026/3131240066.py", line 6>)
             24 LOAD_CONST               4 ('read_n.<locals>.<listcomp>')
             26 MAKE_FUNCTION            8 (closure)

  7          28 LOAD_GLOBAL              2 (range)
             30 LOAD_FAST                1 (n)
             32 CALL_FUNCTION            1

  6          34 GET_ITER
             36 CALL_FUNCT

In [67]:
g.gi_frame.f_locals

{'filename': '/etc/passwd',
 'n': 3,
 'output': '# Note that this file is consulted directly only when the system is running\n# in single-user mode.  At other times this information is provided by\n# Open Directory.\n',
 'f': <_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>}

In [68]:
next(g)

'#\n# See the opendirectoryd(8) man page for additional information about\n# Open Directory.\n'

In [69]:
g.gi_frame.f_locals

{'filename': '/etc/passwd',
 'n': 3,
 'output': '#\n# See the opendirectoryd(8) man page for additional information about\n# Open Directory.\n',
 'f': <_io.TextIOWrapper name='/etc/passwd' mode='r' encoding='UTF-8'>}

# Concurrency

- Threads
    - Threads as objects
    - Futures and `ThreadPoolExecutor`
- Processes
    - `multiprocessing
    - `ProcessPoolExecutor`
- `asyncio`

The GIL (global interpreter lock) ensures that only one thread runs at a time.

In [70]:
import sys

In [71]:
sys.getswitchinterval()

0.005

# Exercise: `all_lines`

Given a number of filenames (text files), I want to see all of the lines from those files.  (The order in which the lines are displayed isn't important.)

1. First, write a function `all_lines` that takes any number of filename arguments, and prints all of the lines from those files.
2. Second, do the same thing with threads -- we'll open a new thread for each filename, we'll put each line from each file in a queue, and then when all threads are done, print all lines from all files.

Use `time.time()` at the start and end to see how long each of these implementations take.

# Next up

- `ThreadPoolExecutor` and futures
- `ProcessPoolExecutor` (and futures)
- `asyncio`

Resume at 11:55

In [72]:
x = 10
y = 20

f = x + y

In [73]:
f

30

# Use `concurrent.futures`

- `ThreadPoolExecutor` -- 