# So You Want To Be A Python Expert? | PyData Seattle 2017

I discuss four features of Python, and how experts use these features. Let us start our discussion by discussing the data-model.

I have some class that I want to represent as python object.

In [None]:
class Polynomial:
    pass

p1 = Polynomial()
p2 = Polynomial()

p1.coeffs = 1, 2, 3 # x2 + 2x + 3
p2.coeffs = 3, 4, 3 # 3x2 + 4x + 3

Now just a side note I could have written the above code in a more compact way by adding an `__init__` method.

In [None]:
class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = coeffs
        
p1 = Polynomial(1, 2, 3)
p2 = Polynomial(3, 4, 3)

In [None]:
p1

<__main__.Polynomial at 0x7f65c08dfe20>

In [None]:
p1

<__main__.Polynomial at 0x7f65c08dfe20>

Now when I look at the print statement I see that it is so ugly. I am missing `__repr__` method to represent the representation of my python object.

In [None]:
class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = coeffs
        
    def __repr__(self):
        return f'Polynomial(*{self.coeffs!r})'
    
p1 = Polynomial(1, 2, 3)
p2 = Polynomial(3, 4, 3)

In [None]:
p1

Polynomial(*(1, 2, 3))

In [None]:
p2

Polynomial(*(3, 4, 3))

In [None]:
p1 + p2

TypeError: unsupported operand type(s) for +: 'Polynomial' and 'Polynomial'

This dosen't make sense. I know that I can add polynomials. I have another method called `__add__` to do that.

In [None]:
class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = coeffs
        
    def __repr__(self):
        return f'Polynomial(*{self.coeffs!r})'
    
    def __add__(self, other):
        return Polynomial(*(x+y for x, y in zip(self.coeffs, other.coeffs)))
    
p1 = Polynomial(1, 2, 3)
p2 = Polynomial(3, 4, 3)

In [None]:
p1 + p2

Polynomial(*(4, 6, 6))

Now we begin some pattern. We have some behaviour that I want to implement and I write some `__function__` called dunder methods or data model methods. Whenever we have to implement this we observe that there is a top-level function or top-level syntax and then there is corresponding underscore function.
* x + y    ->   \_\_add\_\_
* init x   ->   \_\_init\_\_

We know in python there is a function called len and we can implement that in class also using `__len__`.

In [None]:
class Polynomial:
    def __init__(self, *coeffs):
        self.coeffs = coeffs
        
    def __repr__(self):
        return f'Polynomial(*{self.coeffs!r})'
    
    def __add__(self, other):
        return Polynomial(*(x+y for x, y in zip(self.coeffs, other.coeffs)))
    
    def __len__(self):
        return len(self.coeffs)
    
p1 = Polynomial(1, 2, 3)
p2 = Polynomial(3, 4, 3)

In [None]:
len(p1), len(p2)

(3, 3)

The python data model is a means by which you can implement protocols. Those protocols have some abstract meaning depending on the object itself. We can think of this as a protocol as like many protocols when we implement something like len we do that by delegating back to the protocol. repr is implemented by calling repr in the function and same for add. 

## metaclasses
Imagine there are two groups working on one piece of software. One group is a core infrastructure group and they write library code (library.py). The other group is a developer group and they write user code (user.py). The developer group takes the code written by core group and uses that library code to accomplish actual business objectives. Let's say this library provides classes and these classes are expected to be subclassed in the user code. 

In [None]:
# library.py
class Base:
    pass

# user.py
class Derived(Base):
    pass

Adding one function to each file as follows.

In [None]:
# library.py
class Base:
    def foo(self):
        return f'foo'
    
# user.py
class Derived(Base):
    def bar(self):
        return self.foo()

Let's say you write user code and you cannot in any way alter or modify the core code. You can use it but cannot change it. Where can the user code break i.e. where does return `self.foo()` break. The answer is if there is no foo method. Now how you can insure that if there is no foo method in the core library than your code fails at any time rather than when the foo method is called. One simple way is to write a test. Anything simpler you can do.

You can use hasattr.

In [None]:
# user.py
assert hasattr(Base, 'foo'), "you broke it you fool!"

We can see that we are adding a form of constraint. We are forcing some constraints on the library level. In other words the Derived class is enforcing constraint on the base class, the base class has to have these characteristics in order for me to run and be happy. And if it dosen't have these characteristics than I am going to fail and not run.

Let's flip the script a little bit. Let's say we have a case that looks like this.

In [None]:
# library.py
class Base:
    def foo(self):
        return self.bar()
    
# user.py
class Derived(Base):
    def bar(self):
        return 'bar'

Let's say you are the core infrastructure writer and you have to deal with those meatheads in the business unit using your code and abusing your code. They have no idea what they are doing. And you write your base class under the assumption that some responsible developer in the business unit will go and implement this bar method because if they don't then everything falls apart. You cannot change the code on the user level. How do you make sure that the user does not screw up i.e. bar method is implemented by the user.

There are three common questions to this. One of them is metaclasses.

In python a class is a runtime executable code, which means I can do something like this.

In [None]:
for _ in range(10):
    class Base: pass

In [None]:
 class Base:
    for _ in range(10):
        def bar(self): pass

Let me define a class in a function and analyze it using dis module.

In [None]:
def _():
    class Base:
        pass
    
from dis import dis
dis(_)

  2           0 LOAD_BUILD_CLASS
              2 LOAD_CONST               1 (<code object Base at 0x7f65c0b83660, file "<ipython-input-23-c0baff5fae0e>", line 2>)
              4 LOAD_CONST               2 ('Base')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               2 ('Base')
             10 CALL_FUNCTION            2
             12 STORE_FAST               0 (Base)
             14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

Disassembly of <code object Base at 0x7f65c0b83660, file "<ipython-input-23-c0baff5fae0e>", line 2>:
  2           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('_.<locals>.Base')
              6 STORE_NAME               2 (__qualname__)

  3           8 LOAD_CONST               1 (None)
             10 RETURN_VALUE


We see a LOAD_BUILD_CLASS which a actual executable runtime instruction in the python interpreter for creating a class. As we discussed that in python there are protocols that we can use to hook into various aspects of python. It turns out that there is a hook that actually allows you to do things with the process of building class. The method is called `__build_class__` in the builtins module.

In [None]:
old_bc = __build_class__
import builtins

def my_bc(*a, **k):
    # Here I can patch into what python does when building classes
    print(f'my buildclass -> {a} {k}')
    return old_bc(*a, **k) 

builtins.__build_class__ = my_bc

In [None]:
# user.py
class Derived(Base):
    def bar(self):
        return 'bar'

my buildclass -> (<function Derived at 0x7f65c05ae820>, 'Derived', <class '__main__.Base'>) {}


Now I can add my assert in the building of the class.

In [None]:
# library.py
class Base:
    def foo(self):
        return self.bar()
    
import builtins
old_bc = __build_class__

def my_bc(fun, name, base=None, **k):
    if base is Base:
        print('check if bar method is defined')
    if base is not None:
        return old_bc(fun, name, base, **k)
    return old_bc(fun, name, **k)

builtins.__build_class__ = my_bc

# user.py
class Derived(Base):
    def bar(self):
        return 'bar'

check if bar method is defined


This is a clear example of python being a protocol oriented language and you can literally hook into anything in python. We now explore how metaclasses work.

In [None]:
# library.py
class BaseMeta(type):
    def __new__(cls, name, bases, body):
        print(f'BaseMeta.__new__({cls}, {name}, {bases}, {body})')
        return super().__new__(cls, name, bases, body)

class Base(metaclass=BaseMeta):
    def foo(self):
        return self.bar()
    
# user.py
class Derived(Base):
    def bar(self):
        return 'bar'

BaseMeta.__new__(<class '__main__.BaseMeta'>, Base, (), {'__module__': '__main__', '__qualname__': 'Base', 'foo': <function Base.foo at 0x7ff6c868d9d0>})
BaseMeta.__new__(<class '__main__.BaseMeta'>, Derived, (<class '__main__.Base'>,), {'__module__': '__main__', '__qualname__': 'Derived', 'bar': <function Derived.bar at 0x7ff6c868dc10>})


Metaclasses are classes that derive from `type` that have some special methods on them that you have to read the documentation inorder to understand but fundamentally allow you to intercept the construction of derived classes. We see that a dictionary is returned so to check if a method exists I can simply do an assert.

In [None]:
# In this case there is no `bar` method in `Base`

# library.py
class BaseMeta(type):
    def __new__(cls, name, bases, body):
        if not 'bar' in body:
            raise TypeError("bad user class")
        return super().__new__(cls, name, bases, body)

class Base(metaclass=BaseMeta):
    def foo(self):
        return self.bar()
    
# user.py
class Derived(Base):
    def bar(self):
        return 'bar'

TypeError: bad user class

In [None]:
# In this case there was not bar method in Base

# library.py
class BaseMeta(type):
    def __new__(cls, name, bases, body):
        if not 'bar' in body:
            raise TypeError("bad user class")
        return super().__new__(cls, name, bases, body)

class Base(metaclass=BaseMeta):
    def foo(self):
        return self.bar()
    
    def bar(self): pass
    
# user.py
class Derived(Base):
    def ba(self):
        return 'bar'

TypeError: bad user class

The second is even easier. In python metaclasses have a bad reputation because even those people who understand where they're useful see that metaclasses have a lot of complexity to them. The above implementation is a bit clumpsy so in python 3.6 they added a new feature called `__init_subclass__` and it allows you to hook into the process when a subclass is created.

In [None]:
# library.py
class Base:
    def foo(self):
        return self.bar()
    
    def __init_subclass__(self, *a, **kw):
        print(f'__init_subclass__({a}, {kw})')
        super().__init_subclass__(*a, **kw)
        
# user.py
class Derived(Base):
    def bar(self):
        return 'bar'

__init_subclass__((), {})


## Decorators
It is @dec  before some function.

In [None]:
def add(x, y=10):
    return x + y

In [None]:
add(10, 20)

30

In [None]:
add.__code__.co_code

b'|\x00|\x01\x17\x00S\x00'

In [None]:
add.__code__.co_nlocals

2

In [None]:
add.__code__.co_varnames

('x', 'y')

Let's say I want to time this function. So what are my options.

In [None]:
from time import time

def add(x, y=10):
    return x + y

before = time()
print(f'{add(10) = }, {add(10) = }')
after = time()
print(f'time taken: {after - before}')

add(10) = 20, add(10) = 20
time taken: 0.00034499168395996094


But the above method is a bit tedious as I have to write it to all places in my code. I could write that only in the function.

In [None]:
def add(x, y=10):
    before = time()
    rv = x + y
    after = time()
    print(f'elapsed: {after - before}')
    return rv

print(f'{add(10) = }, {add(10) = }')

elapsed: 2.384185791015625e-07
elapsed: 2.384185791015625e-07
add(10) = 20, add(10) = 20


But now what if I have many functions like add, sub, multiple. Then I would have to duplicate the above code in all those functions. So the next solution would be to write a time function.

In [None]:
def timer(func, x, y):
    before = time()
    rv = func(x, y)
    after = time()
    print(f'elapsed: {after - before}')
    return rv

def add(x, y=10):
    return x + y

def sub(x, y=10):
    return x - y

# Now I have to add some code to the user code
print(f'{timer(add, 10, 10)}')

elapsed: 4.76837158203125e-07
20


Now I know everything works live in the python environment. So I can also add a function inside the timer function.

In [None]:
def timer(func):
    def f(x, y=10): # this function is a wrapper that calls the timing calls on the orignal function
        before = time()
        rv = func(x, y)
        after = time()
        print(f'elapsed: {after - before}')
        return rv
    return f

add = timer(add)
sub = timer(sub)

# Now we can see that this added the time functionality to the above functions
print(f'{add(10)}')

elapsed: 7.152557373046875e-07
20


Python makes this a little bit easier using decorators.

In [None]:
@timer
def add(x, y=10):
    return x + y

add(10)

elapsed: 7.152557373046875e-07


20

So a decorator is just a syntax for `add = timer(add)`. Also, you would use def `f(*args, **kwargs)` so that it can used to wrap any function. Higher order decorators are also possible. 

In [None]:
# Decorator to run a function multiple number of times.
def decor(n):
    def inner(func):
        def wrapper(*args, **kwargs):
            for i in range(n):
                print(f'Running for {i+1} time')
                rv = func(*args, **kwargs)
            return rv
        return wrapper
    return inner

@decor(2)
def add(x, y):
    return x + y

add(2, 3)

Running for 1 time
Running for 2 time


5

## Generators

In [None]:
# What is the difference between add1 and add2
def add1(x, y):
    return x + y

class Adder:
    def __call__(self, x, y):
        return x + y
add2 = Adder()

In [None]:
add1(10, 20)

30

In [None]:
add2(10, 20)

30

In [None]:
type(add1)

function

In [None]:
type(add2)

__main__.Adder

Looking from inside add1 is very syntactically similar to add2, but it is much easier to write. The difference is that in case of add2 we can add stateful behaviour. Consider a function that takes a long time to run, say a function that loads data from database. Or in our case a function that sleeps for 5 seconds and then returns the result.

In [None]:
from time import sleep
def load_data():
    rv = []
    for i in range(10):
        sleep(.5)
        rv.append(i)
    return rv

load_data()

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Now it takes 5 seconds to run, but what if I only care about the first value. In that case it also takes 5 seconds. In case of memory it also requires linear memory. iF I only care about the first value then still it requires linear memory. Let me rewrite the above as a class.

In [None]:
class Compute:
    def __call__(self):
        rv = []
        for i in range(10):
            sleep(.5)
            rv.append(i)
        return rv
compute = Compute()

I have this seen this pattern before in the as a basic looping mechanism and we know in python we see top-level syntax or top-level functions have underscore functions that implement them. So we can add`__iter__` and `next`.

In [None]:
class Compute:
    def __iter__(self):
        self.last = 0
        return self
    
    def __next__(self):
        rv = self.last
        self.last += 1
        if self.last > 10:
            raise StopIteration()
        sleep(.5)
        return rv

for val in Compute():
    print(val)

0
1
2
3
4
5
6
7
8
9


It gives you one value at a time and also there is no storage involved. But it looks very ugly. There is a much simpler way to write it using generator syntax.

In [None]:
def compute():
    for i in range(10):
        sleep(.5)
        yield i
        
for val in compute():
    print(val)

0
1
2
3
4
5
6
7
8
9


As a final example consider the case in API design when you say run the following methods in a specific order. 

In [None]:
class Api:
    def run_this_first():  first()
    def run this_second(): second()
    def run_this_third():  third()
        
# Now nothing stops you from doint this
Api().run_this_second() # and then see a break

Now if we look at the generator formulation, we see it performs a computation it not only returns the computation but it also returns the control back to the user to do something with the result. This is the generator code where we let the library do some work, then let user do some work and I interleave them them. This is the actual core conceptualization behind what the generators are built upon, the idea of coroutines. Subroutines can be thought of any piece of code that runs from some starting point to end point, they have one starting entry point and one exit point and that's it. For generators, coroutines, we see interleaving of the user code and library code.

Now for the above case, we cannot have a single function that runs the three functions because the API want to allows use to some interleaving. To solve the above problem we can see a generator solution as.

In [None]:
def api():
    first()
    yield
    second()
    yield
    thried()

## Context Manager
Very simple metaphor. In C you might have heard of resource allocation is initialization. It's the idea that there is always some desire to do some setup and teardown and you want to combine it. Opening a file is a perfect example of this, because if you open a file you need to close the file.

In [None]:
with open('ctx.py') as f:
    pass

So how do we write our own context manager. There is some top-level syntax and underlying underscore functions that implements them.

In [None]:
class ctx:
    def __enter(self):
        pass
    
    def __exit__(self, *args):
        pass

Now as an example suppose you are creating a context manager that opens a sql table and then drops it after finishing.

In [None]:
class temptable:
    def __init__(self, cur):
        self.cur = cur
        
    def __enter__(self):
        self.cur.execute('create table points(x int, y int)')
    
    def __exit__(self, *args):
        self.cur.execute('drop table points')
        

from sqlite3 import connect

with connect('test.db') as conn:
    cur = conn.cursor()
    with temptable(cur):
        cur.execute('insert into points (x,y) values(1,1)')
        for row in cur.execute('select x, y from points'):
            print(row)

(1, 1)


We can see that the `__exit__` must always be called after `__enter__`. So we see there is some sequencing and we saw that sequencing using generators. SO I can improve the above example using generator.

In [None]:
def template(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')
    
class contextmanager:
    def __init__(self, cur): self.cur = cur
    def __enter__(self): 
        self.gen = template(self.cur)
        next(self.gen)
    def __exit__(self, *args): 
        next(self.gen, None)
        
with connect('test.db') as conn:
    cur = conn.cursor()
    with contextmanager(cur):
        cur.execute('insert into points (x,y) values(1,1)')
        for row in cur.execute('select x, y from points'):
            print(row)

(1, 1)


Now we can extend the contextmanager class to work for any generator.

In [None]:
def temptable(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')

class contextmanager:
    def __init__(self, gen):
        self.gen = gen
        
    def __call__(self, *args, **kwargs):
        self.args, self.kwargs = args, kwargs
        return self
    
    def __enter__(self):
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)
        
    def __exit__(self, *args):
        next(self.gen_inst, None)

from sqlite3 import connect
with connect('test.db') as conn:
    cur = conn.cursor()
    with contextmanager(temptable)(cur):
        cur.execute('insert into points (x,y) values(1,1)')
        for row in cur.execute('select x, y from points'):
            print(row)

(1, 1)


Now contextmanager(temptable)(cur) looks ugly so we can change that as temptable = contextmanager(temptable). It is the same as writing a decorator.

In [None]:
class contextmanager:
    def __init__(self, gen):
        self.gen = gen
        
    def __call__(self, *args, **kwargs):
        self.args, self.kwargs = args, kwargs
        return self
    
    def __enter__(self):
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)
        
    def __exit__(self, *args):
        next(self.gen_inst, None)

def temptable(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')
temptable = contextmanager(temptable)

from sqlite3 import connect
with connect('test.db') as conn:
    cur = conn.cursor()
    with temptable(cur):
        cur.execute('insert into points (x,y) values(1,1)')
        for row in cur.execute('select x, y from points'):
            print(row)

(1, 1)


We can write the code as a decorator. Also, we don't have to write the contextmanager class as all of that is written in contextlib.

In [None]:
from contextlib import contextmanager

@contextmanager
def temptable(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')

from sqlite3 import connect
with connect('test.db') as conn:
    cur = conn.cursor()
    with temptable(cur):
        cur.execute('insert into points (x,y) values(1,1)')
        for row in cur.execute('select x, y from points'):
            print(row)

(1, 1)


So contextmanager is just a decorator that just turns a generator into a context manager and we can make some changes to the generator as follows for completeness.

In [None]:
from contextlib import contextmanager

@contextmanager
def temptable(cur):
    cur.execute('create table points(x int, y int)')
    try:
        yield
    finally:
        cur.execute('drop table points')

from sqlite3 import connect
with connect('test.db') as conn:
    cur = conn.cursor()
    with temptable(cur):
        cur.execute('insert into points (x,y) values(1,1)')
        for row in cur.execute('select x, y from points'):
            print(row)

(1, 1)
