# Objectionable Content| PyData Austin 2019
> Building systems in Python, the Python object model; no machine learning.

While working with Python you may have created some functions, played around with keyword arguments or created some classes. But you may have never had to write a large system design. This large system design is essential the moment your analytical work is over and you want to do something real with it (or something in business context). 

So the problem at hand is to model a circuit where we will calculate currents. So when we start me might start by modeling that circuit as a tuple. When we start we get a tuple type and some fields associated with it. When we have to interact with the fields of the tuple we can see that it can be a bit unpleasant as we have to use direct field accessing as we might not know what position indexs mean. Then at some time we do tuple destructuring to give names to these position indexes.

In [None]:
component = 'register', '10-232-1412', 'honhai', 10
type(component)

tuple

In [None]:
# To access the fields
component[0]

'register'

In [None]:
# tuple destructuring
type_, number, manufacturer, resistance = component

type_, number, manufacturer, resistance

('register', '10-232-1412', 'honhai', 10)

The problem with the above is that we still need to know the structure of the code to destructure the tuple and if the original definition of the tuple and the destructuring are far away than it may be difficult to keep them in sync.

Next we may try something a little bit fancier. It might say we want to be able to access the fields of this object directly and I want to have some self-documenting behavior. So instead I will use a dictionary.

In [None]:
component = {
    'type':         'register',
    'number':       '10-232-1412',
    'manufacturer': 'honhai',
    'resistance':   10,
}

component['type']

'register'

That kind of works there is nothing much fancy there. To access the fields I can use square brackets, it is a little bit clumsy and have those quotes. But it is better than having the tuple as if I had to add fields later I don't have to change all the code that I have written and visually it is close to self-documenting.

Next I may something very silly where I may try to access the object by using the dot syntax rather than using square brackets. So I can create a new dict class and override some methods.

In [None]:
class attrdict(dict):
    __getattr__ = dict.__getitem__
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__

component = attrdict({
    'type':         'register',
    'number':       '10-232-1412',
    'manufacturer': 'honhai',
    'resistance':   10,
})

component.type

'register'

**Note**:- The above is a really bad idea and don't ever do that.

But maybe it is not so silly. If we create it as a normal python object (regular class with init method).

In [None]:
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance
r = Resistor('10-232-1412', 'honhai', 10)

If we look inside this class we see there is a `__dict__` attribute and it is pretty much the same dictionary as the dictionary we saw above. This dict object is storing the instance state.

In [None]:
r.__dict__

{'number': '10-232-1412', 'manufacturer': 'honhai', 'resistance': 10}

In [None]:
r.__dict__.number

AttributeError: 'dict' object has no attribute 'number'

In [None]:
r.__dict__['number']

'10-232-1412'

Now don't try to convert this dict to an attrdict.

The next argument that we might make is that it not that different from the way python dictionaries are actually implemented under the covers. Now if we try this, we have a large system with large number of these Resistor objects because we are modeling humongous circuits and we see that it is using a lot of memory. Now we check the size of the object and we see that it is 64 bytes of data which is pretty big for one individual piece of data that really doesn't store a lot of information.

In [None]:
from sys import getsizeof
getsizeof(Resistor(None, None, None))

64

Now if we were to really do some kind of memory analysis of the program we could go pretty far. We can use the `tracemelloc` module which can take snapshots and you can see what got allocated between those two snapshots.

In [None]:
from tracemalloc import start, take_snapshot

start()
before = take_snapshot()
r = Resistor('10-232-1412', 'honhai', 10)
after = take_snapshot()

# works in .py script
for stat in (stat for stat in after.compare_to(before, 'lineno') if stat.traceback[0].filename == __file__):
    print(stat)

The output for above is 
```python
check_tracemalloc.py:11: size=64 B (+64 B), count=1 (+1), average=64 B
check_tracemalloc.py:5: size=40 B (+40 B), count=1 (+1), average=40 B
```

So we can see that additional 64 bytes of space is allocated. Also, the moment anybody gives you a project and asks you to reduce the memory consumption and that project is done in Python, just run away from that job as most of those projects are dead ends. The reason is that there are some structures within python interpreter like FreeLists in your attribute lookup mechanism that never get deallocated. So a python program may not deallocate all the memory that it does not require but it will just ht a high-water mark and stay at that water mark.

So we will not continue this example as this is a dead end.

So the typical answer to our problem is using **slots** in which a python object no longer has an instance dictionary and so you save all that memory from that extra dictionary and instead you have explicit locations for where the instant state of that type is stored. The problem is that this mechanism cannot store anything outside of its scope i.e. outside of number, manufacturer, resistance.

In [None]:
class Resistor:
    __slots__ = 'number', 'manufacturer', 'resistance'    
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance
r = Resistor('10-232-1412', 'honhai', 10)

hasattr(r, '__dict__')

False

In [None]:
getsizeof(Resistor(None, None, None))

72

In [None]:
r.abs = 10

AttributeError: 'Resistor' object has no attribute 'abs'

So we lose that dynamic nature in exchange of space. But for our example we can see it is not true, as it uses more memory. 

**Note**:- So don't waste your time with memory analysis in Python.

So what you should do instead is consider python as an orchestration language (orchestration is the automated configuration, coordination and management of computer systems). A language which takes components written in other languages like C, C++ and orchestrates their mechanisms inside those components. We reserve the manipulation of the business entities or the manipulation of the experiment to the python level and the actual core computation use to this library in C, C++. So those entities turn into restricted computation domains (like numpy.ndarray). If you think about this for a second, every object in python is boxed there is no distinction in python between boxed and unboxed objects and also is heap-allocated and takes a large amount of space. Whereas for numpy.ndarray's they are contiguous machine types so they are unboxed.

**So you often design your system so that you have business entities that are unboxed entities that you put into a computation domain (like np, pd) and in that computation domain you unbox that entity.**

In [None]:
from numpy import array

# Create your boxed objects in python
x, y, z = 1, 2, 3
print(f'{x, y, z}')

# Transfer those objects to underlying computaton domain
values = array([x, y, z])

# Do the core computation using the computation domain
values *= 2
print(f'{values}')

# Get the result in python
x, y, z = values
print(f'{x, y, z}')

(1, 2, 3)
[2 4 6]
(2, 4, 6)


So whenever you are working in a production environment and you are working with python objects it will be slow and require large amount of memory because python objects are heap-allocated.

So when you do these things in python you end up with a manager class that manages everything.

In [None]:
from pandas import DataFrame

# Individual python object that represents this boxed entitiy
# This is not computation dependent, as you just need to look
# up number, manufacturere and find the resistance
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance

# Computation entity. This product represents that entity. I
# can add a __getitem__ method that can convert this unboxed
# to python boxed object.
class Product:
    def __init__(self, *components):
        self.components = DataFrame(
            [[x.manufacturer, x.resistance] for x in components],
            columns=['manufacturer', 'resistance'], 
            index=[x.number for x in components])
    
    def __getitem__(self, number):
        x = self.components.loc[number]
        return Resistor(number, x.manufacturer, x.resistance)
    
p = Product(
    Resistor('10-423-1234', 'honhai', 1),
    Resistor('10-423-1249', 'samsung', 5),
    Resistor('10-423-1230', 'honhai', 10),
)

p.components.resistance.mean(), p['10-423-1234']

(5.333333333333333, <__main__.Resistor at 0x7fd858f14c10>)

Now going back to our tuple example. **You never should be just using a tuple to represent any data in our program, instead use namedtuple.**

In [None]:
from collections import namedtuple
Resistor = namedtuple('Resistor', 'number manufacturer resistance')

r = Resistor('10-232-1412', 'honhai', 10)
r

Resistor(number='10-232-1412', manufacturer='honhai', resistance=10)

So now let's start digging into the *basic python model*. It is used to implement protocols. There are some protocol that represent initialization and there's some syntax associated with that and there's some close correspondence between the syntax that invokes a protocol and the way that we go and implement that protocol. 

In [None]:
r = Register(None, None, None) # does the initiailization

In [None]:
# Lets say we want to get human readable representation of the class
print(r)

Resistor(number='10-232-1412', manufacturer='honhai', resistance=10)


In [None]:
print(repr(r))

Resistor(number='10-232-1412', manufacturer='honhai', resistance=10)


In [None]:
# Now if I want to change this representation I have to modify the __repr__ method
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance
    
    def __repr__(self):
        return f'Resistor({self.number}, {self.manufacturer}, {self.resistance})'

r = Resistor('10-232-1412', 'honhai', 10)
r

Resistor(10-232-1412, honhai, 10)

Now if we follow the python data model docs we see there are some guidelines for these protocols. These are called protocols because usually what you are doing is taking some fixed mechanism by which you answer a question or perform some operation, you are hooking into it to add some small amendment or some small modification and then you are dispatching back to that same protocol on either constituent object or on base class. Now one of the rules for the `__repr__` method is that you should be able to cut and paste the result into a terminal and you should be able to get a new object. So I have to modify the repr method as follows. 

In [None]:
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance
    
    def __repr__(self):
        return f'Resistor("{self.number}", "{self.manufacturer}", {self.resistance})'

r = Resistor('10-232-1412', 'honhai', 10)
r

Resistor("10-232-1412", "honhai", 10)

Now the problem you may face is that what if there are quotes in the attributes. Then you may have to do some fancy stuff. But the solution to that is to call repr inside repr and this is the general guideline when you are implementing this repr function. This is shown below.

In [None]:
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance
    
    def __repr__(self):
        return (f'Resistor({self.number!r}, {self.manufacturer!r}, {self.resistance!r})')
        # !r is equivalent to use __repr__ as shown below
        # return f'Resistor({self.number!r}, {self.manufacturer!r}, {repr(self.resistance)})'
    
r = Resistor('10-232-1412', 'honhai', 10)
r

Resistor('10-232-1412', 'honhai', 10)

You may have seen angle brackets in some representation like when you open a file.

In [None]:
with open('check_tracemalloc.py') as f:
    print(f)

<_io.TextIOWrapper name='check_tracemalloc.py' mode='r' encoding='UTF-8'>


The reason for the angle bracket is that it shows that you cannot copy this representation to a terminal and replicate the object. This makes sense as there are may buffers or file pointer that are necessary to open the file but not after that. So when you see angle brackets it means you cannot copy and paste that representation to a terminal and replicate that object.

Now what if I add a potentiometer that subclasses my resistor.

In [None]:
class Potentiometer(Resistor):
    pass

p = Potentiometer('10-232-1412', 'honhai', 10)
p

Resistor('10-232-1412', 'honhai', 10)

When I print the potentiometer I see that it prints Resistor but that is wrong. The reason is I hardcoded the name of the object in my repr method. So I have to change that.

In [None]:
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance    
    def __repr__(self):
        return (f'{type(self).__name__}({self.number!r}, {self.manufacturer!r}, {self.resistance!r})')

class Potentiometer(Resistor):
    pass

p = Potentiometer('10-232-1412', 'honhai', 10)
p

Potentiometer('10-232-1412', 'honhai', 10)

This is still now enough. We still have to add some boilercode. This can be seen by the next example where we add new attribute to the potentiometer class.

In [None]:
class Potentiometer(Resistor):
    def __init__(self, number, manufacturer, resistance, min_resistance, max_resistance):
        if not min_resistance <= resistance <= max_resistance:
            raise ValueError('resistance out of bounds')
        self.min_resistance = min_resistance
        self.max_resistance = max_resistance
        super().__init__(number, manufacturer, resistance)
    
    def __repr__(self):
        return (f'{type(self).__name__}({self.number!r},'
                                      f'{self.manufacturer!r},'
                                      f'{self.resistance!r},'
                                      f'{self.min_resistance!r},'
                                      f'{self.max_resistance!r})')

p = Potentiometer('10-232-1412', 'honhai', 15, 10, 20)
p 

Potentiometer('10-232-1412','honhai',15,10,20)

This added boilerplate is unavoidable because the only other way you can avoid is this is you create your own protocols or object systems like shown below. (i.e. by adding `__fields__` and just for fun I will add `__slots__`.

In [None]:
class Resistor:
    __slots__ = __fields__ = 'number', 'manufacturer', 'resistance'
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance
    
    def __repr__(self):
        fields = ', '.join(repr(getattr(self, f)) for f in self.__fields__)
        return f'{type(self).__name__}({fields})'
    
class Potentiometer(Resistor):
    __slots__ = __fields__ = *Resistor.__fields__, 'min_resistance', 'max_resistance'
    def __init__(self, number, manufacturer, resistance, min_resistance, max_resistance):
        if not min_resistance <= resistance <= max_resistance:
            raise ValueError('resistance out of bounds')
        self.min_resistance = min_resistance
        self.max_resistance = max_resistance
        super().__init__(number, manufacturer, resistance)
        
p = Potentiometer('10-232-1412', 'honhai', 15, 10, 20)
p 

Potentiometer('10-232-1412', 'honhai', 15, 10, 20)

Now you created your own protocol system i.e. a set of rules that all objects that inherit from this base class should have. One of the rule is to have `__fields__` attribute and for that you will have to add some checks. 

Was this all worth it? Probably no in this case. You could have given all this work to an intern to write the boilerplate. But if you are creating a library then you have to create these object system.

Now if you really want to be clever you could try to use python's introspective nature in order to now build your own object system like shown below.

In [None]:
from inspect import signature

class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number = number
        self.manufacturer = manufacturer
        self.resistance = resistance
    
    def __repr__(self):
        fields = signature(self.__init__).parameters
        values = ', '.join(repr(getattr(self, f)) for f in fields)
        return f'{type(self).__name__}({values})'
    
class Potentiometer(Resistor):
    def __init__(self, number, manufacturer, resistance, min_resistance, max_resistance):
        if not min_resistance <= resistance <= max_resistance:
            raise ValueError('resistance out of bounds')
        self.min_resistance = min_resistance
        self.max_resistance = max_resistance
        super().__init__(number, manufacturer, resistance)
        
p = Potentiometer('10-232-1412', 'honhai', 15, 10, 20)
p 

Potentiometer('10-232-1412', 'honhai', 15, 10, 20)

The problem with the above approach is read-only fields. A read-only field is one that cannot be assigned as new value after initialization.

In [None]:
class A:
    def __init__(self, name):
        self.name = name
a = A('kushaj')

a.kushaj = 'something else' # This is not allowed

In [None]:
# To achieve the above we use @propoerty
class A:
    def __init__(self, name):
        self.name_ = name
    
    @property
    def name(self):
        return self.name_

a = A('kushaj')
a.name = 'something else'

AttributeError: can't set attribute

The protocols create a vocabulary. Let us define a network and we ask the question what is the length of the network. You implement that logic using `__len__`.

In [None]:
class Network:
    def __init__(self, *connections):
        self.connections = connections
        self.elements = {x.number: x for uv in connections
                                     for x in uv if x is not None}
        
    def __len__(self):
        return len(self.elements)
    
    def __getitem__(self, number):
        return self.elements[number]

`__hash__` method in python is intended to be used by the python interpreter. So don't use that method to implement your own hash function. Instead write a custom `hash()` method.

Let's take a simple report and we have done the circuit analysis before and after and figured out where the currents are.

In [None]:
before = {'i1': 10, 'i2': 15, 'i3': 10,          'i5': 1, }
after  = {'i1': 14, 'i2': 14,           'i4': 5, 'i5': 50,}

Now we might want to compare what's the before and after. So we want to generate a report as follows.

In [None]:
delt = '\N{greek capital letter delta}'
print(f'{"":<5} {"before":>8} {"after":>8} {f"|{delt}|":>8} {f"% {delt}":>8}')
for k in sorted(set(before) & set(after)):
    bef, aft = before[k], after[k]
    abs_diff, pct_diff = abs(aft - bef), abs(aft - bef) / bef
    print(f'{k:<5} {bef:>8.0f} {aft:>8.0f} {abs_diff:>8.2f} {pct_diff*100:>8.2f}%')

        before    after      |Δ|      % Δ
i1          10       14     4.00    40.00%
i2          15       14     1.00     6.67%
i5           1       50    49.00  4900.00%


We can add \*, ** to the end of the reports to show some large differences.

In [None]:
delt = '\N{greek capital letter delta}'
print(f'{"":<5} {"before":>8} {"after":>8} {f"|{delt}|":>8} {f"% {delt}":>8}')
for k in sorted(set(before) & set(after)):
    bef, aft = before[k], after[k]
    abs_diff, pct_diff = abs(aft - bef), abs(aft - bef) / bef
    flag = ''
    if pct_diff > .5:
        flag = '**'
    elif pct_diff > .1:
        flag = '*'
    print(f'{k:<5} {bef:>8.0f} {aft:>8.0f} {abs_diff:>8.2f} {pct_diff*100:>8.2f}% {flag}')

        before    after      |Δ|      % Δ
i1          10       14     4.00    40.00% *
i2          15       14     1.00     6.67% 
i5           1       50    49.00  4900.00% **


The flagging mechanism may be important is various other places so we want to make a abstraction of it. As a data scientist your first response would be to create a function.

In [None]:
def get_flag(pct_diff):
    if pct_diff > .5:
        return '**'
    elif pct_diff > .1:
        return '*'
    return ''
delt = '\N{greek capital letter delta}'
print(f'{"":<5} {"before":>8} {"after":>8} {f"|{delt}|":>8} {f"% {delt}":>8}')
for k in sorted(set(before) & set(after)):
    bef, aft = before[k], after[k]
    abs_diff, pct_diff = abs(aft - bef), abs(aft - bef) / bef
    flag = get_flag(pct_diff)
    print(f'{k:<5} {bef:>8.0f} {aft:>8.0f} {abs_diff:>8.2f} {pct_diff*100:>8.2f}% {flag}')

        before    after      |Δ|      % Δ
i1          10       14     4.00    40.00% *
i2          15       14     1.00     6.67% 
i5           1       50    49.00  4900.00% **


The above is a good way to go for the problem. But just for now let use consider another approach, where we create a rangedict object. 

In [None]:
class rangedict(dict):
    def __missing__(self, key):
        for (lower, upper), value in ((k, v) for k, v in self.items() if isinstance(k, tuple)):
            if lower <= key < upper:
                return value
        raise KeyError('cannot find {key} in ranges')
    
flags = rangedict({
    (0,            0.1,): '',
    (0.1,          0.5,): '*',
    (0.5, float('inf'),): '**',
})

delt = '\N{greek capital letter delta}'
print(f'{"":<5} {"before":>8} {"after":>8} {f"|{delt}|":>8} {f"% {delt}":>8}')
for k in sorted(set(before) & set(after)):
    bef, aft = before[k], after[k]
    abs_diff, pct_diff = abs(aft - bef), abs(aft - bef) / bef
    print(f'{k:<5} {bef:>8.0f} {aft:>8.0f} {abs_diff:>8.2f} {pct_diff*100:>8.2f}% {flags[pct_diff]}')

        before    after      |Δ|      % Δ
i1          10       14     4.00    40.00% *
i2          15       14     1.00     6.67% 
i5           1       50    49.00  4900.00% **


Now we try to answer the question what is the difference between `__call__` and `__getitem__`.

In [None]:
class T:
    def __call__(self, x):    return x*2
    def __getitem__(self, x): return x*2
    def f(self, x):           return x*2
    
obj = T()

obj(10), obj[10], obj.f(10)

(20, 20, 20)

Now we might now want to use f() method as that adds a new function. So now we can look at the assumptions of call and getitem to see which bests suits our problem.

Difference number 1 you can only pass one argument to getitem whereas to call you can pass many arguments. Also, you cannot use keyword arguments in getitem.

In [None]:
class T:
    def __call__(self, x, y, *, mode='...'): 
        return x*2 + y
    def __getitem__(self, xy): 
        x, y = xy
        return x*2 + y
    def f(self, x, y):
        return x*2 + y
    
obj = T()
obj(10, 10), obj[10, 20], obj.f(10, 20)

(30, 40, 40)

Why @classmethod? Consider a simple network.

In [None]:
class Network:
    def __init__(self, *resistors):
        self.resistors = resistors
net = Network()

net

<__main__.Network at 0x7fd857364990>

What if I provide a filename which is very realistic as I may provide the resistor info from a file.

In [None]:
class Network:
    def __init__(self, *resistors, filename=None):
        self.resistors = resistors
        if filename is not None:
            with open(filename) as f:
                ...
net = Network()

net

<__main__.Network at 0x7fd8573fb350>

**Note**:- Don't do the above code. As it is not worth creating your own object systems. My recommendation is that init is just boilerplate and just leave it a boilerplate. Your init should almost never do anything fancier than setting attributes, checking you are constructing valid attributes, constructing derived information. So reading a file is a terrible idea because what if I have have different file extensions then I will have to create separate if conditions for each file type.

So instead you should use a classmethod. A classmethod gives you a class from which you can derive whatever you want.

In [None]:
class Network:
    def __init__(self, *resistors):
        self.resistors = resistors
    
    @classmethod
    def from_file(cls, filename):
        with open(filename) as f:
            ...
    
    @classmethod
    def from_database(cls, filename):
        return cls(...)

net = Network.from_file('network.json')

Diving deeper into the object model. Let us look at the namedtuple example again.

In [None]:
from collections import namedtuple
Registor = namedtuple('Registor', 'number manufacturer resistance')
r = Registor('10-123-1242', 'taiyo', -10)
r

Registor(number='10-123-1242', manufacturer='taiyo', resistance=-10)

The problem is we entered negative resistance and there is no way to check for this. So we can subclass namedtuple and add the checks there.

In [None]:
class Registor(namedtuple('RegisterBase', 'number manufacturer resistance')):
    def __new__(cls, number, manufacturer, resistance=10):
        if resistance < 0:
            raise ValueError('resistance must be positive')
        return super().__new__(cls, number, manufacturer, resistance)
    
r = Registor('10-123-1242', 'taiyo', -10)
r

ValueError: resistance must be positive

Now what if we don't want these to be immutable objects. Like if we ave potentiometer where the resistance can change.

In [None]:
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        if resistance < 0:
            raise ValueError('resistance must be positive')
        self.number, self.manufacturer, self.resistance = number, manufacturer, resistance
        
    def get_resistance(self):
        return self.resistance
    
    def set_resistance(self, value):
        if value < 0:
            raise ValueError('resistance must be positive')
        self.resistance = value
        
r = Resistor('10-123-1242', 'taiyo', 10)
r.set_resistance(2)
r.get_resistance()

2

Now we have to set this getter and setter for every attribute. This is where the @property decorater comes into play.

In [None]:
class Resistor:
    def __init__(self, number, manufacturer, resistance):
        self.number, self.manufacturer, self.resistance = number, manufacturer, resistance
    
    @property
    def resistance(self):
        return self._resistance
    
    @resistance.setter
    def resistance(self, value):
        if value < 0:
            raise ValueError('resistance must be positive')
        self._resistance = value
        
r = Resistor('10-123-1242', 'taiyo', 10)
r.resistance = 2

r.resistance

2

Let us get deeper into object constructions. A class in most basic form is constructed as follows:

In [None]:
class T:
    pass

We know we can create an object by class T but this is actually dynamic code. This isn't C++, there isn't some bag of bits that represent this class, no actual runtime representation of this class but only some compile representation to figure out how you can translate pointer accesses into direct offset load operations. This is actually runtime code. We can check this with `dis` module in python which disassembles code.

In [None]:
from dis import dis
def f():
    return x + y
dis(f)

  3           0 LOAD_GLOBAL              0 (x)
              2 LOAD_GLOBAL              1 (y)
              4 BINARY_ADD
              6 RETURN_VALUE


Now if we try to create a class inside a function we see a `LOAD_BUILD_CLASS`.

In [None]:
def f():
    class T:
        pass
dis(f)

  2           0 LOAD_BUILD_CLASS
              2 LOAD_CONST               1 (<code object T at 0x7fd29858fc00, file "<ipython-input-3-398fb3a15e53>", line 2>)
              4 LOAD_CONST               2 ('T')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               2 ('T')
             10 CALL_FUNCTION            2
             12 STORE_FAST               0 (T)
             14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

Disassembly of <code object T at 0x7fd29858fc00, file "<ipython-input-3-398fb3a15e53>", line 2>:
  2           0 LOAD_NAME                0 (__name__)
              2 STORE_NAME               1 (__module__)
              4 LOAD_CONST               0 ('f.<locals>.T')
              6 STORE_NAME               2 (__qualname__)

  3           8 LOAD_CONST               1 (None)
             10 RETURN_VALUE


This LOAD_BUILD_CLASS class the base mechanism that is responsible for creating classes in python. So if we write our own version of it, we can hook every class construction in the entire runtime of your program.

In [None]:
from builtins import __build_class__
def __build_class__(*args, bc=__build_class__, **kwargs):
    print(f'__build_class__({args!r}, {kwargs!r})')
    return bc(*args, **kwargs)
import builtins
builtins.__build_class__ = __build_class__

class T:
    pass

__build_class__((<function T at 0x7fbce82db7a0>, 'T'), {})


The above example changes the default building of the class such that whenever you build a class you print that line about it. Now this is not much useful because if now I call some library then it will print that message for every class construction in that library. This class exists if you want to change memory allocation or do some debugging.

More useful is the metaclass mechanism.

In [None]:
class M(type):
    def __new__(cls, name, bases, body):
        print(f'M.__new__({cls!r}, {name!r}, {bases!r}, {body!r})')
        return super().__new__(cls, name, bases, body)
    
    def __init__(self, name, bases, body):
        print(f'M.__init__({self!r}, {name!r}, {bases!r}, {body!r})')
        super().__init__(name, bases, body)
        
class T(metaclass=M):
    pass

M.__new__(<class '__main__.M'>, 'T', (), {'__module__': '__main__', '__qualname__': 'T'})
M.__init__(<class '__main__.T'>, 'T', (), {'__module__': '__main__', '__qualname__': 'T'})


A metaclass is basically a hook into the object construction process. In other words when you construct the type in `class T(metaclass=M)` and you give it a metaclass there are two hookpoints for how that type is constructed. One is called new and other called init. There is a third called call. Now as you can see when we constructed the class T as there were hooks in the new and init, we printed that stuff out.

We can do some fancy stuff here. One of the things we can do is to provide the dictionary that's used for processing the body of that class which is done by the prepare method.

In [None]:
class M(type):
    def __new__(cls, name, bases, body):
        print(f'M.__new__({cls!r}, {name!r}, {bases!r}, {body!r})')
        return super().__new__(cls, name, bases, body)
    
    def __init__(self, name, bases, body):
        print(f'M.__init__({self!r}, {name!r}, {bases!r}, {body!r})')
        super().__init__(name, bases, body)
        
    @staticmethod
    def __prepare__(name, bases):
        print(f'M.__prepare__({name!r}, {bases!r})')
        return {}
    
class T(metaclass=M):
    pass

M.__prepare__('T', ())
M.__new__(<class '__main__.M'>, 'T', (), {'__module__': '__main__', '__qualname__': 'T'})
M.__init__(<class '__main__.T'>, 'T', (), {'__module__': '__main__', '__qualname__': 'T'})
