# The Journey Matters More than the Destination
## Python Weirdness with Metaclasses and Descriptors

**RSE Webinar - 30/01/2019**  
**Dr. Chris Cave-Ayland - Senior Research Computing Engineer - University of Southampton**  
**c.i.cave-ayland@soton.ac.uk**  
**@ccaveayland**  

# Introduction

<img src="attachment:image.png" alt="Drawing" style="width: 600px;"/>

In [1]:
from simtk import openmm

nbf = openmm.NonbondedForce()
for i in range(nbf.getNumParticles()):
    charge, sigma, epsilon = nbf.getParticleParameters(i)

In [None]:
for p in nbf.particles:
    print(p.charge)

# The Difference Between Python and C++

It's worth exploring why this stylistic difference exists between these two languages. An example:

In [3]:
class SimpleClass(object):
    def __init__(self):
        self.value = 0

sc = SimpleClass()
print(sc.value)
sc.value = 5
print(sc.value)

0
5


Let's say I go away and use this class. I write a bunch of code that depends on it. Later I decide I want to embelish this class and add some range checking to the value. Suddenly a simple attribute isn't good enough anymore. Instead I need a function that is called to set the value that can also perform the range checking. 

In [4]:
class SimpleClass(object):
    def __init__(self):
        self.value = 0
    
    def setValue(self, new_value):
        assert new_value < 100, "Value must be less than 100"
        self.value = new_value
        
sc = SimpleClass()
sc.setValue(5)
sc.setValue(100)

AssertionError: Value must be less than 100

The problem is that swapping to function calls would break all of my existing code. In Python there is a way around this but first let's look at how C++ handles this problem.

## C++ Style

C++ you would never write your class like the above to begin with. Instead you would design your class with encapsulation.

In [5]:
class CppSimpleClass(object):
    """A simple class written in an 'encapsulated' C++ style.
    """
    def __init__(self):
        self._value = 0
        
    def getValue(self):
        return self._value
    
    def setValue(self, val):
        assert val < 100, "value must be less than 100"
        self._value = val

csc = CppSimpleClass()
print(csc.getValue())
csc.setValue(5)
print(csc.getValue())
csc.setValue(100)

0
5


AssertionError: value must be less than 100

The use of these get and set routines to access underlying private attributes is the hallmark of encapsulation. It's a universal design pattern in C++ for the simple reason that it makes your classes robust to later changes. By using a function from the start you have more wiggle room to implement required pieces of functionality.

## Python Style

In Python you don't (always) have to use encapsulation because you have property's. Going back to our starting example we can implement the desired range checking without having to alter how the class is actually used.

In [6]:
class SimpleClass(object):
    def __init__(self):
        self._value = 0
        
    @property
    def value(self):
        return self._value
    
    @value.setter
    def value(self, val):
        assert val < 100, "value must be less than 100"
        self._value = val

# you get range checking but the class still looks the same externally
sc = SimpleClass()
print(sc.value)
sc.value = 5
print(sc.value)
sc.value = 100

0
5


AssertionError: value must be less than 100

# The property object in Python

You might have encountered propertys before and thought of them as a handy feature. But this feature of allowing function calls to be presented as simple attributes fundamentally changes the design constraints of the language. Instead of get and set routines being necessary they should be avoided wherever possible.

Before moving on I'd like draw out more clearly what is going on in the previous example. It's important to note that @property is not some piece of magical syntax that makes functions look like data. There are two things going on; first is the use of @. This denotes a decorator and for simplicity let's take it out of the equation entirely:

In [7]:
class SimpleClass(object):
    def __init__(self):
        self._value = 0
        
    #@property
    def value(self):
        return self._value
    value = property(value)
    
#setter routine omitted for brevity
sc = SimpleClass()
print(sc.value)

0


Hopefully this makes something important a bit clearer. A property is just a Python object (like everything else). If we want to understand how it works then the best thing to do is to just look at its code. Below is a pure Python implemention of property.

In [8]:
class Property(object):
    "Emulate PyProperty_Type() in Objects/descrobject.c"

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)

    def getter(self, fget):
        return type(self)(fget, self.fset, self.fdel, self.__doc__)

    def setter(self, fset):
        return type(self)(self.fget, fset, self.fdel, self.__doc__)

    def deleter(self, fdel):
        return type(self)(self.fget, self.fset, fdel, self.__doc__)

The crux of how a property works is in the methods, \_\_get\_\_, \_\_set\_\_ and \_\_delete\_\_. Classes that customise one or more of these methods are called descriptors and they work by exploiting the **attribute lookup** behaviour of Python classes. This is just a fancy way of describing what happens when using the "." syntax. How attribute lookup works is actually quite complicated but we'll look at a small part of it (see [4] for the full story). The attributes of a Python object are stored in its \_\_dict\_\_.

In [9]:
sc = SimpleClass()
print('Instance attributes:', sc.__dict__)

Instance attributes: {'_value': 0}


That's quite sparse! The \_\_dict\_\_ contains only those values unique to an individual instance, usually those things set in \_\_init\_\_. Everything else is stored in the \_\_dict\_\_ of an instance's class.

In [10]:
print('Class attributes:', SimpleClass.__dict__)

Class attributes: {'__module__': '__main__', '__init__': <function SimpleClass.__init__ at 0x7fbb2b6458c8>, 'value': <property object at 0x7fbb2b64c548>, '__dict__': <attribute '__dict__' of 'SimpleClass' objects>, '__weakref__': <attribute '__weakref__' of 'SimpleClass' objects>, '__doc__': None}


In a simpler world you might expect that attribute lookup would be straightforword, first look for an attribute in the instance \_\_dict\_\_ if it's not there then try the class \_\_dict\_\_.

Unfortunately descriptors make things more complicated. The class \_\_dict\_\_ is actually checked before the instance \_\_dict\_\_, if the attribute is found in the class \_\_dict\_\_ and if it has a \_\_get\_\_ and \_\_set\_\_ method then the result of the lookup will be:

In [11]:
# sc.value returns
SimpleClass.__dict__['value'].__get__(sc, SimpleClass)

0

This is just the default lookup behavior. This too can be customised through the \_\_getattribute\_\_ method of a class.

Returning to the CppSimpleClass example we now have the tools we need to effectively write a Pythonic equivalent without too much work.

In [12]:
class SimpleWrapper(object):
    # this would need to be a subclass of CppSimpleClass in Python 2
    def __init__(self):
        self._value = 0
    
    value = property(CppSimpleClass.getValue, CppSimpleClass.setValue)

sw = SimpleWrapper()
print(sw.value)
sw.value = 5
print(sw.value)
sw.value = 100

0
5


AssertionError: value must be less than 100

Progress! Still a little simplistic however. There is a second slightly different context in which get and set routines are used in C++ classes. Consider the below:

In [13]:
class CppClass(CppSimpleClass):
    def __init__(self):
        self._value = 0
        self._array = [0, 1, 2]
        
    def getNumArrayElements(self):
        return len(self._array)
        
    def getArrayElement(self, index):
        return self._array[index]
    
    def setArrayElement(self, index, value):
        assert value < 100, "Element must be less than 100"
        self._array[index] = value
        
cc = CppClass()
for i in range(cc.getNumArrayElements()):
    print(cc.getArrayElement(i))

0
1
2


It's a similar case but a bit different. Can we get away with using a property here?

In [14]:
class Wrapper(SimpleWrapper):
    def __init__(self):
        self._value = 0
        self._array = []
        
    array = property(CppClass.getArrayElement, CppClass.setArrayElement)
    
w = Wrapper()
print(w.array)

TypeError: getArrayElement() missing 1 required positional argument: 'index'

# Creating a New Descriptor

So we need a custom descriptor that can handle this form of getter and setter. Something like this can do the job:

In [15]:
class ArrayProperty(property):
    def __init__(self, flen, fget, fset, doc=None):
        super().__init__(fget, fset, doc=doc)
        self.flen = flen
    
    def __get__(self, obj, objtype=None):
        self.parent = obj
        return self
    
    def __len__(self):
        return self.flen(self.parent)
    
    def __getitem__(self, index):
        return self.fget(self.parent, index)
        
    def __setitem__(self, index, value):
        self.fset(self.parent, index, value)
    
    def __set__(self, obj, value):
        raise AttributeError('Cannot set attribute')

We can borrow from property for the \_\_init\_\_ method at least. Otherwise things are mixed around w.r.t. property, fget and fset are now associated with the \_\_getitem\_\_ and \_\_setitem\_\_ methods. The \_\_set\_\_ method is used to prevent this attribute from being overwritten. At this point it is only a wrapper for a C++ array and we don't want to allow it to be disconected from the underlying array. The \_\_get\_\_ method is used to set the parent attribute, this captures the particular instance to provide to the unbound fget and fset methods.

In [16]:
class Wrapper(SimpleWrapper):
    def __init__(self):
        super().__init__()
        self._array = [0, 1, 2] # init with some values so we have something to work with
        
    array = ArrayProperty(flen=CppClass.getNumArrayElements, 
                          fget=CppClass.getArrayElement,
                          fset=CppClass.setArrayElement)
    
w = Wrapper()
print("Accessed array attribute -", w.array)
print("Number of array elements -", len(w.array))
print("Array Element -           ", w.array[0])
w.array[0] = 20
print("Updated array element -   ", w.array[0])
print("Underlying list -         ", w._array)
w.array[0] = 100

Accessed array attribute - <__main__.ArrayProperty object at 0x7fbb2b6a6648>
Number of array elements - 3
Array Element -            0
Updated array element -    20
Underlying list -          [20, 1, 2]


AssertionError: Element must be less than 100

Even with an implementation this basic we can create a compelling example. 

In [17]:
w = Wrapper()
for element in w.array:
    print(element)

# instead of
cc = CppClass()
for i in range(cc.getNumArrayElements()):
    print(cc.getArrayElement(i))

0
1
2
0
1
2


# Metaclasses

In principle I now had all the tools to create the desired Pythonic version of the target library. The bottleneck is I'd have to write one of the above for every single class in OpenMM which was just too boring to contemplate. So the next question that occured was how could I automate the above process as much as possible?

The essence of the question was can I programmatically create classes?

Turns out you can and the mechanism to do so is via metaclasses. There is a default metaclass, which is type. You can set the metaclass for a new class, like so:

In [18]:
class Wrapper(SimpleWrapper, metaclass=type):
    def __init__(self):
        super().__init__()
        self._array = [0, 1, 2] # init with some values so we have something to work with
        
    array = ArrayProperty(flen=CppClass.getNumArrayElements,
                          fget=CppClass.getArrayElement,
                          fset=CppClass.setArrayElement)

In the above case we've trivially set the metaclass to the default. Similarly to the decorator example, the above is just a nicer syntax that is provided instead of it's equivalent which is below:

In [19]:
Wrapper = type('Wrapper', # class name
               (SimpleWrapper,), # tuple of base classes
               {'__init__': CppClass.__init__, # dict of additional attributes, 
                'array': ArrayProperty(CppClass.getNumArrayElements, CppClass.getArrayElement, CppClass.setArrayElement)})

w = Wrapper()
print(w.value)
for element in w.array:
    print(element)

0
0
1
2


Here type is being used as a metaclass. In fact, type is the default metaclass of all Python classes. 

To start with lets define metaclasses with a useful lie - **A metaclass is a callable that returns a class object**. To put it another way - metaclass is to class, as class is to instance:

In [20]:
w = Wrapper()
print(isinstance(w, Wrapper))
print(isinstance(Wrapper, type))

True
True


As you can call a Class to create an instance you can call a Metaclass to create a class. If a Metaclass is just a callable then we could use just a standard function:

In [21]:
def Pythonize(name, bases, attrs):
    base = bases[0]
    attrs.update({'__init__': base.__init__,
                 'value': property(base.getValue, base.setValue),
                 'array': ArrayProperty(base.getNumArrayElements, base.getArrayElement, base.setArrayElement)})
    return type(name, (), attrs)

class Wrapper(CppClass, metaclass=Pythonize):
    pass

Wrapper = Pythonize('Wrapper', (CppClass,), {})

w = Wrapper()
print(w.value)
for element in w.array:
    print(element)

0
0
1
2


This works... but has some shortcomings. In particular if you use a bog standard function as a metaclass it is not compatible with isinstance:

In [22]:
isinstance(Wrapper, Pythonize)

TypeError: isinstance() arg 2 must be a type or tuple of types

For this reason it is better to write a metaclass using the below form:

In [23]:
class Pythonize(type):
    def __new__(metaclass, name, bases, attrs):
        base = bases[0] # assuming only one base class here
        return super().__new__(
            metaclass,
            name,
            (),
            {'__init__': base.__init__,
             'value': property(base.getValue, base.setValue),
             'array': ArrayProperty(base.getNumArrayElements, base.getArrayElement, base.setArrayElement)})

class Wrapper(CppClass, metaclass=Pythonize):
    pass

w = Wrapper()
print(w.value)
for element in w.array:
    print(element)
print(isinstance(Wrapper, Pythonize))

0
0
1
2
True


The Pythonize "class" here can be used as a metaclass because it inherits from type. This method relies on more explicitly hooking in to the creation behaviour of Python classes. The \_\_new\_\_ method is the class constructor in Python. You can think of it a bit like \_\_init\_\_ but it's invoked first.

All that remains is to write some code for the metaclass that is able to look through the attributes of the base class and determine how to group them up as arguments for the appropriate descriptor. This logic is a bit convoluted so I've hidden it in a blackbox:

In [24]:
import attribute_grouping

class Pythonize(type):
    def __new__(mcs, name, bases, attrs):
        base = bases[0]
    
        # look through base class attributes and group getters and setters
        # for arrays and individual values
        array_attributes, non_array_attributes = attribute_grouping.blackbox(base)

        print(array_attributes)
        
        attrs['__init__'] = base.__init__
        
        for attr, args in non_array_attributes.items():
            attrs[attr] = property(*args)
            
        for attr, args in array_attributes.items():
            attrs[attr] = ArrayProperty(*args)
            
        return super().__new__(mcs, name, (), attrs)

Wrapper = Pythonize('Wrapper', (CppClass,), {})

w = Wrapper()
print(w.value)
for element in w.array:
    print(element)
print(isinstance(Wrapper, Pythonize))

{'array': [<function CppClass.getNumArrayElements at 0x7fbb2b645c80>, <function CppClass.getArrayElement at 0x7fbb2b645bf8>, <function CppClass.setArrayElement at 0x7fbb2b645d90>]}
0
0
1
2
True


Finally I had a simple programmatic way to convert however many classes are in OpenMM to their Pythonic equivalents. All that was needed was a bit of logic represented by the below pseudocode.

In [None]:
# for class in simtk.openmm:
#     new_class = Pythonize(class)

# The End Result - KAPOW

KAPOW is a Pythonic OpenMM Wrapper. The result of applying (similar to) the above to the OpenMM Python API.

In [25]:
from simtk import openmm
import kapow

# Using OpenMM Natively
nbf = openmm.NonbondedForce()
nbf.addParticle(0, 0, 0)
for i in range(nbf.getNumParticles()):
    print(nbf.getParticleParameters(i))
print()

# Using KAPOW
knbf = kapow.NonbondedForce.Wrap(nbf)
for particle in knbf.particles:
    print(particle)

[Quantity(value=0.0, unit=elementary charge), Quantity(value=0.0, unit=nanometer), Quantity(value=0.0, unit=kilojoule/mole)]

Particle(charge=Quantity(value=0.0, unit=elementary charge), sigma=Quantity(value=0.0, unit=nanometer), epsilon=Quantity(value=0.0, unit=kilojoule/mole))


In [26]:
# Using OpenMM Natively
amd = openmm.AMDIntegrator(0, 0, 0)
for i in range(amd.getNumGlobalVariables()):
    print(amd.getGlobalVariableName(i), amd.getGlobalVariable(i))

print()

# Using KAPOW
amd = kapow.AMDIntegrator(0, 0, 0)
print(amd.globalVariables.items())

alpha 0.0
E 0.0

(('alpha', 0.0), ('E', 0.0))


So whats the problem? Turns out OpenMM is a moving target that interferes with the blackbox aspect above. Even different minor versions of OpenMM have a different set of exceptions that requires manual attention and tweaking.

# Conclusions

* You can make Python do weird things by interfering with how the attributes of classes are accessed (descriptors) and how classes are created (metaclasses).

* These may seem niche but fundamentally allow a tremendous amount of design flexibility in Python as a language.
* Descriptors are widely invoked through the property object; metaclasses are used widely by projects such as Django.
* Whilst the application to OpenMM is a bit impractical the actual inner workings of the Pythonization project may be more generally useful.

# Resources
[1] https://openmm.org - The OpenMM simulation package  
[2] https://github.com/cc-a/kapow - The Pythonic OpenMM Wrapper  
[3] https://docs.python.org/3.6/howto/descriptor.html - The Python documentation on Descriptors  
[4] https://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/ - An outstanding and detailed guide to Descriptors and Metaclasses