What are descriptors?

The underpinning mechanism for properties, slots, and even functions.

__The Problem__

In [1]:
# Having property functions and setters for eah attribute gets tedious and repetitive
class Point:
    @property
    def x(self):
        return self._x
    
    @x.setter
    def x(self, value):
        self._x = int(value)
        
    @property
    def y(self):
        return self._y
    
    @y.setter
    def y(self, value):
        self._y = int(value)
        
    def __init__(self, x, y):
        self.x = x
        self.y = y

__The Descriptor Protocol__

There are 4 main methods to implement the descriptor protocol (they are not all required)
- `__get__`
- `__set__`
- `__delete__`
- `__set_name__`

There are also 2 categories of descriptors:
- non-data descriptors: only implement `__get__`
- data descriptors: implement `__set__` and/or `__delete__`

__Using a Descriptor Class__

In [2]:
from datetime import datetime

In [3]:
class TimeUTC:
    def __get__(self, instance, owner_class):
        return datetime.utcnow().isoformat()

In [4]:
class Logger:
    current_time = TimeUTC()

In [5]:
l = Logger()
l.current_time

'2022-02-05T16:54:48.035985'

**The `__get__` Method**

Logger defines a single instance of TimeUTC as a *class attribute*. But because TimeUTC implements `__get__`, Python will use that method when retrieving the instance attribute value.

We can access current_time from the class or the instance:
- if called from an instance, the 'instance' parameter will not be `None`
- the 'owner_class' parameter describes the class that owns the TimeUTC instance

We can return different values from `__get__` depending if it was called from the class or the instance.

If called from the class, return the descriptor instance.

If called from an instance, return the attribute value.

In [6]:
class TimeUTC:
    def __get__(self, instance, owner_class):
        if not instance:
            return self
        else:
            return datetime.utcnow().isoformat()

**The `__set__` Method**

The `__set__` method signature differs slightly from `__get__`:

```
def __set__(self, instance, value): ...
```

Since setters are always called from instances, there is no need for the 'owner_class' parameter.

__Caveat with Set/Delete/Get__

When creating multiple instances of a class with descriptors, since the descriptors are defined as class attributes, they will all use the same reference. This is mostly a problem for Set and Delete.

This is one of the reasons why we pass the 'instance' parameter to these dunder methods, so we can be aware of which instance we are storing values for.

__Where to Store the Attribute Value__

Assuming the attribute objects we want to store are hashable, we can create a dictionary in the data descriptor instance and use the instance as the key.

In [1]:
class IntegerValue:
    def __init__(self):
        self.data = {}
        
    def __set__(self, instance, value):
        self.data[instance] = int(value)
        
    def __get__(self, instance, owner_class):
        if not instance:
            return self
        else:
            return self.data.get(instance)

This however is not the best approach in its current form, since we store an additional reference to the instance object. The object will never be garbage collected.

__Weak References__

We can solve this issue by creating a weak reference to the object, which will not affect the reference count.

In [2]:
import weakref

In [3]:
class IntegerValue:
    def __init__(self):
        self.data = {}
        
    def __set__(self, instance, value):
        inst_ref = weakref.ref(instance)
        self.data[inst_ref()] = int(value)
        
    def __get__(self, instance, owner_class):
        if not instance:
            return self
        else:
            return self.data.get(instance)

In [6]:
# The weakref module has a WeakKeyDictionary to accomplish what we did above.
# Once an object that has a weak reference is GC'd, its key will automatically be deleted
# from the dictionary
from weakref import WeakKeyDictionary

In [5]:
class IntegerValue:
    def __init__(self):
        self.data = WeakKeyDictionary()
        
    def __set__(self, instance, value):
        self.data[instance] = int(value)
        
    def __get__(self, instance, owner_class):
        if not instance:
            return self
        else:
            return self.data.get(instance)

__Improving the Weak Reference Approach__

Using weak references still require that our object be hashable. We can instead use the `id()` of our instance. Using `id` alone has some drawbacks, mainly we lose the ability for the dictionary to automatically remove keys for finalized objects.

Instead we can use a tuple of `(weak_ref, value)` for our dictionary vallues, and register a callback function that will remove the dead entry from the dictionary.

This gives all these benefits:
- instance specific storage
- doesn't use the instance itself for storage (`__slots__` problem)
- handles non-hashable objects
- keeps data storage mechanism clean

In [1]:
class IntegerValue:
    def __init__(self):
        self.values = dict()
    
    def __set__(self, instance, value):
        self.values[id(instance)] = (weakref.ref(instance, self._remove_object), int(value))
        
    def __get__(self, instance, owner_class):
        if instance is None: 
            return self
        else:
            return self.values[id(instance)][1]
        
    def _remove_object(self, weak_ref):
        reverse_lookup = [k for k, v in self.values.items()
                         if v[0] is weak_ref]
        if reverse_lookup:
            key = reverse_lookup[0]
            del self.values[key]

**The `__set_name__` Method**

A handy method introduced in Python 3.6 that gets called once when a descriptor is first instantiated. It can be useful for better error handling, and for descriptors used for validation.

In [1]:
class ValidString:
    def __init__(self, min_len):
        self.min_len = min_len
        
    def __set_name__(self, owner_class, property_name):
        self.property_name = property_name
        
    def __set__(self, instance, value):
        if not isinstance(value, str):
            raise ValueError(f"{self.property_name} must be 'str'")
        if len(value) < self.min_len:
            raise ValueError(f"{self.property_name} must be more than {self.min_len} characters")
        
        instance.__dict__[self.property_name] = value
        
    def __get__(self, instance, owner_class):
        if instance is None:
            return self
        else:
            instance.__dict__.get(self.property_name, None)

__Property Value Lookup Resolution__

When we have a property (descriptor) called `x`, and we have an instance dictionary (`__dict__`) that also contains `x`, will Python use the instance dictionary value or the descriptor?

It depends on whether the descriptor is a data or non-data descriptor.

For data descriptors (`__get__` and `__set__` defined), it will always override the instance dictionary by default.

For non-data descriptors (only `__get__` defined), it looks in the instance dictionary first and if not found, will use the descriptor.