Descriptors are a way of reusing the same access logic in multiple attributes. For example, field types in ORMs such as the Django ORM and SQL Alchemy are descriptors, managing the flow of data from the fields in a database record to Python object attributes and vice versa.

A descriptor is a class that implements a protocol consisting of the `__get__`, `__set__`, and `__delete__` methods.

A property factory is a way to avoid repetitive coding of getters and setters by applying functional programming patterns. A property factory is a higher-order function that creates a parameterized set of accessor functions and builds a custom property instance from them, with closures to hold settings like the storage_name. The object-oriented way of solving the same problem is a descriptor class.

## A Simple Descriptor

A class implementing a `__get__`, a `__set__`, or a `__delete__` method is a descriptor.

In [1]:
class Quantity:
    
    def __init__(self, storage_name):
        self.storage_name = storage_name
    
    def __set__(self, instance, value):
        if value > 0:
            # Here, we must handle the managed instance __dict__ directly; trying to use the
            # setattr built-in would trigger the __set__ method again, leading to infinite
            # recursion.
            instance.__dict__[self.storage_name] = value
        else:
            raise ValueError("Value must be > 0")

In [2]:
class LineItem:
    weight = Quantity('weight') 
    price = Quantity('price') 
    
    def __init__(self, description, weight, price): 
        self.description = description
        self.weight = weight
        self.price = price
    
    def subtotal(self):
        return self.weight * self.price

In Example 20-1, each managed attribute has the same name as its storage attribute, and there is no special getter logic, so Quantity doesn’t need a `__get__` method.

When coding a `__set__` method, you must keep in mind what the
self and instance arguments mean: self is the descriptor in‐
stance, and instance is the managed instance. Descriptors man‐
aging instance attributes should store values in the managed in‐
stances. That’s why Python provides the instance argument to the
descriptor methods.

A drawback of Example 20-1 is the need to repeat the names of the attributes when the descriptors are instantiated in the managed class body. It would be nice if the LineItem class could be declared like this:

```
class LineItem:
    weight = Quantity()
    price = Quantity()
    #remaining methods as before
```

The problem is that—as we saw in Chapter 8—the righthand side of an assignment is executed before the variable exists. The expression Quantity() is evaluated to create a descriptor instance, and at this time there is no way the code in the Quantity class can guess the name of the variable to which the descriptor will be bound (e.g., weight or price).

### Automatic Storage Attribute Names

In [3]:
class Quantity:
    
    __counter = 0
    
    def __init__(self):
        cls = self.__class__
        prefix = cls.__name__
        index = cls.__counter
        self.storage_name = '_{}#{}'.format(prefix, index)
        cls.__counter += 1
    
    def __get__(self, instance, owner):
        return getattr(instance, self.storage_name)
    
    def __set__(self, instance, value):
        if value > 0:
            setattr(instance, self.storage_name, value)
        else:
            raise ValueError("Value must be > 0")

In [4]:
class LineItem:
    weight = Quantity() 
    price = Quantity() 
    
    def __init__(self, description, weight, price): 
        self.description = description
        self.weight = weight
        self.price = price
    
    def subtotal(self):
        return self.weight * self.price

Here we can use the higher-level getattr and setattr built-ins to store the value instead of resorting to `instance.__dict__` because the managed attribute and the storage attribute have different names, so calling getattr on the storage attribute will not trigger the descriptor, avoiding the infinite recursion.

In [5]:
coconuts = LineItem("Brazilian coconut", 20, 17.95)
coconuts.weight, coconuts.price

(20, 17.95)

In [6]:
getattr(coconuts, '_Quantity#0'), getattr(coconuts, '_Quantity#1')

(20, 17.95)

If we wanted to follow the convention Python uses to do name
mangling (e.g., _LineItem__quantity0) we’d have to know the
name of the managed class (i.e., LineItem), but the body of a class
definition runs before the class itself is built by the interpreter, so
we don’t have that information when each descriptor instance is
created.

Note that `__get__` receives three arguments: `self`, `instance`, and `owner`. The `owner` argument is a reference to the managed class (e.g., `LineItem`), and it’s handy when the descriptor is used to get attributes from the class. If a managed attribute, such as weight, is retrieved via the class like `LineItem.weight`, the descriptor `__get__` method receives `None` as the value for the instance argument. This explains the Attribute error in the next console session:

In [7]:
try:
    LineItem.weight
except AttributeError as err:
    print(str(err))

'NoneType' object has no attribute '_Quantity#0'


In [8]:
class Quantity:
    
    __counter = 0
    
    def __init__(self):
        cls = self.__class__
        prefix = cls.__name__
        index = cls.__counter
        self.storage_name = '_{}#{}'.format(prefix, index)
        cls.__counter += 1
    
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return getattr(instance, self.storage_name)
    
    def __set__(self, instance, value):
        if value > 0:
            setattr(instance, self.storage_name, value)
        else:
            raise ValueError("Value must be > 0")

Usually we do not define a descriptor in the same module where it’s used, but in a separate utility module designed to be used across the application—even in many applications, if you are developing a framework.

***Django model fields are descriptors.***

Asimplemented so far, the Quantity descriptor works pretty well. Its only real drawback is the use of generated storage names like _Quantity#0, making debugging hard for the users. But automatically assigning storage names that resemble the managed attribute names requires a class decorator or a metaclass.

### Property Factory Versus Descriptor Class

In [9]:
def quantity():
    try:
        quantity.counter += 1
    except AttributeError:
        quantity.couonter = 0
    storage_name = '_{}:{}'.format('quantity', quantity.counter)
    
    def qty_getter(instance): 
        return getattr(instance, storage_name)
    
    def qty_setter(instance, value):
        if value > 0:
            setattr(instance, storage_name, value)
        else:
            raise ValueError('value must be > 0')
    
    return property(qty_getter, qty_setter)

I prefer the descriptor class approach mainly for two reasons:

* A descriptor class can be extended by subclassing; reusing code from a factory function without copying and pasting is much harder.
* It’s more straightforward to hold state in class and instance attributes than in function attributes and closures as we had to do in Example 20-5.

To summarize, the property factory pattern is simpler in some regards, but the descriptor class approach is more extensible. It’s also more widely used.

### A New Descriptor Type

In [10]:
import abc

In [11]:
class AutoStorage: 
    __counter = 0
    
    def __init__(self):
        cls = self.__class__
        prefix = cls.__name__
        index = cls.__counter
        self.storage_name = '_{}#{}'.format(prefix, index)
        cls.__counter += 1
    
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return getattr(instance, self.storage_name)
    
    def __set__(self, instance, value):
        setattr(instance, self.storage_name, value)

In [12]:
class Validated(abc.ABC, AutoStorage):
    
    def __set__(self, instance, value):
        value = self.validate(instance, value) 
        super().__set__(instance, value) 
    
    @abc.abstractmethod
    def validate(self, instance, value): 
        """return validated value or raise ValueError"""

In [13]:
class Quantity(Validated): 
    """a number greater than zero"""
    def validate(self, instance, value):
        if value <= 0:
            raise ValueError('value must be > 0')
        return value

In [14]:
class NonBlank(Validated):
    """a string with at least one non-space character"""
    
    def validate(self, instance, value):
        value = value.strip()
        if len(value) == 0:
            raise ValueError('value cannot be empty or blank')
        return value

In [15]:
class LineItem:
    weight = Quantity() 
    price = Quantity() 
    description = NonBlank()
    
    def __init__(self, description, weight, price): 
        self.description = description
        self.weight = weight
        self.price = price
    
    def subtotal(self):
        return self.weight * self.price

The LineItem examples we’ve seen in this chapter demonstrate a typical use of descriptors to manage data attributes. Such a descriptor is also called an **overriding descriptor** because its `__set__` method overrides (i.e., interrupts and overrules) the setting of an attribute by the same name in the managed instance. However, there are also non-overriding descriptors.

## Overriding Versus Nonoverriding Descriptors

Recall that there is an important asymmetry in the way Python handles attributes. Reading an attribute through an instance normally returns the attribute defined in the instance, but if there is no such attribute in the instance, a class attribute will be retrieved. On the other hand, assigning to an attribute in an instance normally creates the attribute in the instance, without affecting the class at all.

In [16]:
def cls_name(obj_or_cls):
     cls = type(obj_or_cls)
     if cls is type:
         cls = obj_or_cls
     return cls.__name__.split('.')[-1]

In [17]:
def display(obj):
    cls = type(obj)
    if cls is type:
        return '<class {}>'.format(obj.__name__)
    elif cls in [type(None), int]:
        return repr(obj)
    else:
        return '<{} object>'.format(cls_name(obj))

In [18]:
def print_args(name, *args):
    pseudo_args = ', '.join(display(x) for x in args)
    print('-> {}.__{}__({})'.format(cls_name(args[0]), name, pseudo_args))

In [19]:
class Overriding: 
    """a.k.a. data descriptor or enforced descriptor"""
    def __get__(self, instance, owner):
        print_args('get', self, instance, owner)

    def __set__(self, instance, value):
        print_args('set', self, instance, value)

In [20]:
class OverridingNoGet: 
    """an overriding descriptor without ``__get__``"""

    def __set__(self, instance, value):
        print_args('set', self, instance, value)

In [21]:
class NonOverriding: 
    """a.k.a. non-data or shadowable descriptor"""
    
    def __get__(self, instance, owner):
        print_args('get', self, instance, owner)

In [22]:
class Managed: 
    over = Overriding()
    over_no_get = OverridingNoGet()
    non_over = NonOverriding()
    
    def spam(self): 
        print('-> Managed.spam({})'.format(display(self)))

## Overriding Descriptor
A descriptor that implements the `__set__` method is called an overriding descriptor, because although it is a class attribute, a descriptor implementing `__set__` will override attempts to assign to instance attributes. This is how Example 20-2 was implemented. Properties are also overriding descriptors: if you don’t provide a setter function, the
default `__set__` from the property class will raise AttributeError to signal that the attribute is read-only.

In [23]:
obj = Managed()
obj.over  # obj.over triggersthe descriptor __get__ method, passing the managed instance
# obj as the second argument.

-> Overriding.__get__(<Overriding object>, <Managed object>, <class Managed>)


In [24]:
Managed.over  # Managed.over triggers the descriptor __get__ method, passing None as the
# second argument (instance).

-> Overriding.__get__(<Overriding object>, None, <class Managed>)


In [25]:
obj.over = 7  # Assigning to obj.over triggersthe descriptor __set__ method, passing the value
# 7 as the last argument.

-> Overriding.__set__(<Overriding object>, <Managed object>, 7)


In [26]:
obj.over  # Reading obj.over still invokes the descriptor __get__ method.

-> Overriding.__get__(<Overriding object>, <Managed object>, <class Managed>)


In [27]:
obj.__dict__['over'] = 8  # Bypassing the descriptor, setting a value directly to the obj.__dict__.

In [28]:
vars(obj)

{'over': 8}

In [29]:
obj.over  # However, even with an instance attribute named over, the Managed.over
# descriptor still overrides attempts to read obj.over.

-> Overriding.__get__(<Overriding object>, <Managed object>, <class Managed>)


### Overriding Descriptor Without `__get__`

Usually, overriding descriptors implement both `__set__` and `__get__`, but it’s also possible to implement only `__set__`, as we saw in Example 20-1. In this case, only writing is handled by the descriptor. Reading the descriptor through an instance will return the descriptor object itself because there is no `__get__` to handle that access. If a namesake instance attribute is created with a new value via direct access to the instance `__dict__`, the `__set__` method will still override further attempts to set that attribute,
but reading that attribute will simply return the new value from the instance, instead of returning the descriptor object. In other words, the instance attribute will shadow the descriptor, but only when reading.

In [30]:
obj.over_no_get  # This overriding descriptor doesn’t have a __get__ method, so reading
# obj.over_no_get retrieves the descriptor instance from the class.

<__main__.OverridingNoGet at 0x2033dac4880>

In [31]:
Managed.over_no_get  # The same thing happens if we retrieve the descriptor instance directly from the
# managed class.

<__main__.OverridingNoGet at 0x2033dac4880>

In [32]:
obj.over_no_get = 7  # Trying to set a value to obj.over_no_get invokes the __set__ descriptor
# method.

-> OverridingNoGet.__set__(<OverridingNoGet object>, <Managed object>, 7)


In [33]:
obj.over_no_get  # Because our __set__ doesn’t make changes, reading obj.over_no_get again
# retrieves the descriptor instance from the managed class.

<__main__.OverridingNoGet at 0x2033dac4880>

In [34]:
obj.__dict__['over_no_get'] = 9  # Going through the instance __dict__ to set an instance attribute named
# over_no_get.

In [35]:
obj.over_no_get  # Now that over_no_get instance attribute shadows the descriptor, but only for
# reading.

9

In [36]:
obj.over_no_get = 7  # Trying to assign a value to obj.over_no_get still goes through the descriptor
# set.

-> OverridingNoGet.__set__(<OverridingNoGet object>, <Managed object>, 7)


In [37]:
obj.over_no_get  # But for reading, that descriptor is shadowed as long as there is a namesake
# instance attribute.

9

### Nonoverriding Descriptor
If a descriptor does not implement `__set__`, then it’s a nonoverriding descriptor. Setting an instance attribute with the same name will shadow the descriptor, rendering it ineffective for handling that attribute in that specific instance. Methods are implemented as nonoverriding descriptors.

In [38]:
obj = Managed()
obj.non_over  # obj.non_over triggersthe descriptor __get__ method, passing obj asthe second
# argument.

-> NonOverriding.__get__(<NonOverriding object>, <Managed object>, <class Managed>)


In [39]:
obj.non_over = 7  # Managed.non_over is a nonoverriding descriptor, so there is no __set__ to
# interfere with this assignment.

In [40]:
obj.non_over  # The obj now has an instance attribute named non_over, which shadows the
# namesake descriptor attribute in the Managed class.

7

In [41]:
Managed.non_over  # The Managed.non_over descriptor is still there, and catches this access via the
# class.

-> NonOverriding.__get__(<NonOverriding object>, None, <class Managed>)


In [42]:
del obj.non_over  # If the non_over instance attribute is deleted…

In [43]:
obj.non_over  # Then reading obj.non_over hits the __get__ method of the descriptor in the
# class, but note that the second argument is the managed instance.

-> NonOverriding.__get__(<NonOverriding object>, <Managed object>, <class Managed>)


### Overwriting a Descriptor in the Class
Regardless of whether a descriptor is overriding or not, it can be overwritten by assignment to the class. This is a monkey-patching technique, but in Example 20-12 the descriptors are replaced by integers, which would effectively break any class that depended on the descriptors for proper operation.

In [44]:
obj = Managed() 
Managed.over = 1 
Managed.over_no_get = 2
Managed.non_over = 3
obj.over, obj.over_no_get, obj.non_over

(1, 2, 3)

Although the reading of a class attribute can be controlled by a descriptor with `__get__` attached to the managed class, the writing of a class attribute cannot be handled by a descriptor with `__set__` attached to the same class.

In order to control the setting of attributes in a class, you have to
attach descriptors to the class of the class—in other words, the
metaclass. By default, the metaclass of user-defined classes is type,
and you cannot add attributes to type.

## Methods Are Descriptors
A function within a class becomes a bound method because all user-defined functions have a `__get__` method, therefore they operate as descriptors when attached to a class.

In [45]:
obj = Managed()
obj.spam  # Reading from obj.spam retrieves a bound method object.

<bound method Managed.spam of <__main__.Managed object at 0x000002033DAE25E0>>

In [46]:
Managed.spam  # But reading from Managed.spam retrieves a function.

<function __main__.Managed.spam(self)>

In [47]:
obj.spam = 7
obj.spam  # Assigning a value to obj.spam shadows the class attribute, rendering the spam
# method inaccessible from the obj instance.

7

Because functions do not implement __set__, they are nonoverriding descriptors.

The other key takeaway from Example 20-13 is that obj.spam and Managed.spam retrieve different objects. As usual with descriptors, the `__get__` of a function returns a reference to itself when the access happens through the managed class. But when the access goes through an instance, the `__get__` of the function returns a bound method object: a callable that wraps the function and binds the managed instance (e.g., obj) to the first argument of the function (i.e., self), like the functools.partial function does.

In [48]:
import collections

In [49]:
class Text(collections.UserString):
    
    def __repr__(self):
        return 'Text({!r})'.format(self.data)
    
    def reverse(self):
        return self[::-1]

In [50]:
word = Text('forward')
word

Text('forward')

In [51]:
word.reverse()

Text('drawrof')

In [52]:
Text.reverse(Text('backward'))

Text('drawkcab')

In [53]:
type(Text.reverse), type(word.reverse)  # Note the different types: a function and a method.

(function, method)

In [54]:
list(map(Text.reverse, ['repaid', (10, 20, 30), Text('stressed')]))
# Text.reverse operates as a function, even working with objects that are not
# instances of Text.

['diaper', (30, 20, 10), Text('desserts')]

In [55]:
Text.reverse.__get__(word)  # Any function is a nonoverriding descriptor. Calling its __get__ with an instance
# retrieves a method bound to that instance.

<bound method Text.reverse of Text('forward')>

In [56]:
Text.reverse.__get__(None, Text)  # Calling the function’s __get__ with None as the instance argument retrieves the
# function itself.

<function __main__.Text.reverse(self)>

In [57]:
word.reverse  # The expression word.reverse actually invokes Text.reverse.__get__(word),
# returning the bound method.

<bound method Text.reverse of Text('forward')>

In [58]:
word.reverse.__self__  # The bound method object has a __self__ attribute holding a reference to the
# instance on which the method was called.

Text('forward')

In [59]:
word.reverse.__func__ is Text.reverse  # The __func__ attribute of the bound method is a reference to the original
# function attached to the managed class.

True