Skip to content
kvorak edited this page Dec 4, 2015 · 1 revision

Abstract

This notebook describes a simple set of classes to provide comfortable ways to work with enumerated data. A common practice in software development is using constructs such as consts or write-only variables to define integers. In the end, what we are trying to accomplish is to use an integer to represent one value from among a set of values. That is what an index into a python immutable sequence does. So, I have created the two following classes to define an interface to the immutable sequence in the form of an Enumeration class and EnumeratedValue as a descriptor which enforces the sequence on a variable.

The Constructs

Enumeration

class Enumeration(tuple):
    """ Defines a tuple with attribute access

    This class defines a tuple that permits indexing with attribute access as a convenience function

    >>> colors = Enumeration(['CYAN', 'YELLOW', 'MAGENTA'])
    >>> colors.CYAN
    0
    >>> colors.RED
    None
    """

    def __getattr__(self, attr):
        if attr in self:
            return self.index(attr)

It is fairly common practice to see statements like myObject = myClass(status_code=status.OK) where status refers to a namespace and OK is a hardcoded integer value. In fact, a lot of python packages will do that be defining these constants in a module. So status.py would read

OK = 0
ERROR = 1
WARNING = 2
CRITICAL = 3

and so on. It is just easier for developers to remember that the status value needs to be one of OK, ERROR, WARNING, or CRITICAL than to remember which integer value (which would be used internally most of the time) corresponds to the status code. The purpose of the Enumeration class is to provide an interface in python to existing data structures that are already well suited to serving this need.

The Immutable Sequence

Python provides an immutable sequence in the form of the tuple. By using an immutable sequence, we can guarantee two things: that the list of values cannot be changed or altered and that it will remain in order. This means that, given a list of strings the correlation between a value and an integer is guaranteed to be preserved. Given x = tuple('zero','one','two'):

>>> x[0]
'zero'
>>> x.index('one')
1

The only problem is that, while it works for the same purpose as the constants above, it isn't as 'comfortable' to write status_code=status.index('OK').

Result

The Enumeration class simple extends the tuple and provides an implementation of __getattr__ that permits access using python's normal attribute accessors.

This means that we don't need status.py anymore and can simply declare:

status = Enumeration(['OK', 'ERROR', 'WARNING', 'CRITICAL'])

and the same effect is accomplished.

EnumeratedValue

class EnumeratedValue(object):
    """ A Descriptor describing an int which must be a valid index into a specified tuple

    This class is a data-descriptor which, when instantiated by an immutable sequence, raises
    a ValueError if the integer value represented is not a valid index into the sequence

    >>> foreground = EnumeratedValue(colors)
    >>> foreground
    0
    >>> foreground = 1
    >>> colors[foreground]
    "YELLOW"
    >>> foreground = EnumeratedValue(colors, colors.MAGENTA)
    >>> foreground
    2
    """

    def __init__(self, enumeration_options, initial_value=0):
        self.val = initial_value
        self.options = tuple(enumeration_options)

    def __get__(self, obj, objtype):
        return self.val

    def __set__(self, obj, val):
        if isinstance(val, int) and val in range(0, len(self.options)):
            self.val = val
        elif val in self.options:
            self.val = self.options.index(val)
        else:
            raise ValueError("{} is not in {}".format(val, self.options))

This is a fairly simple python data-descriptor that, when given an immutable sequence at instantiation, raises a ValueError if the variable it describes is ever set to a value that is not a valid index or value in the sequence and can be accessed via __get__ to always return its integer representation. This allows the variable to be used in the same what it always has, internally: as an integer representation of a value from among a static set of acceptable values.

The Doctest

colors = Enumeration(['CYAN', 'YELLOW', 'MAGENTA'])
print "With {}".format(colors)

class ColorValue(object):

    value = EnumeratedValue(colors)

# The EnumeratedValue defaults to the integer value 0
foreground = ColorValue()
print "foreground.value is: {}".format(foreground.value)

# It can be set with an integer, which can, in turn, be used to index the string representation
foreground.value = 1
print "The foreground color is now: {}".format(colors[foreground.value])

# It can also be set with the 'constant', but remains an integer
foreground.value = colors.MAGENTA
print "Now, it has changed to color #{}".format(foreground.value)

# And since that constant is just a mapping to a string...
foreground.value = 'CYAN'
print "It can even be changed to: {}".format(foreground.value)

# And it does not permit assignments outside of its range
try:
    foreground.value = 5
except ValueError as e:
    print "We can't do this because:\n{}".format(e.message)

# It even works if you pass an invalid constant
try:
    foreground.value = colors.RED
except ValueError as e:
    print "And we can't do this because:\n{}".format(e.message)
With ('CYAN', 'YELLOW', 'MAGENTA')
foreground.value is: 0
The foreground color is now: YELLOW
Now, it has changed to color #2
It can even be changed to: 0
We can't do this because:
5 is not in ('CYAN', 'YELLOW', 'MAGENTA')
And we can't do this because:
None is not in ('CYAN', 'YELLOW', 'MAGENTA')

The Real Application

Okay, so shortcuts are cool and the fact that it supports a familiar notation is convenient and comfortable. But that's a lot of work for little gain, right? And I have to import another package? One that isn't python-standard? That's not the reason this was written.

The fact that the classes are abstract and generic and that the values for the enumeration can be defined at runtime means that these structures can be used in template programming to define a value that will be constrained by a list of values that could be provided at the last possible moment. You might not hard-code colors in the doctest above, but rather import the list from a configuration or settings file. Maybe it was accessed from a server at a remote point or loaded from a database. It won't matter. As long as the list of strings can be provided before ColorValue is built by the python interpreter, the value can be treated as an integer, and the string value is always recoverable. As a bonus, should the value ever go out of range, a ValueError will be raised.

And since the integer/string correlation is immutably bound, you can even accept string values from users to store in the variable without doing any kind of lookup. This allows for the options to be passed to user as a list of strings, they can select one and send it back and the value (still just an integer with a descriptor around it) can be assigned a string literal and the mapping will be interpreted backwards with perfect accuracy.

Possible Further Extensions and Uses

  • An Enumeration that accepts tuple([tuple(str('key'), object_instance), ...]) and returns the instance instead of an integer when enumeration_instance.key is accessed.
  • Better type checking on Enumeration. As written, the entire __getattr__ function, the reason for the class to exist, is useless if the tuple isn't a sequence of string values. We should raise a TypeError exception if the class recieves anything else on instantiation.
  • Adapt the EnumeratedValue so that it can accessed to return an integer instead of an instace of EnumeratedValue even when used as a normal variable and not as a class attribute.