Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Assumptions

This is a draft for a new language feature in Cython.

It loosely fills the same role as "templates" does in other languages, however it does not result in seperate types in a type system, but rather constitute assumptions made on a per-variable basis.

Example 1:

#!python
cdef Array(len=10) x = get_my_array()
print x.len
Here, x is declared as holding "an Array object with has the len attribute set to 10". The effects are:
  • The assignment to x will fail if the object returned from get_my_array() does not have a len attribute set to 10.
  • Any reference to x.len in code below will be replaced compile-time with 10.

Example 2:

#!python
cdef Array(sorted=True) x = get_my_array()
x.binary_search(3)

Here, x.sorted is not a field that is available run-time, but rather is a name for an assumption contract. I.e., x.sorted can only be resolved compile-time. Such "virtual attributes" must always have a default value (in this case False) which is used by the Cython compiler if no assumptions are placed on it.

On the assignment, it is checked (by normal iteration) that x is already sorted, and if not an exception is raised and the assignment doesn't happen. Then, if binary_search is an inlineable function (in pxd file):

#!python
cdef Array:
    cdef inline object binary_search(self, needle):
        if not self.sorted: self.sort() # line (a)
        return self._search_sorted(needle)

then Cython might be able to simply remove line (a) when inlining. On the other hand, if the assumption wasn't made, "self.sorted" would default to False and the call to sort would happen, as one cannot know whether it is sorted.

Note that this is not a matter of optimizing away the if-test on line (a) -- it is rather a way of moving the expensive check to the place where the assignment to x happen, allowing syntax candy for "reusing" a check and keeping track of a condition. (Of course, this example is a little contrived and doesn't quite work, as it would be better to sort manually first and calling search_sorted manually too).

(Note: Cython-inlineable function in pxd file is not yet a feature in Cython, so this is hypothetically.)

Declaring possible assumptions for your class

Declaring assumption attributes

One option for this is to have new keywords:

#!python
cdef class Array:
    cdef can_make_assumptions int len
    cdef can_make_assumptions compile_time_only sorted = False

(exact strings/names are not final of course). The latter statement acts more or less like a constant declaration and doesn't result in run-time code.

Another option:

#!python
cdef class Array:
    cdef int len
    __cythoncanassume__ = [
        attribute_assumption("len"),
        virtual_assumption("sorted")
    ]

Please, first read [:DagSverreSeljebotn/WhyILikeSimpleGrammars: Why I Like Simple Grammars], then recoil in disgust. Also consider that this is something that will be written in for a select few, special classes -- verbosity (as in number of keystrokes) is not an issue at all (clarity is though).

The advantages to this seem to be that it could potentially also be used for non-cdef fields in normal Python classes. Also it makes it easy to have a default order, so that assumptions does not have to be named. More attributes might be needed for each assumption (see "Problems" section below). And finally, assumptions is something that definitely can carry over to Cython code one tries to run in the Python interpreter (where it would simply do a noop, and no harm).

The assume/unassume special functions

The functions __assume__ and __unassume__ are normal methods on the object, which are called when the object is assigned to/removed from variables on which assumptions has been placed.

Basically, the following:

#!python
cdef MyObj(len=10) x
o = MyArray(10)
x = o
print x.len
x = None

is translated into

#!python
cdef MyArray x
o = MyArray(10)
o.__assume__(len=10)
x = o
print 10
x.__unassume__(len=10)
x = None

__assume__ then has the possibility to raise an AssumptionError if len is not 10, which will always mean that the assignment to x will not happen. Note that x.len also got replaced.

While in lot of cases __assume__ could be generated automatically (mostly all where an assumption has the same name as a runtime attribute), for "virtual attributes" (assumptions which do not have a corresponding run-time attribute) it can not and needs a function. It seems more simple and consistent (as well as a reminder to the class author) to simply demand that all assumptions are checked manually in __assume__, though another option is to have a default implementation that simply compares the parameters using == with object attributes.

The object must guarantee that the runtime value of an attribute that is passed to __assume__ does not change before a corresponding __unassume__ is called. Normally the fields will be constant for the life-time of the object, but the object also has the option of reference-counting the assumptions and freeze certain mutable operations (or queue them and commit on unassume, if that is a natural mode of operation).

Types

Robert thought the issue of type assumptions are orthogonal and I agree, so let's solve this first. But if still you want to discuss it, [:enhancements/runtimectypes: here's my current thoughts].

Problems

What to do with "None"? I.e., x.len can probably not be replaced by 10 but have to be replaced by 10 if x is not None else <raise error>.

A problem in NumPy is that, for instance, x.shape can be assigned to, at which point assumptions should ideally be rechecked (either that, or NumPy must be changed to raise exceptions if shape is assigned to while under assumption, but not chaning ndarray is a goal...). One could do something like:

#!python
cdef class ndarray:
    __cythoncanassume__ = [
        attribute_assumption("ndim", invalidated_by=("shape",))
    ]

If "reshape" did inplace modification (it returns a new array instead, so that's no problem, but if it did) then you'd add "reshape" to invalidated_by. Any calls to methods or assignments to attributes whose names are listed in invalidated_by would then be treated like a reassignment (triggering reevaluation of assumptions).

Unfortunately, this principle means that one would have to also reevaluate assumptions after passing the object as an argument to other functions, which might kill performance in some cases. So perhaps changing NumPy (or allowing undefined/inconsistent behaviour if manually changing shape) is the only way to go...

This doesn't fully solve the needs of NumPy. For instance, it is very much desired that ndarray(shape=(2,3,4)) automatically implies ndarray(shape=(2,3,4),ndim=3). One way to tackle this would be to allow compile-time expressions in the syntax -- as it is a relatively limited usecase it shouldn't matter if it is a temporary solution and somewhat unstable... something along the lines of:

#!python
cdef class ndarray:
    __cythoncanassume__ = [
        attribute_assumption("ndim"),
        attribute_assumption("shape",
            provides={"ndim": (lambda shape: return len(shape))}
        )
    ]
Something went wrong with that request. Please try again.