enhancements builtins pythontypestoo

DagSverreSeljebotn edited this page Apr 15, 2008 · 1 revision
Clone this wiki locally

One proposal: All types available both in runtime and type context

(This is some suggestions from DagSverreSeljebotn, up for discussion).

This is probably a matter of taste, but what I personally prefer is if the language in itself at least feels like it treats builtins and user-defined types the same way. Currently there is a distinction between type context which decides compile-time typing, and runtime context which refers to a runtime type object.

I do not suggest that one gets rid of this distinction, but I propose that all types (whether C or builtin Python or custom Python or C extension) can be used in both contexts.

These are type contexts:
  • Types of parameters and cdef local variables.
  • Casting using <T>-operator.
  • Can be used as type arguments.
These are runtime contexts:
  • Calling as a function (= calling constructor, which in Python is often converter)
  • Inspecting type objects (checking identity, getting repr() of the type, etc.)

Make Python types available in type context

This follows the pattern Robert proposes above. For any Python type T, allow the following:

   cdef T a = x

* With the exception of ``object``, *subtype instances can not be assigned*. ``x`` has to be exactly of the type ``T``. Some extension syntax (``descendants(T)`` or similar) can be added later, but it doesn't seem like a normal usecase.

robertwb: This is inconsistent with how Extension types work, and (IMHO) unpythonic. In fact I think it goes against the grain of OO programing in general. The cases of builtins are exceptional because there are enormous savings to be made and rarely does one subclass them.

  • Also, objects with instance-overriden methods can not be assigned (ie the dict is checked at assignment time if the dict is mutable).

robertwb: All dicts are mutable. One would have to check that the dict of the object didn't contain anything that overrode anything in any superclasse's dicts.

  • For all types, a runtime exception is thrown if the assigned object is of the wrong type (checks for __class__ identity). (In future, one might consider extending this by first try to invoke any __coerce__ operators.).
  • For builtins like list, the approach above will be taken for optimizations.
  • For extension types, one might be able to provide similar optimizations.

robertwb: Great pains are taken to make cdef methods overrideable (especially see cpdef methods, optional arguments, etc). If S is a subtype of T, and a (of type S) is declared to be type T, things should just work.

  • For Python-defined types, one might just leave it initially. However there is still a potential for optional optimization: Look up the method in the class definition at module load time and store the non-bound method, rather than looking up a method in the object.

Especially the last one might not bring huge gains, but even without any optimizations I feel it gives a cleaner language to let all types be available in type contexts.

robertwb: Cython already handles cdef T a = x, and I believe it does so in the correct way.

DagSverreSeljebotn: I just tested this in the most recent Cython. What you can do is this for any extension types that are declared in pxd files, however you cannot do something like:

class A: pass

def foo():
    cdef A x = A()

It just won't compile. Similarily, can't do this either:

import sets
def foo():
    cdef sets.Set x = sets.Set()

It complains that sets is not cimported.

robertwb: This is the difference between import and cimport. I can see maybe why you would want this to work for consistency, but I think this in general is unpythonic and don't see what one would want to declare types for except for type checking (which is easy enough to do) and function overloading (which is not implemented, and I'm not convinced we want to add that, as it's not the Python way to do things (though it might be useful to add function overloading for C types--wrapping C libraries would be easier for one thing.

Why this will work

This can work even if Python classes are only available run-time. Here's some example code with function overloading:

cdef object sqrt(object x): ... # 1
cdef double sqrt(double x): ... # 2
cdef Real sqrt(Rational x): ... # 3

x = time.time() # x is Python object
print sqrt(x) # calls 1
cdef double d = x
print sqrt(d) # calls 2
cdef Rational r = Rational(x)
print sqrt(r) # calls 3
a = r # a is same Rational object
print sqrt(a) # calls 1, unless type-inference is added

What happens here is that the Cython compiler, when it hits the "Rational" in a type context, will predeclare it as a Python object (much like other Python symbols in runtime contexts). And that is in fact enough to resolve function overloading etc. Meanwhile, inside sqrt(Rational) the necesarry methods of Rational can be resolved at module load time so that a working run-time optimization is in place.

Finer points

In the discussion above, object ends up being an exception (variables with this type can be assigned to descendants as well). This will never create problems as one won't instantiate object, however for pedagogical reasons one could introduce an additional keyword in type context only; any. So rather than cdef object x one would say cdef any x, where any basically means descendants(object).

Make C types available in runtime context

Basically, provide "constructors" for all C types. This would amount to adding an additional conversion syntax so that doing runtime calls to the Cython C types means conversion. For instance

a = double(x)
b = (unsigned int)(x) # Where () only serves to enclose the "variable name" "unsigned int", might not want to support this.

robertwb: Currently the language works so that it is unambiguous whether a given symbol refers to a type or object. This kind of unification may confuse things more (though worth considering, and for the "Pure Python" mode something like cython.types.unsigned_int(x) could represent a cast.)

DagSverreSeljebotn: This might be a matter of taste. Having all types, both C types and Python classes, always available both as a type specifier and as a callable constructor (which happen to often do conversion in Python) seems more consistent to me.

To be entirely consistent, one could have registered type objects with some info so that one could do something like:

>>> print double
<native C type 'double'>

But this is not particularily important.

Ensuing email discussion


In general type declarations of Python objects should accept subclasses of that object. Great pains are taken to make subclassing work well for extension types (vtables for cdef methods, all the magic that makes cpdef methods and optional arguments work). This is in fact one of the main tenants of object oriented programing. This is why statements like

> cdef T a = x
> * With the exception of object, subtype instances can not be assigned. x has to be exactly of the type T. Some extension syntax (descendants(T) or similar) can be added later, but it doesn't seem like a normal usecase.

make me quite hesitant.

In the question of being allowed to do

> cdef T a = x

for T a python class (not cimported, and not even necessarily a type) I am not sure this is a good thing. The *only* reason we declare types for python objects is to be able to do static binding. If T is not statically declared, then there is no advantage (other than perhaps type checking which can be done anyways). With no advantages, and it goes against the "duck typing" philosophy of Python (though one can always manually check the type if one needs it), I'm not convinced that we want to go this route.

I would like more feedback on this from the general community before rejecting it outright however.

Dag Sverre

Robert makes a very good case (and has fully convinced me) for needing to support descendants being assigned (that not allowing that wasn't one of my brightest ideas).