Skip to content

enhancements libcython

xOneca edited this page Nov 18, 2014 · 1 revision

CEP 603: Cython Library

Motivation

A lot of utility code can be shared in a common library, reducing the memory footprint of a program written (partly) in Cython. Additionally, types can be shared in this library, potentially reducing module startup time. Furthermore, certain optimizations may be applied for shared types, such as inter-module type checking, allowing the code to take a faster, specialized, path. As just one such example, obtaining a memoryview slice from a memoryview object can in many cases bypass the buffer interface and share the memoryview object directly.

Design and Implementation

Firstly, the feature will be optional and must be explicitly enabled by the user through distutils or a cython command line switch.

A module, also referred to as libcython, will be created and compiled when Cython itself is built. The module will have a mangled name, dependent on the Cython version, which prevents modules compiled with different Cython versions from sharing possibly ABI-incompatible functionality. This module is imported at runtime by modules in an analogous manner to functionality shared through pxd files.

Utility codes will get an additional attribute called libcython_entries. If present, it specifies a list of Entry objects relating to symbols that should be exported by libcython and imported by modules wishing to share functionality.

C Utility Codes

When exporting symbols via libcython, the prototype (MyUtility.proto) of the utility will be included in both libcython and any user-compiled module. This means the compiler will generate the prototype for any exporting symbols using the Entry object, and hence no prototype for the exporting symbol must be declared by the utility:

static int (*myfunc)(float myarg); /* libcython prototype */
static int myfunc(float myarg); /* module prototype */

Creating Entry objects and associated types for C symbols is rather tedious, especially for complex types such as extension types. Hence entry creation would be best handled through a Cython utility. Let us consider CythonFunction, we can define this in Cython space as follows:

cdef extern from *:
    ctypedef class __builtin__.pycfunction [object PyCFunctionObject ]:
        pass

@cname('__pyx_CyFunction')
cdef class CythonFunction(pycfunction):
    cdef int flags
    cdef object func_dict
    cdef object func_weakreflist
    ...

The Entry for the CythonFunction class can readibly be retrieved, however a convenience function to automate the process would be preferred. This entry would be listed in the libcython_entries list, and would generate a prototype (in this case for the object struct layout, not for the PyTypeObject itself) in both the user's extension module and in libcython and no C level prototype would be needed. The entry in the example would have the default struct object name __pyx_CyFunction_obj, but this may be replaced with whatever name is wanted.

Cython Utility Codes

Utility codes written in Cython will be handled in a similar way to the C utility codes. Instead of filtering out all symbols that have to be exported a new decorator export may be introduced to facilitate the process. This also makes it easier to whitelist which entries of Cython utility codes participate in user-exposed scopes, such as the cython or cython.view scopes. Global variables that need to be exposed (which would be rare anyway), would however have to be handled manually as no decorator can be applied in that context.

Entry Caching

An EntryPool shall be associated with each CythonUtility from which entries may be read. There shall be two forms of EntryPools, the first shall invalidate cached entries based on timestamps whereas the latter shall check for the matching of hashes (e.g. SHA). For caching of entries of pxd files the first option will be more efficient as only the file's timestamp will have to be checked, and it would also allow entry caching for CythonUtility codes that are written inline in the compiler (and have no associated timestamp). The hashing approach might however cause the cache on disk to grow nearly indefinitely as there is no way to invalidate previous versions of a given code, so the compiler will have to sweep old entries from the cache.

Clone this wiki locally