Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subtyping PyVarObject (e.g. bytes/tuple) #711

Open
robertwb opened this issue Dec 9, 2008 · 4 comments
Open

subtyping PyVarObject (e.g. bytes/tuple) #711

robertwb opened this issue Dec 9, 2008 · 4 comments

Comments

@robertwb
Copy link
Contributor

robertwb commented Dec 9, 2008

The problem with a PyVarObject (such as bytes or tuple) is that its struct is of variable length that is determined at instance creation time. When Cython generates subclass code, it expects to be able to add fields directly behind the compile time struct, which thus end up in the variably allocated memory area.

Migrated from http://trac.cython.org/ticket/152

@robertwb
Copy link
Contributor Author

robertwb commented Dec 9, 2008

scoder changed description from

The problem with a PyVarObject (such as str) is that its struct is of variable length that is determined at instance creation time. When Cython generates subclass code, it expects to be able to add fields directly behind the compile time struct, which thus end up in the variably allocated memory area.

to

The problem with a PyVarObject (such as str) is that its struct is of variable length that is determined at instance creation time. When Cython generates subclass code, it expects to be able to add fields directly behind the compile time struct, which thus end up in the variably allocated memory area.

@robertwb
Copy link
Contributor Author

scoder commented

This also impacts the tp_dictoffset slot, so it has an impact on PyVarObject subtypes that use an instance __dict__.

@robertwb
Copy link
Contributor Author

scoder changed description from

The problem with a PyVarObject (such as str) is that its struct is of variable length that is determined at instance creation time. When Cython generates subclass code, it expects to be able to add fields directly behind the compile time struct, which thus end up in the variably allocated memory area.

to

The problem with a PyVarObject (such as bytes or tuple) is that its struct is of variable length that is determined at instance creation time. When Cython generates subclass code, it expects to be able to add fields directly behind the compile time struct, which thus end up in the variably allocated memory area.
summary from

subtyping PyVarObject (e.g. str)

to

subtyping PyVarObject (e.g. bytes/tuple)
commented

@robertwb robertwb added this to the wishlist milestone Aug 16, 2016
navytux added a commit to navytux/cython that referenced this issue Jan 17, 2023
…struct layout

in other words allow subtyping PyVarObject (e.g. bytes/tuple) without
adding C-level fields or otherwise changing C-level struct layout at all.

At cython#711 @robertwb says:

    The problem with a PyVarObject (such as str) is that its struct is of
    variable length that is determined at instance creation time. When
    Cython generates subclass code, it expects to be able to add fields
    directly behind the compile time struct, which thus end up in the
    variably allocated memory area

So the problem with inheriting from PyVarObject is that accessing
C-level attributes needs to be adjusted to know the real runtime size of
the object. However for simple cases, like e.g.

    cdef class MyBytes(bytes):
        # no cdef attributes
        def meth1()
        def meth2()
        ...

we know it can already work ok as is because if there is no C-level
attributes, then there is no problem of accessing them. In general if we
do not change the C-structure layout of the object compared to its base
type, it can work already with existing infrastructure without any
change.

Such inheritance can be useful even in the limited form. For example in
https://lab.nexedi.com/nexedi/pygolang/merge_requests/21 I use it to
inherit from `bytes` with making sure that there is no size increase in
inherited object. It already works ok out of the box for `unicode`, but
previously even

    cdef class MyBytes(bytes):
        pass

was rejected.

Updates: cython#711
@navytux
Copy link
Contributor

navytux commented Jan 17, 2023

Patch that allows limited inheriting from PyVarObject: #5212.

navytux added a commit to navytux/pygolang that referenced this issue Oct 9, 2023
For gpython to switch builtin str/unicode to bstr/ustr we will need
bstr/ustr to have exactly the same C layout as builtin string types.
This is possible to achieve only via `cdef class`. It is also good to
switch to `cdef class` for RAM savings - from cython/cython#5212 (comment) :

    # what Cython does at runtime for `class MyBytes(bytes)`
    In [3]: MyBytes = type('MyBytes', (bytes,), {'__slots__': ()})

    In [4]: MyBytes
    Out[4]: __main__.MyBytes

    In [5]: a = bytes(b'123')

    In [6]: b = MyBytes(b'123')

    In [7]: a
    Out[7]: b'123'

    In [8]: b
    Out[8]: b'123'

    In [9]: a == b
    Out[9]: True

    In [10]: import sys

    In [11]: sys.getsizeof(a)
    Out[11]: 36

    In [12]: sys.getsizeof(b)
    Out[12]: 52

So with `cdef class` we gain more control and optimize memory usage.

This was not done before because cython forbids to `cdef class X(bytes)` due to
cython/cython#711. We work it around in setup.py with
draft for proper patch pre-posted to upstream in cython/cython#5212 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants