Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of Python internals #4635

Open
da-woods opened this issue Feb 12, 2022 · 8 comments
Open

List of Python internals #4635

da-woods opened this issue Feb 12, 2022 · 8 comments

Comments

@da-woods
Copy link
Contributor

da-woods commented Feb 12, 2022

Not really a bug report or an enhancement: I'm just trying to document all the places that Cython uses Python internals (non-public APIs) so that we have a reasonable idea of what might break with C API changes. Will also document where Cython feature flags provide an "public API only" code path.

The list is a likely incomplete (currently based on a very crude regex search for re.compile(r"""(?<=[^\w])_Py[\w]*""", flags=re.IGNORECASE) plus a few other bits that I know about.

Cython .pxd includes

  • array.pxd https://github.com/cython/cython/blob/master/Cython/Includes/cpython/array.pxd - Cython provides a pxd file allowing users access to the array.array internals. This is documented as being CPython-specific internals and made available to users on that basis.
  • Cython/Includes/cpython/pylifecycle.pxd provides access to _Py_InitializeEx_Private, _Py_PyAtExit, _Py_RestoreSignals,
    _Py_CheckPython3, _Py_gitidentifier, _Py_gitversion or a similar "own risk" basis"

Internal functions

  • _Py_NewReference is used in Cython/Utility/AsyncGen.c and Coroutine.c. Can probably be replaced with __Pyx_NewRef
  • _Py_TPFLAGS_HAVE_VECTORCALL in Cython/Utility/CythonFunction.c. Does not look to be guarded.
  • _Py_AS_GC used in Cython/Utility/Coroutine.c to access field ->gc.gc_refs. Guarded only by CYTHON_COMPILING_IN_CPYTHON
  • _Py_DEC_REFTOTAL in Coroutine.c. Guarded by CYTHON_COMPILING_IN_CPYTHON
  • Cython/Debugger/libpython.py looks up (but doesn't call) _PyEval_EvalFrameDefault, _PySet_Dummy
  • _PyStack_AsDict used in Cython/Utility/FunctionArguments. Guarded by CYTHON_METH_FASTCALL
  • _PyObject_GetDictPtr in Cython/Utility/Exceptions.c, ObjectHandling.c
  • _PyTraceback_Add in Cython/Utility/Exceptions.c. Used in "limited API" code path(!), presumably because the public API code path goes into more internals
  • Cython/Utility/Optimize.c uses _PySet_NextEntry, _PyList_Extend, and _PyDict_Pop - they are all guarded by CYTHON_COMPILING_IN_CPYTHON with alternative code paths in place
  • _PyTrash_thread_deposit_object and _PyTrash_thread_destroy_chain are used in ExtensionTypes.c for old versions of CPython. Not needed for newer versions .
  • _PyErr_FormatFromCause is used in Coroutine.c with a version check
  • _PyGen_Send is used in Coroutine.c with version checks and an alternate codepath available
  • _PyGen_SetStopIterationValue is used in the alternate codepath for _PyGen_Send.
  • _PyBytes_Join (and _PyString_Join) are used in StringTools.c but with an alternative implementation available for non-CPython
  • _PyUnicode_FastCopyCharacters is used in StringTools.c and ObjectHandling.c but with version checks and an alternative implementation.
  • _PyObject_NextNotImplemented - ObjectHandling.c. Guarded by CYTHON_USE_TYPE_SLOTS so alternative code path exists
  • _PyDict_SetItem_KnownHash and _PyDict_GetItem_KnownHash - ObjectHandling.c Guarded by a version-check so alternative code path exists
  • _PyObject_GenericGetAttrWithDict - ObjectHandling.c. Guarded by a version check (and CYTHON_USE_TYPE_SLOTS) so alternative code path exists
  • _PyObject_GetDictPtr in ObjectHandling.c. Used in a few places, but looks to be guarded. The guards are inconsistent between uses (CYTHON_UNPACK_METHODS && CYTHON_COMPILING_IN_CPYTHON && CYTHON_USE_PYTYPE_LOOKUP, CYTHON_USE_DICT_VERSIONS && CYTHON_USE_TYPE_SLOTS) so a little fiddly to replace if changed, but not impossible.
  • _PyCFunction_FastCallDict and _PyCFunction_FastCallKeywords - used in ObjectHandling.c for older Python versions
  • _PyMethodDescr_FastCallKeywords used in ObjectHandling.c for current Python versions. Looks like a shortcut that would be easily disabled if needed.
  • _PyLong_FromByteArray is used in TypeConversion.c unguard
  • _PyLong_AsByteArray is used in TypeConversion.c with a version guard. It looks like conversion of large number string to Python longs fail with a runtime exception without it
  • _PyAsyncGen_MAXFREELIST is used in AsyncGen.c. There is a check that it's defined (and redefinition). The assumption is that it's a macro (which probably has to be true in C?). Potentially risky because I think there's plans to unify freelist implementations in CPython (but probably easily removed from Cython if needed)
  • _PyCFunctionFast and _PyCFunctionFastWithKeywords are used in current Python versions (ModuleSetupCode.c and CythonFunction.c). Although underscore-prefixed they are in the Python documentation.
  • _PyThreadState_UncheckedGet is used in current Python versions (ModuleSetupCode.c) but alternative code paths exist if it ever goes missing
  • _PyThreadState_Current used in ModuleSetupCode.c in very old Python versions
  • _PyDict_NewPresized used in ModuleSetupCode.c - it's easily replace with the less efficient PyDict_New if needed though
  • _PyDict_GetItem_KnownHash is used in ModuleSetupCode.c with version checks. Alternative code paths are available.
  • _PyUnicode_Ready is used in ModuleSetupCode.c. Alternative code is in place for the expected removal of the concept of "unicode readiness" in Python 3.12.

Cython feature flags

  • _PyType_Lookup is used in a few places but guarded by CYTHON_USE_PYTYPE_LOOKUP
  • _PyGC_FINALIZED in ModuleNode.py - Guarded by CYTHON_USE_TP_FINALIZE. However, turning this off does disable some features of cdef classes
  • _PyErr_StackItem - guarded by CYTHON_USE_EXC_INFO_STACK
  • _PyString_Eq is used in FunctionArguments.c but only for very old Python versions
  • _PyStack_AsDict is used in the macro __Pyx_KwargsAsDict_FASTCALL in FunctionArguments.c on recent versions of Python. It's guarded by CYTHON_METH_FASTCALL but realistically this is a flag we won't want to disable.
  • CYTHON_USE_UNICODE_WRITER guards use of _PyUnicodeWriter_Init and related functions. It's currently turned off on Python 3.11a since _PyFloat_FormatAdvancedWriter and _PyLong_FormatAdvancedWriter disappeared.
  • CYTHON_VECTORCALL guards _PyVectorcall_Function It looks like it has now been made public with PyVectorcall_Function though, so non-issue.
  • CYTHON_PEP393_ENABLED (true for recent versions I think) guards _PyUnicode_AsDefaultEncodedString
  • CYTHON_USE_PYLONG_INTERNALS enables the use of ob_digit on long object (with all the assumptions about how those internals are stored). Also enables _PyLong_Copy in Builtins.c
  • CYTHON_USE_PYLIST_INTERNALS uses internal fields on list objects (e.g. ->allocated). Fallback code-paths exist for everything
  • CYTHON_USE_UNICODE_INTERNALS guards access to internal fields on unicode (and also bytes). Including ob_shash, but also direct access into the memory buffer. Fallback code-paths exist
  • CYTHON_USE_EXC_INFO_STACK accesses _PyErr_StackItem including fields like previous_item mainly in Coroutine.c and Exceptions.c. Replacement code-paths exist, but I'm not sure if they cover all functionality. It accesses from the PyThreadState object.

Internal field access

This section is fairly incomplete since I haven't yet worked out a good way of searching for these

  • self->ob_refcnt in Coroutine.c
  • --Py_TYPE(self)->tp_frees; --Py_TYPE(self)->tp_allocs; in Coroutine.c (Guarded by CYTHON_COMPILING_IN_CPYTHON)
  • ObjectHandling.c accesses ob_item of tuple and list. Guarded only by CYTHON_COMPILING_IN_CPYTHON

Frames/Tracebacks

  • Coroutine.c access internal fields of PyTracebackObject (tb_frame) and PyFrameObject (f_back mainly). The alternative code paths don't really work in PyPy so this is probably the Cython feature most dependent on internal detail.
  • Exceptions.c creates a PyFrameObject using the public PyFrame_New but doesn't access the internal fields of it. This is to create exception tracebacks so is used everywhere in Cython.
  • ObjectHandling.c uses f_localsplus of frame objects only on old versions of Python I think (on new versions it's covered by vectorcall)
  • Profile.c uses frames and threadstates quite heavily - fields accessed include:
    • frame: f_trace, f_lineno (but via a macro that can become a no-op easily)
    • traceback c_tracefunc, c_traceobj, c_profilefunc, c_profileobj, use_tracing, tracing
    • CodeObject: co_flags
      These are only used if linetracing/profiling is enabled, so not required for the "normal" functioning of Cython.

Other

@matusvalo
Copy link
Contributor

I have an idea. Can we create internals documentation and add it there? Cython codebase is pretty complex so adding details about cython internals to documentation could be useful and can help people in contributing to this project. There is already HackerGuide so maybe we can extend the documentation to add also this.

@da-woods
Copy link
Contributor Author

So the background to this is that:

  • Cython uses a lot of undocumented private functions from CPython.
  • Changing those tends to break Cython and so not all Python maintainers are completely happy about it.
  • Making a list of private CPython internals is useful because:
    • it lets us know what's in danger of being broken
    • it may help find features that are useful, but aren't covered well by public APIs
  • Not all of the private internals are a problem - quite a few of them are for speed, and can be turned off with a single C define.
  • An issue possibly isn't the right place to do it. But will do for now.

Internals documentation might be useful (but unhelpful very fast if code is changed but the documentation isn't). I'm not sure this list would be a useful part of it though.

@scoder
Copy link
Contributor

scoder commented Feb 13, 2022

Thanks @da-woods for digging this up. I have a couple of comments.

  • _Py_NewReference – this can happen when copying code directly from CPython. It shouldn't happen, though, because it's an unnecessary source of friction.
  • preprocessor guards – I've put some past work into making those somewhat correct and reasonable. There's always room for improvement. With the recent changes in CPython, we'll see what remains as "CPython specific" and what needs a more concrete guard. But that seems a very fine grained decision. Sometimes it's good to just put in one or more "PY_VERSION_HEX" guards, and sometimes a change is really worth a new feature switch. And sometimes it turns out later that that decision wasn't the best one. We'll see.
  • "used for old versions of CPython. Not needed for newer versions ." – That's generally ok. Old CPython versions do not change drastically any more, so we can usually integrate with them very tightly, especially when backporting newer features (for which there is a better/official/stable API in later Python versions).
  • "Assumption that it is a macro though" – not necessarily. You are probably referring to the defined(...) check. That's usually done because PyPy defines all C-API functions as macro aliases to its own PyPy_... functions. The check (the kind that I'm thinking of) does not depend on a function being a macro, it usually just checks whether we're in the right Python version or whether the function is defined as a macro, e.g. because PyPy has started to provide it.
  • --Py_TYPE(self)->tp_frees; --Py_TYPE(self)->tp_allocs – this can probably just go. It's debugging/statistics code copied from CPython. Nice to have but doesn't hurt when it's gone.

@da-woods
Copy link
Contributor Author

"Assumption that it is a macro though" – not necessarily. You are probably referring to the defined(...) check. That's usually done because PyPy defines all C-API functions as macro aliases to its own PyPy_... functions. The check (the kind that I'm thinking of) does not depend on a function being a macro, it usually just checks whether we're in the right Python version or whether the function is defined as a macro, e.g. because PyPy has started to provide it.

Yeah that is what I meant. I'd missed that detail.

"used for old versions of CPython. Not needed for newer versions ." – That's generally ok.

Yes agree (especially when older Python versions is Py2). I'm trying to be thorough at this stage, but once I've gone through everything I'll then start trying to work out what might genuinely be a problem.

@da-woods
Copy link
Contributor Author

da-woods commented Mar 6, 2022

I'm calling this list as complete as it's going to be at this stage I think

@oscarbenjamin
Copy link

  • _PyCFunction_FastCallKeywords - used in ObjectHandling.c for older Python versions

This one has been made public in python/cpython#114626 but that now causes my Cython build to fail:

      src/flint/flint_base/flint_base.c: At top level:
      src/flint/flint_base/flint_base.c:740:45: error: ‘_PyCFunctionFastWithKeywords’ undeclared here (not in a function); did you mean ‘PyCFunctionFastWithKeywords’?
        740 |   #define __Pyx_PyCFunctionFastWithKeywords _PyCFunctionFastWithKeywords
            |                                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/flint/flint_base/flint_base.c:745:50: note: in expansion of macro ‘__Pyx_PyCFunctionFastWithKeywords’
        745 |   #define __Pyx_PyCFunction_FastCallWithKeywords __Pyx_PyCFunctionFastWithKeywords
            |                                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      src/flint/flint_base/flint_base.c:9475:144: note: in expansion of macro ‘__Pyx_PyCFunction_FastCallWithKeywords’

@oscarbenjamin
Copy link

Actually looks like that was fixed already in python/cpython#115561. Sorry for the noise.

@ngoldbaum
Copy link
Contributor

Just now noticing ob_refcnt in the context of the nogil python build, which removes the ob_refcnt field from PyObject. There are also a number of tests that assume you can access ob_refcnt:

± rg ob_refcnt tests
tests/memoryview/memoryview.pyx
675:    return (<PyObject*>x).ob_refcnt
701:        print repr(buf[i]), (<PyObject*>buf[i]).ob_refcnt

tests/memoryview/memslice.pyx
1069:    return (<PyObject*>x).ob_refcnt
1095:        print repr(buf[i]), (<PyObject*>buf[i]).ob_refcnt

tests/buffers/bufaccess.pyx
962:    return (<PyObject*>x).ob_refcnt
988:        print repr(buf[i]), (<PyObject*>buf[i]).ob_refcnt

tests/run/test_coroutines_pep492.pyx
70:        return (<PyObject*>obj).ob_refcnt

tests/run/exceptionrefcount.pyx
38:    return (<PyObject*>obj).ob_refcnt

tests/compile/pylong.pyx
8:        Py_ssize_t ob_refcnt
13:        int ob_refcnt

tests/run/refcount_in_meth.pyx
20:    return (<PyObject*>obj).ob_refcnt

tests/compile/pylong.pyx also has a forward-declaration for PyObject that includes ob_refcnt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants