Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make critical section API public as part of non-limited C API #26

Closed
colesbury opened this issue May 21, 2024 · 20 comments
Closed

Make critical section API public as part of non-limited C API #26

colesbury opened this issue May 21, 2024 · 20 comments

Comments

@colesbury
Copy link

The critical section API provides macros for locking one or two Python objects at a time. I propose making the following macros public as part of Include/cpython (i.e., the non-limited C API):

  • Py_BEGIN_CRITICAL_SECTION(PyObject *op) / Py_END_CRITICAL_SECTION()
  • Py_BEGIN_CRITICAL_SECTION2(PyObject *a, PyObject *b) / Py_END_CRITICAL_SECTION2()

These macros are defined, but no-ops, in the default build.

PR draft: python/cpython#119353
Public header: Include/cpython/critical_section.h
PEP 703 section: https://peps.python.org/pep-0703/#python-critical-sections

Note that the PR draft doesn't currently have user facing documentation. I'll add that, but I'm hoping to get a decision on the proposed API first.

Why is this useful?

C API extensions may need to add synchronization to be thread-safe without the GIL. In some cases, straightforward per-object locking can introduce deadlocks that were not present when running with the GIL. This API avoids that by implicitly suspending critical sections in places where the GIL would be released.

This makes it easier to add locking to extensions without introducing deadlocks.In the "nogil" fork, we used this to make https://github.com/gaogaotiantian/viztracer thread-safe, and we've also used this extensively within CPython.

Note that in some cases plain mutexes are still more appropriate: the Py_BEGIN/END_CRITICAL_SECTION API makes it easier to avoid deadlocks, but plain mutexes make it easier to reason about the invariant that the mutex ensures.

Why is this part of CPython?

The underlying implementation hooks in to the PyThreadState management, so it would not be possible to implement this outside of CPython.

@encukou
Copy link
Collaborator

encukou commented May 21, 2024

Macros are only usable in C/C++. To lift that limitation, the underlying type & functions would need to be made public as well, and since we expect that non-C language wrappers re-implement the macros, the documentation should show what they expand to -- similar to Py_BEGIN_ALLOW_THREADS & Py_END_ALLOW_THREADS.

That complicates things, of course -- the current implementation requires the size of the PyCriticalSection* structs, which runs into the same issues as exposing PyMutex :(

@colesbury
Copy link
Author

colesbury commented May 21, 2024

The functions are callable from Rust and other languages. The structs are two pointers (_PyCriticalSection) or three pointers (_PyCriticalSection2). I'm happy to designate the underlying functions as public.

Note that C API types are not really usable outside of C/C++ either. PyO3, for example, reimplements all the C structs in Rust.

The complaint with PyMutex was that it was one byte. These structs are larger than a single byte.

@vstinner
Copy link

I'm fine with these 4 macros.

It's ok that macros cannot be used in Rust and other programming languages, if _PyCriticalSection structure, _PyCriticalSection_Begin() and _PyCriticalSection_End() can be used (same for the _PyCriticalSection2 variant). Should we make these structures and functions public for these languages?

@vstinner
Copy link

These macros look similar to Py_TRASHCAN_BEGIN and Py_TRASHCAN_END. I like the fact that proposed "critical section" macros don't leak implementation details, but call opaque functions :-)

@encukou
Copy link
Collaborator

encukou commented May 31, 2024

So, we want to expose:

  • The structs PyCriticalSection and PyCriticalSection2, whose contents are opaque private but whose size/alignment is part of the ABI and may change in a new feature release
  • PyCriticalSection_Begin(PyCriticalSection *s, PyObject* obj)
  • PyCriticalSection_End(PyCriticalSection *s)
  • PyCriticalSection2_Begin(PyCriticalSection2 *s, PyObject* a, PyObject* b)
  • PyCriticalSection2_End(PyCriticalSection2 *s)
  • the macro Py_BEGIN_CRITICAL_SECTION(OBJ), documented to expand to:
    {
        PyCriticalSection _Py_cs;
        PyCriticalSection_Begin(&_Py_cs, (PyObject*)(OBJ));
    
    (edited to add cast)
  • the macro Py_END_CRITICAL_SECTION(), documented to expand to:
        PyCriticalSection_End(&_Py_cs);
    }
    
  • the macro Py_BEGIN_CRITICAL_SECTION2(OBJ), documented to expand to:
    {
        PyCriticalSection2 _Py_cs;
        PyCriticalSection2_Begin(&_Py_cs, (PyObject*)(A), (PyObject*)(B))
    
    (edited to add cast)
  • the macro Py_END_CRITICAL_SECTION2(), documented to expand to:
        PyCriticalSection2_End(&_Py_cs);
    }
    

(The actual expansion might be different, e.g. no-op in the default build, but these should be tested.)

Is that right?


Note that C API types are not really usable outside of C/C++ either. PyO3, for example, reimplements all the C structs in Rust.

Yup, that's why we want to avoid them where possible.

@vstinner
Copy link

The structs PyCriticalSection and PyCriticalSection2, whose contents are opaque but whose size/alignment is part of the ABI and may change in a new feature release

IMO members should be public. Trying to provide a stable ABI is too early, and trying to hide members would only make the implementation more complicated and fragile.

The issue title clearly excludes the limited C API:

Make critical section API public as part of non-limited C API

@encukou
Copy link
Collaborator

encukou commented May 31, 2024

Right, I meant private rather than opaque (edited). The compiler will see them, users should not touch them directly.

@colesbury
Copy link
Author

A few differences between what @encukou wrote and the current PR:

  • The structs and function names currently begin with an _ (like struct _PyCriticalSection) because they are not intended to be used directly from C/C++ -- callers should use the macro. If you want to make them explicitly public and just the contents private, I think that's okay too.
  • The struct _PyCriticalSection is currently opaque in the default build. The contents are only defined in the free-threaded build. Not super important, but my idea was to prevent accidental misuse. If you'd prefer it to be private, but not opaque in both configurations, that's okay too.
  • The Py_BEGIN_CRITICAL_SECTION() / Py_BEGIN_CRITICAL_SECTION2() macros cast their arguments to a PyObject* using _PyObject_CAST(). I think this is consistent with our other macros.

@encukou
Copy link
Collaborator

encukou commented May 31, 2024

Yup, I do want to make them public, so that non-C languages can use them. I'd also like the functions to be exposed as actual DLL symbols (and shadowed with static inline functions for performance in C; I'm happy to implement that little dance).
For non-C, I don't think we should make them opaque.

I edited the post to add the casts.

@colesbury
Copy link
Author

I'd also like the functions to be exposed as actual DLL symbols...

The functions are exposed as actual DLL symbols in the linked PR. The inline functions aren't public (i.e., they are in pycore_critical_section.h).

There's a dance with redefining the Py_BEGIN_CRITICAL_SECTION, etc. macros for internal CPython usage where we want more things to be inlined.

https://github.com/python/cpython/pull/119353/files#diff-ecf78b1af8cff87b3ae3899029f7ba1223286415cc64ef2f0d5bcf58bb540cecR77-R83

@vstinner
Copy link

There's a dance with redefining the Py_BEGIN_CRITICAL_SECTION, etc. macros for internal CPython usage where we want more things to be inlined.

I suggest you to use a different name for the "fast inline" flavor. I did something similar with PyThreadState_GET():

  • PyThreadState_GET(): "slow" public regular function
  • _PyThreadState_GET(): "fast" internal inline function

@vstinner
Copy link

The #26 (comment) API LGTM.

@encukou
Copy link
Collaborator

encukou commented Jun 11, 2024

It's fine for me for non-limited C API as well.

But unlike PyMutex, I think we want to add this one to the limited API rather soon, and we should plan for that. So let me look at the aspects that tend to prevent evolution.

Exposed struct sizes

The usual solution for avoiding struct sizes is adding API to allocate the struct. But since there's already space for the pointer, we can (if it's ever needed) make PyCriticalSection_Begin/PyCriticalSection_End allocate/free without breaking users. (And then we can introduce a PyCriticalSection_Begin_v2 that takes a bigger struct and doesn't allocate, and switch the macros to use it.)

Or perhaps we can do what was proposed for PyMutex: inflate the struct size to give some room for expansion.
Here, the struct is only used on the stack, rather than as part of a bigger structure that might need ABI compatibility with other extensions. If we prescribe that the structs must only be used as in the macros, the struct can be “just big enough” in version-specific ABI, but have some extra space when compiling for the stable ABI.
(Possibly lots more space -- stack space is only expensive in recursion, but if you're nesting critical sections you're doing it wrong. But an extra pointer is probably enough.)

Error reporting

It seems that failure to enter/exit a critical section is not really recoverable, but generally it's good if functions are able to raise runtime deprecation warnings.
In this case, I can live with the API being difficult to deprecate.

@vstinner
Copy link

The usual solution for avoiding struct sizes is adding API to allocate the struct. But since there's already space for the pointer, we can (if it's ever needed) make PyCriticalSection_Begin/PyCriticalSection_End allocate/free without breaking users. (And then we can introduce a PyCriticalSection_Begin_v2 that takes a bigger struct and doesn't allocate, and switch the macros to use it.)

Allocating memory on the heap means introducing the risk of a memory allocation failure. I don't think that we want that for a critical section.

@vstinner
Copy link

(Possibly lots more space -- stack space is only expensive in recursion, but if you're nesting critical sections you're doing it wrong. But an extra pointer is probably enough.)

We are trying to reduce the stack memory usage in Python. I don't think that adding an unused member "just in case" (for future usage) to the structure is worth it.

@gvanrossum
Copy link

To me, the possibility of PyMutex needing to expand in size seems too uncertain to worry about it. The basic lock functionality will not require extra space, ever. (If they ever do, they will have to invent a new type.)

The only reason to reserve extra space seems to be the idea of possibly having the lock contain a pointer to the "real" location of the lock, in case an object containing a lock is moved by a hypothetical future GC. But this feels like the wrong solution for that problem: It would make more sense for the GC to declare locked objects unmovable (there will always be reasons why a particular object can't be moved, so this would just be another reason). Also, the Stable ABI would have many other problems when GC starts moving objects, so it's also a straw man.

@colesbury
Copy link
Author

It would be helpful to get a decision on this API.

It seems to me that people are supportive of the API @encukou described in #26 (comment). There is still some uncertainty about future stable ABI considerations, but the current proposal does not affect the stable ABI/limited C API.

@encukou
Copy link
Collaborator

encukou commented Jun 19, 2024

Yeah, let's do a formal vote for the API in #26 (comment)

@zooba
Copy link

zooba commented Jun 19, 2024

This looks like an absolute nightmare to debug (it's the first time I've seen how these work), and unfortunately it's not going to solve the existing race conditions/vulnerabilities we have that I'd hoped per-object locks would help with, but the API (the macro names) are fine.

I don't see how this gets into the stable ABI at all in the future, though, nor can this be used reliably from other languages (for an arbitrary value of "other language"). If we want to go there, I think we'll need a design that doesn't rely on pointers to user-defined variables on the stack, and probably need to store it all in our thread state and return a key to the caller to store (this will be hidden in the macro for C devs, so their sources don't change).

@vstinner
Copy link

@colesbury: Congrats, your API got approved, I close the issue.

@zooba: Maybe some tooling can be added later to help debugging dead locks and race conditions. And yeah, for now, this API doesn't belong to the stable ABI, and I agree that a redesign may be needed if we want to make it happen later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants