Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of features specific to the C language #35

Open
encukou opened this issue May 18, 2023 · 12 comments
Open

Use of features specific to the C language #35

encukou opened this issue May 18, 2023 · 12 comments
Labels
evolution-proposed stakeholder: language bindings e.g., Jpype, PyO3 theme: the C language issues related to the way we use the C language

Comments

@encukou
Copy link
Contributor

encukou commented May 18, 2023

Non-C languages struggle with:

  • Bit fields
  • Enums (which have compiler-dependent sizes)
  • Macros and inline functions (which need a C compiler)
  • (edit) varargs

Some related problems have their own issues here:

@encukou
Copy link
Contributor Author

encukou commented May 18, 2023

@davidhewitt wrote:

How do Rust bindings handle these issues?

As for the "issues" in the OP, thus we handle them as so:

parse C headers (which is very tricky in general)

We get humans to do this bit. To keep things easier we try to match the structure of the include/ directory (as it is on main) as the pyo3-ffi crate. So e.g. listobject.h becomes the listobject.rs file linked to above. Naturally there is some lag between what we've synced and what's in CPython main.

match C type sizes (e.g. sizeof(int)). Some can differ depending on the compiler, even on the same platform.

The Rust standard library has core::ffi with types like c_int, which has so far worked for us.

mimic C memory layout (which is also compiler-defined). Padding, bit fields, enums are particularly tricky.

Rust has #[repr(C)] to lay out structs in a C-compatible way and can do C-compatible enums. Bitfields are extremely awkward; we have to use integer fields of the correct total size and implement offset operations to do the reads. bindgen has helped us get this correct in the past.

do ... something? ... with C macros and inline functions.

We've reproduced most of these as #[inline] unsafe fn, e.g. here's the implementation for Py_TYPE. This obviously isn't perfect; macros which aren't functions such as PyDateTime_TimeZone_UTC will still get mirrored as functions, i.e. PyDateTime_TimeZone_UTC().

@encukou encukou changed the title Use of features specific to C language Use of features specific to the C language May 18, 2023
@steve-s
Copy link
Contributor

steve-s commented May 18, 2023

I think that providing some convenience helpers specific for C is fine, but any C specific should be left out from the ABI, or have alternative non-C specific variants in the ABI. For example, PyErr_FromFormat is extremely useful if you are in C. If you are developing rust language binding, then you probably want to provide rust-style formatting helper that fits the rust way of doing such things. The ABI should have you covered without having to do some cumbersome C interop.

Edit: now I see you don't list varargs, but I'd add them to the list too?

@vstinner
Copy link
Contributor

vstinner commented Sep 4, 2023

@steve-s:

Edit: now I see you don't list varargs, but I'd add them to the list too?

What is the problem with variadic arguments? Are there programming languages which cannot use them and are used to write Python extensions?

Are you saying that we should not add functions using variadic arguments in their defintion, like PyObject* Py_BuildValue(const char *format, ...)? Or are you saying that we should not add functions using vargs: va_list (<stdarg.h>) in their definition, like PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs))?

How do you implement a function with a variable number of arguments in these languages? For example, how do you bind Py_BuildValue(), PyTuple_Pack(), or PyUnicode_FromFormat() in these languages?

I looked at PyO3 (Rust), it seems to happily bind these functions:

PyO3 also has PyUnicode_FromFormatV(), but its implementation is commented.

@vstinner
Copy link
Contributor

vstinner commented Sep 4, 2023

The question of variadic arguments arised in my issue proposing to add int PySys_Audit(const char *event, const char *argFormat, ...) to the limited C API: python/cpython#108571

@encukou
Copy link
Contributor Author

encukou commented Sep 4, 2023

I don't think we should avoid functions like Py_BuildValue. But they should not be the only way to do something.

For Py_BuildValue you can use PyTuple_New+PyTuple_SetItem, or build from a list, or (ideally in the future) build from an array of PyObject*.

For PyUnicode_FromFormatV you can prepare the char* using any other formatting API, and call PyUnicode_FromStringAndSize.

For C, the varargs variants might still be the right way to do things. But alternatives should (and do) exist.

@vstinner
Copy link
Contributor

vstinner commented Sep 4, 2023

For Py_BuildValue you can use PyTuple_New+PyTuple_SetItem, or build from a list, or (ideally in the future) build from an array of PyObject*.

Note: IMO this API is bad since it creates an uninitialized Python tuple object, see: #56

For PyUnicode_FromFormatV you can prepare the char* using any other formatting API, and call PyUnicode_FromStringAndSize.

Yep, PyUnicode_FromStringAndSize() is a nice API in term of API stability and not leaking implementation details.

@cavokz
Copy link
Contributor

cavokz commented Sep 5, 2023

Go's cgo does not (and probably won't ever) support calls to variadic C functions. When in developing Pygolo (to embed and extend Python in Go) I need some functionality they provide I hope to find some other way.

Example, Tuple_Pack is implemented exactly by means of PyTuple_New and PyTuple_SetItem, which I think is fine because my users either will receive the fully initialized tuple or will get an error.

Another case is Arg_ParseTuple, I don't need to call the variadic C API for that. Go has variadic functions and I can learn the type of their parameters, I don't need a format string to know what to do but I need PyTuple_GetItem to access the elements.

I'm not bound to expose the whole C API to Go, some part of it do not make sense at all (for Go) and others are just used to build higher level or more idiomatic constructs.

I think it's not only acceptable but sometimes also necessary to have a low-level error-prone difficult-to-handle API, it's just that they should not be first choice or maybe the only option.

I love the Python C API, it's a great tool, good documentation, it's powerful and it's a lot of fun to use and to build upon. Of course, as everything, it can be improved but please do try to make it too ideal, ok?

edit: I see I'm mostly echoing @encukou, just more verbosely :)

@vstinner
Copy link
Contributor

vstinner commented Sep 5, 2023

What said @encukou for PySys_Audit() now makes sense to me: if we add an API with variadic arguments, we should offer a similar API without it. PySys_Audit() builds a tuple: the second flavor would be an API which already takes a tuple.

So people would can use variadic arguments just use it (it's a convenient API), and there is a backup plan for other people.

Thanks @cavokz, I didn't want to add a new function and have to maintain it to solve an hypothetical problem. I didn't know that Go doesn't support variadic arguments and that Python can be used with Go, two things that I learned today 👍

@steve-s
Copy link
Contributor

steve-s commented Sep 7, 2023

To further clarify what I meant:

  • ABI should provide all the necessary functionality without the need for varargs (we seem to be in agreement on this)
  • I am not sure how stable and well defined are varagrs w.r.t. ABI (they probably are in practice, is their ABI standardized?, is it safe to rely on that)
  • Note that in HPy, the ABI is varargs free, but HPy provides helper vargars functions that are not part of ABI and are implemented using the ABI. The helper functions are compiled into the extension. The idea is that language bindings can piggy back on the C API (if they have good C interop), but they can also just build on top of the ABI and provide a facade for whatever interface is natural/idiomatic for given language.

@encukou
Copy link
Contributor Author

encukou commented Sep 7, 2023

I am not sure how stable and well defined are varagrs w.r.t. ABI (they probably are in practice, is their ABI standardized?, is it safe to rely on that)

AFAIK, C standards say nothing about ABI. This kind of stuff is left to platforms/distributors, see the disclaimer in the docs

Note that in HPy, the ABI is varargs free, but HPy provides helper vargars functions that are not part of ABI and are implemented using the ABI. The helper functions are compiled into the extension. The idea is that language bindings can piggy back on the C API (if they have good C interop), but they can also just build on top of the ABI and provide a facade for whatever interface is natural/idiomatic for given language.

That's the general idea of the direction I want to go in: a very portable base, and optional C-specific niceties.
(The PySys_Audit case is uniquely tricky though. If no hooks are registered, PySys_Audit avoids wrapping arguments in PyObject*s. And AFAIK, for security reasons we can't tell the caller whether hooks are registered, so users of the tuple API can't avoid allocating the tuple.)

@pitrou
Copy link

pitrou commented Oct 26, 2023

  • Enums (which have compiler-dependent sizes)

I see this mentioned in many places, but the enum size is actually mandated by the platform ABI, which the compiler is supposed to comply with.

For example the SysV ABI for x86-64 (used on GNU/Linux) mandates that enum have size 4 and alignment 4, except when some enum values wouldn't fit (which is unlikely):
https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

Similarly, Microsoft's x86-64 ABI has a similar provision and doesn't even seem to make rooms for enums larger than 32 bits ("enums are constant integers and are treated as 32-bit integers"):
https://learn.microsoft.com/en-us/cpp/build/x64-software-conventions?view=msvc-170#scalar-types

That said, I agree that having to think about enum size when using a non-C/C++ language is annoying in itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evolution-proposed stakeholder: language bindings e.g., Jpype, PyO3 theme: the C language issues related to the way we use the C language
Projects
None yet
Development

No branches or pull requests

6 participants