Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine-readable description of the C API #7

Open
encukou opened this issue May 10, 2023 · 10 comments
Open

Machine-readable description of the C API #7

encukou opened this issue May 10, 2023 · 10 comments
Labels
fixable stakeholder: language bindings e.g., Jpype, PyO3 theme: abstraction theme: the C language issues related to the way we use the C language

Comments

@encukou
Copy link
Contributor

encukou commented May 10, 2023

(This issue was originally “The C Language”, but discussion focused on one sub-issue, so it was re-purposed. See #35 for C.)


Information about the C API is only available in the headers. Those can generally only be parsed with a C compiler, which presents a problem for bindings to other languages (and even ctypes users), who generally resort to either

  • copying the info by hand, or
  • using a C binding generator (those can only parse a subset of C, and usually it's some undefined subset. It's not feasible to limit CPython to all the relevant subsets.)
@gvanrossum
Copy link

Maybe the solution is to generate the C headers (possibly even generating separate C++ headers) from some dedicated file format that's flexible enough to specify all this and from which it is simple to extract the binding info for other languages.

@encukou
Copy link
Contributor Author

encukou commented May 17, 2023

I'm told this repo isn't about discussing solutions, so I'll refrain. But please let me know when it's time to discuss them -- I've thought about this a lot, and I have a way forward and experience from working on a subset of the API. You don't need to retrace my steps :)

@iritkatriel
Copy link
Member

I think the idea of not discussing solutions was to avoid discussing api redesign proposals. There probably is no harm in discussing solutions for problems that can be fixed incrementally. I created a’fixable’ label to mark those, and we could decide that issues with this label can move on to discussing solutions.

@encukou
Copy link
Contributor Author

encukou commented May 17, 2023

OK! The problem is that I believe that nearly all of the issues can be solved incrementally :)


generate the C headers from some dedicated file format that's flexible enough to specify all this and from which it is simple to extract the binding info for other languages.

I'll call this “dedicated file format” a manifest for short.

The devil is in the details: in this case “flexible enough to specify all this”. Designing the format is quite a big task.
For starters, we don't really know what the current public C API is. (Maybe that could be its own issue.) There are efforts to organize, and categorize it, and document the stability expectations, but the upshot is that we don't know the scope of the manifest file format upfront.

So, the main issue with your idea is bootstrapping. If the manifest is the single source of truth, it needs to encode all the details that would be in the C header. If there's one thing missing, we can't generate -- or we need escape hatches like raw C inclusion, which degrades the value of the manifest.

So, let's bootstrap another way: the manifest is a separate file, and it's checked against the headers.
The downside is that there's now two places to update when adding a function: .h, .c, and the manifest. OK, there are three places: .h, .c, the manifest, and the docs. ... Um, amongst the updatees are .h, .c, manifest, docs, and tests... -- Anyway! Updating the entry is negligible compared to the overall work of updating public API.
The upside is that there are no issues with info the manifest doesn't capture yet.

That's the point where the limited API manifest is currently. (Limited API is a well-defined subset of the API. This lets me side-step the “we don't know what the C API is” issue above. Somewhat beside the point, IMO the limited API is generally a good starting target for incremental improvements that would be otherwise blocked by the overwhelming size/vagueness of the full API. Which is why went with it.)

When the manifest is complete enough for some definitions, we can move to generating those individual definitions, in the style of Argument Clinic. (If we agree it's good to generate code. Just because something can be generated doesn't mean it should.)

(For the limited API manifest itself, my next step is adding info on argument/return types. Hope to get to it in 3.13 -- but the process is incremental, so if I don't, we still have all the info captured so far!)

@vstinner
Copy link
Contributor

How do Rust bindings handle these issues?

@gvanrossum gvanrossum changed the title C The C Language May 17, 2023
@gvanrossum
Copy link

@encukou I see your point about bootstrapping. (Each code generator that replaces handwritten code needs to deal with this, and each case seems to be a special snowflake.)

I'm not particularly keen on the Argument Clinic style that you linked to; I'd rather generate concrete C code like deepfreeze or cases_generator.

If the Rust bindings library has a way to extract info from existing C files, maybe we can borrow their approach to help with the bootstrapping? (I wouldn't want Rust as a dependency, but if they can do it programmatically, that means so can we.)

@encukou
Copy link
Contributor Author

encukou commented May 17, 2023

@davidhewitt
Copy link

Yes, we do it by hand. It's possible to use bindgen to generate Rust definitions from C headers using libclang.

The downside of this is that we'd need to ship the Python headers OR require the user to point us at them at build time, and then also require users to install libclang. There's also the compile-time cost of this. Given the C API is relatively stable and manageable in size, it's been doable for us to maintain by hand to keep usage simpler.

@davidhewitt
Copy link

davidhewitt commented May 17, 2023

How do Rust bindings handle these issues?

We get humans to do this bit. To keep things easier we try to match the structure of the include/ directory (as it is on main) as the pyo3-ffi crate. So e.g. listobject.h becomes the listobject.rs file linked to above. Naturally there is some lag between what we've synced and what's in CPython main.

[edit by @encukou]: I moved the rest of the comment to #35 (comment)

@encukou
Copy link
Contributor Author

encukou commented May 18, 2023

I've repurposed the issue to better reflect the solution we're discussing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixable stakeholder: language bindings e.g., Jpype, PyO3 theme: abstraction theme: the C language issues related to the way we use the C language
Projects
None yet
Development

No branches or pull requests

5 participants