Skip to content

RFC: Adding a backend for HPy.#4490

Draft
fangerer wants to merge 177 commits intocython:masterfrom
fangerer:hpy
Draft

RFC: Adding a backend for HPy.#4490
fangerer wants to merge 177 commits intocython:masterfrom
fangerer:hpy

Conversation

@fangerer
Copy link
Copy Markdown

@fangerer fangerer commented Dec 6, 2021

This PR is a first effort for the long-term goal to add an HPy backend to Cython. The main objective here is to get some feedback on how to structure the code and what else needs to be considered. So, this is an explicit request for comments.

The main idea is to move C code generation into the CCodeWriter. For that, I've introduced a new class HPyCodeWriter that inherits from CCodeWriter and the new writer emits HPy API calls/code instead of CPython API calls/code. Then, we can use two different writers to run the same generation step.

As we have already discussed in a meeting on September 3rd, 2021 with @scoder, Cython now emits C API and HPy API code and the different sections are protected by ifdefs like this:

# ifndef HPY
  /* C API code */
#endif

#ifdef HPY
  /* HPy API code */
#endif

This is realized by introducing code sections for HPy (see Code.py: code_layout).
Some of the sections are independent of the API and can be shared (e.g. string_decls which defines global const char * variables).

A lot of work still needs to be done (of course) but the first simple example hpy_basic.py is already working (C only).

@fangerer
Copy link
Copy Markdown
Author

fangerer commented Feb 7, 2022

Sorry that it took that long but I just pushed a major update to this PR that changes a lot.

As far as I understood the comments, the main problem was that I've moved code generation into the code writers. I now reverted this and also removed the HPyCCodeWriter. Code and its structure are now generated in the same places as before.

Additionally, there are two major additions compared to the previous changes:

  1. I'm now properly using HPyField to store objects in global variables.
  2. I migrated the majority of the module initialization code that sets up Cython's caching of builtins and other objects.

Also, where needed, I've changed the structure of the generated code such that it suits for both, C API and HPy. For example:
For building tuples, in the C API you would just do:

PyObject *tuple = PyTuple_New(n);
PyTuple_SET_ITEM(tuple, 0, x);
PyTuple_SET_ITEM(tuple, 1, y);
...
/* use variable 'tuple' */

where in HPy you need to use a tuple builder:

HPyTupleBuilder builder = HPyTupleBuilder_New(ctx, n);
HPyTupleBuilder_Set(ctx, builder, 0, x);
HPyTupleBuilder_Set(ctx, builder, 1, y);
...
HPy tuple = HPyTupleBuilder_Build(ctx, builder);
/* use variable 'tuple' */

So, I've changed the structure in Cython such that it would work with the builder pattern but where the build step in C API is just a no-op.

In order to cover all possible use cases, I've introduced a Python class named Backend which is used to generate the API-specific code (very fine-grained). There are three API backends:

  1. CApiBackend: generate bare C API code (as close to the state before this PR as possible)
  2. HPyBackend: generate bare HPy API code
  3. CombinedBackend: uses meta-programming to generate code that suites for both, C API and HPy. One can decide for which API to compile just by defining (or not) macro "HPY". So, if the cythonized code will be included in an HPy extensions, this will automatically happen. This, for example, looks like that:
  __PYX_OBJECT some_function(__PYX_CONTEXT_DECL __PYX_OBJECT x) {
      int result;
      result = __PYX_LONG_AS_INT(__PYX_CONTEXT x);
  }

where __PYX_OBJECT is a macro that defines the Python object C type which is PyObject * for the C API and HPy for HPy.
The CombinedBackend is probably the one that will in the end be the default but when we first merge changes, I would set the CApiBackend as default and this would generate code almost as before.

There are, of course, still some open points and tasks to be done before this can be merged. Here are some points I want to discuss:

  • How should the decision been made which backend to use? I suggest to introduce a compiler options like the language option.
  • Functions that return borrowed references and following increfs are in general a problem (in particular, function make_owned_reference doesn't work for HPy). Is it suitable to make all of them return a new reference?

@scoder
Copy link
Copy Markdown
Contributor

scoder commented Feb 8, 2022

Functions that return borrowed references and following increfs are in general a problem (in particular, function make_owned_reference doesn't work for HPy).

Why not? Doesn't it just dup() a handle if necessary?

Is it suitable to make all of them return a new reference?

I'd like to avoid that if at all possible.

@scoder
Copy link
Copy Markdown
Contributor

scoder commented Feb 8, 2022

in HPy you need to use a tuple builder

I wonder if we could just generate an inline helper function for each tuple length, in a new code writer section. There can't be that many different literal tuple or list sizes in a module, after all. The reference counting would still happen outside, but that function would then just steal a reference to each of its item arguments.

Copy link
Copy Markdown
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again. This looks much, much better.

There is still a lot of code duplication in the utility functions, but I guess some of this can't be avoided without losing in readability. Still, my general feeling is that duplication is visibly more costly than macros.

Regarding the Backends, I'm sure there's still more that can be done by the C preprocessor, rather than cluttering Cython's code base with backend.do_this() calls. Why not have a __Pyx_CLEAR_GLOBAL() and a __Pyx_VISIT_GLOBAL()? C compiler macros are very local and cheap. They can even reduce the complexity if we give them good names. In contrast, new concepts in the compiler design are expensive.

I don't mind adding the HPy context argument macro to all calls. It really doesn't hurt. And a lot of it can still be hidden by appropriate C-API function macros. We have __Pyx_* variants for almost all of the C-API (that we use) by now. Why not just make it all of it? It may even help in supporting subinterpreters in CPython at some point. If you really feel like the context should be hidden, add a code.get_capi_call(function_name, args). Can't say if that makes it better.

Regarding the utility C code, I understand that you generally want to move the HPy case second, probably because you expect the non-HPy case to be more important and better known. However, I find negated conditions harder to read than positive conditions. It's clearer to write #if HPY followed by the HPy block, than #if ! HPY and then lots of other stuff followed by the HPy block. Maybe adding a /* HPy */ marker might also help already, can't say without seeing it. But I wouldn't mind putting the HPy case first if that helps readability.

Comment on lines +1097 to +1101
#if !CYTHON_COMPILING_IN_HPY
if (likely(PyLong_Check(x))) {
#else /* CYTHON_COMPILING_IN_HPY */
if (likely(HPy_TypeCheck($hpy_context_cname, x, $hpy_context_cname->h_LongType))) {
#endif /* CYTHON_COMPILING_IN_HPY */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could also be hidden by a macro, __Pyx_PyLong_Check(). The context is "just there", right? There is precedence for this in the exception handling code that passes the thread state instead of looking it up, when the CYTHON_FAST_THREAD_STATE feature flag is defined.

OTOH, you are using HPyLong_Check() elsewhere. Is this different?

Comment on lines 119 to +123
#if CYTHON_ASSUME_SAFE_MACROS
#define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x))
#else
#else /* CYTHON_ASSUME_SAFE_MACROS */
#define __pyx_PyFloat_AsDouble(x) PyFloat_AsDouble(x)
#endif
#endif /* CYTHON_ASSUME_SAFE_MACROS */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to introduce these comments (which will get copied into the generated C files as it stands), then it would be good to at least repeat the actual condition. Here, you say #if FLAG and #else /* FLAG */, whereas in other places, you say #if ! FLAG and #else /* FLAG */. I think that undermines the intention of making it clearer which case we're looking at. I still have to look at the initial condition to see what the #else means.

I'm generally not a fan of these comments as they tend to suffer from bit rot quickly.
To me, they also make the #else and #endif more difficult to spot since they add a lot of verbosity to the line.

For the CYTHON_COMPILING_IN_HPY condition, maybe you can just add an /* HPY */ marker to the HPy case.

class CombinedBackend(APIBackend):
name = "combined"
pyobject_ctype = "__PYX_OBJECT_CTYPE"
pyobject_ctype_base_part = "__PYX_OBJECT_CTYPE"
Copy link
Copy Markdown
Contributor

@scoder scoder Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how ubiquitous this type is, let's make it less verbose and less screaming. Something like __Pyx_Object or __Pyx_PObject seems fine. It's usually clear from the context that it refers to a type.

}
#endif
#else /* CYTHON_COMPILING_IN_HPY */
static int __Pyx_InitString(HPyContext *ctx, __Pyx_StringTabEntry t, HPyField *str) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the differences can be handled with macros, and the rest are temporary differences that will go away as HPy matures. I'd rather not have three copies of (mostly) this function when two feel already too many.

Comment thread Cython/Utility/Optimize.c
Comment on lines -1113 to +1130
#if !CYTHON_COMPILING_IN_PYPY
#if !CYTHON_COMPILING_IN_PYPY && !CYTHON_COMPILING_IN_HPY
static {{c_ret_type}} __Pyx_PyInt_{{'' if ret_type.is_pyobject else 'Bool'}}{{op}}{{order}}(PyObject *op1, PyObject *op2, long intval, int inplace, int zerodivision_check); /*proto*/
#else
#elif CYTHON_COMPILING_IN_PYPY && !CYTHON_COMPILING_IN_HPY
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If HPy is always different here, regardless of the runtime environment (PyPy/CPython), why not move it first? That avoids duplicating the condition.

Comment on lines 3523 to +3527
code.putln("#elif CYTHON_COMPILING_IN_LIMITED_API")

backend.put_hpy(code, '%s = _h2py(init_%s_impl(_HPyGetContext()));' % (
env.module_cname,
env.module_name))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the limited API case?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's, of course, not the primary condition.
The generated structure is:

#if CYTHON_PEP489_MULTI_PHASE_INIT
  // HPy does not yet support multi-phase init
#else
  #if PY_MAJOR_VERSION < 3
     // HPy does not support Python 2
  #elif CYTHON_COMPILING_IN_LIMITED_API
    // this will always be true for HPy
  #else
   ...

So, it's currently the only valid case for HPy but I agree, it should be future-proof. I'll restructure that.

Comment thread Cython/Compiler/Nodes.py
code.globalstate.directives = old


class CPPNode(Node):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPP generally suggests C++, not C PreProcessor.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll rename that but the node should eventually go away.

Comment thread Cython/Compiler/Nodes.py
code.putln("PyObject* values[%d] = {%s};" % (
max_args, ','.join('0'*max_args)))
code.putln("%s values[%d] = {%s};" % (
backend.pyobject_ctype, max_args, ','.join(('__PYX_NULL', ) * max_args)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Places like this should use the constant in Naming.py.

Comment on lines +2104 to +2105
if def_node.is_cyfunction:
return Nodes.CPPNode(def_node.pos, body=assmt, cond="!CYTHON_COMPILING_IN_HPY")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a temporary work-around or intended as a general exclusion?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a temporary workaround since this is not yet supported in HPy. In general, I need to think about __pyx_CyFunctionObject and how it plays together with HPy.
As far as I understand, __pyx_CyFunctionObject is an extension to CPython's PyCMethodObject. We cannot do exactly the same (i.e. embed the some struct) because HPy does, ofc, not expose the C-level struct of a method.
We can still define the ´__pyx_CyFunctionType(without embeddingPyCMethodObject) but everything that's declared in PyCMethodObject` would then be an attribute read or something.

Comment on lines 171 to +172
# is_pyobject boolean Is a Python object type
# is_hpy boolean Is an HPy handle type
Copy link
Copy Markdown
Contributor

@scoder scoder Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this distinction? (handle vs. object)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's a left-over. I just overlooked it.

@HiperMaximus
Copy link
Copy Markdown

Has there been more progress on this? Would love to see it! Great work!

@fangerer
Copy link
Copy Markdown
Author

Hi @HiperMaximus , no, I'm sorry. I couldn't make progress on that since I was focusing on NumPy/HPy in the last months. @mattip tries to make some progress on this (see fangerer#1 ). We were also in the progress of figuring out how we can split this big change set into smaller pieces such that we can merge small chunks.

@igorcoding
Copy link
Copy Markdown

Hi! I was wondering, is this feature still on the list and is it possible to see it in some way in the future?

@fangerer
Copy link
Copy Markdown
Author

Hi @igorcoding! Yes, this is definitively something we (the HPy dev cores) want to do. This particular PR served as a proof-of-concept to explore ways how we can/should do it. Currently, I'm not personally working on this topic since I spend most of my HPy-time on HPy itself or on NumPy/HPy. However, a colleague started to work on it here.

@igorcoding
Copy link
Copy Markdown

igorcoding commented Aug 13, 2023

Hey @fangerer! This is awesome! Great to hear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] Preliminary support for HPy

6 participants