New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use vectorcall by default #5804
Conversation
This isn't ready, but I'm putting it up in the hope that vectorcall doesn't get refactored from under it (too much). |
} | ||
|
||
/////////////// PyObjectVectorCallKwBuilder.proto //////////////// | ||
//@requires: PyObjectFastCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is that this can also be used by utility code. So something like the limited API int.from_bytes
or code.update
can be done efficiently by vectorcall when supported
f(a, b) goes to vectorcall (as before) f(a, b, **kwds) now goes to "vectorcall_dict f(a, b, c=c) now goes to vectorcall with kwnames I'm fairly sure this is still a bit liable to crash.
I'm working on the basis that vectorcall is usually worthwhile but unpacking methods is a choosable optimization
a723c9f
to
7464ed7
Compare
Edited with updated results 12-Nov One thing I was worried about is if it increases the code size... It looks to give a small increase in code size but not dramatically different. Building Cython then running strip gives:
|
This mainly needs me to add a few tests I think, and maybe find a suitable benchmark to work out if it's really worthwhile. I think it probably is worth doing though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
Cython/Compiler/ExprNodes.py
Outdated
# pointer to a function if necessary. If the function has fused | ||
# arguments, return the specific type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, this is just copying an existing comment, but I don't see the "fused types" part of it being done here. Seems outdated and worth removing.
Cython/Compiler/ExprNodes.py
Outdated
# Specialised call to a (potential) PyMethodObject with non-constant argument tuple. | ||
# Allows the self argument to be injected directly instead of repacking a tuple for it. | ||
# | ||
# function ExprNode the function/method object to call | ||
# arg_tuple TupleNode the arguments for the args tuple | ||
# kwdict ExprNode or Node keyword dictionary (if present) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably meant to write None
here, right?
# kwdict ExprNode or Node keyword dictionary (if present) | |
# kwdict ExprNode or None keyword dictionary (if present) |
Cython/Utility/ObjectHandling.c
Outdated
#if CYTHON_ASSUME_SAFE_MACROS | ||
PyTuple_SET_ITEM(builder, n, key); | ||
#else | ||
if (unlikely(PyTuple_SetItem(builder, n, key))) return -1; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the helper macro that we already have for this.
#if CYTHON_ASSUME_SAFE_MACROS | |
PyTuple_SET_ITEM(builder, n, key); | |
#else | |
if (unlikely(PyTuple_SetItem(builder, n, key))) return -1; | |
#endif | |
if (unlikely(__Pyx_PyTuple_SET_ITEM(builder, n, key))) return -1; |
Cython/Compiler/ExprNodes.py
Outdated
# the following is always true in Py3 (kept only for safety), | ||
# but is false for unbound methods in Py2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems worth cleaning up along the way.
Cython/Compiler/Optimize.py
Outdated
def _check_positional_args_for_method_call(self, positional_args): | ||
# Do the positional args imply we can substitute a PyMethodCallNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The comment you provide makes a good method name:
def _check_positional_args_for_method_call(self, positional_args): | |
# Do the positional args imply we can substitute a PyMethodCallNode | |
def _should_use_PyMethodCallNode(self, positional_args): |
- This seems more like a helper function than a method.
- Consider moving it to
PyMethodCallNode
as a static method, e.g.PyMethodCallNode.can_be_used_for_posargs(posargs)
.
Cython/Compiler/Optimize.py
Outdated
return isinstance(positional_args, ExprNodes.TupleNode) and not ( | ||
positional_args.mult_factor or (positional_args.is_literal and len(positional_args.args) > 1)) | ||
|
||
def _check_function_may_be_method_call(self, function): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also just be a static method in PyMethodCallNode
.
Obviously microbenchmarks can be tuned to prove whatever you want them to prove. But here's a microbenchmark to justify the PR:
Roughly:
So I think it is worthwhile, at least for this artificial benchmark. The benchmark doesn't test the unpacking side of it of course. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few more style comments, but would like to see this in 3.1 in the end.
Cython/Utility/TypeConversion.c
Outdated
if (!kwds) goto limited_bad; | ||
if (PyDict_SetItemString(kwds, "signed", __Pyx_NewRef(Py_True))) goto limited_bad; | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython/Compiler/ExprNodes.py
Outdated
keyword_variable = "" | ||
if use_kwnames: | ||
keyword_variable = kwnames_temp | ||
elif self.kwdict: | ||
keyword_variable = self.kwdict.result() | ||
if keyword_variable: | ||
keyword_variable = ", %s" % keyword_variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems simple enough to just use f-strings.
keyword_variable = "" | |
if use_kwnames: | |
keyword_variable = kwnames_temp | |
elif self.kwdict: | |
keyword_variable = self.kwdict.result() | |
if keyword_variable: | |
keyword_variable = ", %s" % keyword_variable | |
if kwnames_temp: | |
keyword_variable = f", {kwnames_temp}" | |
elif self.kwdict: | |
keyword_variable = f", {self.kwdict.result()}" | |
else: | |
keyword_variable = "" |
Cython/Compiler/ExprNodes.py
Outdated
use_kwnames = False | ||
for arg in args: | ||
arg.generate_evaluation_code(code) | ||
if isinstance(self.kwdict, DictNode): | ||
use_kwnames = True | ||
for keyvalue in self.kwdict.key_value_pairs: | ||
keyvalue.generate_evaluation_code(code) | ||
elif self.kwdict: | ||
self.kwdict.generate_evaluation_code(code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but these two cases might become a little simpler below if you store the key_value_pairs
in a variable (maybe just kwargs
?) instead of using a separate flag for the distinction.
Cython/Compiler/ExprNodes.py
Outdated
extra_keyword_args = "" | ||
if use_kwnames: | ||
extra_keyword_args = ("+ ((CYTHON_VECTORCALL) ? %d : 0)" % | ||
len(self.kwdict.key_value_pairs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a candidate for an f-string to me.
if not ExprNodes.PyMethodCallNode.can_be_used_for_posargs( | ||
node.positional_args, has_kwargs=has_kwargs, kwds_is_dict_node=kwds_is_dict_node): | ||
return node | ||
function = node.function | ||
if not ExprNodes.PyMethodCallNode.can_be_used_for_function(function): | ||
return node | ||
|
||
node = self.replace(node, ExprNodes.PyMethodCallNode.from_node( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the two can_be_…
functions are always called before calling from_node()
. That seems redundant. Why not let a PyMethodCallNode
classmethod try to optimise the call (thus, maybe try_to_optimise_call()
), and if it can't, return the original node unchanged?
Feel free to add a if node is (not) replacement
short-cut to the replace()
method to simplify this further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm disagreeing with this for now.
There's a good chunk of duplication between SimpleCallNode
and GeneralCallNode
, but there there's also significant differences (a CloneNode
check, the name of the attribute used for the positional arguments, handling of keyword arguments).
So I think the logic need to be duplicated either way, either here or with an isinstance
check in PyMethodCallNode
. I think it's slightly better here (where the visitor has found the classes for us).
I don't have a hugely strong opinion - it just doesn't look like a significant improvement to me.
Other changes are made... I think the original work on this was before fstrings became allowed in Cython (hence they weren't used in places they could have been) |
Codestyle is failing with
which I think isn't this PR's fault |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to merge now.
There's something odd going on with CyCache on Python 3.12 and also I'm not sure either should be the fault of this PR, but I don't see them on any other recent builds. I'll leave merging this until I've had time to investigate properly |
There's something odd going on with CyCache on Python 3.12 and also `line_trace` on Windows+Python 3.12.
I've seen both fail in Py3.12 before, so it seems likely that it's the same issue here.
|
I at least can't reproduce the CyCache stuff locally so I'm going to merge this. I don't think the 3.1 branch is immediately going to be released so there should be time to fix it if needed I think. |
f(a, b) goes to vectorcall (as before)
f(a, b, **kwds) now goes to "vectorcall_dict
f(a, b, c=c) now goes to vectorcall with kwnames
Will eventually fix #5784