Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vectorcall by default #5804

Merged
merged 17 commits into from Jan 14, 2024
Merged

Conversation

da-woods
Copy link
Contributor

@da-woods da-woods commented Nov 7, 2023

f(a, b) goes to vectorcall (as before)
f(a, b, **kwds) now goes to "vectorcall_dict
f(a, b, c=c) now goes to vectorcall with kwnames

Will eventually fix #5784

@da-woods
Copy link
Contributor Author

da-woods commented Nov 7, 2023

This isn't ready, but I'm putting it up in the hope that vectorcall doesn't get refactored from under it (too much).

}

/////////////// PyObjectVectorCallKwBuilder.proto ////////////////
//@requires: PyObjectFastCall
Copy link
Contributor Author

@da-woods da-woods Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is that this can also be used by utility code. So something like the limited API int.from_bytes or code.update can be done efficiently by vectorcall when supported

f(a, b) goes to vectorcall (as before)
f(a, b, **kwds) now goes to "vectorcall_dict
f(a, b, c=c) now goes to vectorcall with kwnames

I'm fairly sure this is still a bit liable to crash.
I'm working on the basis that vectorcall is usually worthwhile
but unpacking methods is a choosable optimization
@da-woods
Copy link
Contributor Author

da-woods commented Nov 8, 2023

Edited with updated results 12-Nov

One thing I was worried about is if it increases the code size... It looks to give a small increase in code size but not dramatically different.

Building Cython then running strip gives:

                                            |  Size without  |  Size with vectorcall
Actions.cpython-311-x86_64-linux-gnu.so     | 65200          | 65200
Code.cpython-311-x86_64-linux-gnu.so        | 1046800        | 1055024
DFA.cpython-311-x86_64-linux-gnu.so         | 99568          | 99568
FlowControl.cpython-311-x86_64-linux-gnu.so | 541456         | 545584
FusedNode.cpython-311-x86_64-linux-gnu.so   | 408752         | 416944
Machines.cpython-311-x86_64-linux-gnu.so    | 135120         | 135120
Parsing.cpython-311-x86_64-linux-gnu.so     | 934960         | 955440
refnanny.cpython-311-x86_64-linux-gnu.so    | 69344          | 69344
Scanners.cpython-311-x86_64-linux-gnu.so    | 92112          | 92112
Scanning.cpython-311-x86_64-linux-gnu.so    | 264432         | 264432
_tempita.cpython-311-x86_64-linux-gnu.so    | 469840         | 473936
Transitions.cpython-311-x86_64-linux-gnu.so | 107984         | 107984
Visitor.cpython-311-x86_64-linux-gnu.so     | 290256         | 290256

@da-woods
Copy link
Contributor Author

da-woods commented Dec 4, 2023

This mainly needs me to add a few tests I think, and maybe find a suitable benchmark to work out if it's really worthwhile.

I think it probably is worth doing though

Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

Comment on lines 5989 to 5990
# pointer to a function if necessary. If the function has fused
# arguments, return the specific type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, this is just copying an existing comment, but I don't see the "fused types" part of it being done here. Seems outdated and worth removing.

# Specialised call to a (potential) PyMethodObject with non-constant argument tuple.
# Allows the self argument to be injected directly instead of repacking a tuple for it.
#
# function ExprNode the function/method object to call
# arg_tuple TupleNode the arguments for the args tuple
# kwdict ExprNode or Node keyword dictionary (if present)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant to write None here, right?

Suggested change
# kwdict ExprNode or Node keyword dictionary (if present)
# kwdict ExprNode or None keyword dictionary (if present)

Comment on lines 2083 to 2087
#if CYTHON_ASSUME_SAFE_MACROS
PyTuple_SET_ITEM(builder, n, key);
#else
if (unlikely(PyTuple_SetItem(builder, n, key))) return -1;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the helper macro that we already have for this.

Suggested change
#if CYTHON_ASSUME_SAFE_MACROS
PyTuple_SET_ITEM(builder, n, key);
#else
if (unlikely(PyTuple_SetItem(builder, n, key))) return -1;
#endif
if (unlikely(__Pyx_PyTuple_SET_ITEM(builder, n, key))) return -1;

Comment on lines 6601 to 6602
# the following is always true in Py3 (kept only for safety),
# but is false for unbound methods in Py2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems worth cleaning up along the way.

Cython/Compiler/ExprNodes.py Outdated Show resolved Hide resolved
Comment on lines 5058 to 5059
def _check_positional_args_for_method_call(self, positional_args):
# Do the positional args imply we can substitute a PyMethodCallNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The comment you provide makes a good method name:
Suggested change
def _check_positional_args_for_method_call(self, positional_args):
# Do the positional args imply we can substitute a PyMethodCallNode
def _should_use_PyMethodCallNode(self, positional_args):
  1. This seems more like a helper function than a method.
  2. Consider moving it to PyMethodCallNode as a static method, e.g. PyMethodCallNode.can_be_used_for_posargs(posargs).

return isinstance(positional_args, ExprNodes.TupleNode) and not (
positional_args.mult_factor or (positional_args.is_literal and len(positional_args.args) > 1))

def _check_function_may_be_method_call(self, function):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also just be a static method in PyMethodCallNode.

and tweak choices about when to optimize a little
@da-woods da-woods changed the title [WIP] use vectorcall by default Use vectorcall by default Dec 9, 2023
@da-woods da-woods marked this pull request as ready for review December 9, 2023 11:54
@da-woods
Copy link
Contributor Author

da-woods commented Dec 9, 2023

Obviously microbenchmarks can be tuned to prove whatever you want them to prove. But here's a microbenchmark to justify the PR:

import cython
from timeit import timeit

py_func = eval('lambda a1, a2: (a1, a2)')
# Note - not actually a cy_func when just running in Python
def cy_func(a1, a2):
    return a1, a2

def call_with_args(N: cython.int, f, a1, a2):
    for _ in range(N):
        f(a1, a2)

def call_with_arg_kwd(N: cython.int, f, a1, a2):
    for _ in range(N):
        f(a1, a2=a2)

def call_with_two_kwds(N: cython.int, f, a1, a2):
    for _ in range(N):
        f(a1=a1, a2=a2)

def call_with_one_dict_kwds(N: cython.int, f, a1, a2):
    kwds = dict(a2=a2)
    for _ in range(N):
        f(a1, **kwds)


for callee in [py_func, cy_func]:
    for caller in [call_with_args, call_with_arg_kwd, call_with_two_kwds, call_with_one_dict_kwds]:
        time = timeit("caller(10000, callee, 's', 5)", globals=dict(caller=caller, callee=callee), number=1000)
        print(f"{callee} {caller}: {time}")
Python 3.11
<function <lambda> at 0x7f399b760e00> <function call_with_args at 0x7f399b763d80>: 0.9782767149154097
<function <lambda> at 0x7f399b760e00> <function call_with_arg_kwd at 0x7f399b763e20>: 1.1857463290216401
<function <lambda> at 0x7f399b760e00> <function call_with_two_kwds at 0x7f399b763ec0>: 1.2286154520697892
<function <lambda> at 0x7f399b760e00> <function call_with_one_dict_kwds at 0x7f399b763f60>: 2.835505162947811
<function cy_func at 0x7f399baedf80> <function call_with_args at 0x7f399b763d80>: 1.004445847007446
<function cy_func at 0x7f399baedf80> <function call_with_arg_kwd at 0x7f399b763e20>: 1.220398498000577
<function cy_func at 0x7f399baedf80> <function call_with_two_kwds at 0x7f399b763ec0>: 1.2556711449287832
<function cy_func at 0x7f399baedf80> <function call_with_one_dict_kwds at 0x7f399b763f60>: 2.864341426989995

Cython master+Python 3.11
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_args at 0x7f485716a400>: 0.6562793910270557
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_arg_kwd at 0x7f48574c75e0>: 1.9850503330817446
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_two_kwds at 0x7f485716a4d0>: 2.152405772008933
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_one_dict_kwds at 0x7f485716a5a0>: 2.0803401669254526
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_args at 0x7f485716a400>: 0.32045664894394577
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_arg_kwd at 0x7f48574c75e0>: 1.5240388029487804
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_two_kwds at 0x7f485716a4d0>: 1.7001622419338673
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_one_dict_kwds at 0x7f485716a5a0>: 1.649793562013656

This PR+Python3.11
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_args at 0x7ff475d2e400>: 0.6487717010313645
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_arg_kwd at 0x7ff47603b5e0>: 0.9347794350469485
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_two_kwds at 0x7ff475d2e4d0>: 1.0077196899801493
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_one_dict_kwds at 0x7ff475d2e5a0>: 1.822032073047012
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_args at 0x7ff475d2e400>: 0.31454418902285397
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_arg_kwd at 0x7ff47603b5e0>: 0.5494329601060599
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_two_kwds at 0x7ff475d2e4d0>: 0.5668489240342751
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_one_dict_kwds at 0x7ff475d2e5a0>: 1.337843697052449

Roughly:

  • call_with_args doesn't change for this PR (which is expected, the code generated should be the same),
  • In Python 3.11 it's actually a pessimization to compiled with Cython on the current master for calling with variations of f(a1=a1, a2=a2) (call_with_arg_kwd and call_with_two_kwds). This PR gets Cython faster again.

So I think it is worthwhile, at least for this artificial benchmark. The benchmark doesn't test the unpacking side of it of course.

Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few more style comments, but would like to see this in 3.1 in the end.

if (!kwds) goto limited_bad;
if (PyDict_SetItemString(kwds, "signed", __Pyx_NewRef(Py_True))) goto limited_bad;
{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines 6665 to 6671
keyword_variable = ""
if use_kwnames:
keyword_variable = kwnames_temp
elif self.kwdict:
keyword_variable = self.kwdict.result()
if keyword_variable:
keyword_variable = ", %s" % keyword_variable
Copy link
Contributor

@scoder scoder Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems simple enough to just use f-strings.

Suggested change
keyword_variable = ""
if use_kwnames:
keyword_variable = kwnames_temp
elif self.kwdict:
keyword_variable = self.kwdict.result()
if keyword_variable:
keyword_variable = ", %s" % keyword_variable
if kwnames_temp:
keyword_variable = f", {kwnames_temp}"
elif self.kwdict:
keyword_variable = f", {self.kwdict.result()}"
else:
keyword_variable = ""

Comment on lines 6547 to 6555
use_kwnames = False
for arg in args:
arg.generate_evaluation_code(code)
if isinstance(self.kwdict, DictNode):
use_kwnames = True
for keyvalue in self.kwdict.key_value_pairs:
keyvalue.generate_evaluation_code(code)
elif self.kwdict:
self.kwdict.generate_evaluation_code(code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but these two cases might become a little simpler below if you store the key_value_pairs in a variable (maybe just kwargs?) instead of using a separate flag for the distinction.

Comment on lines 6635 to 6638
extra_keyword_args = ""
if use_kwnames:
extra_keyword_args = ("+ ((CYTHON_VECTORCALL) ? %d : 0)" %
len(self.kwdict.key_value_pairs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a candidate for an f-string to me.

Comment on lines +5095 to +5102
if not ExprNodes.PyMethodCallNode.can_be_used_for_posargs(
node.positional_args, has_kwargs=has_kwargs, kwds_is_dict_node=kwds_is_dict_node):
return node
function = node.function
if not ExprNodes.PyMethodCallNode.can_be_used_for_function(function):
return node

node = self.replace(node, ExprNodes.PyMethodCallNode.from_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the two can_be_… functions are always called before calling from_node(). That seems redundant. Why not let a PyMethodCallNode classmethod try to optimise the call (thus, maybe try_to_optimise_call()), and if it can't, return the original node unchanged?

Feel free to add a if node is (not) replacement short-cut to the replace() method to simplify this further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm disagreeing with this for now.

There's a good chunk of duplication between SimpleCallNode and GeneralCallNode, but there there's also significant differences (a CloneNode check, the name of the attribute used for the positional arguments, handling of keyword arguments).

So I think the logic need to be duplicated either way, either here or with an isinstance check in PyMethodCallNode. I think it's slightly better here (where the visitor has found the classes for us).

I don't have a hugely strong opinion - it just doesn't look like a significant improvement to me.

@da-woods
Copy link
Contributor Author

Other changes are made... I think the original work on this was before fstrings became allowed in Cython (hence they weren't used in places they could have been)

@da-woods
Copy link
Contributor Author

Codestyle is failing with

The sphinxcontrib.applehelp extension used by this project needs at least Sphinx v5.0; it therefore cannot be built with this version.

which I think isn't this PR's fault

Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to merge now.

@da-woods
Copy link
Contributor Author

There's something odd going on with CyCache on Python 3.12 and also line_trace on Windows+Python 3.12.

I'm not sure either should be the fault of this PR, but I don't see them on any other recent builds. I'll leave merging this until I've had time to investigate properly

@scoder
Copy link
Contributor

scoder commented Jan 14, 2024 via email

@da-woods
Copy link
Contributor Author

I at least can't reproduce the CyCache stuff locally so I'm going to merge this. I don't think the 3.1 branch is immediately going to be released so there should be time to fix it if needed I think.

@da-woods da-woods merged commit 7b8ad1c into cython:master Jan 14, 2024
58 of 63 checks passed
@da-woods da-woods deleted the vectorcall-everywhere branch January 14, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add (and use) vectorcall utility-code functions
2 participants