Use vectorcall by default #5804

da-woods · 2023-11-07T08:28:43Z

f(a, b) goes to vectorcall (as before)
f(a, b, **kwds) now goes to "vectorcall_dict
f(a, b, c=c) now goes to vectorcall with kwnames

Will eventually fix #5784

da-woods · 2023-11-07T08:29:22Z

This isn't ready, but I'm putting it up in the hope that vectorcall doesn't get refactored from under it (too much).

da-woods · 2023-11-07T08:30:53Z

Cython/Utility/ObjectHandling.c

+}
+
+/////////////// PyObjectVectorCallKwBuilder.proto ////////////////
+//@requires: PyObjectFastCall


The intention is that this can also be used by utility code. So something like the limited API int.from_bytes or code.update can be done efficiently by vectorcall when supported

f(a, b) goes to vectorcall (as before) f(a, b, **kwds) now goes to "vectorcall_dict f(a, b, c=c) now goes to vectorcall with kwnames I'm fairly sure this is still a bit liable to crash.

I'm working on the basis that vectorcall is usually worthwhile but unpacking methods is a choosable optimization

da-woods · 2023-11-08T08:26:35Z

Edited with updated results 12-Nov

One thing I was worried about is if it increases the code size... It looks to give a small increase in code size but not dramatically different.

Building Cython then running strip gives:

                                            |  Size without  |  Size with vectorcall
Actions.cpython-311-x86_64-linux-gnu.so     | 65200          | 65200
Code.cpython-311-x86_64-linux-gnu.so        | 1046800        | 1055024
DFA.cpython-311-x86_64-linux-gnu.so         | 99568          | 99568
FlowControl.cpython-311-x86_64-linux-gnu.so | 541456         | 545584
FusedNode.cpython-311-x86_64-linux-gnu.so   | 408752         | 416944
Machines.cpython-311-x86_64-linux-gnu.so    | 135120         | 135120
Parsing.cpython-311-x86_64-linux-gnu.so     | 934960         | 955440
refnanny.cpython-311-x86_64-linux-gnu.so    | 69344          | 69344
Scanners.cpython-311-x86_64-linux-gnu.so    | 92112          | 92112
Scanning.cpython-311-x86_64-linux-gnu.so    | 264432         | 264432
_tempita.cpython-311-x86_64-linux-gnu.so    | 469840         | 473936
Transitions.cpython-311-x86_64-linux-gnu.so | 107984         | 107984
Visitor.cpython-311-x86_64-linux-gnu.so     | 290256         | 290256

…rywhere

da-woods · 2023-12-04T20:24:01Z

This mainly needs me to add a few tests I think, and maybe find a suitable benchmark to work out if it's really worthwhile.

I think it probably is worth doing though

scoder

Looks good overall.

scoder · 2023-12-06T09:25:53Z

Cython/Compiler/ExprNodes.py

+        # pointer to a function if necessary. If the function has fused
+        # arguments, return the specific type.


I know, this is just copying an existing comment, but I don't see the "fused types" part of it being done here. Seems outdated and worth removing.

scoder · 2023-12-06T09:26:59Z

Cython/Compiler/ExprNodes.py

    # Specialised call to a (potential) PyMethodObject with non-constant argument tuple.
    # Allows the self argument to be injected directly instead of repacking a tuple for it.
    #
    # function    ExprNode      the function/method object to call
    # arg_tuple   TupleNode     the arguments for the args tuple
+    # kwdict      ExprNode or Node  keyword dictionary (if present)


You probably meant to write None here, right?

Suggested change

# kwdict ExprNode or Node keyword dictionary (if present)

# kwdict ExprNode or None keyword dictionary (if present)

scoder · 2023-12-06T09:33:53Z

Cython/Utility/ObjectHandling.c

+#if CYTHON_ASSUME_SAFE_MACROS
+    PyTuple_SET_ITEM(builder, n, key);
+#else
+    if (unlikely(PyTuple_SetItem(builder, n, key))) return -1;
+#endif


Let's use the helper macro that we already have for this.

Suggested change

#if CYTHON_ASSUME_SAFE_MACROS

PyTuple_SET_ITEM(builder, n, key);

#else

if (unlikely(PyTuple_SetItem(builder, n, key))) return -1;

#endif

if (unlikely(__Pyx_PyTuple_SET_ITEM(builder, n, key))) return -1;

scoder · 2023-12-06T09:53:16Z

Cython/Compiler/ExprNodes.py

+            # the following is always true in Py3 (kept only for safety),
+            # but is false for unbound methods in Py2


This seems worth cleaning up along the way.

Cython/Compiler/ExprNodes.py

scoder · 2023-12-06T10:05:53Z

Cython/Compiler/Optimize.py

+    def _check_positional_args_for_method_call(self, positional_args):
+        # Do the positional args imply we can substitute a PyMethodCallNode


The comment you provide makes a good method name:

Suggested change

def _check_positional_args_for_method_call(self, positional_args):

# Do the positional args imply we can substitute a PyMethodCallNode

def _should_use_PyMethodCallNode(self, positional_args):

This seems more like a helper function than a method.

Consider moving it to PyMethodCallNode as a static method, e.g. PyMethodCallNode.can_be_used_for_posargs(posargs).

scoder · 2023-12-06T10:06:43Z

Cython/Compiler/Optimize.py

+        return isinstance(positional_args, ExprNodes.TupleNode) and not (
+            positional_args.mult_factor or (positional_args.is_literal and len(positional_args.args) > 1))
+
+    def _check_function_may_be_method_call(self, function):


This could also just be a static method in PyMethodCallNode.

and tweak choices about when to optimize a little

da-woods · 2023-12-09T12:21:14Z

Obviously microbenchmarks can be tuned to prove whatever you want them to prove. But here's a microbenchmark to justify the PR:

import cython
from timeit import timeit

py_func = eval('lambda a1, a2: (a1, a2)')
# Note - not actually a cy_func when just running in Python
def cy_func(a1, a2):
    return a1, a2

def call_with_args(N: cython.int, f, a1, a2):
    for _ in range(N):
        f(a1, a2)

def call_with_arg_kwd(N: cython.int, f, a1, a2):
    for _ in range(N):
        f(a1, a2=a2)

def call_with_two_kwds(N: cython.int, f, a1, a2):
    for _ in range(N):
        f(a1=a1, a2=a2)

def call_with_one_dict_kwds(N: cython.int, f, a1, a2):
    kwds = dict(a2=a2)
    for _ in range(N):
        f(a1, **kwds)


for callee in [py_func, cy_func]:
    for caller in [call_with_args, call_with_arg_kwd, call_with_two_kwds, call_with_one_dict_kwds]:
        time = timeit("caller(10000, callee, 's', 5)", globals=dict(caller=caller, callee=callee), number=1000)
        print(f"{callee} {caller}: {time}")

Python 3.11
<function <lambda> at 0x7f399b760e00> <function call_with_args at 0x7f399b763d80>: 0.9782767149154097
<function <lambda> at 0x7f399b760e00> <function call_with_arg_kwd at 0x7f399b763e20>: 1.1857463290216401
<function <lambda> at 0x7f399b760e00> <function call_with_two_kwds at 0x7f399b763ec0>: 1.2286154520697892
<function <lambda> at 0x7f399b760e00> <function call_with_one_dict_kwds at 0x7f399b763f60>: 2.835505162947811
<function cy_func at 0x7f399baedf80> <function call_with_args at 0x7f399b763d80>: 1.004445847007446
<function cy_func at 0x7f399baedf80> <function call_with_arg_kwd at 0x7f399b763e20>: 1.220398498000577
<function cy_func at 0x7f399baedf80> <function call_with_two_kwds at 0x7f399b763ec0>: 1.2556711449287832
<function cy_func at 0x7f399baedf80> <function call_with_one_dict_kwds at 0x7f399b763f60>: 2.864341426989995

Cython master+Python 3.11
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_args at 0x7f485716a400>: 0.6562793910270557
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_arg_kwd at 0x7f48574c75e0>: 1.9850503330817446
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_two_kwds at 0x7f485716a4d0>: 2.152405772008933
<function <lambda> at 0x7f485719cea0> <cyfunction call_with_one_dict_kwds at 0x7f485716a5a0>: 2.0803401669254526
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_args at 0x7f485716a400>: 0.32045664894394577
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_arg_kwd at 0x7f48574c75e0>: 1.5240388029487804
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_two_kwds at 0x7f485716a4d0>: 1.7001622419338673
<cyfunction cy_func at 0x7f485716a260> <cyfunction call_with_one_dict_kwds at 0x7f485716a5a0>: 1.649793562013656

This PR+Python3.11
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_args at 0x7ff475d2e400>: 0.6487717010313645
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_arg_kwd at 0x7ff47603b5e0>: 0.9347794350469485
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_two_kwds at 0x7ff475d2e4d0>: 1.0077196899801493
<function <lambda> at 0x7ff475d60ea0> <cyfunction call_with_one_dict_kwds at 0x7ff475d2e5a0>: 1.822032073047012
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_args at 0x7ff475d2e400>: 0.31454418902285397
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_arg_kwd at 0x7ff47603b5e0>: 0.5494329601060599
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_two_kwds at 0x7ff475d2e4d0>: 0.5668489240342751
<cyfunction cy_func at 0x7ff475d2e260> <cyfunction call_with_one_dict_kwds at 0x7ff475d2e5a0>: 1.337843697052449

Roughly:

call_with_args doesn't change for this PR (which is expected, the code generated should be the same),
In Python 3.11 it's actually a pessimization to compiled with Cython on the current master for calling with variations of f(a1=a1, a2=a2) (call_with_arg_kwd and call_with_two_kwds). This PR gets Cython faster again.

So I think it is worthwhile, at least for this artificial benchmark. The benchmark doesn't test the unpacking side of it of course.

scoder

I have a few more style comments, but would like to see this in 3.1 in the end.

scoder · 2024-01-11T14:35:43Z

Cython/Utility/TypeConversion.c

-            if (!kwds) goto limited_bad;
-            if (PyDict_SetItemString(kwds, "signed", __Pyx_NewRef(Py_True))) goto limited_bad;
+        {
+


Suggested change

scoder · 2024-01-11T14:40:30Z

Cython/Compiler/ExprNodes.py

+        keyword_variable = ""
+        if use_kwnames:
+            keyword_variable = kwnames_temp
+        elif self.kwdict:
+            keyword_variable = self.kwdict.result()
+        if keyword_variable:
+            keyword_variable = ", %s" % keyword_variable


This seems simple enough to just use f-strings.

Suggested change

keyword_variable = ""

if use_kwnames:

keyword_variable = kwnames_temp

elif self.kwdict:

keyword_variable = self.kwdict.result()

if keyword_variable:

keyword_variable = ", %s" % keyword_variable

if kwnames_temp:

keyword_variable = f", {kwnames_temp}"

elif self.kwdict:

keyword_variable = f", {self.kwdict.result()}"

else:

keyword_variable = ""

scoder · 2024-01-11T14:43:19Z

Cython/Compiler/ExprNodes.py

+        use_kwnames = False
        for arg in args:
            arg.generate_evaluation_code(code)
+        if isinstance(self.kwdict, DictNode):
+            use_kwnames = True
+            for keyvalue in self.kwdict.key_value_pairs:
+                keyvalue.generate_evaluation_code(code)
+        elif self.kwdict:
+            self.kwdict.generate_evaluation_code(code)


Not sure, but these two cases might become a little simpler below if you store the key_value_pairs in a variable (maybe just kwargs?) instead of using a separate flag for the distinction.

scoder · 2024-01-11T14:47:50Z

Cython/Compiler/ExprNodes.py

+        extra_keyword_args = ""
+        if use_kwnames:
+            extra_keyword_args = ("+ ((CYTHON_VECTORCALL) ? %d : 0)" %
+                len(self.kwdict.key_value_pairs))


Looks like a candidate for an f-string to me.

scoder · 2024-01-11T15:00:28Z

Cython/Compiler/Optimize.py

+        if not ExprNodes.PyMethodCallNode.can_be_used_for_posargs(
+                node.positional_args, has_kwargs=has_kwargs, kwds_is_dict_node=kwds_is_dict_node):
+            return node
+        function = node.function
+        if not ExprNodes.PyMethodCallNode.can_be_used_for_function(function):
+            return node
+
+        node = self.replace(node, ExprNodes.PyMethodCallNode.from_node(


It seems that the two can_be_… functions are always called before calling from_node(). That seems redundant. Why not let a PyMethodCallNode classmethod try to optimise the call (thus, maybe try_to_optimise_call()), and if it can't, return the original node unchanged?

Feel free to add a if node is (not) replacement short-cut to the replace() method to simplify this further.

I'm disagreeing with this for now.

There's a good chunk of duplication between SimpleCallNode and GeneralCallNode, but there there's also significant differences (a CloneNode check, the name of the attribute used for the positional arguments, handling of keyword arguments).

So I think the logic need to be duplicated either way, either here or with an isinstance check in PyMethodCallNode. I think it's slightly better here (where the visitor has found the classes for us).

I don't have a hugely strong opinion - it just doesn't look like a significant improvement to me.

da-woods · 2024-01-13T09:02:49Z

Other changes are made... I think the original work on this was before fstrings became allowed in Cython (hence they weren't used in places they could have been)

da-woods · 2024-01-13T09:10:52Z

Codestyle is failing with

The sphinxcontrib.applehelp extension used by this project needs at least Sphinx v5.0; it therefore cannot be built with this version.

which I think isn't this PR's fault

scoder

Looks fine to merge now.

da-woods · 2024-01-13T20:02:27Z

There's something odd going on with CyCache on Python 3.12 and also line_trace on Windows+Python 3.12.

I'm not sure either should be the fault of this PR, but I don't see them on any other recent builds. I'll leave merging this until I've had time to investigate properly

scoder · 2024-01-14T07:22:42Z

There's something odd going on with CyCache on Python 3.12 and also `line_trace` on Windows+Python 3.12.

I've seen both fail in Py3.12 before, so it seems likely that it's the same issue here.

da-woods · 2024-01-14T09:34:22Z

I at least can't reproduce the CyCache stuff locally so I'm going to merge this. I don't think the 3.1 branch is immediately going to be released so there should be time to fix it if needed I think.

da-woods commented Nov 7, 2023

View reviewed changes

da-woods added 3 commits November 7, 2023 20:01

(WIP) use vectorcall by default

513bf17

f(a, b) goes to vectorcall (as before) f(a, b, **kwds) now goes to "vectorcall_dict f(a, b, c=c) now goes to vectorcall with kwnames I'm fairly sure this is still a bit liable to crash.

Apply to "int.from_bytes"

e410722

Detach decision to use vectorcall from unpack methods

7464ed7

I'm working on the basis that vectorcall is usually worthwhile but unpacking methods is a choosable optimization

da-woods force-pushed the vectorcall-everywhere branch from a723c9f to 7464ed7 Compare November 7, 2023 20:36

Small fixes

2295117

da-woods and others added 9 commits November 9, 2023 19:55

Add gotref and string check

b5bf080

Merge remote-tracking branch 'real_origin/master' into vectorcall-eve…

ad07727

…rywhere

Avoid some unnecessary checks

7d67334

Add a no-copy shortcut to merged dict node

0d83a70

Fix simplecallnode optimization

eec7bfe

Use old name appropriately

38e701c

Fix self-build

1f37586

Move function_type

f80394e

Merge branch 'master' into vectorcall-everywhere

050036f

scoder reviewed Dec 6, 2023

View reviewed changes

da-woods added 2 commits December 9, 2023 10:56

Comments from review

4c5ea02

Add tests

955886c

and tweak choices about when to optimize a little

da-woods changed the title ~~[WIP] use vectorcall by default~~ Use vectorcall by default Dec 9, 2023

da-woods marked this pull request as ready for review December 9, 2023 11:54

scoder reviewed Jan 11, 2024

View reviewed changes

scoder added this to the 3.1 milestone Jan 12, 2024

scoder added enhancement Code Generation labels Jan 12, 2024

Comments from review

0dde3c9

Replace two more occurrences of self.kwargs.key_value_pairs

5df97d0

scoder approved these changes Jan 13, 2024

View reviewed changes

da-woods merged commit 7b8ad1c into cython:master Jan 14, 2024
58 of 63 checks passed

da-woods deleted the vectorcall-everywhere branch January 14, 2024 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use vectorcall by default #5804

Use vectorcall by default #5804

da-woods commented Nov 7, 2023 •

edited

da-woods commented Nov 7, 2023 •

edited

da-woods Nov 7, 2023 •

edited

da-woods commented Nov 8, 2023 •

edited

da-woods commented Dec 4, 2023

scoder left a comment

scoder Dec 6, 2023

scoder Dec 6, 2023

scoder Dec 6, 2023

scoder Dec 6, 2023

scoder Dec 6, 2023

scoder Dec 6, 2023

da-woods commented Dec 9, 2023

scoder left a comment

scoder Jan 11, 2024

scoder Jan 11, 2024 •

edited

scoder Jan 11, 2024

scoder Jan 11, 2024

scoder Jan 11, 2024

da-woods Jan 13, 2024

da-woods commented Jan 13, 2024

da-woods commented Jan 13, 2024

scoder left a comment

da-woods commented Jan 13, 2024

scoder commented Jan 14, 2024 via email

da-woods commented Jan 14, 2024

		# pointer to a function if necessary. If the function has fused
		# arguments, return the specific type.

	# kwdict ExprNode or Node keyword dictionary (if present)
	# kwdict ExprNode or None keyword dictionary (if present)

		# the following is always true in Py3 (kept only for safety),
		# but is false for unbound methods in Py2

		def _check_positional_args_for_method_call(self, positional_args):
		# Do the positional args imply we can substitute a PyMethodCallNode

	def _check_positional_args_for_method_call(self, positional_args):
	# Do the positional args imply we can substitute a PyMethodCallNode
	def _should_use_PyMethodCallNode(self, positional_args):

Use vectorcall by default #5804

Use vectorcall by default #5804

Conversation

da-woods commented Nov 7, 2023 • edited

da-woods commented Nov 7, 2023 • edited

da-woods Nov 7, 2023 • edited

Choose a reason for hiding this comment

da-woods commented Nov 8, 2023 • edited

da-woods commented Dec 4, 2023

scoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

da-woods commented Dec 9, 2023

scoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scoder Jan 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

da-woods commented Jan 13, 2024

da-woods commented Jan 13, 2024

scoder left a comment

Choose a reason for hiding this comment

da-woods commented Jan 13, 2024

scoder commented Jan 14, 2024 via email

da-woods commented Jan 14, 2024

da-woods commented Nov 7, 2023 •

edited

da-woods commented Nov 7, 2023 •

edited

da-woods Nov 7, 2023 •

edited

da-woods commented Nov 8, 2023 •

edited

scoder Jan 11, 2024 •

edited