Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cython makes little use of PySet APIs #2042

Closed
pitrou opened this issue Dec 12, 2017 · 8 comments
Closed

Cython makes little use of PySet APIs #2042

pitrou opened this issue Dec 12, 2017 · 8 comments
Milestone

Comments

@pitrou
Copy link
Contributor

pitrou commented Dec 12, 2017

If I cythonize the following code:

# cython: language_level=3

cpdef use_dict(dict d, u, v, w):
    if u in d:
        del d[u]
    d.pop(v, None)
    d[w] = 1
    return [x for x in d if x]

cpdef use_set(set s, u, v, w):
    if u in s:
        s.remove(u)
    s.discard(v)
    s.add(w)
    return [x for x in s if x]

Then most of use_dict exploits the PyDictconcrete APIs (except for the .pop call which goes through a Python method call), but use_set exploits none of the PySet concrete API except for PySet_Add.

@scoder
Copy link
Contributor

scoder commented Dec 15, 2017

The C-API functions are actually not equivalent to the Python methods. Using them would require some additional special handling in dedicated utility code functions for a set of sets. See the comments here:

("set", "PySet_Type", [BuiltinMethod("__contains__", "TO", "b", "PySequence_Contains"),
BuiltinMethod("clear", "T", "r", "PySet_Clear"),
# discard() and remove() have a special treatment for unhashable values
# BuiltinMethod("discard", "TO", "r", "PySet_Discard"),
# update is actually variadic (see Github issue #1645)
# BuiltinMethod("update", "TO", "r", "__Pyx_PySet_Update",
# utility_code=UtilityCode.load_cached("PySet_Update", "Builtins.c")),
BuiltinMethod("add", "TO", "r", "PySet_Add"),
BuiltinMethod("pop", "T", "O", "PySet_Pop")]),

And the implementation in CPython here:
https://github.com/python/cpython/blob/13ad3b7a82bf56d803fbe48ee5df6c4b08986c78/Objects/setobject.c#L1950-L1970

But I agree that the normal fast-path (key is hashable) is worth handling in C rather than Python method calls, so we should copy over the code from CPython and adapt it appropriately.

@pitrou
Copy link
Contributor Author

pitrou commented Dec 15, 2017

I could perhaps give it a try, but you'd have to tell me where to add the relevant tests and how to run the test suite to avoid having it take an overly long time :-)

@scoder
Copy link
Contributor

scoder commented Dec 15, 2017

Sure. The helper functions should go into Cython/Utility/Builtins.c. There are already a couple of set related functions at the end. The file is split into named sections, the ".proto" section is written early into the .c file, the unsuffixed section is the main implementation and goes in late.

Cython/Compiler/Builtin.py provides the mapping from builtins and their methods to C names/functions. It has various examples that load utility functions In the signature string, just take a look at the set.update() comment I mentioned. In the signature string, T refers to the self type, O is an arbitrary object.

You can add the tests to tests/run/set.pyx. The doctests are executed in Python, the rest of the file is compiled. To run the tests: python runtests.py --debug -vv run.set.

@pitrou
Copy link
Contributor Author

pitrou commented Dec 16, 2017

Ok, one oddity: Cython/Compiler/Builtin.py says set.__contains__ should generate a call to PySequence_Contains, but actually a call to __Pyx_PySequence_ContainsTF is generated. Is the set.__contains__ specification simply ignored?

pitrou added a commit to pitrou/cython that referenced this issue Dec 16, 2017
@pitrou
Copy link
Contributor Author

pitrou commented Dec 16, 2017

(I opened a PR at #2044; it tackles the remove and discard methods, but not __contains__ for the reason stated above)

@scoder
Copy link
Contributor

scoder commented Dec 17, 2017

__Pyx_PySequence_ContainsTF

Yes, Builtin.py only applies basic adaptations. Some further optimisations are done programmatically later on, often depending on input and output types, for example.

@pitrou
Copy link
Contributor Author

pitrou commented Dec 17, 2017

Is calling __Pyx_PySequence_ContainsTF instead of PySequence_Contains actually an optimization?

If I replace the __contains__ slot for sets with PySet_Contains, it's still __Pyx_PySequence_ContainsTF that is called...

@scoder
Copy link
Contributor

scoder commented Dec 17, 2017

I had to look that up myself, it's actually different again. It's a helper function that is used to implement both in and not in in one function, based on an additional argument. The tricky bit is that it needs to pass the exception return value through, which is why it's not just a simple macro. Performance-wise, it most likely makes zero difference.

Also note that the __contains__ method in Bultin.py really only handles cases where x.__contains__ is used explicitly. I added that years ago mostly to allow undoing the assignment of unbound methods that some people do in Python code to avoid a method lookup in loops. If Cython can see the underlying method call and understand that it can replace it with a C function call, that's obviously much faster than any unbound Python method call.

scoder added a commit that referenced this issue Dec 18, 2017
Issue #2042: add optimized versions of `x in set`, set.remove() and set.discard()
@scoder scoder closed this as completed in e801d21 Dec 18, 2017
@scoder scoder added this to the 0.28 milestone Dec 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants