Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement multimethod call in the Python C API #170

Merged
merged 29 commits into from Jul 6, 2019

Conversation

@peterbell10
Copy link
Collaborator

commented Jun 24, 2019

I've been working on a C++ implementation of the main uarray multimethod call. I'm not familliar with the C API but this works well enough for my scipy tests to pass.

In my benchmarks the overhead is down from ~5 us to ~2.5 us. Fixes #156.

Show resolved Hide resolved setup.py Outdated
Show resolved Hide resolved setup.py Outdated

@peterbell10 peterbell10 force-pushed the peterbell10:c-api branch 2 times, most recently from 17e6e1f to 8b001b7 Jun 25, 2019

@peterbell10 peterbell10 force-pushed the peterbell10:c-api branch from 088df34 to e5aeeeb Jun 26, 2019

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

Ah, I see you discovered the descriptor protocol. 😅 That's a tough one.

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 27, 2019

I spent way too much time trying to figure out why adding a method called __get__ did nothing...

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 27, 2019

The weird thing is that I can reproduce the crash when doing a full pytest run, but none of the individual tests crash... Possibly a use after free that's GC dependant somewhere?

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

I usually use valgrind or a debugger like lldb to find the line where it crashes and work backwards from there.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

(Not the most fun thing in the world)

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 27, 2019

I already know it's a segfault in numpy's get_ufunc_arguments.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

So we know one of two things: Either it's in NumPy, or it's the ufunc or the arguments. Since you're handling the args fine, you probably need something for the ufunc, or self in this case.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

The weird thing is that I can reproduce the crash when doing a full pytest run, but none of the individual tests crash... Possibly a use after free that's GC dependant somewhere?

That is weird indeed... Either one is doing too many increfs, too many decrefs. Python has a refcounting GC, I don't know how cycles are handled in such a GC, maybe a delayed pass?

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

I guess we shouldn't be getting circular references at all, but here's a reference:https://stackoverflow.com/questions/10962393/how-does-pythons-garbage-collector-detect-circular-references

Not that it helps in a use-after-free.

@peterbell10 peterbell10 force-pushed the peterbell10:c-api branch from b470c99 to 98aa991 Jun 27, 2019

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 27, 2019

I've actually tried removing all of the Py_DECREFs from my code and it still segfaults. Maybe this isn't use after free at all.

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 27, 2019

Unless I'm not INCREFing something.

if (!res)
return {};

if (!PyTuple_Check(res) || PyTuple_Size(res) != 2)

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jun 27, 2019

Contributor

You might want to use Py_BuildValue and PyArg_ParseTuple more. Although it might bring up the overhead, not sure... https://docs.python.org/3/c-api/arg.html

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

Maybe we're somehow... Using steal where there should be a borrow?

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 27, 2019

I've tried changing all steals to refs and it still happens.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

Then let's try changing gears a bit... Is it in upstream NumPy? Are we supposed to INCREF before passing things into a ufunc somehow? Seems unlikely, since it's done from Python.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

Two more things: A dict returns NULL instead of setting an exception if it's accessed with an invalid key, so it might be a null pointer. And a tuple-access, out-of-bounds might do it.

Speaking of, the first could be the reason, given that it only happens on this repo where kwargs are actually being dropped.

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 28, 2019

For me, the tests always seem to fail at

unumpy/tests/test_numpy.py::test_functions_coerce[backend1-method10-args10-kwargs10]

Could there be something special about unumpy.var or DaskBackend

@hameerabbasi
Copy link
Contributor

left a comment

I looked for out-of-bounds accesses and this is one I found.

Show resolved Hide resolved uarray/_backend.py
Show resolved Hide resolved uarray/_uarray_dispatch.cxx Outdated
@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2019

I ran this locally, and I couldn't reproduce the segfault. But I see a lot of SystemError: <uarray multimethod 'something'> returned NULL without setting an error. bugs.

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 4, 2019

@hameerabbasi is their any chance you could look into the XND backend failures? I can't install it on my machine to debug.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2019

@peterbell10 Sure.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2019

It doesn't happen on macOS, for some reason. Let me reboot into Linux to check.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2019

So, it's a version issue, it's an issue with Xnd. It's happened a few times. I'd just mark the backend as xfail for now. Reported upstream at xnd-project/xnd#47.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2019

My bad... It seems two set_backends don't work inside each other and cause an infinite recursion.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2019

The second set_backend inside xnd_backend causes it to come back into Xnd. The innermost set_backend should dominate, not the outermost.

peterbell10 added some commits Jul 4, 2019

Revert "Mark XND as xfail"
This reverts commit bdc71ce.

@peterbell10 peterbell10 force-pushed the peterbell10:c-api branch from 91048d6 to da4fa1c Jul 4, 2019

Decref globals before reaching shutdown
DECREF cannot be run during shutdown as the python interpreter may have already
been finalized, in which case the call may segfault.

@peterbell10 peterbell10 force-pushed the peterbell10:c-api branch from 46dac2b to 2389d3b Jul 4, 2019

peterbell10 added some commits Jul 4, 2019

Fix iterator invalidation bug
We update the active backends whilst iterating over that same array. This can
invalidate our iterators and so we must use indices instead.
@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 5, 2019

You can mark this PR as ready-for-review and I'll give it a more thorough review. 😄

Remove contextvar wrapper
The code didn't actually work and the thread_local version was less efficient
because it was forced to imitate the contextvar interface.

@peterbell10 peterbell10 marked this pull request as ready for review Jul 5, 2019

Intern protocol identifiers at module init time, avoid GetAttrString
PyObject_GetAttrString internally allocates a new string for every lookup. This
brings the overhead down from ~1 us to ~0.7 us.
@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 5, 2019

In my benchmarks, the overhead is now down to ~700ns.

};


struct SkipBackendContext

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jul 6, 2019

Contributor

Would it make the code simpler to have an abstract class and reduce the duplication?

This comment has been minimized.

Copy link
@peterbell10

peterbell10 Jul 6, 2019

Author Collaborator

context_helper already covers most of the duplication. SetBackendContext and SkipBackendContext have slightly different arguments and data types so do need to be distinct.

I could maybe add something like a py_object base class that covers new, dealloc and holds PyObject_HEAD. Does that sound useful?

This comment has been minimized.

Copy link
@peterbell10

peterbell10 Jul 6, 2019

Author Collaborator

I tried creating a base class and while it did seem to work, I think it's probably a bad idea to try and mix C++ inheritance with the Python's c-style "inheritance". It makes very low level assumptions about the memory layout of types that don't necessarly hold for C++ types. I also get this warning that because of it:

uarray/_uarray_dispatch.cxx:846:12: warning: offsetof within non-standard-layout type ‘{anonymous}::Function’ is undefined [-Winvalid-offsetof]
   offsetof(Function, dict_),      /* tp_dictoffset */
{
PyObject_HEAD

context_helper<py_ref> ctx_;

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jul 6, 2019

Contributor

Maybe this should be the base class?


struct SkipBackendContext
{
PyObject_HEAD

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jul 6, 2019

Contributor

This would be moved into the base class.

This comment has been minimized.

Copy link
@peterbell10

peterbell10 Jul 6, 2019

Author Collaborator

I use context_helper on line 735 without creating any python objects.

This comment has been minimized.

Copy link
@hameerabbasi

hameerabbasi Jul 6, 2019

Contributor

Fair enough, still would be nice to have a base class.

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 6, 2019

Fantastic job! Couple of minor comments. Besides that, it might be nice to move ua.Dispatchable to C++, the attribute getters and the constructor might be a lot faster.

@peterbell10

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 6, 2019

it might be nice to move ua.Dispatchable to C++

Is it alright if I leave that for a follow-up PR?

@hameerabbasi

This comment has been minimized.

Copy link
Contributor

commented Jul 6, 2019

Is it alright if I leave that for a follow-up PR?

Of course!

@hameerabbasi hameerabbasi merged commit 6946570 into Quansight-Labs:master Jul 6, 2019

1 check passed

Quansight-Labs.uarray #20190705.2 succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.