New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle duplicate indices in Numba implementation of AdvancedIncSubtensor1
#1081
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1081 +/- ##
==========================================
- Coverage 79.28% 79.24% -0.05%
==========================================
Files 159 152 -7
Lines 48111 48002 -109
Branches 10937 10922 -15
==========================================
- Hits 38145 38038 -107
- Misses 7454 7458 +4
+ Partials 2512 2506 -6
|
aesara/link/numba/dispatch/basic.py
Outdated
@numba_njit | ||
def check(z, vals): | ||
if max_idx >= len(z): | ||
raise IndexError(msg_pos) | ||
if max(-min_idx, 0) > len(z): | ||
raise IndexError(msg_neg) | ||
if len(idxs) != len(vals): | ||
raise ValueError("Incompatible dimensions during indexing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally try to avoid manual checks at the Numba level. Partly because the same checks can be introduced downstream by Numba itself. For instance, if the values are constants, then some such checks occur at compile-time (e.g. during constant folding-like optimization passes), and at run-time in the code generated by zip
, for
, and z[idx]
.
Also, we want to honor the Numba-level options for disabling things like bounds checks. If we add similar checks of our own, then we can end up effectively overriding such options.
There are definitely cases in which manual checks are warranted, but it's not clear to me whether or not this is one of those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of that is that if we do it manually, we can always set boundscheck=False
for that function without any safety issues (if the indices are known at compile time, as indicated in the code.)
I checked that this cleans up the generated assembly quite a bit, and also speeds up the execution of the function.
I've seen a lot of models where a big share of the execution time is in this op, so even smaller performance improvements matter.
I think even in general manual checks might be a good idea in many cases, often the optimizer will then optimize away the later checks and we end up with cleaner and faster asm code. Generally, the fewer checks happen within the loop the better.
Is it possible that codecov doesn't pick up changes when I force-push? Somehow I still see the old version of the patch in the coverage report... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you demonstrate how these manually added checks and constant case specializations are advantageous and/or not already covered by Numba's optimizations? I'm very hesitant to incorporate this additional complexity without some kind of demonstrable benefit.
AdvancedIncSubtensor1
Comparison of the constant index case with the non-constant index case: %env NUMBA_BOUNDSCHECK=0
import aesara
import aesara.tensor as at
import numpy as np
n, k = 100_000, 100
idxs_vals = np.random.randint(k, size=n)
#idxs_vals.sort()
x_vals = np.random.randn(k)
a_vals = np.random.randn(n)
x = at.dvector("x")
a = at.dvector("d")
idxs = at.vector("idx", dtype=np.int64)
out = at.inc_subtensor(x[idxs], a)
func = aesara.function([idxs, x, a], out, mode="NUMBA")
func_inner = func.vm.jit_fn
_ = func_inner(idxs_vals, x_vals, a_vals)
print("time with non-const index:")
%timeit func_inner(idxs_vals, x_vals, a_vals)
x = at.dvector("x")
a = at.dvector("d")
out = at.inc_subtensor(x[idxs_vals], a)
func = aesara.function([x, a], out, mode="NUMBA")
func_inner = func.vm.jit_fn
func_inner(x_vals, a_vals);
print("time with const index:")
%timeit func_inner(x_vals, a_vals)
So the non-const index case is about 1.4x slower. The asm of the non-const version without boundschecks looks ok, but it has to deal with the possibility of negative indices:
If we enable boundschecks, we get branching in the loop:
In comparison the constant-index case looks pretty nice:
And we get safe indexing even without |
This also raises the question, how we want to deal with out of bounds access by default. |
You've demonstrated that there's possibly a clear difference between constant and non-constant inputs, but we really need to know whether or not all the extra code is providing value, and only a comparison with and without it would help determine that. Also, it's better if you provide the generated LLVM IR instead of the ASM generated for your machine. |
1e6be72
to
81d42d0
Compare
I turned part of it into a rewrite, that makes it a bit cleaner. Apart from that I'm not really sure what extra code you are referring to. |
81d42d0
to
186ef9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Numba transpilations look better, but we don't need the manual bounds check additions, especially not at the Op
and/or rewrite levels.
Bounds-checking in Numba is entirely Numba's responsibility. If there's a bug, we can work around it with such additions, but I'm not aware of one; otherwise, we should support Numba options like bounds-checking in the ways we currently support other, similar options (e.g. aesara.config
).
numbas default for boundschecks is False, so unless we change that, this means that |
Aesara's responsibility is to faithfully preserve the results of explicitly defined computations for valid inputs when transpiling to Numba and other targets. Since most errors aren't specified in an Aesara graph—aside from Regardless, the scope of this PR—and its related issue—does not cover manual bounds checking. We can discuss it in a new issue or Discussion, though. |
I'm actually a bit shocked you would accept something in aesara where we access invalid memory for wrong user input by default. I am not going to remove boundchecks from the PR, I'd feel responsible for all the headache that would lead to.
Not sure where that is coming from. I'd say aesara's responsibility is to produce decent code, however that happens. And invalid memory access is certainly not that. |
You seem to be aware that bounds checking already exists in Numba, but you're also assigning the same responsibility to Aesara. If you have this much disgust for a lack of bounds checking, then you need to take that up with Numba—and a few other programming languages, as well. As I said above, if you found a bug in Numba's bounds checking that's solved by your implementation, please report it to them. We will always consider adding code that works around a current Numba bug or missing feature, but that doesn't seem to be the case here. If it is, tell us. As you mentioned, we can override Numba's defaults and compile these graphs with bounds checking turned on by default. That's a viable approach. That's also a completely independent change; one that does not factor into the issue addressed by this PR.
That's fine; we can take care of it.
That's in reference to the kinds of computations that should be expressed in our Numba implementations of Aesara nodes. If you read the rest of what I wrote, you'll see how it relates to explicit error handling like the kind you've added.
Unless the code was explicitly designed/intended to prevent invalid memory accesses caused by bad user input, such an error says nothing about the quality of the code. It only says something about the purpose and/or expectations of the code. Sometimes the purpose/expectations for code involves performance, and bounds checking can hinder performance quite a bit. In that case, bounds checking would not make for decent code. Regardless, unnecessary redundancy does not make code more decent, so, unless your additions are addressing something currently missing from Numba (as mentioned above), these changes are not more decent than the same code without the redundant bounds checking. |
I kind of hope we are just talking past each other here, so I'll just summarize a bit, and hopefully that helps:
This is why I set There is however a very common case where we can eliminate the boundschecks during graph execution and still have safe array access: If the array of indices is known at compile time we can simply pre-compute the maximum and minimum entry, and check at graph execution time if those are valid for the other input arrays. I proposed two different implementations of that, one where this happens entirely in the numba dispatch, and one where I moved it to the graph itself using a rewrite. |
We remove the default implementation for AdvancedIncSubtensor, since it produces incorrect results for duplicate indexes. This means we fall back to object mode for now until we have a proper implementation of that as well.
186ef9f
to
3a30756
Compare
cd905cf
to
f3f65ad
Compare
Sounds like numba gives us the option to enable boundchecks but doesn't do it by default. What's wrong with opting to use that in Aesara? We can have a numba-specific Aesara flag for disabling boundchecks introduced in numba Ops, if we don't want to add a numba specific variable at the Op level. Also what's with that |
Yes, if we want bounds checking in Numba, we need to use Numba's bound checking. That's it.
I think that variable is used by |
I give up. |
Isn't this what the current PR proposed? Or are you referring to the small |
This PR brings bounds checking to our graphs by adding a new property to Our Python and C backends already perform bounds checking, and extra work would be needed in order to provide versions that don't (and use the newly introduced property). Likewise, if we're going to do anything with bounds checking at the graph-level, we would need to do it for all Also, has anyone considered how adding a new property like that to Anyway, the approach in this PR consists of a much bigger set of changes than we need. We only need a simple implementation in the spirit of @numba_njit(boundscheck=boundscheck)
def advancedincsubtensor1(x, vals, idxs):
for idx, val in zip(idxs, vals):
x[idx] += val
return x where Also, if we're adding bounds checking for |
I've create #1143 to cover the issue associated with this PR. We can address the default bounds checking after that. In the meantime, if anyone wants (or ever wanted) bounds checking in Numba, they should be able to set |
Here are a few important guidelines and requirements to check before your PR can be merged:
pre-commit
is installed and set up.First, this PR removes the incorrect numba implementation of
AdvancedIncSubtensor
, so this will now just fall back to objectmode, and be slow and correct. (We should also provide a numba impl for this, but that will be a separate PR).But we do add a new implementation for
AdvancedIncSubtensor1
(the much more common case). Here, we also take advantage of the fact that sometimes we know the indices beforehand, so we can simplify bounds checks, and generate cleaner and faster assembly code.Also a one line fix for cumop accidentally made its way into the PR, but this is simple enough that maybe we can just keep it here? (it has its own commit).
The problem was that the
numba.prange
loop had data races.fixes #603