-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Throw an error when @threads captures a boxed variable
#57185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
is it really true that this is a "clear sign of a race condition" ? wouldn't it be more accurate to say it's a "clear sign that the compiler was unable to prove the absence of a race condition" |
|
No. The mere presence of a julia> let A = 1 # Some outer-local-variable that gets boxed
function wrong()
out = zeros(Int, 10)
Threads.@threads for i in 1:10
A = i # Redefine A, this causes A.contents to be mutated
sleep(rand()/10)
out[i] = A # Reference A, reading from the boxed A.conents that has since been mutated on another thread
end
out
end
function not_wrong()
out = zeros(Int, 10)
Threads.@threads for i in 1:10
local A = i
sleep(rand()/10)
out[i] = A
end
out
end
@info "Surprise!" wrong()' not_wrong()'
end
┌ Info: Surprise!
│ (wrong())' =
│ 1×10 adjoint(::Vector{Int64}) with eltype Int64:
│ 4 8 9 6 8 6 2 6 9 8
│ (not_wrong())' =
│ 1×10 adjoint(::Vector{Int64}) with eltype Int64:
└ 1 2 3 4 5 6 7 8 9 10
|
|
The idea of turning lexically mutated variables into a mutable |
|
Isn't section 1.3 in https://discourse.julialang.org/t/unclear-allocation-behaviour-with-built-in-sum/125304 an example of a I mean, it's pretty terrible, but it's not incorrect (I don't think). Edit: I suppose that example is one layer deeper than this introspects, but I think #15276 could rear its head similarly at the |
|
Ugh, you're right, that case probably isn't incorrect per-se (so far as I can see). I think if we were designing Could we at least stick a deprecation warning in here instead of an error maybe? (with the intention of making it an error in a hypothetical v2) |
|
A canonical example of non-racy boxes would be This is a very typical example that users are running into. Note that this is very nice julia code, apart from the Many other languages force a style of It is a breath of fresh air that julia doesn't care and doesn't force the user to do the SSA transform by hand. The only issue is that lowering doesn't understand the control-flow, and if you use closures, then users must do the SSA transform by hand, which violates the spirit of all other parts of the language :( PS. Maybe even more poignantly, If captures behaved like the rest of the language, this would lower identical to even before any compiler optimizations. |
|
This PR would be a great improvement for usability. I've been bitten by this bug before, and been among the worst I've had to debug, because its both nondeterministic, requires esoteric knowledge of Julia to understand (arguably even an esoteric bug, namely the slow closure bug), and goes against the general behaviour of the language where we expect that reassigning the same variable does not cause it to become a mutable reference behind your back. However, a straight up error is probably too breaking. Perhaps it could instead throw a warning? |
|
Now I know where my mysterious bug comes from! It is so important that this issue can be addressed ASAP. |
|
If this is considered too breaking to do by default, perhaps it's a good candidate for "strict mode". |
I am not sure this is true. Take the original example: let A = 1 # Some outer-local-variable that gets boxed
function wrong()
out = zeros(Int, 10)
Threads.@threads for i in 1:10
A = i # Redefine A, this causes A.contents to be mutated
sleep(rand()/10)
out[i] = A # Reference A, reading from the boxed A.conents that has since been mutated on another thread
end
out
end
end
|
|
The barrier for what is considered esoteric can vary a lot from user to user. In this case, I strongly suspect that knowing this will cause a race condition requires an above average understanding of julia's scoping rules. |
My point is that you don't need to know about closures or |
|
I think the problem here is that a lot of users are not thinking of it that way. They think that when they set At least, I've encountered multiple people who got tripped up by exactly this, and I think it's not such a crazy mistake to make if you don't understand the scoping rules well. |
|
I would also say that one of the first things we learn when programming is the distinction between mutation (writing to a variable) and assignment. That's something I teach early when I teach programming, and it's also explicitly mentioned early in the Julia manual. What is so confusing here is that something that really looks like assignment really is mutation. in particular, we learn early that assigning a variable inside a function will not cause the variable to change outside the function. That is, there is usually a difference between: A = [1]
foo(x) = (x[1] = 2)
foo(A) # mutationand A = 1
foo(x) = (x = 2)
foo(A) # assignment, no mutation |
|
also extra-confusing is that |
|
Thanks @adienes, yes I should have constructed my example with that ordering to make it extra clear just how confusing this can actually be sometimes. |
The example in particular wraps the |
|
This would be a good idea for a linter though. |
I mean, this really is still assignment — or more specifically, deciding you have a better use for an existing name. The trouble is that it's reusing a name from a shared scope. But perhaps that's precisely the level at which this should be tackled. Instead of an error or warning, what if Most updates to outer scope names are already racy and broken. This would just make them no-ops instead of races. And it's a very easy rule to document. I imagine there could still be some "safe" patterns of this in use in the wild, however, where folks are carefully guarding such updates with a mutex or some such... but that seems quite unlikely. |
this will lead to some very fun and frequent discussions about the differences between scopes
In seriousness though, for my tastes I find that proposal to be a bit too clever. I don't dispute that it would fix the majority of cases where this pattern is problematic, but it's also introducing a pattern I haven't seen precedent for in any other macro or language construct. it's also breaking, as it removes legitimate functionality (albeit probably rarely used) despite the fact that I do agree that this behavior is super counterintuitive and easily leads to bugs, I do think the right solution is a combination of
|
|
Even though this PR this will not be going forward (at least in its current form), I am implementing a similar change in OhMyThreads.jl in this PR: JuliaFolds2/OhMyThreads.jl#141 I would appreciate any thoughts or feedback there from people who had thoughts on this PR. The philosophy in that PR is that I am assuming that boxed captures are erroneous by default, but I am also including a ( julia> @allow_boxed_captures let
v = tmap(1:10) do i
A = i
sleep(rand())
A
end
A = 1 # oops, now everything is a race condition!
v
end
10-element Vector{Int64}:
4
2
8
4
3
6
6
2
6
6 |
This is my proposal to close #14948
Any time
Threads.@threadsconstructs a closure that contains aCore.Box, that should be a clear sign of a race condition, since the order at which thatBoxis read from or written to is undefined. This is a really easy, surprising, and hidden footgun for users to run into.I suspect that a large chunk of existing cases out in the wild have not been caught because in many cases, the bugs caused by this have a low probability of being seen (though exist nonetheless).
Basically, all I'm doing here is walking through the fieldtypes of the closure produced by
@threadsand if that closure has aCore.Boxin it, we throw an error that suggests what went wrong and how it can potentially be fixed. This is a pretty aggressive approach to the problem in #14948, but I think it's a serious enough problem that it's warranted.