Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum!, prod!, any!, and all! may silently return incorrect results #39385

Open
yurivish opened this issue Jan 24, 2021 · 26 comments
Open

sum!, prod!, any!, and all! may silently return incorrect results #39385

yurivish opened this issue Jan 24, 2021 · 26 comments
Labels
arrays [a, r, r, a, y, s] bug Indicates an unexpected problem or unintended behavior correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing

Comments

@yurivish
Copy link
Contributor

yurivish commented Jan 24, 2021

I noticed that sum! and prod! do not check for aliasing, leading to incorrect results when the same array is passed in both arguments. The incorrect behavior under aliasing is not documented and the result is a silent unexpected wrong result.

julia> a = [1 2 3];

julia> sum!(copy(a), a)
1×3 Matrix{Int64}:
 1  2  3

julia> sum!(a, a)
1×3 Matrix{Int64}:
 0  0  0

julia> a = [1 2 3];

julia> prod!(copy(a), a)
1×3 Matrix{Int64}:
 1  2  3

julia> prod!(a, a)
1×3 Matrix{Int64}:
 1  1  1

Not all mutating functions have this problem and cumsum! can be used this way without error:

julia> a = [1, 2, 3];

julia> cumsum!(a, a)
3-element Vector{Int64}:
 1
 3
 6

Other functions correctly flag the aliasing in the case of === arguments and throw an error:

julia> using LinearAlgebra

julia> a = rand(2, 2);

julia> mul!(a, a, a)
ERROR: ArgumentError: output matrix must not be aliased with input matrix

The examples above were generated on Julia 1.6.0-beta1 and I see the same behavior in version 1.5.3.

Edit: I noticed this issue also applies to any! and all!:

julia> let a = [true, false]
           any!(a, a)
       end
2-element Array{Bool,1}:
 0
 0

julia> let a = [true, false]
           any!(copy(a), a)
       end
2-element Array{Bool,1}:
 1
 0

julia> let a = [true, false]
           all!(a, a)
       end
2-element Array{Bool,1}:
 1
 1

julia> let a = [true, false]
           all!(copy(a), a)
       end
2-element Array{Bool,1}:
 1
 0
@yurivish yurivish changed the title sum! and prod! may silently return incorrect results sum!, prod!, any!, and all! may silently return incorrect results Jan 24, 2021
@JeffBezanson JeffBezanson added arrays [a, r, r, a, y, s] bug Indicates an unexpected problem or unintended behavior labels Jan 26, 2021
@barucden
Copy link
Contributor

So what is the correct approach here? Add an alias check similar to that in LinearAlgebra.mul!? If so, it might be enough to add the check here https://github.com/JuliaLang/julia/blob/master/base/reducedim.jl#L908.

@mcabbott
Copy link
Contributor

Maybe worth noting that

julia> sum!(a, a; init=false)
1×3 Matrix{Int64}:
 2  4  6

This is Base.mapreducedim!(identity, +, Base.initarray!(a, Base.add_sum, false, a), a), and the difference is whether initarray! does anything. (prod! and all! etc. come to the same place.)

@StefanKarpinski
Copy link
Sponsor Member

Can @mbauman's Base.mightalias help here?

@mbauman
Copy link
Sponsor Member

mbauman commented Jan 27, 2021

I mean, in all these cases it needs to collapse to a no-op. Can we avoid worrying about aliasing at all with a simple size(out) == size(in) && copyto!(out, in)?

@mcabbott
Copy link
Contributor

mcabbott commented Jan 27, 2021

I'm not sure what should happen, but a tricker example is:

julia> mat = [100 2 3; 4 5 6];

julia> sum!(view(mat,:,1), mat);

julia> mat
2×3 Matrix{Int64}:
  5  2  3
 11  5  6

The perhaps-comparable @views mul!(mat[:,[1,3]], mat[:,2:3], mat[:,1:2]) runs without complaint.

@nalimilan
Copy link
Member

@mbauman You're right that calling size(out) == size(in) && copyto!(out, in) is probably a good idea anyway, but as @mcabbott showed this is not enough to catch all problematic cases. Do you think we should call Base.mightalias then? And should we error or make a copy of the input if it aliases the output?

At any rate, docstrings should mention this.

@adienes
Copy link
Contributor

adienes commented Jul 2, 2023

should this bug be release blocking at any point? I feel that it is

  1. surprising
  2. easy to encounter
  3. uses only "basic" functions and objects

@oscardssmith
Copy link
Member

at the very least, we could document that this is a case where weirdness can occur

@ufechner7
Copy link

Please, add this to the 1.10 milestone... This is a correctness issue, not so difficult to fix, and correctness should have priority for any function in Base or the standard library.

@oscardssmith oscardssmith added this to the 1.10 milestone Aug 5, 2023
@nlw0
Copy link
Contributor

nlw0 commented Aug 7, 2023

should this bug be release blocking at any point? I feel that it is

1. surprising

2. easy to encounter

3. uses only "basic" functions and objects

I beg to differ. Mutable functions are prone to weird issues like that, and that's one reason to avoid them or be careful with their use. Good thing we name them with a scary warning ! character. Behavior when calling sum!(a,a) should be undefined, in my opinion. Furthermore, anyone using such a function is probably more concerned with speed than avoiding a result like that. Adding checks go against ultimate speed.

Completely agree it should be documented. But in my opinion this is a bit like expecting a specific defined behavior from reduce(-, 1:10).

@KristofferC
Copy link
Sponsor Member

KristofferC commented Aug 7, 2023

Why is this on the milestone? The milestone is not something you just slap on issues and they get magically fixed. Is this a regression vs 1.9?

@ufechner7
Copy link

ufechner7 commented Aug 7, 2023

Well, there is even a pull request already that might fix this issue... Not good for the reputation of Julia to delay this any further...
See: https://discourse.julialang.org/t/did-julia-community-do-something-to-improve-its-correctness/102515/21

@adienes
Copy link
Contributor

adienes commented Aug 7, 2023

forgive the naivete, but what even causes this? why does aliasing matter in the output array? that is, what makes

sum!(a, a)

so different from

a .= sum.(a)

@nlw0
Copy link
Contributor

nlw0 commented Aug 7, 2023

forgive the naivete, but what even causes this? why does aliasing matter in the output array? that is, what makes

sum!(a, a)

so different from

a .= sum.(a)

Any reducing function such as sum, in a mutable version, should feel free to use the output parameter as "scratch space" in my opinion. If you provide the pre-allocated output/scratch space as one of the inputs, you should bear the consequences. Should be undefined behavior. It's a very different situation from a mapping functor where the values are going to be consumed and immediately replaced by the output.

In this example, the non-mutating call to sum there will be a (stack allocated?) accumulator, and the broadcasting will allocate new memory for the output, and then the data will be copied from this new anonymous array into a. When using immutable functions, we assume new memory will be allocated as necessary. Mutable functions could have different guarantees about what happens to the parameters, of course. I personally think in the case of sum! it should be fair game for the implementation to alter the values during processing, with no guarantees that it only writes there in the last step. That's my opinion for this case at least.

@ufechner7
Copy link

ufechner7 commented Aug 7, 2023

Any reducing function such as sum, in a mutable version, should feel free to use the output parameter as "scratch space" in my opinion. If you provide the pre-allocated output/scratch space as one of the inputs, you should bear the consequences. Should be undefined behavior. It's a very different situation from a mapping functor where the values are going to be consumed and immediately replaced by the output.

In this example, the non-mutating call to sum there will be a (stack allocated?) accumulator, and the broadcasting will allocate new memory for the output, and then the data will be copied from this new anonymous array into a. When using immutable functions, we assume new memory will be allocated as necessary. Mutable functions could have different guarantees about what happens to the parameters, of course. I personally think in the case of sum! it should be fair game for the implementation to alter the values during processing, with no guarantees that it only writes there in the last step. That's my opinion for this case at least.

I do not agree. Functions in base should be as safe and correct as possible. If there would be a large performance penalty to add a check I would accept your argument, but if the average performance penalty is less than 10% we should add a check. Not all users of Julia have a PhD. Better safe than sorry. Julia should try to be as fast and safe as Rust and not as fast and unsafe as C. Unsafe high performance versions of standard functions can live in packages, or an enabling them via a macro might also be an option.

@LilithHafner LilithHafner added the correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing label Aug 7, 2023
@mikmoore
Copy link
Contributor

mikmoore commented Aug 7, 2023

I don't see this as remotely easy to "fix" in code. Note that our mul! also "stumbles" on this:

julia> using LinearAlgebra

julia> x = ones(1,1); mul!(x,x,x)
ERROR: ArgumentError: output matrix must not be aliased with input matrix
Stacktrace:
 [1] gemm_wrapper!(C::Matrix{Float64}, tA::Char, tB::Char, A::Matrix{Float64}, B::Matrix{Float64}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra ~/julia-1.9.1/share/julia/stdlib/v1.9/LinearAlgebra/src/matmul.jl:650

julia> mul!(x,x,x') # silent failure
1×1 Matrix{Float64}:
 0.0

BLAS is clearly not checking for aliasing internally. We have a courtesy check (that threw the initial error, based on checking for C === A || B === C), but as the last line shows we don't check every simple possibility, much less anything more elaborate.

I say mul! "stumbles" instead of "fails" because the docstring indicates (emphasis on the final sentence)

help?> mul!
search: mul! rmul! lmul! accumulate! muladd widemul accumulate module Module __module__ 'mutable struct' @__MODULE__

  mul!(Y, A, B) -> Y

  Calculates the matrix-matrix or matrix-vector product AB and stores the result in Y, overwriting the existing
  value of Y. Note that Y must not be aliased with either A or B.

Note that fixing these problems, in general, requires that we either throw an error or silently copy inputs (but some functions also need to observably mutate "inputs", so that solution could not work everywhere). Neither of those is what the user wanted, although I expect they would prefer the error to unspecified behavior. Even when it didn't cause deeper correctness problems, I can't promise the user would be happy we started copying stuff for them in a function they specifically called to avoid allocations.

We can do people the favor of courtesy checks, but it seems that these will inevitably throw on legal uses and/or miss illegal ones. We can try to catch "easy and obvious" violations (like the mul! example above did), but those should not be be relied upon and (IMO) should not come at any non-trivial run-time cost.

To my point on reliability, I would say Base.mightalias might be part of these courtesy checks except that it is documented to be conservative so could cause unnecessary and difficult-to-sidestep errors on totally acceptable uses. For example, it currently triggers on

julia> y = randn(4,4); y1 = view(y,[1,3],:); y2 = view(y,[2,4],:); Base.mightalias(y1,y2)
true

Documenting the no-alias assumption appears to be the only complete solution. It would be worth auditing all of our ! functions for undocumented no-alias requirements. This fix just requires adding similar notes to mul!'s Note that Y must not be aliased with either A or B.

@ufechner7
Copy link

Documenting the no-alias assumption appears to be the only complete solution.

Well, documenting is always good, but adding an incomplete solution is better than implementing no solution. If an incomplete solution makes it less likely that a newcomer hits this issue we should do it.

@KristofferC
Copy link
Sponsor Member

Not good for the reputation of Julia to delay this any further...

Again, the milestone is not for getting random things fixed faster. I'll remove it since it doesn't seem to be a release blocker.

@KristofferC KristofferC removed this from the 1.10 milestone Aug 7, 2023
@ufechner7
Copy link

This is bad and shows the attitude of leading team members to ignore correctness issues...

@gdalle
Copy link
Contributor

gdalle commented Aug 7, 2023

Started PR #50824 to amend the docs because it doesn't cost much, even if we later decide to check for some forms of aliasing.

Personally, I agree with @mikmoore that some edge cases will necessarily fall through the cracks. And I also don't think that comments like

This is bad and shows the attitude of leading team members to ignore correctness issues...

will lead anywhere. If anything, the fact that we're here discussing this shows that we all take it to heart, although opinions may diverge.

@gbaraldi
Copy link
Member

gbaraldi commented Aug 7, 2023

I agree, and potentially we could add those checks and even make them smarter (we potentially could and maybe should check for derived arrays like ranges or transposes because we can look for the parent) but the user is always able to get around those so the docs should reflect that.

@nlw0
Copy link
Contributor

nlw0 commented Aug 8, 2023

I do not agree. Functions in base should be as safe and correct as possible. If there would be a large performance penalty to add a check I would accept your argument, but if the average performance penalty is less than 10% we should add a check. Not all users of Julia have a PhD. Better safe than sorry. Julia should try to be as fast and safe as Rust and not as fast and unsafe as C. Unsafe high performance versions of standard functions can live in packages, or an enabling them via a macro might also be an option.

sum! is not really a function, it's a procedure, it's mutable. I love functional programming, the strictness and rigueur of Haskell and Scala. This strict FP view should be directed to functions such as sum. There's a well-known design principle behind procedures such as sum!: they are supposed to enable pre-allocation of arrays, and the priority is performing reductions of an array B and store the result in a. If the implementation happens to challenge some corner case, that's interesting and maybe we should think about the implications. But by no means does that immediately imply the procedure is "incorrect". Maybe this mutable function just happens to have a reduced domain compared to a pure function, and it's still useful and does exactly what it's intended to do. In my opinion, any calls to procedures that can mutate a parameter should be considered undefined if the same parameter (or view or whatever) is present in the other inputs. And that's just reasonable if we are talking about having a strict mathematical understanding of what operation is being implemented: strict mathematical understandings precludes mutation.

Maybe it's the case that these mutable reductions could be implemented in a way that is efficient and guarantee to behave like the identity function when the inputs have the same shape, even in the unreasonable case where the same array is the input and the output. In fact, the assignment to 0 in the summation sounds a little inefficient. I'll take that extra feature if I can have it. I'm not convinced this should be a general constraint when deciding what these procedures should do, though, and I believe it's likely to introduce complications in other instances.

I find it pretty upsetting that so much attention is being given to this, while issue #45000, that I still consider very serious and relevant, was dismissed out of hand.

@mkitti
Copy link
Contributor

mkitti commented Oct 23, 2023

I find it pretty upsetting that so much attention is being given to this, while issue #45000, that I still consider very serious and relevant, was dismissed out of hand.

There you were directed to mapfoldl which is also referred to in the documentation. Could you explain in #45000 why that is not a resolution to that issue?

@nlw0
Copy link
Contributor

nlw0 commented Oct 24, 2023

I find it pretty upsetting that so much attention is being given to this, while issue #45000, that I still consider very serious and relevant, was dismissed out of hand.

There you were directed to mapfoldl which is also referred to in the documentation. Could you explain in #45000 why that is not a resolution to that issue?

mapfoldl should be the default behavior for mapreduce and reduce. Just give me the same as I we would get with a simple for-loop, and get out of the way of important compiler optimizations tuned to this pattern. There should be no attempt to be "smart" about accuracy. My experiment demonstrates that accuracy can actually be better with the "naive" mapfoldl. The current implementation results in generally worse machine code generation, and it's not even the best for the intended specific purpose of better summation accuracy with floating-point. I can sympathize with the original intention, but it's just not a good idea. Especially if I'm potentially performing integer operations, for instance, and then the forced splitting of the data just doesn't make sense, while interfering in compiler optimizations.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Nov 14, 2023

I don't really think that issue is particularly relevant to this one, but linking it here did call my attention to it.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Nov 15, 2023

At least sum! isn't used by Julia itself so ideally should not/never have been part of Julia Base, but at least can then be "fixed" without slowing Julia down.

Any reducing function such as sum, in a mutable version, should feel free to use the output parameter as "scratch space" in my opinion.

Maybe, but all of those functions could be duplicated in an external package as is (right now, they could even be deprecated here), and people could do using aliasing_unsafe_sum, if a check is actually too costly to use such function from Base after modification:

I do not agree. Functions in base should be as safe and correct as possible. [..] Not all users of Julia have a PhD. Better safe than sorry.

I'm sympathetic to that, so should a check be added to at least the most important functions ASAP?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] bug Indicates an unexpected problem or unintended behavior correctness bug ⚠ Bugs that are likely to lead to incorrect results in user code without throwing
Projects
None yet
Development

Successfully merging a pull request may close this issue.