Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to fix the sparsity pattern of a sparse matrix #190

Closed
ChrisRackauckas opened this issue Jul 17, 2022 · 23 comments
Closed

Ability to fix the sparsity pattern of a sparse matrix #190

ChrisRackauckas opened this issue Jul 17, 2022 · 23 comments

Comments

@ChrisRackauckas
Copy link

We had a long discussion in Slack over the following example:

julia> using LinearAlgebra, SparseArrays

julia> J = sparse(Diagonal(ones(4)))
4×4 SparseMatrixCSC{Float64, Int64} with 4 stored entries:
 1.0           
     1.0       
         1.0   
             1.0

julia> W = sparse(Diagonal(ones(4)));

julia> J[4,4] = 0
0

julia> gamma = 1.0
1.0

julia> W .= gamma .* J
4×4 SparseMatrixCSC{Float64, Int64} with 3 stored entries:
 1.0           
     1.0       
         1.0   
              

Personally, I thought the answer would be

julia> W .= gamma .* J
4×4 SparseMatrixCSC{Float64, Int64} with 3 stored entries:
 1.0           
     1.0       
         1.0   
              0.0

but I was wrong: .= can change the sparsity pattern of the matrix W. There is a clear argument for this behavior: it stems from the fact that setindex! can change the sparsity pattern, and .= is just doing setindex! stuff, so therefore it can change the sparsity pattern (not just add, but also decrease structural non-zeros). And this is kind of documented:

https://docs.julialang.org/en/v1/stdlib/SparseArrays/

Julia has support for sparse vectors and sparse matrices in the SparseArrays stdlib module. Sparse arrays are arrays that contain enough zeros that storing them in a special data structure leads to savings in space and execution time, compared to dense arrays.

You can read that and say "that contain enough zeros", it didn't need them all, therefore it dropped them. QED, end of story.

But.

There seem to be a lot of cases that may not want this behavior. Here's a quick story. I was happy symbolic factorization was added with JuliaLang/julia#33738 and was using it in SciML for the last few years, or so I thought. But as SciML does, it wanted more power so it directly started doing the operations (https://github.com/SciML/LinearSolve.jl/blob/v1.22.1/src/factorization.jl#L82-L108) and curiously I started getting an error downstream "LoadError: ArgumentError: pattern of the matrix changed". I thought "I only use in-place operations, so the sparsity pattern cannot change?" I was wrong. SciML/OrdinaryDiffEq.jl#1702 is the fix with the MWE on the top. What's going on is that at some time instances in the ODE solve, J can have a non-structural zero (the Jacobian just happens to be zero there), and so @. W = dtgamma*J will change the sparsity pattern of W. The answer of course is, if I wanted the sparsity pattern to not change, then @. W.nzval = dtgamma*J.nzval needs to be done for safety.

However, this example opens up a whole can of worms.

  1. Since W is modified in place, it's actually modifying the prob.f.jac_prototype that was given by user, and so if I wanted to have a safety check that the sparsity pattern has not changed, I would need to create a deepcopy version of what the user gives me and check the sparsity patterns back against that source of truth (since otherwise the pointers would be the same and .= would then change the sparsity pattern on the source of truth).
  2. It's easy to see other codes running into this in obscure ways. Any other Newton code like NLsolve or Optim definitely don't have checks for if sparsity pattern doesn't change, I think because they assume the matrix in the TwiceDifferentiable object is "fixed", but in reality the mutation can change the user-given pattern and thus these codes can hit this. A more obscure example is that gradient accumulation in Flux has code like x .= x .+ learning_rate .* grad and so if you happen to hit cases where the grad is the wrong value, you can have structural non-zeros get deleted during gradient descent, and they will never come back. So something like GeometricFlux.jl can have a silent issue about its weight matrices changing structural without noticing. The Flux codes definitely don't have checks for sparsity pattern changes (because I don't think until today there was a clear reason to do so!). These cases aren't fatal, but it's a bit worrisome that it's not quite the correct algorithm, but it can work anyways.
  3. This is a piece of code on an example where @YingboMa, @chriselrod, @Wimmerer and I have looked at more than a few times, and no one ever noticed the symbolic factorizations were just happening all of the time, or that this broadcast wasn't a good idea. In the discussions, it was brought up another case (Support sparse arrays/matrices as output? FluxML/Zygote.jl#163) where some good Julia programmers accidentally ran into a similar behavior and just didn't realize this could cause sparsity patterns to change. So I don't think it's a case of only one person not knowing, or that sufficiently good Julia programmers won't hit this issue, it seems like it can happen to anyone that isn't careful enough.

So what to do about it?

In the Slack discussion we settled on two possible solutions. Since there is a legitimately good reason to allow for users to modify the sparsity patterns of matrices, it would be good to have an option for a fixed sparsity pattern mode that can be enabled for packages which assume this. This can be done "on the matrix", it doesn't need a global mode like #189. The two options are then:

  1. Put a Boolean into SparseMatrixCSC for fixed
  2. Make FixedSparseMatrixCSC be a separate type

For simplicity of implementation, I'm for (1) but I can see a case for (2). Either way, I don't think there is a reasonable option for (3) a new type that does not live inside of SparseArrays.jl because this is something that would be sensitive to all changes inside of this package, i.e. every PR would need to be checked and a separate overload is necessary for FixedSparseMatrixCSC in any case that could change the sparsity pattern, so this would be unmaintainable if it's not updated simultaneously to the non-checking version. For the most part, the implementation could just extend what's already there: if it's a bool there's a few spots where if A.fixed; error(); end, otherwise there's a few AbstractSparseMatrixCSC dispatches that would need a specialization to an error if FixedSparseMatrixCSC. Handling broadcast is probably the tricky part.

@ChrisRackauckas
Copy link
Author

@mcabbott noticed that it's not entirely consistently applied either. A few examples. mul! changes storage:

julia> x * y
3-element SparseVector{Float64, Int64} with 2 stored entries:
  [1]  =  0.0118453
  [2]  =  0.392584

julia> z = spzeros(3);

julia> mul!(z, x, y); z
3-element SparseVector{Float64, Int64} with 2 stored entries:
  [1]  =  0.0118453
  [2]  =  0.392584

x .= 0 doesn't change structural zeros, but x .= x .* 0 does. This is because:

julia> x = sprand(2,5,0.3);

julia> copyto!(x, zeros(10)); x
2×5 SparseMatrixCSC{Float64, Int64} with 4 stored entries:
     0.0      0.0    
 0.0          0.0    

julia> copyto!(x, rand(10)); x
2×5 SparseMatrixCSC{Float64, Int64} with 10 stored entries:
 0.0664767  0.0910088  0.942327  0.251953  0.470191
 0.342159   0.741781   0.173199  0.768339  0.447902

fill! doesn't change values:

julia> using SparseArrays

julia> sprand(4,4,0.01)
4×4 SparseMatrixCSC{Float64, Int64} with 0 stored entries:
              
              
              
              

julia> sprand(4,4,0.75)
4×4 SparseMatrixCSC{Float64, Int64} with 12 stored entries:
 0.952781  0.803769           0.989751
 0.739467  0.514578  0.634868  0.165134
 0.770259                    0.150503
 0.858668           0.258699  0.930328

julia> A = sprand(4,4,0.75)
4×4 SparseMatrixCSC{Float64, Int64} with 12 stored entries:
          0.499803   0.6468    0.481953
          0.0925575  0.493383  0.84902
 0.184849  0.0156402  0.713497  0.2682
          0.598524   0.755637   

julia> fill!(A,false)
4×4 SparseMatrixCSC{Float64, Int64} with 12 stored entries:
     0.0  0.0  0.0
     0.0  0.0  0.0
 0.0  0.0  0.0  0.0
     0.0  0.0   

@rayegun
Copy link
Member

rayegun commented Jul 17, 2022

I would prefer the Abstract path. It would also help find errors that would impact SuiteSparseGraphBLAS.jl

@SobhanMP
Copy link
Member

reminds me of #89 🙃, if my opinion is worth anything i'd go with option b.

@briochemc
Copy link

My 2 cents FWIW because not sure this fits perfectly here: .- changing the sparsity structure really bothered me 1.5 years ago when I was trying to build an in-place Jacobian for which I seeked help on discourse. I think this can be illustrated with something like this (creating an in-place Jacobian matrix function f!(J, x) that depends on some parameter x:

julia> const D1 = sparse(I(2))
2×2 SparseMatrixCSC{Bool, Int64} with 2 stored entries:
 1  
   1

julia> f!(J, x) = J .= D1 .+ x .* D1
f! (generic function with 1 method)

julia> J = 1D1
2×2 SparseMatrixCSC{Int64, Int64} with 2 stored entries:
 1  
   1

julia> f!(J, 2.0) # as expected
2×2 SparseMatrixCSC{Int64, Int64} with 2 stored entries:
 3  
   3

julia> f!(J, -1.0) # structural non-zeros be gone
2×2 SparseMatrixCSC{Int64, Int64} with 0 stored entries:
   
   

After finding a complicated solution I simply gave up and removed all the in-place stuff (which I did not need for performance but was needed to use SIAMFANLEquations.jl solvers). A "safe" fixed-sparsitty pattern type/flag would have been useful back then for me!

@ChrisRackauckas
Copy link
Author

fill!(A,false) doesn't change the sparsity pattern, but fill!(A,true) does.

julia> using SparseArrays

julia> A = sprand(4,4,0.75)
4×4 SparseMatrixCSC{Float64, Int64} with 11 stored entries:
 0.938189  0.0258504  0.541015  0.507658
                             0.624827
                    0.506468  0.292268
 0.601504  0.302011   0.441514  0.421854

julia> fill!(A,false)
4×4 SparseMatrixCSC{Float64, Int64} with 11 stored entries:
 0.0  0.0  0.0  0.0
             0.0
         0.0  0.0
 0.0  0.0  0.0  0.0

julia> fill!(A,true)
4×4 SparseMatrixCSC{Float64, Int64} with 16 stored entries:
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0

julia> A = sprand(4,4,0.75)
4×4 SparseMatrixCSC{Float64, Int64} with 15 stored entries:
 0.414613   0.758689            0.484661
 0.0849077  0.995499  0.0714851  0.281265
 0.40888    0.2467    0.0441692  0.820482
 0.242452   0.621135  0.281112   0.308854

julia> A .= 0
4×4 SparseMatrixCSC{Float64, Int64} with 15 stored entries:
 0.0  0.0      0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

julia> A .= 0 .* A
4×4 SparseMatrixCSC{Float64, Int64} with 0 stored entries:
              
              
              
              

At least the one thing that a fixed sparsity pattern mode would do is have consistency. I keep thinking I know how this will work but keep finding cases that do something different 😅

@mcabbott
Copy link

Instead of tying this "strict" behaviour to the type (either through some new FixedSparseMatrix type, or by adding some flag to the existing one) might it be possible to tie it to the operation? Something like this:

@fixnz x .= 0 .* x    # alters Broadcast.materialize
@fixnz 0 .* x         # not just in-place
@fixnz copyto!(x, y)
@fixnz f!(x)          # perhaps inserts a check or a fix around unknown `f!`?

How many other cases would need to be handled?

With either mechanism, I guess the model for what happens if the source has numbers where the destination does not is:

copyto!(Diagonal(ones(2)), rand(2,2))  # ArgumentError: cannot set off-diagonal entry
copyto!(Diagonal(ones(2)), 0*@show(x) for x in rand(2,2) if true)  # runs the iterator

@SobhanMP
Copy link
Member

SobhanMP commented Jul 18, 2022

How many other cases would need to be handled?

What about stuff in packages? FixedSparseMatrix has the benefit of always throwing errors

@ViralBShah
Copy link
Member

ViralBShah commented Aug 15, 2022

So how often does this sort of thing happen where a non-structural zero gets introduced? Looking at #201, it is a significant increase in the complexity of our sparse matrix package, with lots of new types (not just one new fixed type). Also, I don't understand the argument of why this has to be in the SparseArrays stdlib. Adding more stuff and complexity into the SparseArrays stdlib means it just makes it harder for us to remove it as an stdlib in the 1.9-1.10 timeframe, while increases compile times for everyone.

@SobhanMP
Copy link
Member

So how often does this sort of thing happen where a non-structural zero gets introduced?

maybe the right question is how many hours get wasted debugging because the sparsity pattern changed.

@ViralBShah
Copy link
Member

Yes, but can't that be avoided by having a separate package? FixedSparseArrays?

@ChrisRackauckas
Copy link
Author

Not easily for this kind of thing. This is exactly the kind of implementation where every time some new primitive operation that could be internally changing the sparsity pattern requires the reviewer to say "this function needs a fixed pattern overload". I guess you could add downstream tests and require authors to make a follow-up PR in a second repo, and not merge until that other PR exists?

So how often does this sort of thing happen where a non-structural zero gets introduced?

With the current broadcast definition? Pretty much all of the time. Pick an algorithm that gets used often with sparse matrices and an implementation in Julia. Newton method in NLsolve? That's using the broadcast pieces that change the sparsity pattern (in fact, it's mutating the same pointer as the input type, so solving the same input twice doesn't give the same answer if there's a sparse matrix). They did notice that some codes were effected by some weird behavior, so they just put an FAQ in there:

https://docs.juliahub.com/NLsolve/KFCNP/4.4.0/#If-the-Jacobian-is-sparse

Note that the Jacobian matrix is not reset across function calls. As a result, you need to be careful and ensure that you don't forget to overwrite all nonzeros elements that could have been initialized by a previous function call. If in doubt, you can clear the sparse matrix at the beginning of the function. If J is the sparse Jacobian, this can be achieved with:

fill!(a, 0)
dropzeros!(a) # if you also want to remove the sparsity pattern

It assumes the user is the issue, but I see a test which actually has it as a broadcast thing that is accidentally fixed by this. This is what sparse matrix support in packages looks like today because of this issue: require the user to reset the sparsity pattern every iteration due to mysterious sparsity pattern changes. It needs a diagnostic tool.

@SobhanMP
Copy link
Member

Yes, but can't that be avoided by having a separate package? FixedSparseArrays?

yes but if we implement all of the ground work here, the rest of the implementation is going to be 30 lines anyways?

while increases compile times for everyone.

i tested the compile time, it's almost the same in main vs so/static_matrix.

with lots of new types (not just one new fixed type)

it's 2 concrete new types and 2 abstract types. We don't even need the abstract type, i think it makes everything nicer to overload on AbstractTypes rather than using traits.

it harder for us to remove it as an stdlib in the 1.9-1.10 timeframe

non of the problem this is going to fix is related to base, unlikely anything in base uses it.

@ViralBShah
Copy link
Member

ViralBShah commented Aug 20, 2022

Separate question, is it possible to do this through a flag in the existing SparseMarrixCSC, or extending the parameters in its type?

More types means more combinations to worry about, more adjoints and transpose types, and interactions with dense etc.

@SobhanMP
Copy link
Member

SobhanMP commented Aug 20, 2022

that would be a breaking change, the reason for the Fixed type was to not have breaking changes

there a few think to take into consideration,

  • SparseMatrixCSC.nzval should return something that is read only
  • getcolptr, should return an ReadOnly. to make it type stable, the flag has to be in the type
  • adding type argument to SparseMatrixCSC would be a breaking change.
  • SparseMatrixCSC{<:AbstractVector{Tv},<:AbstractVector{TI}} would be an alternative but i'm not sure it's a good idea. would break too much
    then _isfixed(SparseMatrixCSC{A,B}) = _isfixed(B)

@SobhanMP
Copy link
Member

More types means more combinations to worry about, more adjoints and transpose types, and interactions with dense etc.

most of these should be generalized to AbstractSparseMatrixCSC 🤷

@SobhanMP
Copy link
Member

Final argument, the functions

  • _absspvec_hcat(X::AbstractSparseVector{Tv,Ti}...)
  • vcat(X::SVorFSV...)
  • _broadcast_zeropres!
  • _broadcast_zeropres!
  • _broadcast_zeropres!
  • _map_zeropres!
  • _map_zeropres!
  • function _map_zeropres!(f::Tf, C::SparseVecOrMat, A::SparseVecOrMat) where Tf
  • function _diffshape_broadcast(f::Tf, A::SparseVecOrMat, Bs::Vararg{SparseVecOrMat,N}) where {Tf,N}
  • function _noshapecheck_map(f::Tf, A::SparseVecOrMat, Bs::Vararg{SparseVecOrMat,N}) where {Tf,N}
  • _noshapecheck_map!(f::Tf, C::SparseVecOrMat, A::SparseVecOrMat, Bs::Vararg{SparseVecOrMat,N}) where {Tf,N} =

use the _is_fixed trait.

all of these function will have to be overwritten because in the hypothetical other package because i can't figure out how to do something like that

f(::Vararg{Real,N}) where N = true
f(::Vararg{Real,N}, ::Float64, ::Vararg{Real,M}) where {N, M} = false

every time someone modifies one of these functions, the package has to reflect them. maintaining it is going to be a pain. If it is here, the burden of not braking them fall on shoulder of the person editing those functions.

@rayegun
Copy link
Member

rayegun commented Aug 21, 2022

Ah I forgot about the Vararg methods (I guess I was only thinking about the N<=2 methods), that is indeed very unfortunate. Doesn't bode well for extensibility in the future.

This means we'll definitely keep the trait. I am still wary of adding the types. Once again not because of code size but because of the issues mentioned before with wanting people to use it, but not wanting to be breaking etc. The trait is internal enough that if we needed to factor it out one day we could.

Let me think about it for a bit. Sorry for the confusion, I thought the factoring out would be easy (or at least easier than impossible 😣)

@SobhanMP
Copy link
Member

SobhanMP commented Aug 21, 2022

arguably the abstract types are just there for convenience. i could re-factor them out with the trait. if that somehow makes the problem better. after all they are redundant, if A is a AbstractFixedCSC (or what ever the name is), _is_fixed(A) has to be true (and vice versa). this makes it possible to dispatch on the type for functions with no var arg and use the trait on functions with var-arg. I can rewrite types using the trait (and remove the abstract types). I still think that marking it as experiment is a good way of not having to worry too much about breaking changes.

@cuihantao
Copy link

I ran into this issue and spent some time debugging before seeing this thread. From a user perspective, remembering to call dropzeros when needed is a lot easier than remembering to check if operations have altered the sparsity pattern.

@rayegun
Copy link
Member

rayegun commented Oct 24, 2022

The upcoming FixedSparseArray type in SparseArrays.jl will help, and SuiteSparseGraphBLAS.jl never drops explicit zeroes. All of my upcoming sparse types will never drop explicit zeroes.

@gdalle
Copy link

gdalle commented Aug 9, 2023

Just encountered the problem in JuliaDiff/ForwardDiff.jl#658

The upcoming FixedSparseArray type in SparseArrays.jl will help

I didn't find it on the repo, could you point to where it lives?

@sjdaines
Copy link
Contributor

Another consequence of this: #454

Should this issue be reopened, or should there be a new issue specifically about unexpected and silent modification of the sparsity pattern, along with documentation updates with warnings and examples?

@ViralBShah
Copy link
Member

Let's keep it all in #454.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants