Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change handling of sparse backends #38

Closed
gdalle opened this issue Apr 10, 2024 · 17 comments · Fixed by #40
Closed

Change handling of sparse backends #38

gdalle opened this issue Apr 10, 2024 · 17 comments · Fixed by #40

Comments

@gdalle
Copy link
Collaborator

gdalle commented Apr 10, 2024

Problems:

Suggested solutions (instead of #37):

  • Replace AutoSparseBackend with a wrapper struct AutoSparse(AutoBackend)
  • Add fields to this wrapper struct for sparse-specific ingredients
struct AutoSparse{B<:AbstractADType,S,C} <: AbstractADType
    backend::B
    sparsity_detector::S
    coloring_algorithm::C
end

To preserve backwards-compatibility we can try to define shortcuts like the following, to be removed in v0.3.0:

const AutoSparseBackend = AutoSparse{<:AutoBackend}

function AutoSparseBackend(args...; sparsity_detector, coloring_algorithm, kwargs...)
    backend = AutoBackend(args...; kwargs...)
    return AutoSparse(backend, sparsity_detector, coloring_algorithm)
end

However I'm not 100% sure we can keep this from being breaking.

@Vaibhavdixit02
Copy link
Member

Vaibhavdixit02 commented Apr 10, 2024

Definitely agree that the current way of AutoSparse* is pretty tedious and not modular.

I'd keep the name something else since AutoSparseBackend would make it appear like a choice of AD backend so just Sparse(args...) seems nicer?

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

I thought about that but Sparse is a bit generic, hence the addition of Auto. Besides, we would actually have

AutoSparse <: AbstractADType

so it seems coherent.

@Vaibhavdixit02
Copy link
Member

I see, that seems reasonable

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

@avik-pal @ChrisRackauckas any thoughts? It's mostly SciML folks who are using this feature so far

@Vaibhavdixit02
Copy link
Member

One more option AutoSparsifyBackend for the constructor? 😅

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

just Sparse(args...) seems nicer?

The killer argument is that we will export that name, and we cannot decently export something named Sparse

One more option AutoSparsifyBackend for the constructor? 😅

What do you mean?

@ChrisRackauckas
Copy link
Member

AutoSparse <: AbstractADType is a very good middle ground IMO.

And there's an interconnected piece here. Though it would take quite a bit of work to actually make SparseDiffTools work with all AD backends in its current form. But with DI, DI could take over what is the front end of SparseDiffTools and redeisgn the color Jacobian functions to not be forwarddiff exclusive but instead call the DI high level chunked modes, and then all backends that support that would then be color differentiation compatible. Ultimately that's the best solution here, and no one has had the time to pull it off but @gdalle seems like you have the time to finally put the polishing steps on this and see it, and indeed that's the right thing to do here.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

My vision is that there are 3 phases to sparse AD, and they can (should) be separated:

The first step is this ADTypes revamp

@ChrisRackauckas
Copy link
Member

Yes, that vision is correct. We've just been scraped for hands so... welcome aboard!

Sparsity pattern detection: currently done by Symbolics, possibly 50x faster with SparseConnectivityTracer (see benchmarks at adrhill/SparseConnectivityTracer.jl#4)

Nice! Yes, this has been a bit of a pain for us. This should probably be in JuliaDiff?

Coloring: currently done by SparseDiffTools

Yup and that's ultimately all it should be.

Jacobian evaluation: currently done by SparseDiffTools, should be done by DifferentiationInterface in chunked mode to be backend-agnostic

Agreed.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

Ultimately everything should be in JuliaDiff, but @adrhill and I are iterating very fast so it's easier to keep it local. I fully intend to transfer DI to JuliaDiff eventually

@avik-pal
Copy link
Member

Sparsity pattern detection: currently done by Symbolics, possibly 50x faster with SparseConnectivityTracer (see benchmarks at adrhill/SparseConnectivityTracer.jl#4)

This is based on input propagation and won't work (give incorrect pattern) for control flow or branching that affects sparsity patterns, right? I am not familiar with how symbolics works, but from Shashi's paper, I thought that is one of the challenges it is addressing.

Regardless this seems like a more principled way to alteast replace the approx sparse methods in sparsedifftools.

@ChrisRackauckas
Copy link
Member

This is based on input propagation and won't work (give incorrect pattern) for control flow or branching that affects sparsity patterns, right? I am not familiar with how symbolics works, but from Shashi's paper, I thought that is one of the challenges it is addressing.

SparsityDetection.jl had extra things for the union sparsity, but that had maintenance troubles of Cassette and those got dropped. The simplified version then was to Symbolics trace, which then became what we had today. A simpler tracer is following that evolution in a fine way.

Yes, it is limited, but it's at least correct and errors when not possible. Not ideal but tends to work in most use cases on model code, which is good enough.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

This is based on input propagation and won't work (give incorrect pattern) for control flow or branching that affects sparsity patterns, right?

Yes indeed, this assumes the same control flow throughout subsequent function executions. In that way it is coherent with DifferentiationInterface semantics (otherwise we couldn't e.g. build a ReverseDiff tape during the preparation step).

that had maintenance troubles of Cassette and those got dropped.

It was kind of our original idea : redo the "Sparsity Programming" paper with a much more lightweight approach. Maybe not as powerful, but a hell of a lot faster

@adrhill
Copy link
Contributor

adrhill commented Apr 10, 2024

This is based on input propagation and won't work (give incorrect pattern) for control flow or branching that affects sparsity patterns, right?

We could in theory provide a second dual number tracer that uses primal values for control flow / branches.

Currently, SparseConnectivityTracer returns input-output connectivities, which are a conservative estimate of the Jacobian sparsity. We could be a bit smarter in our operator overloading for functions like round that return zero-derivatives.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

I don't think we can handle control flow, there is no way to do this with operator overloading alone.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 10, 2024

That's also why StochasticAD for instance has to rely on workarounds to support if/else

https://gaurav-arya.github.io/StochasticAD.jl/dev/limitations.html

@ChrisRackauckas
Copy link
Member

Yes, it requires global not local information, so it's simply not possible to do with operator overloading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants