-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: DistributionsBase.jl for packages defining custom distributions #1139
Comments
What's the benefit to the interface being in a separate package? Is the idea that a MyNicheDistribution.jl package would not want to depend on Distributions.jl because it's a large dependency? (Is this a large dependency? does it take a long time to compile?) Or is there another advantage to a separate minimal "base"/"interface" package? |
Yes, Distributions.jl is a somewhat large dependency by necessity (eg ultimately a lot of special functions come from https://github.com/JuliaStats/Rmath.jl, which is a binary dependency, too). A lot of distributions could be much more lightweight. |
An additional benefit for doing something similar in Tables.jl was clarifying all details of the API. |
A thing is its often hard to define more niche Distributions without depending on Distributions anyway. Another thing to bear in mind is it is desirable for Distributions.jl to be a one-stop-shop for all of your distributions. |
Just curious - what's the latest here? I recently heard a couple of people echoing the sentiment for this sort of refactoring. |
My understanding is that for this particular package, this does not have the support of maintainers so I didn't start any work on it. Generally I still think that factoring out a generic API for packages like this at some point in their lifecycle is the best approach. |
Wondering if @simonbyrne and @andreasnoack can chime in. |
Some design choices in Distributions can cause problems, like type constraints on constructors (making it unusable with Symbolics) and inconsistent argument checking in the form of the At this point it's difficult to change the library itself; I've seen perfectly reasonable discussions devolve into bikeshedding. That's not meant to accuse anyone of any bad behavior; it's a natural consequence of a large codebase with a wide range of use-cases. It's important for the Julia ecosystem to find a balance between standards and agility, and Distributions.jl is very heavily on the side of the former. As a result, there's a strong tendency to hack around its limitations. It has become so heavy-weight that in many cases people start a new package rather than try to update it. This seems to be the case with DistributionsAD.jl, and is certainly the case for MeasureTheory.jl. I can see a great value in factoring on a DistributionsBase. Ideally, there could even be a way to allow parameterized distributions like we have in MeasureTheory, and make it more natural to build libraries like this as extensions instead of independent projects. |
We could call it DistributionsInterface, to stress that's ist mostly about function definitions, not concrete implementations. But whatever the name, I think having definitions of pdf-, cdf- and similar functions in a lightweight package would be extremely helpful. |
Can we at least start by moving some of the abstract types to a package? Things like SciML generally don't need all of the distributions defined but just need to dispatch on |
This could also remove some function name redundancy between MeasureBase/MeasureTheory and Distributions, since MeasureBase could depend on DistributionsInterface. |
I'm in favor of extracting an interface (see #1641 (review)) but I think one should make sure it is clearly defined and consider that there are multiple things that it seems would be good to change in a next breaking release of Distributions.
I looked around a bit but it seems e.g. DifferenceEquations.jl, DiffEqFlux.jl, and DiffEqBayes.jl all use concrete distributions such as |
The point isn't DiffEqFlux and DiffEqBayes. The problem is that currentlly |
DiffEqBase and QuasiMonteCarlo are the two I'm thinking about. They want to dispatch on "if I see a Distribution", but don't need any distributions themselves. And as Oscar says, it's a pretty big hit to support that right now, and that 0.8 seconds can be approximately 0. |
Oscar's example doesn't use This gets a bit off-topic: I didn't know this functionality exists and honestly I am a bit surprised that basically all problem types support distributions as initial values - I wouldn't have thought, e.g., that's supported for e.g. ODEs (and it's not documented in https://docs.sciml.ai/DiffEqDocs/stable/types/ode_types/#SciMLBase.ODEProblem it seems, according to the docstring |
This use case may be (partially) solvable using the trait-based DensityInterface.jl API, which both Distributions.jl and MeasureBase.jl/MeasureTheory.jl support already. Independent of that, I still think and AbstractDistributions.jl would be very useful. (Note: In DensityInterface.jl, |
It came before those featurse, but indeed we could deprecate it. It is off topic though and we should discuss this in DiffEqBase. |
Distributions.jl is a high-quality implementation of many commonly used distributions, benefiting from continuous contributions and peer review from the members of the Julia community.
While it is natural to contribute commonly used distributions to this package, some distributions may not be generic enough to warrant this (eg distributions used in a very narrow subfield, invented for a specific application and not in widespread use yet, etc). We should nevertheless make it easy for distributions living in other packages to share the API without incurring the cost of dependencies that are used by Distributions.jl. Of course, distributions from such packages could be migrated to this one later on if necessary.
I am proposing that a minimal API is extracted to a small, lightweight package, which could be called DistributionsBase.jl, defining the
functions
cdf
,pdf
, ... that operate on Distributions (from the current export list,a
@reexport_DistributionsBase
macro that reexports these, for use by packages to make it easy to maintain a consistent common exported interface.I am not sure that I would export the type hierarchy in the first pass, as I don't think would be used commonly by packages defining custom distributions. In any case, I would start with the bare minimum, more can be added on demand later.
(related: #525)
The text was updated successfully, but these errors were encountered: