-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate UnivariateFinite (for categorical distributions) out to new package #504
Comments
cc @OkonSamuel |
@cscherrer might also be interested in this. |
While this is correct for |
Thanks @DilumAluthge . I guess it will get a little complicated since we're moving away from Distributions.jl, but I think we'll still be able to |
Yes, Distributions is a hefty dependency. The new package could probably avoid |
I guess it would be easier if Distributions.jl has a separate package containing basic functions which we can extend |
This was suggested and discussed in JuliaStats/Distributions.jl#1139 but it seems it was not supported by the maintainers. |
Yes, but it's so widely used I don't see much getting around it. We actually have it as a dependency for MeasureTheory anyway, and use it for some of the sampling routines. I'd love if this can be made nicely compatible with MeasureTheory, if it's not already. Here are some guidelines for that:
These are just off the top of my head, and partly for my notes as well. |
CategoricalDistributions.jl really sounds like a narrow scope for a package. And IIUC it would define a lot of the Distributions.jl API (like Why can't |
I think this issue should be taken a bit broader than categorical distributions only, but my overall observation is that Distributions has become very hard to maintain because it is a central utility with few and rarely available maintainers. Most people just drop by for an issue or PR but don't want (understandably) to get involved in its maintenance, and many former maintainers are not involved anymore. I agree with @nalimilan that splitting a package just for categorical distributions would be too fragmenting, but splitting a DistributionsBase, and possibly using this opportunity to clean the house, should be considered |
There were conversations but resistance. They said "you can just have Distributions as a dependency and overload..." I think the main objection was to include anything with a sample space that was not |
(or Float64 / Int themselves). But overall I agree that this has been one of the biggest blocking points in Distributions, because some parts of the code base were non-trivial to adapt to something more generic. That being said, the type parameters of distributions can be extended (VariateForm and ValueSupport) |
This is also my impression and, if true, a real issue for Julia, in my opinion. The concept of a distribution extends far beyond what is in Distributions.jl (objects sampled using Monte-Carlo-Markov-Chain, pdfs on manifolds, priors for tree-like parameter spaces, etc). It may not make sense to add to what is there, but without a base API package it's very messy to extend. |
Yes, that is what we currently do in MLJ. But it means having Distributions.jl as a dependency in places we'd rather not. We had to perform some acrobatics to create our light-weight MLJModelInterface, partly for this reason. |
Thanks to contributors to this discussion. Update: I think I should like to proceed with moving this functionality to new package, dependent on Distributions for Some technical details for myself or whoever takes this on. Currently the super-lightweight MLJModelInterface defines a method only, Moving forward, no changes are made to MLJModelInterface. The new package will not depend on MLJ in any way, and in particular does not extend the constructor from MLJModelInterface. Rather:
In principle the renaming is not necessary, but it simplifies the explanation and makes sense given the recent generalization. However, I'm open to suggestions re the naming/renaming question. |
Another reason to do this: https://github.com/JuliaAI/ScientificTypes.jl/issues/142 |
In line with #416, I propose we move
UnivariateFinite
out to a new package calledCategoricalDistributions.jl
.If this were okay with the current host of MLJBase.jl (I need to check this @vollmersj) it might make sense for this package to live at JuliaData (host of CategoricalArrays.jl) or JuliaStats (host of Distrtibutions.jl). I wonder what curators of those organisations think of that idea?
@nalimilan @bkamins @andreasnoack @devmotion @matbesancon
Recall that
UnivariateFinite
consists of the following:A composite type
UnivariateFinite{S,V,R,P<:Real}
for encoding the probability distribution associated with a finite labelled set of points, as opposed to the distributionCategorical
from Distributions.jl, whose sample space is always a collection of integers. The sample space of aUnivariateFinite
instance is aCategoricalPool
object from CategoricalArrays.jl.Implementation of relevant parts of the Distributions.jl API, including
rand
,pdf
,logpdf
support
,params
,mode
, andfit
(which fits to aCatgoricalVector
).A wrapper
UnivariateFiniteArray
for arrays of such objects (sharing a common sample space / pool). This type, implementing theAbstractArray
API, is optimised for fast indexing, and for broadcasting ofpdf
, andlogpdf
(which turned out to be essential in our applications to machine learning).A fairly elaborate constructor for
UnivariateFiniteArray
objects from matrices of probabilities. See this docstringTechnical note. I'm hoping this migration should be fairly painless but there is one issue to be aware of: Currently the
UnivariateFinite
constructor stub lives in MLJModelInterface but the type and all real functionality lives in MLJBase (which depends on MLJModelInterface). The reason for this was to keep MLJModelInterface (the sole dependency of third party packages inplementing MLJ's model API) super lightweight. So this needs sorting out.The text was updated successfully, but these errors were encountered: