Add fit_mle methods #1670

dmetivie · 2023-01-31T20:28:26Z

Hello,
I open this PR after asking a question on discourse and an issue (more like feature enhancement) here. I was not too convinced by the answer on Discourse, so I decided to show you directly what I meant in this PR.

Here are the additions of this PR

Main one add fit_mle(dist::Distribution, x) instead of just fit_mle(::Type{<:Distribution}, x).
Why?
- Why not: From the tests I ran, it does not break anything. If fit_mle(::Type{<:Distribution}, x) is available, then it will use it.
- It allows to fully use multiple dispatch of fit_mle as all “structure parameters” are now stored in dist and not as extra parameters. E.g. fit_mle(Binomial(n, 0.1), x) is fit_mle(Binomial, ntrials(Binomial(n, 0.1)), x). Of course, the 0.1 is not used and is kind of useless here.
- This dist can be used when needed as a starting point for a maximum likelihood estimation.
- This could maybe benefit many downstream pkg using fit_mle for complex distributions like Copulas.jl, ExpectationMaximization.jl or HMMBase.jl.
Point 1. allows adding very easily a fit_mle method for Product distributions.
fit_mle(product_distribution([Exponential(0.5), Normal(11.3, 3.2)]),x) is simply product_distribution([fit_mle(Exponential, x[1,:]), fit_mle(Normal, x[2, :])])
I am not too sure how it would be done only with types. Thought this seems to be done in Copulas.jl (notation becomes quite heavy), but again not sure how one would infer the components of a mixtures distribution or a product distribution.
Add fit_mle(Laplace, x, w) using the weighted median
Add fit_mle(Dirac, x[,w]) (why not).
I added test for all the new things and doc.
I added in the estimation of mixture section a ref to my pkg ExpectationMaximization.jl which fits mixture models (and thanks to fit_mle(dist::Distribution, x) is able to use EM algorithm on very generic distribution e.g., mixture of mixtures).

I tried to follow as much as I understood the contributions guidelines (my VSCode formatter changed quite a bit of things).

…tributions.jl into add-fit_mle-methods

devmotion

Hej! Thank you for taking the time and putting together a PR. Before diving into more detailed discussions, I think it would be good if you could revert the formatting changes. Additionally, I think it would be good to move a few parts to separate PRs as they do not involve any design questions (see my comments below).

src/multivariate/product.jl

src/univariate/continuous/laplace.jl

devmotion · 2023-01-31T22:57:49Z

src/univariate/discrete/binomial.jl

@@ -196,6 +196,7 @@ suffstats(::Type{T}, data::BinomData, w::AbstractArray{<:Real}) where {T<:Binomi

 fit_mle(::Type{T}, ss::BinomialStats) where {T<:Binomial} = T(ss.n, ss.ns / (ss.ne * ss.n))

+fit_mle(d::T, x::AbstractArray{<:Integer}) where {T<:Binomial} = fit_mle(T, suffstats(T, ntrials(d), x))


I'm generally not a big fan of supporting both instances and types if it is not really needed (https://docs.julialang.org/en/v1/manual/style-guide/#Avoid-confusion-about-whether-something-is-an-instance-or-a-type). In this line here, I'm not sure if there is a very strong argument for adding the method based on the instance since the same functionality already exists. More generally, I think it could be confusing to know which parameters of d are fixed and which ones are optimized with MLE.

The arguments are the ones I made before.
Mostly, it allows using fit_mle(dist, x) in a very generic code without having to care about extra arguments for some distributions like ProductDistribution, Binomial and Categorical.

For me, the point is not really adding a functionality to Binomial but all that comes with the form fit_mle(dist, x). The only cost is some possible minor confusion.
Docs should be clear on the difference between the instance and types difference. I tried typing something in that sense.

src/univariate/discrete/categorical.jl

src/univariate/discrete/dirac.jl

…tributions.jl into add-fit_mle-methods

devmotion

I added some more comments to explain my reservations. Currently, my opinion is that the suggestions in this PR would lead to confusing behaviour that is difficult to predict, without a clear benefit. I'm also not completely convinced by the definitions for product distributions as they are not very flexible and make some seemingly strong assumptions such as e.g. that the same arguments are provided to all fits.

Therefore currently I would suggest not merging this PR.

docs/src/fit.md

devmotion · 2023-02-01T21:16:51Z

docs/src/fit.md

+fit_mle(Binomial(n, 0.1), x) 
+# equivalent to fit_mle(Binomial, ntrials(Binomial(n, 0.1)), x), here the parameter 0.1 is not used
+
+fit_mle(Categorical(p), x) 
+# equivalent to fit_mle(Categorical, ncategories(Categorical(p)), x), here the only the length of p is used not its values
+
+d = product_distribution([Exponential(0.5), Normal(11.3, 3.2)])
+fit_mle(d, x) 
+# equivalent to product_distribution([fit_mle(Exponential, x[1,:]), fit_mle(Normal, x[2, :])]). Again parameters of d are not used.


IMO this different behaviour of the distribution instances is very confusing and exactly shows the problem I had in mind in https://github.com/JuliaStats/Distributions.jl/pull/1670/files#r1092571161. There's no clear and obvious way to tell which properties of the distribution will be fixed when fitting. And even in the case of the product distribution the resulting code is not (significantly) shorter than the current code.

For Categorical the current fit_mle(Categorical, x) = fit_mle(Categorical, max(x), x) could also lead to confusion (and error in really unlucky case where there are 0 samples of the last category) and not so different of fit_mle(Categorical, ncategories(Categorical(p)), x).

It is not about code length. See my comment below.

devmotion · 2023-02-01T21:17:31Z

docs/src/fit.md

+```
+
+Note that for standard distributions, the values of the distribution parameters `d` are not used in `fit_mle` only the “structure” of `d` is passed into `fit_mle`.
+However, for complex Maximum Likelihood estimation requiring optimization, e.g., EM algorithm, one could use `D` as an initial guess.


This is yet another behaviour and IMO adds to the confusion.

src/genericfit.jl

src/multivariate/product.jl

devmotion · 2023-02-01T21:24:28Z

src/multivariate/product.jl

+function fit_mle(g::Product, x::AbstractMatrix, args...)
+    d = size(x, 1)
+    length(g) == d || throw(DimensionMismatch("The dimensions of g and x are inconsistent."))
+    return product_distribution([fit_mle(g.v[s], y, args...) for (s, y) in enumerate(eachrow(x))])


In addition to what I mentioned above, I think there's another issue here: It is not possible to pass different arguments/options to the fits of the different components, which makes this approach much less flexible than just fitting each component manually.

I wondered about that: is there a way to add kwargs = [kwargs[i] for i in 1:d] and apply them to each distribution?
About the arguments, I believe in most situation you have a sample and fit the whole ProductDistribution at once.
Hence, IMO the code above would correspond to the most typical situation. If you think about a situation where some distributions have weighted samples and some other not, one could either try to define a method for this case or as you said do it manually.

devmotion · 2023-02-01T21:24:51Z

src/multivariate/product.jl

+params(g::Product) = params.(g.v)
+


This seems unrelated:

Suggested change

params(g::Product) = params.(g.v)

devmotion · 2023-02-01T21:25:04Z

src/product.jl

+params(d::ArrayOfUnivariateDistribution) = params.(d.dists)
+


Suggested change

params(d::ArrayOfUnivariateDistribution) = params.(d.dists)

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

dmetivie · 2023-02-02T13:04:01Z

Thanks for giving thoughts and time.
The main benefits of this PR are not about shorter code. I find that the instance version of fit_mle convey more information than the type version. In Distributions.jl advantages are not super obvious (except for ProductDistributions) since for simple univariate of multivariate normal MLE is known exactly.
The instance version allows writing the EM algorithm very simply for all kind of distributions see here sure I am very bias about that.
I feel that with types only we are missing something:

How would one code fit_mle for product distributions with types only? IMO fit_mle for product distribution is a very neat feature (even if it is not hard to code).
How would you extract the types of a mixture with only the type version of fit_mle (or another similar function)

lrnv · 2023-03-23T17:04:36Z

Although i really have no oppinon on the confusion argument, i think that a better solution would be to parametrize mixtures distributions:

MixtureModel{Gamma,Normal,Binomial} would be the type, and not only MixtureModel. This would be coherent with the current behavior of ProductDistribution, which is parametrized by types. But I agree that this is probably a more dificult change to implement (might even be considered breaking ?).

ParadaCarleton · 2023-09-06T19:48:29Z

How would you extract the types of a mixture with only the type version of fit_mle (or another similar function)

Hmm, maybe the types of a mixture should also include how many of each component is included? e.g.

Mixture{MixtureComponent{Normal, 2}, MixtureComponent{Gamma, 3}}

dmetivie added 11 commits January 24, 2023 15:54

add fit_mle(dt::D, arg) method

a059f39

add fit_mle for product

bb4bf69

add weighted fit_mle for Laplace

168efa2

add fit_mle for Dirac

2fdb322

add lin to EM pkg to fit mixture

c198e7f

add docs for new fit_mle

81fcffe

add tests and fit_mle

12722ca

add docs for new fit_mle + build docs

a96582e

tests new features + fit_mle binom and categorical

c310bc3

Merge branch 'add-fit_mle-methods' of https://github.com/dmetivie/Dis…

95f13d5

…tributions.jl into add-fit_mle-methods

doc ref Product

f933504

devmotion reviewed Jan 31, 2023

View reviewed changes

dmetivie added 5 commits February 1, 2023 10:59

unformat

fe9c3c4

unformat

07433f2

Merge branch 'add-fit_mle-methods' of https://github.com/dmetivie/Dis…

22bb43b

…tributions.jl into add-fit_mle-methods

mv Product to ProductDistr rm Laplace, Dirac

dd4629d

add test

d892d98

devmotion reviewed Feb 1, 2023

View reviewed changes

dmetivie and others added 3 commits February 2, 2023 13:08

Update docs/src/fit.md

2f8139f

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

Update src/genericfit.jl

da8929f

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

Update src/multivariate/product.jl

1a57ad4

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

dmetivie mentioned this pull request Mar 23, 2023

Interface compatibility with Distributions.jl ? dmetivie/ExpectationMaximization.jl#8

Open

lrnv mentioned this pull request Apr 3, 2023

Methods on MixtureModel actually target a mixture distribution. #1669

Open

timholy mentioned this pull request Jan 13, 2024

fit_mle(Type, y) vs fit_mle(Distribution, y) #1574

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fit_mle methods #1670

Add fit_mle methods #1670

dmetivie commented Jan 31, 2023

devmotion left a comment

devmotion Jan 31, 2023

dmetivie Feb 1, 2023 •

edited

devmotion left a comment

devmotion Feb 1, 2023

dmetivie Feb 2, 2023

devmotion Feb 1, 2023

devmotion Feb 1, 2023

dmetivie Feb 2, 2023

devmotion Feb 1, 2023

devmotion Feb 1, 2023

dmetivie commented Feb 2, 2023

lrnv commented Mar 23, 2023

ParadaCarleton commented Sep 6, 2023

		@@ -196,6 +196,7 @@ suffstats(::Type{T}, data::BinomData, w::AbstractArray{<:Real}) where {T<:Binomi

		fit_mle(::Type{T}, ss::BinomialStats) where {T<:Binomial} = T(ss.n, ss.ns / (ss.ne * ss.n))

		fit_mle(d::T, x::AbstractArray{<:Integer}) where {T<:Binomial} = fit_mle(T, suffstats(T, ntrials(d), x))

Add fit_mle methods #1670

Are you sure you want to change the base?

Add fit_mle methods #1670

Conversation

dmetivie commented Jan 31, 2023

devmotion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmetivie Feb 1, 2023 • edited

Choose a reason for hiding this comment

devmotion left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmetivie commented Feb 2, 2023

lrnv commented Mar 23, 2023

ParadaCarleton commented Sep 6, 2023

dmetivie Feb 1, 2023 •

edited