Feature request: provide L2 norm of each distribution's pdf #806

ablaom · 2019-01-08T01:23:25Z

Wishing for a method l2norm(d::Distribution) that returns the L2 norm of the probability density function for d. Our use case is computing the Brier loss and integrated square loss for machine learning models learning probability distributions: MLJ issue #34

The text was updated successfully, but these errors were encountered:

matbesancon · 2019-04-25T09:44:53Z

@ablaom sorry about the delay.
The best option would be to implement LinearAlgebra.norm for Distributions. PR welcome, but you can do that in your own repo (this is technically type piracy, but if it's not implemented in Distributions, it's fine)

ablaom · 2019-05-02T04:15:15Z

@fkiraly Do you want to suggest a shortlist of distributions for which we want to implement the L2 norm aka LinearAlgebra.norm( •, 2)? Here are all the final types of Distributions.Distribution:

MixtureModel
UnivariateGMM
MvNormal
MvNormalCanon
Arcsine
Bernoulli
Beta
BetaBinomial
BetaPrime
Binomial
Biweight
Categorical
Cauchy
Chi
Chisq
Cosine
Dirichlet
DirichletMultinomial
DiscreteUniform
MvLogNormal
Distributions.GenericMvTDist
EdgeworthMean
EdgeworthSum
EdgeworthZ
EmpiricalUnivariateDistribution
Epanechnikov
Erlang
Exponential
FDist
Frechet
Gamma
GeneralizedExtremeValue
GeneralizedPareto
Geometric
Gumbel
Hypergeometric
InverseGamma
InverseGaussian
InverseWishart
KSDist
KSOneSided
Kolmogorov
Laplace
Levy
LocationScale
LogNormal
Logistic
Multinomial
NegativeBinomial
NoncentralBeta
NoncentralChisq
NoncentralF
FisherNoncentralHypergeometric
WalleniusNoncentralHypergeometric
NoncentralT
Normal
NormalCanon
NormalInverseGaussian
Pareto
Poisson
PoissonBinomial
Rayleigh
Semicircle
Skellam
SymTriangularDist
TDist
TriangularDist
Triweight
Truncated
Uniform
VonMises
VonMisesFisher
Weibull
Wishart

matbesancon · 2019-05-02T07:38:04Z

PR welcome.
Also since this is a new feature, a page in the documentation would be great

fkiraly · 2019-05-02T16:31:37Z

Sure - I'll try to make the shortlist short...

Composites:
MixtureModel
Truncated

Continuous atoms:
Beta
Chi
Chisq
Exponential
Gamma
GeneralizedExtremeValue
GeneralizedPareto
Gumbel
Laplace
Logistic
Normal
Pareto
Uniform
VonMises
Weibull

Discrete atoms:
Bernoulli
Categorical
Dirichlet
DiscreteUniform
Geometric
Hypergeometric
Multinomial
NegativeBinomial
Poisson
Skellam

fkiraly · 2019-05-02T16:38:24Z

Though, having thought about the matter a little bit:
l2-norm is useful - but what would be even more useful is squared integral of pdf between 0 and x, for any x. Same for cdf.

In addition, I'm not sure whether LinearAlgebra.norm(d) is the best way to do this. That is because it's the l2 norm of the pdf, rather than of the distribution - that would be an ill-defined concept.

After all, you could take the l2 norm of any distribution defining function (if exists), pdf, cdf, mgf, etc, and the result would be different. Why should one single out pdf, given that it is merely one of multiple ways to uniquely define a distribution, and given that it doesn't even exist for mixed distributions?

Also, would it not be more natural if something like l2norm(d), or similar, gives the distribution of the random variable |X|_2, where X is distributed according to d? That is, returns a distribution object rather than a number?

I'm not saying that such a behaviour is something I'd particularly be interested in, just that it seems to me like the more natural behaviour of applying functions to distribution objects.

matbesancon · 2019-05-08T08:58:18Z

Ok yes I see your point, in that case a new function could be created, something like dist_norm.
It will have to appear in the documentation

simonbyrne · 2019-05-08T18:02:49Z

I might be misunderstanding, but wouldn't the Brier loss simply be:

function brier_loss(d::Distribution, x0)
    m = mean(d)
    return var(d) + m*(m-2*x0) + x0
end

I'm not opposed to having it here, but it does seem that scoring rules like this could also go in a separate package.

fkiraly · 2019-05-09T08:59:06Z

@simonbyrne - no, that's not the Brier loss.

Brier loss would be

function brier_loss(d::Distribution, x0)
       return -2* pmf(d,x0) + l2norm(d)^2
end

and pdf for absolutely continuous ones.

fkiraly · 2019-05-09T12:17:47Z

@simonbyrne or, is your statement that the two are always, analytically, identical?
(proof?)

fkiraly · 2019-05-09T12:24:21Z

@simonbyrne it's actually not, here's a disproof: your score is linear in x0
The Brier score (as defined by me) is linear in pmf(x0), and doesn't otherwise depend on x0.
Hence the Brier score is in general is not linear in x0, so it can't be identical.

simonbyrne · 2019-05-09T20:30:13Z

Ignore my comment above, I misunderstood what was being proposed.

From what I can tell, Brier score is really only defined for binary outcomes (i.e. Bernoulli predictions). It seems like there would be many ways one could generalize it to other distributions, e.g.

brier_loss(d, x0) = (x0 - mean(d))^2

would be an obvious candidate.

Additionally, from what I can tell, I think your definition off by a linear scaling and offset: e.g. for the Bernoulli case, if x0 == 0, then your expression is equal to -2*(1-p) + p^2 + (1-p)^2 == (-2 + 2p) +p^2 + (p^2 -2p + 1) == 2p^2 -1 (which should be p^2).

simonbyrne · 2019-05-09T20:41:19Z

Another generalization which would have your desired property of depending only on x0 through the pdf, would be

brier_score(d, x0) = (1 - pdf(d, x0))^2

though this obviously wouldn't work for continuous distributions.

fkiraly · 2019-05-10T15:01:17Z

@simonbyrne I think we should be systematic - there are three high-level questions here:
(i) whether the original Brier/squared loss is important/relevant
(ii) whether distributions functionality supporting it should be implemented
(iii) whether your alternatives are reasonable/viable alternatives - this is a theory question rather than an implementation question, primarily.

I think (i) yes, (ii) yes, and (iii) no - and in addition you are probably not too familiar with probabilistic losses, and maybe a little confused. I hope you do not take this too negatively. I'll try to explain theory below.

Regarding (i), I believe this is a standard thing, at least for the classification setting. Brier is implemented by relevant packages such as sklearn and mlr, and it one of multiple standard ways to train, or evaluate probabilistic classification models. For regression models, so are the generalizations, but these are more rare since common packages do not have good probabilistic interfaces (mlj does!)

The answer to (ii) is of course up to Distributions.jl devs to decide, but I was of the impression that their feeling was "yes".

Regarding (iii), I think we need to clarify a few points.
One can't just arbitrarily define losses for probabilistic predictions - they need to be sensibly measuring how close the prediction is to the truth, and properness is one such "reasonable" property.

The alternatives you propose are not proper for the multi-class classification case, or the regression case. For binary classification, it turns out that -2 p(y) + |p|_2^2, and the other (commonly used) expression (1-p(y))² happen to be the same up to scaling and offset, so they measure the same.

But this is not true for the general case - one simple way to see this is that (1-p(y))² doesn't make too much sense for continuous pdf which are not upper bounded by 1.

For the same reason, I believe even in the binary classification case the expression -2 p(y) + |p|_2^2 is more helpful, since that is more or less the only one which generalizes to all relevant cases (except mixed distributions, but that's another story).

simonbyrne · 2019-05-10T16:12:26Z

The answer to (ii) is of course up to Distributions.jl devs to decide, but I was of the impression that their feeling was "yes".

I don't really mind, but my point is that there is nothing really that special about having such rules in Distributions.jl: they could easily go in another package (say ScoringRules.jl, or be incorporated into LossFunctions.jl). Indeed, it is often desirable to work in smaller packages, since it is much easier and quicker to iteratively develop.

In terms of implementation, the main question is how you would handle cases where no closed-form solution exists?

That is a good point about propriety, and your L2 loss definition does match up with Selten characterization.

fkiraly · 2019-05-10T16:39:16Z

@simonbyrne, great that we are on one page regarding theory!

Let's chat about implementation then. If you want to support computation of the key proper scoring rules (or proper losses, with the sign convention more common in ML) for the continuous case, you have two options that I can see:
(i) on-demand symbolic integration based on symbolic expressions that the distributions provide
(ii) the distributions provide explicit expressions for integrals etc that appear in the scoring rules. For squared/Brier, this is the 2-norm of the pdf.

Number (i) is infeasible i.m.o., since you need to write a symbolic computation engine à la Mathematica from scratch, and it seems overkill.

Another alternative, numerical integration, isn't really one (I would argue), since it can be arbitrarily wrong: that's not only problematic if you have to do it for each data point, it's also a no-go if you want to use the losses for unbiased evaluation (where unbiased is not in the statistical, but in the empirical sense).

simonbyrne · 2019-05-10T16:47:27Z

I think it is fair to say that symbolic operations are far out of scope of this package.

We could certainly have a pdfnorm(d::Distribution) (or something similar) function for functions where it can be calculated easily (it would throw a MethodError in other cases).

fkiraly · 2019-05-10T16:48:23Z

yes, exactly!

matbesancon added the new-func label May 8, 2019

simonbyrne added the up for grabs label May 10, 2019

ablaom mentioned this issue Aug 19, 2020

Add BrierScore for Continuous targets and assorted Distributions JuliaAI/MLJBase.jl#395

Closed

ablaom mentioned this issue Oct 14, 2020

Meta issue: lssues for possible collaboration with UCL JuliaAI/MLJ.jl#673

Closed

7 tasks

giordano mentioned this issue May 5, 2021

Add function to compute L2 norm of distributions #1321

Merged

giordano mentioned this issue Jun 7, 2021

Add squared L2 norms of some discrete distributions #1340

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: provide L2 norm of each distribution's pdf #806

Feature request: provide L2 norm of each distribution's pdf #806

ablaom commented Jan 8, 2019 •

edited

Loading

matbesancon commented Apr 25, 2019

ablaom commented May 2, 2019

matbesancon commented May 2, 2019

fkiraly commented May 2, 2019

fkiraly commented May 2, 2019 •

edited

Loading

matbesancon commented May 8, 2019

simonbyrne commented May 8, 2019

fkiraly commented May 9, 2019 •

edited

Loading

fkiraly commented May 9, 2019 •

edited

Loading

fkiraly commented May 9, 2019

simonbyrne commented May 9, 2019 •

edited

Loading

simonbyrne commented May 9, 2019 •

edited

Loading

fkiraly commented May 10, 2019 •

edited

Loading

simonbyrne commented May 10, 2019

fkiraly commented May 10, 2019 •

edited

Loading

simonbyrne commented May 10, 2019

fkiraly commented May 10, 2019

Feature request: provide L2 norm of each distribution's pdf #806

Feature request: provide L2 norm of each distribution's pdf #806

Comments

ablaom commented Jan 8, 2019 • edited Loading

matbesancon commented Apr 25, 2019

ablaom commented May 2, 2019

matbesancon commented May 2, 2019

fkiraly commented May 2, 2019

fkiraly commented May 2, 2019 • edited Loading

matbesancon commented May 8, 2019

simonbyrne commented May 8, 2019

fkiraly commented May 9, 2019 • edited Loading

fkiraly commented May 9, 2019 • edited Loading

fkiraly commented May 9, 2019

simonbyrne commented May 9, 2019 • edited Loading

simonbyrne commented May 9, 2019 • edited Loading

fkiraly commented May 10, 2019 • edited Loading

simonbyrne commented May 10, 2019

fkiraly commented May 10, 2019 • edited Loading

simonbyrne commented May 10, 2019

fkiraly commented May 10, 2019

ablaom commented Jan 8, 2019 •

edited

Loading

fkiraly commented May 2, 2019 •

edited

Loading

fkiraly commented May 9, 2019 •

edited

Loading

fkiraly commented May 9, 2019 •

edited

Loading

simonbyrne commented May 9, 2019 •

edited

Loading

simonbyrne commented May 9, 2019 •

edited

Loading

fkiraly commented May 10, 2019 •

edited

Loading

fkiraly commented May 10, 2019 •

edited

Loading