Skip to content

Conversation

@aplavin
Copy link
Contributor

@aplavin aplavin commented Apr 19, 2019

It is sometimes useful to have a degenerate distribution with a single value for generality, so that not to special-case constant values vs distributions. Let me know if you agree with this addition, then I will implement the remaining functions and add a docstring.

@matbesancon
Copy link
Member

Hi @aplavin have you checked: f4a3e5c
cc @simonbyrne
If this is taken into account, the limit case sigma = 0 should switch back to Constant.
Also, one name that could be used is Dirac for this. Constant could be confused with uniform (as in constant probability over support)

@aplavin
Copy link
Contributor Author

aplavin commented Apr 22, 2019

I haven't seen that commit specifically, but I guessed that all distributions should correctly handle edge cases correctly. E.g. sd = 0 for Normal, Cauchy, TDist and others, or lower = upper for Uniform, should all give a degenerate Dirac-delta distribution. I see two obvious advantages to having a special distribution like this: it should be more performant - basically no computations at all; and one shouldn't have to choose which distribution edge case he wants, so that there is always e.g. Delta(c) instead of Normal(c, 0) in some places, Uniform(c, c) in other, and Cauchy(c, 0) in yet other.
As for possible confusion, I definitely understand the concern with Constant. I would probably prefer Delta to Dirac though.

@matbesancon
Copy link
Member

Dirac might be more appropriate, Delta is just a letter and carries many different meanings.
Agreed on the multiple distribution point

@aplavin
Copy link
Contributor Author

aplavin commented Apr 22, 2019

So, I will rename Constant to Dirac and implement other functions that make sense, right?

@matbesancon
Copy link
Member

Looks good to me. Add all necessary functions, some tests, documentation. Then we will see how to use it for degenerate cases of distributions, ie: Uniform(a, b) = if a == b then Constant(a) else ...

@mschauer
Copy link
Member

Falling back to Dirac can create type instabilities.

@matbesancon
Copy link
Member

matbesancon commented Apr 23, 2019

Good point. @aplavin you can finish this implementation without touching the special Normal and Uniform constructors.

  • rename to Dirac
  • finish implementing the Distribution behaviour for Dirac
  • add include("file.jl") in Distributions.jl
  • add tests
  • add documentation as docstring and include it in the docs

@matbesancon matbesancon changed the title Implement Constant distribution Implement Dirac distribution Apr 23, 2019
@matbesancon
Copy link
Member

ping @aplavin how are you doing on this one? Do you need help on some parts?


eltype(::Dirac{T}) where {T} = T
rand(rng::AbstractRNG, d::Dirac) = d.value
pdf(d::Dirac, x::Real) = x == d.value ? 1 : 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be floating point

Suggested change
pdf(d::Dirac, x::Real) = x == d.value ? 1 : 0
pdf(d::Dirac, x::Real) = x == d.value ? 1.0 : 0.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't pdf(d, NaN) return NaN? Or what if d.value is NaN?

eltype(::Dirac{T}) where {T} = T
rand(rng::AbstractRNG, d::Dirac) = d.value
pdf(d::Dirac, x::Real) = x == d.value ? 1 : 0
cdf(d::Dirac, x::Real) = x < d.value ? 0 : 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdf(d::Dirac, x::Real) = x < d.value ? 0 : 1
cdf(d::Dirac, x::Real) = x < d.value ? 0.0 : 1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same everywhere

A *Dirac distribution* is parametrized by its only value, and takes its value with probability 1.
```julia
d = Dirac(2.) # Dirac distribution with value = 2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use 2.0 rather than 2..
Also, punctuation in comments.

```julia
d = Dirac(2.) # Dirac distribution with value = 2.
rand(d) # Always returns the same value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, punctuation, and the comment isn't aligned with the one above (which I assume is what you wanted).

@matbesancon
Copy link
Member

ping @aplavin
Can you merge master into your branch? There are some conflicts. Also you have a couple comments to address

@matbesancon
Copy link
Member

@aplavin the build is still failing on travis

rand(d) # Always returns the same value
```
"""
struct Dirac{T} <: DiscreteUnivariateDistribution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dirac Delta is a Continuous distribution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't have a density with respect to the Lebesgue measure, so it is not a "continuous probability distribution" (not to mix with distributions in Schwartz' sense.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation ValueSupport can take values Discrete (Samples take discrete Int values) and Continuous (samples take continuous real Float64 values). So in this sense, it seems that the Dirac delta should be Continuous. I think that your interpretation would leave many useful measures completely out of the hierarchy (e.g. mixtures of continuous and discrete such as Spike-and-Slab)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, according to the documentation the trait "Continuous" seems to refer only to the set on which the distribution is defined (the real line), not to the fact that it is absolutely continuous with respect to the Lebesgue measure.

@johnczito johnczito mentioned this pull request Apr 7, 2020
@devmotion
Copy link
Member

What is the status of this PR? It would be very useful to have a Dirac distribution in Turing (see, e.g., TuringLang/Turing.jl#1471 (comment)). Can this PR be updated or should I open a new one?

@mschauer
Copy link
Member

I think technically you cannot update. This is quite progressed, I think we need to check and test some stupid things about NaNs and review the properties below with the literature

 skewness(d::Dirac) = 0.0
 kurtosis(d::Dirac) = 0.0
 entropy(d::Dirac) = 0.0
 mgf(d::Dirac, t) = exp(t * d.value)
 cf(d::Dirac, t) = exp(im * t * d.value)

@mschauer
Copy link
Member

mschauer commented Nov 26, 2020

But if you want to work on this by making a PR starting from this I think this is okay with @aplavin ? (You can also make PRs against this branch)

@aplavin
Copy link
Contributor Author

aplavin commented Nov 26, 2020

Yes I confirm this is totally ok with me. Sorry for abandoning the PR, at that time I was completely unaware of how all the git/branch/pr/pkg interact together and couldn't even reproduce the tests locally. The CI gave a confusing error with unhelpful stacktrace and I just couldn't debug it. Now I kinda-understand how everything works together, but still not quite...

@cossio
Copy link
Contributor

cossio commented Nov 26, 2020

I think entropy(d::Dirac) = -Inf makes more sense @mschauer ?

@devmotion
Copy link
Member

In my opinion, it depends on what distribution you have in mind when talking about a Dirac distribution. For the use case in Turing that popped up yesterday we are interested in the discrete measure mu with mu(A) = x in A ? 1 : 0. In this case naturally pdf(mu, z) = z == x etc. and also entropy(mu) = - pdf(mu, x) * log(pdf(mu, x)) = - 1 * log(1) = 0.

@abraunst
Copy link

In my opinion, it depends on what distribution you have in mind when talking about a Dirac distribution. For the use case in Turing that popped up yesterday we are interested in the discrete measure mu with mu(A) = x in A ? 1 : 0. In this case naturally pdf(mu, z) = z == x etc. and also entropy(mu) = - pdf(mu, x) * log(pdf(mu, x)) = - 1 * log(1) = 0.

I don't think that the name Dirac should be used for the discrete case

@devmotion
Copy link
Member

devmotion commented Nov 26, 2020

I don't think that the name Dirac should be used for the discrete case

I guess this was also the reason why initially in this PR it was called Delta and only changed later to Dirac upon request of the reviewers. However, the term Dirac measure is used for the discrete measure that is implemented in this PR and that I had in mind, see, e.g. https://en.wikipedia.org/wiki/Dirac_measure. Hence in my opinion the name Dirac is appropriate.

@mschauer
Copy link
Member

For the Dirac measure there is no uncertainty, so the entropy must be zero. For a package defining probability distributions, it is pretty much a given that we follow https://en.wikipedia.org/wiki/Dirac_measure in definition and naming.

@abraunst
Copy link

For the Dirac measure there is no uncertainty, so the entropy must be zero. For a package defining probability distributions, it is pretty much a given that we follow https://en.wikipedia.org/wiki/Dirac_measure in definition and naming.

Well... it's pretty much arbitrary that you choose the Dirac on the integers and not on the real line (in the latter case, the only reasonable definition of the entropy is -Inf as e.g. is the limit of concentrating normals). The other choice should be also implemented then, because it is missing and would be useful. I would suggest a different naming then (Dirac on the real line was the meaning given by Dirac itself). :-)

@cossio
Copy link
Contributor

cossio commented Nov 26, 2020

Maybe Kronecker is a better name for the discrete case.

@devmotion
Copy link
Member

devmotion commented Nov 26, 2020

Well... it's pretty much arbitrary that you choose the Dirac on the integers and not on the real line (in the latter case, the only reasonable definition of the entropy is -Inf as e.g. is the limit of concentrating normals). The other choice should be also implemented then, because it is missing and would be useful. I would suggest a different naming then (Dirac on the real line was the meaning given by Dirac itself). :-)

The discrete Dirac measure as defined and discussed in the comments above and on Wikipedia is not specific or limited to integers. The main point here is just which reference measure is used: if it is the counting measure (in case of the discrete definition) or the Lebesgue measure (in the limiting case of a Normal distribution). In contrast to Measures, in Distributions it is not possible to specify it explicitly. I think it is natural to use the counting measure as reference measure here. It is also more useful for practical applications, I assume, since otherwise all values of the logpdf are infinite.

@abraunst
Copy link

Well... it's pretty much arbitrary that you choose the Dirac on the integers and not on the real line (in the latter case, the only reasonable definition of the entropy is -Inf as e.g. is the limit of concentrating normals). The other choice should be also implemented then, because it is missing and would be useful. I would suggest a different naming then (Dirac on the real line was the meaning given by Dirac itself). :-)

The discrete Dirac measure as defined and discussed in the comments above and on Wikipedia is not specific or limited to integers. The main point here is just which reference measure is used: if it is the counting measure (in case of the discrete definition) or the Lebesgue measure (in the limiting case of a Normal distribution). In contrast to Measures, in Distributions it is not possible to specify it explicitly. I think it is natural to use the counting measure as reference measure here. It is also more useful for practical applications, I assume, since otherwise all values of the logpdf are infinite.

I don't follow

  1. you don't need any reference measure to define the Dirac, you only need a set X and a sigma algebra on X. If you define this as <: DiscreteDistribution, you made the choice of X being the integers (at least, according to the current documentation).
  2. the fact that the pdf is not well defined is not a good argument to say that the distribution is not useful. Having the Dirac delta on the reals would be useful (as I argued before, e.g. to construct the spike-and-slab distribution, that is useful in machine learning to model sparse data)

I agree that the discrete case is also useful, I would just avoid calling it Dirac because it would be incompatible with the original use.

@devmotion
Copy link
Member

devmotion commented Nov 26, 2020

you made the choice of X being the integers

No, X can be anything. It is just a special case of DiscreteNonParametric which allows any finite subset of real numbers as support.

And therefore I think the discrete Dirac measure should be a DiscreteDistribution whose pdf etc are defined with respect to the counting measure. Dirac measure is a common term for this distribution and its support is not limited to integers.

What I wanted to point out above with the reference measures was mainly that the different entropy values are not surprising and arise from the different reference measures which induce the "fake" (i.e. non-existing) density of Inf at x and -Inf everywhere else when you talk about the Lebesgue measure and normal distributions (which is also what you end up with if you evaluate pdf(Normal(0, 0), 0)).

@abraunst
Copy link

you made the choice of X being the integers

No, X can be anything. It is just a special case of DiscreteNonParametric which allows any finite subset of real numbers as support.

What you wrote is in contradiction with the documentation of Distributions (also DiscreteNonParametric has the same problem, see my issue #887). From https://juliastats.org/Distributions.jl/latest/types/:

ValueSupport

Distributions.ValueSupport

The ValueSupport sub-types defined in Distributions.jl are:

Type Element type Descriptions
Discrete Int Samples take discrete values
Continuous Float64 Samples take continuous real values

@mschauer
Copy link
Member

mschauer commented Nov 26, 2020

So a bit ex cathedra if you allow:

Definition of a distribution P we use is https://en.wikipedia.org/wiki/Probability_distribution . A random variable which is almost surely constant x has such a distribution, and it is what @devmotion gives P(A) = x in A ? 1 : 0.
This is in probability call "Dirac measure" even if it is a probability measure or distribution, https://en.wikipedia.org/wiki/Dirac_measure.
It is also a discrete distribution (https://en.wikipedia.org/wiki/Discrete_measure), because it there is a finite set, D = {a} for which P(D) = 1. For discrete distributions we give the probability mass function (https://en.wikipedia.org/wiki/Probability_mass_function) , or - the same - density with respect to the counting measure.

Our definitions of DiscreteUnivariateDistribution and CountinousUnivariateDistributions are sloppy, because not every measure is discrete or continuous, https://en.wikipedia.org/wiki/Lebesgue%27s_decomposition_theorem#Refinement, nor each discrete measure integer valued . But for a truly discrete measure we do use DiscreteUnivariateDistribution.

@mschauer
Copy link
Member

mschauer commented Nov 26, 2020

We should leave kurtosis and skewness undefined. These are properties of normalised random variables and one cannot normalise a constant random variable in a meaningful way.

@abraunst
Copy link

abraunst commented Nov 26, 2020 via email

@mschauer
Copy link
Member

As said, things don't match, but in the end, DiscreteUnivariate is supposed to represent discrete distributions and ContinuousUnivariate is supposed to represent absolute continuous distributions (check https://arxiv.org/pdf/1907.08611.pdf, " whether it has discrete or continuous support, corresponding to a density with respect to a counting measure or a Lebesgue measure" expresses this unambiguously.) Why would we put that in as magic constant into the type parameters of an abstract type? We shouldn't, I have argued many times (https://discourse.julialang.org/t/reworking-distributions-jl/32339/22), but alas, it is there, currently.

@abraunst
Copy link

As said, things don't match, but in the end, DiscreteUnivariate is supposed to represent discrete distributions and ContinuousUnivariate is supposed to represent absolute continuous distributions (check https://arxiv.org/pdf/1907.08611.pdf, " whether it has discrete or continuous support, corresponding to a density with respect to a counting measure or a Lebesgue measure" expresses this unambiguously.) Why would we put that in as magic constant into the type parameters of an abstract type? We shouldn't, I have argued many times (https://discourse.julialang.org/t/reworking-distributions-jl/32339/22), but alas, it is there, currently.

Let me be more direct: there is no plan to support distributions that are neither?

@matbesancon
Copy link
Member

replaced by #1231

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants