Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Probabilities should be given in the log domain; double's don't provide enough fidelity #41

Closed
mikeizbicki opened this Issue · 6 comments

2 participants

@mikeizbicki

For example, the function,

density :: d -> Double -> Double

should really be of type

density :: d -> Double -> LogFloat

That way, if I ask for the density at 1E-1000000000 on a normal distribution with mean and stddev 1, I still get a meaningful result instead of just 0.

I can't think of any good reason not to be using probabilities in the log domain, since all of them will be in the range [0,1].

I'm implementing a bunch of machine learning algorithms that require probability distributions, and I'd like to be able to use your library, but I won't be able to unless this is changed. Also, I've implemented some code for estimating these distributions from data, and would be able to add it to your library if this were changed as well.

@Shimuuar
Collaborator
@mikeizbicki

I also think having logQuantile :: d -> LogFloat -> Double would be useful.

If I whip up a patch adding these functions to the class, would you incorporate it? I could add implementations for the normal distribution, but I doubt I know enough about the other distributions to do it quickly for them. I think creating a simple LogX version of each of the classes would work for that, but it might add too much clutter to the interface.

@Shimuuar Shimuuar referenced this issue from a commit in Shimuuar/statistics
@Shimuuar Shimuuar Add density and probability in the log domain
They are added as another method in DiscreteDistr and
ContDistr type classes.

Issue #41 requested this
936f7a4
@Shimuuar
Collaborator

logQuantile is more problematic. There are two reasons:

  1. It does stretch range for probabilities near zero but it doesn't for p near 1. So another function is needed to gain symmetry.
  2. Quantiles rarely have closed form expression and usually obtained by equation solving. It's not done in log domain. And implementing them in the way that doen't lose precision is far from trivial.
@Shimuuar
Collaborator

I've added log-domain versions for density and probability. log-density branch in my fork. New methods have default implementation so everything should work. Many distributions allow more efficient implementations but I don't have time to write them right now. Will gladly accept patches though

@Shimuuar Shimuuar referenced this issue from a commit in Shimuuar/statistics
@Shimuuar Shimuuar Add logarithm of probablity density
Both density and logDensity could be expressed terms of each other
so default implementations are added.

Currently none of the distributions have good implementations of
logDensity since they use default implementation.

Affects #41
63b0022
@Shimuuar
Collaborator

Time to clean up pending issues.

I think it's quite reasonable to add logDensity to ContDistr type class and did just so. It's uses Double for log-domain numbers but I'm a bit wary of adding non-platform dependencies since criterion could be proposed for platform.

Currently all distributions use default implementation of logDensity. I'll fix it but I don't have time right now.

I left out quantile for now.

@Shimuuar
Collaborator

Fixed as of 866c4182684d02680a10f7348f8f6bb3b114194f1

@Shimuuar Shimuuar closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.