Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of floating point precision in PhiAccrualFailureDetector #1821

Closed
robdavid opened this issue Nov 6, 2013 · 3 comments
Closed

Loss of floating point precision in PhiAccrualFailureDetector #1821

robdavid opened this issue Nov 6, 2013 · 3 comments

Comments

@robdavid
Copy link

robdavid commented Nov 6, 2013

I have been testing Akka on a resource constrained cluster of multiple JVMs on a single physical machine, with the result that heartbeat arrival times can vary widely, due, presumably, to GC's running a lot of the time. I tried to set a high value of the phi threshold to cope with this issue, but found this was ineffective. So I grabbed a copy of the PhiAccrualFailureDetector and added some logging to try to figure out why. I found that values of phi calculated by PhiAccrualFailureDetector would suddenly jump from approx. 15 to infinity.

In the PhiAccrualFailureDetector, the calculation of Phi makes use of a function that calculates the cumulative distribution function for a given point on the bell curve:

/**
   * Cumulative distribution function for N(mean, stdDeviation) normal distribution.
   * This is an approximation defined in β Mathematics Handbook (Logistic approximation).
   * Error is 0.00014 at +- 3.16
   */
  private[apakgroup] def cumulativeDistributionFunction(x: Double, mean: Double, stdDeviation: Double): Double = {
    val y = (x - mean) / stdDeviation
    // Cumulative distribution function for N(0, 1)
    1.0 / (1.0 + math.exp(-y * (1.5976 + 0.070566 * y * y)))
  }

This is then used to calculate a value for phi as follows:

val phi = -math.log10(1.0 - cumulativeDistributionFunction(timeDiff, mean, stdDeviation))

For the sake of argument, let E be the value of math.exp(-y * (1.5976 + 0.070566 * y * y))

The the calculation of phi boils down to

phi = -log10(1 - 1/(1+E))

For small values of E (< 1E-15) then the limit of the precision of a double value means than the value of E is effectively discarded, as it is sufficiently smaller than 1.0 for it to “fall off” the mantissa when added together. This means that 1 - 1/(1+E) becomes 1 - 1/1 = 0, and the value of phi become infinite. Effectively this limits the maximum meaningful value of phi that can be returned to approx. 15.

However, the expression within the log can be re-arranged algebraically:

1 - 1 / (1+E)
= (1+E) / (1+E) - 1 / (1+E)
= E / (1+E)

And the phi calculation becomes

phi = -log10(E/(1+E))

Now for small values of E, the loss of precision on 1+E means that E/(1+E) = E/1 = E and phi=-log10(E). The maximum value of phi is then constrained only by the largest negative number the log10() function can be produce, i.e. the largest negative exponent (approx -330). The approximation E/(1+E)=E for small values of E seems like the more correct one for floating point calculations with limited precision.

In testing this alternate formulation, however, I have found that values of E = Infinity are possible, presumably due to large enough negative values of y in math.exp(-y * (1.5976 + 0.070566 * y * y)). The previous formulation handles this correctly and gives 1 for 1-1/(1+Infinity) whereas Infinity/(1+Infinity) yields NaN. Therefore my solution is to use the new calculation for values that are greater than the mean, and the old calculation otherwise. My final version of the phi function is:

private[apakgroup] def phi(timeDiff: Long, mean: Double, stdDeviation: Double): Double = {
    val y = (timeDiff - mean) / stdDeviation
    val e =  math.exp(-y * (1.5976 + 0.070566 * y * y))
    val phi = if (timeDiff > mean) -math.log10(e / (1.0 + e)) else -math.log10(1.0 - 1.0/(1.0 + e))
    phi
}

This seems robust in the testing I’ve done so far, yielding the same value of phi as previously for phi < 15 approx., but allowing me to usefully set a very high threshold (e.g. 250).

@bantonsson
Copy link
Member

Hi @robdavid ,

That is a very nice bug report and analysis/solution.

Akka doesn't use github for issue tracking. Could you please open a ticket in Assembla here:
http://www.assembla.com/spaces/akka/tickets

Following the instructions here:
http://doc.akka.io/docs/akka/current/project/issue-tracking.html

Since you seem to have solved the problem, could open a pull request with your contributions as well?
https://github.com/akka/akka/blob/master/CONTRIBUTING.md

@robdavid
Copy link
Author

robdavid commented Nov 6, 2013

OK, ticket #3706 created.

@bantonsson
Copy link
Member

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants