You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been testing Akka on a resource constrained cluster of multiple JVMs on a single physical machine, with the result that heartbeat arrival times can vary widely, due, presumably, to GC's running a lot of the time. I tried to set a high value of the phi threshold to cope with this issue, but found this was ineffective. So I grabbed a copy of the PhiAccrualFailureDetector and added some logging to try to figure out why. I found that values of phi calculated by PhiAccrualFailureDetector would suddenly jump from approx. 15 to infinity.
In the PhiAccrualFailureDetector, the calculation of Phi makes use of a function that calculates the cumulative distribution function for a given point on the bell curve:
/**
* Cumulative distribution function for N(mean, stdDeviation) normal distribution.
* This is an approximation defined in β Mathematics Handbook (Logistic approximation).
* Error is 0.00014 at +- 3.16
*/
private[apakgroup] def cumulativeDistributionFunction(x: Double, mean: Double, stdDeviation: Double): Double = {
val y = (x - mean) / stdDeviation
// Cumulative distribution function for N(0, 1)
1.0 / (1.0 + math.exp(-y * (1.5976 + 0.070566 * y * y)))
}
This is then used to calculate a value for phi as follows:
val phi = -math.log10(1.0 - cumulativeDistributionFunction(timeDiff, mean, stdDeviation))
For the sake of argument, let E be the value of math.exp(-y * (1.5976 + 0.070566 * y * y))
The the calculation of phi boils down to
phi = -log10(1 - 1/(1+E))
For small values of E (< 1E-15) then the limit of the precision of a double value means than the value of E is effectively discarded, as it is sufficiently smaller than 1.0 for it to “fall off” the mantissa when added together. This means that 1 - 1/(1+E) becomes 1 - 1/1 = 0, and the value of phi become infinite. Effectively this limits the maximum meaningful value of phi that can be returned to approx. 15.
However, the expression within the log can be re-arranged algebraically:
Now for small values of E, the loss of precision on 1+E means that E/(1+E) = E/1 = E and phi=-log10(E). The maximum value of phi is then constrained only by the largest negative number the log10() function can be produce, i.e. the largest negative exponent (approx -330). The approximation E/(1+E)=E for small values of E seems like the more correct one for floating point calculations with limited precision.
In testing this alternate formulation, however, I have found that values of E = Infinity are possible, presumably due to large enough negative values of y in math.exp(-y * (1.5976 + 0.070566 * y * y)). The previous formulation handles this correctly and gives 1 for 1-1/(1+Infinity) whereas Infinity/(1+Infinity) yields NaN. Therefore my solution is to use the new calculation for values that are greater than the mean, and the old calculation otherwise. My final version of the phi function is:
private[apakgroup] def phi(timeDiff: Long, mean: Double, stdDeviation: Double): Double = {
val y = (timeDiff - mean) / stdDeviation
val e = math.exp(-y * (1.5976 + 0.070566 * y * y))
val phi = if (timeDiff > mean) -math.log10(e / (1.0 + e)) else -math.log10(1.0 - 1.0/(1.0 + e))
phi
}
This seems robust in the testing I’ve done so far, yielding the same value of phi as previously for phi < 15 approx., but allowing me to usefully set a very high threshold (e.g. 250).
The text was updated successfully, but these errors were encountered:
I have been testing Akka on a resource constrained cluster of multiple JVMs on a single physical machine, with the result that heartbeat arrival times can vary widely, due, presumably, to GC's running a lot of the time. I tried to set a high value of the phi threshold to cope with this issue, but found this was ineffective. So I grabbed a copy of the PhiAccrualFailureDetector and added some logging to try to figure out why. I found that values of phi calculated by PhiAccrualFailureDetector would suddenly jump from approx. 15 to infinity.
In the PhiAccrualFailureDetector, the calculation of Phi makes use of a function that calculates the cumulative distribution function for a given point on the bell curve:
This is then used to calculate a value for phi as follows:
For the sake of argument, let E be the value of
math.exp(-y * (1.5976 + 0.070566 * y * y))
The the calculation of phi boils down to
phi = -log10(1 - 1/(1+E))
For small values of E (< 1E-15) then the limit of the precision of a double value means than the value of E is effectively discarded, as it is sufficiently smaller than 1.0 for it to “fall off” the mantissa when added together. This means that 1 - 1/(1+E) becomes 1 - 1/1 = 0, and the value of phi become infinite. Effectively this limits the maximum meaningful value of phi that can be returned to approx. 15.
However, the expression within the log can be re-arranged algebraically:
1 - 1 / (1+E)
= (1+E) / (1+E) - 1 / (1+E)
= E / (1+E)
And the phi calculation becomes
phi = -log10(E/(1+E))
Now for small values of E, the loss of precision on 1+E means that E/(1+E) = E/1 = E and phi=-log10(E). The maximum value of phi is then constrained only by the largest negative number the log10() function can be produce, i.e. the largest negative exponent (approx -330). The approximation E/(1+E)=E for small values of E seems like the more correct one for floating point calculations with limited precision.
In testing this alternate formulation, however, I have found that values of E = Infinity are possible, presumably due to large enough negative values of y in
math.exp(-y * (1.5976 + 0.070566 * y * y))
. The previous formulation handles this correctly and gives 1 for 1-1/(1+Infinity) whereas Infinity/(1+Infinity) yields NaN. Therefore my solution is to use the new calculation for values that are greater than the mean, and the old calculation otherwise. My final version of the phi function is:This seems robust in the testing I’ve done so far, yielding the same value of phi as previously for phi < 15 approx., but allowing me to usefully set a very high threshold (e.g. 250).
The text was updated successfully, but these errors were encountered: