fix: align stop direction calculation with OBA Java semantics#477
Conversation
aaronbrethorst
left a comment
There was a problem hiding this comment.
Hey Adel, thanks for taking the time to dig into the Java reference implementation and fix these direction calculation discrepancies. The flat-earth approximation and shape point window corrections are solid work, and I can see from the screenshot that these changes produce visibly better results. Before we can merge this, I need you to make a couple of changes:
Critical
1. Revert the variance-to-standard-deviation change -- the old Go code was correct
The PR description describes a "Square Root Trap" where the previous Go code "incorrectly applied math.Sqrt() before comparing it against the varianceThreshold." I cross-referenced the Java reference implementation in GenerateNarrativesTask.java and the old Go code was actually correct here.
The Java code does:
double yVariance = Descriptive.sampleVariance(ys, yMu); // sum((y-mean)^2) / (n-1)
double yStdDev = Descriptive.sampleStandardDeviation(ys.size(), yVariance); // sqrt(yVariance)
if (yStdDev > _stopDirectionStandardDeviationThreshold ...) // threshold = 0.7Descriptive.sampleVariance computes sample variance with an n-1 denominator, and sampleStandardDeviation takes that sample variance and returns sqrt(sampleVariance). The Java threshold variable is even named _stopDirectionStandardDeviationThreshold.
So the Java compares standard deviation (with sqrt) to 0.7, using sample variance (n-1 denominator). The old Go code did exactly this:
xStdDev := math.Sqrt(xVariance) // where variance used (n-1) denominator
if xStdDev > adc.varianceThreshold ...The new Go code removes sqrt and switches to population variance (n denominator), which diverges from Java:
// New code - does NOT match Java
if xVariance > adc.varianceThreshold ... // comparing variance (not stddev) to 0.7This changes the effective threshold. Comparing variance > 0.7 is equivalent to comparing stddev > 0.836, making the Go code more permissive than Java for edge cases near the boundary.
What to fix: Revert both changes:
- Keep
math.Sqrt()before the threshold comparison (lines 237-240) - Keep sample variance (
n-1denominator) in thevariance()function (lines 402-411)
The flat-earth and window bounds fixes are the real corrections here -- the sqrt/variance changes should be left as they were.
Important
2. The near-zero check diverges from Java (but is better)
The change from exact comparison (xMu == 0.0 && yMu == 0.0) to epsilon comparison (math.Abs(xMu) < 1e-6 && math.Abs(yMu) < 1e-6) is good defensive coding, but the Java code uses exact == 0.0. This is unlikely to cause any practical difference, but please add a brief comment noting this is an intentional improvement over Java's exact comparison:
// Intentional improvement over Java's exact == 0.0 comparison;
// floating-point mean of cos/sin values is unlikely to be exactly zero.
if math.Abs(xMu) < 1e-6 && math.Abs(yMu) < 1e-6 {This way future maintainers know the divergence is deliberate.
Fit and Finish
3. Consider renaming varianceThreshold to standardDeviationThreshold
Since the Java names this _stopDirectionStandardDeviationThreshold and the value is compared against a standard deviation (after sqrt), renaming the Go constant and field would improve clarity:
const defaultStandardDeviationThreshold = 0.7This is optional but would make the code self-documenting about what the threshold actually means.
What's good about this PR:
- The flat-earth approximation (
math.Atan2(dy, dx)with cosine-adjusted longitude) is exactly right and matches the Java algorithm - The shape point window bounds fix (
indexTo >= len→indexTo = len-1, usingshapePoints[indexTo]directly) eliminates the off-by-one - The
indexTo > indexFromcondition (instead ofindexTo > indexFrom+1) correctly allows orientation calculation when from and to are adjacent points - Clear, well-researched PR description -- the effort to trace each discrepancy back to the Java source is appreciated
Thanks again, and I look forward to merging this change once the variance/stddev issue is resolved.
3db3ecd to
13c08d2
Compare
aaronbrethorst
left a comment
There was a problem hiding this comment.
Hey Adel, thanks for the quick turnaround on addressing the review feedback -- every item has been handled correctly. Here's a summary of what I checked:
Previously Requested Changes -- All Addressed
1. Variance/standard deviation revert (Critical) -- Fixed
The variance() function correctly uses the n-1 denominator (sample variance), and math.Sqrt() is applied before the threshold comparison -- matching the Java Descriptive.sampleStandardDeviation behavior. The test assertion (2.5 for [1, 2, 3, 4, 5]) confirms sample variance is in use.
Additionally, the guard clause change from len(values) == 0 to len(values) <= 1 is a smart fix -- without it, a single-element input would cause a division-by-zero panic with the n-1 denominator.
2. Epsilon comparison comment (Important) -- Added
The comment at line 231-232 clearly documents that the epsilon-based near-zero check is an intentional improvement over Java's exact == 0.0 comparison.
3. Rename to standardDeviationThreshold (Fit and Finish) -- Done
All instances renamed consistently across the constant, struct field, setter method, panic message, DirectionPrecomputer wrapper, and test functions. No stale references remain.
Original Fixes -- Still Correct
- Flat-earth approximation: The
math.Atan2(dy, dx)with cosine-adjusted longitude matches the Java algorithm - Shape point window bounds: The off-by-one fix and adjacent-point condition are both correct
- Tests: Updated assertions align with sample variance behavior; all pass
Fit and Finish
1. PR description is now outdated
The PR description still references the "Square Root Trap" and population variance changes, which were reverted per feedback. Consider updating the description to reflect the final state of the PR so future readers aren't confused by the discrepancy.
Verdict
Merge. All requested changes have been properly addressed and no new issues were introduced.
Resolves #135
What changed?
This PR completely aligns the Golang
AdvancedDirectionCalculatorwith the original OBA Java implementation. Previously, the Go server produced different or missing stop directions due to several mathematical inconsistencies compared to the Java logic.Key mathematical & logical fixes introduced:
math.Sqrt()before comparing it against thevarianceThreshold(0.7). This caused valid directions to be discarded. This PR compares the variance directly, matching Java.variancefunction to calculate Population Variance (dividing byninstead ofn-1) to perfectly match the exact scaling of the Java threshold.utils.BearingBetweenPointswith the OBA Java's exact flat-earth coordinate approximation usingmath.Atan2(dy, dx * cos(lat))for short distances.indexTo-1changed toindexTo) to strictly match Java's 5-point window span.Population Varianceoutputs and ensure complete coverage for the new mathematical logic.@aaronbrethorst
fixes : #135