Skip to content

fix: clamp content differences to prevent FMA-induced numerical instability#1455

Open
barendgehrels wants to merge 1 commit intoboostorg:developfrom
barendgehrels:fix/issue-1452-index-content
Open

fix: clamp content differences to prevent FMA-induced numerical instability#1455
barendgehrels wants to merge 1 commit intoboostorg:developfrom
barendgehrels:fix/issue-1452-index-content

Conversation

@barendgehrels
Copy link
Copy Markdown
Collaborator

Might fix #1452

@tinko92 could you verify if this helps the problem?

The fix is by Claude Code (reviewed/directed by me), it argues:

 Analysis

  The core invariant being violated

  When box_exp is expanded from box_orig to include an indexable, we have box_exp ⊇ box_orig, so mathematically content(box_exp) >=
  content(box_orig) and content_diff >= 0. With FMA fusion, this invariant breaks: the multiply inside content() fuses with the
  subsequent subtraction differently for the two calls, yielding negative or spuriously non-zero results.

  The dangerous patterns

  Every call site follows one of two shapes:

  Shape 1 — "enlargement cost" (5 sites): content(expanded) - content(original)
  - insert.hpp:92
  - rstar/choose_next_node.hpp:120, 249
  - quadratic/redistribute_elements.hpp:294-295
  - linear/redistribute_elements.hpp:435-436

  Shape 2 — "free content" (1 site): content(union) - content(a) - content(b)
  - quadratic/redistribute_elements.hpp:77-79

  In all cases the result is mathematically guaranteed >= 0, but FMA can make it negative or spuriously non-zero.

  Why negative matters more than "slightly different positive"

  When content_diff becomes negative for a box that didn't actually expand, that box appears to shrink when receiving the new element —
  making it the "best" choice by a wide margin. This cascades: wrong node selections lead to a suboptimal tree structure (6685 nodes
  instead of 6763 in the reported case), which means far less effective pruning during queries.

  The fix: clamp to the known invariant

  Since we know these differences are >= 0, clamping them is mathematically sound and eliminates the FMA issue without suppressing any  compiler optimizations globally:

  content_type content_diff = (std::max)(content_type(0),
      content - index::detail::content(ch_i.first));

  This fixes both failure modes:
  - Negative values become exactly 0 (correct: the box didn't enlarge)
  - "Should be zero but is epsilon" — while (std::max) doesn't fix this, it doesn't need to. Two boxes that both don't enlarge will both   clamp to exactly 0, so equality comparisons work. Two boxes that both enlarge by a tiny positive amount are just heuristic  tie-breakers where sub-ULP differences are irrelevant.

  Interesting existing precedent

  The code at rstar/choose_next_node.hpp:135 already does epsilon-aware comparison:
  if ( min_content_diff < -std::numeric_limits<content_type>::epsilon()
    || std::numeric_limits<content_type>::epsilon() < min_content_diff )
  This shows the codebase already anticipated near-zero issues, just not systematically.

  What about the == comparisons?

  Lines like content_diff == smallest_content_diff and content1 == content2 are used as tie-breakers. After clamping, the most important
   case — two equal boxes both showing zero enlargement — will compare as exactly 0 == 0. For two genuinely different enlargements,
  sub-ULP differences in tie-breaking are harmless for the heuristic.

  Concrete changes needed:

  │ insert.hpp                          │ 92      │ Clamp content_diff                 │
  │ rstar/choose_next_node.hpp          │ 120     │ Clamp content_diff                 │
  │ rstar/choose_next_node.hpp          │ 249     │ Clamp content_diff                 │
  │ quadratic/redistribute_elements.hpp │ 77-79   │ Clamp free_content                 │
  │ quadratic/redistribute_elements.hpp │ 294-295 │ Clamp both content_increase values │
  │ linear/redistribute_elements.hpp    │ 435-436 │ Clamp both content_increase values │

I think this makes sense, looks good - and locally it did not break any unit tests (the RST always failed for me - on the fly that is fixed now as well, see other PR)

out_content_increase1 = content_incrase1;
out_content_increase2 = content_incrase2;
out_content_increase1 = content_increase1;
out_content_increase2 = content_increase2;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also fixed a typo, that's why the diff is a bit larger here

@tinko92
Copy link
Copy Markdown
Collaborator

tinko92 commented Mar 31, 2026

I tried something similar to what the language model outputs here before I wrote #1453 , but instead of clamping, I tested the values of the content calls directly for equality. I went into a different direction because

  1. it added an extra branch and prevented contraction to FMA (because the compiler needed to compute both intermediate results for comparing anyway), so no advantage over turning of contraction for content directly. This clamping is better though because max is branchless and it still allows applying FMA, so 1. is resolved.

  2. It did not fully resolve the issue.

Consider the following test using the instrumented main.cpp and the data.csv from the issue:

#!/usr/bin/env bash

(cd ../../boost/libs/geometry/ && git reset --hard -q && git checkout $1 -q && sed -i '504i\cross_track_calls++;' include/boost/geometry/strategies/geographic/distance_cross_track.hpp && git branch --show-current)
g++ -g -I../../boost/ -DNDEBUG -O3 -march=$2 main.cpp -std=c++20 -o a.out && ./a.out | grep -E "^2:|query all took|cross_track calls:"

Then we get:

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v2 # Baseline without issue (v2)
develop
2: 6763
query all took :   3460.036 ms
cross_track calls: 3303357

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh develop x86-64-v3 # Issue (v3)
develop
2: 6685
query all took :   7346.178 ms
cross_track calls: 6117432

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/index-content-fp-contract-off x86-64-v3 #1453 
fix/index-content-fp-contract-off
2: 6763 # Probably the same tree structure
query all took :   3496.460 ms
cross_track calls: 3306476 # Minor difference in distance calls, might be due to difference in distance itself.

bartels@DESKTOP-BPNV3G3:~/dev/bg_1452/test$ sh test.sh fix/issue-1452-index-content x86-64-v3 # This PR
fix/issue-1452-index-content
2: 6697 # Different tree structure from baseline.
query all took :   4120.981 ms
cross_track calls: 4200675 # Still a severe regression in the number of distance calls.

So there almost certainly call sites of content that were not covered by the LLM-generated patch. I still like the change in itself, though, because it seems like a sensibly defensive thing to do that causes no branching and prevents potentially nonsensical values. It may be sensible to merge both PRs (the original PR also e.g. prevents issues with contraction of content at potential future call sites or existing sites unrelated to this bug if there are any).

Copy link
Copy Markdown
Collaborator

@tinko92 tinko92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve merging these changes, the one formatting comment is no blocker for me.

content_type content_increase1 = enlarged_content1 - content1;
content_type content_increase2 = enlarged_content2 - content2;
content_type const content_increase1 = (std::max)(content_type(0), enlarged_content1 - content1);
content_type const content_increase2 = (std::max)(content_type(0), enlarged_content2 - content2);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: These violate our max line length of 100.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Geometry Index GCC arch x86-64-v3

2 participants