Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an exception when potentially losing precision in FDBSCAN-DenseBox #560

Merged
merged 1 commit into from
Oct 12, 2022

Conversation

aprokop
Copy link
Contributor

@aprokop aprokop commented Oct 19, 2021

For some problems and eps, FDBSCAN-DenseBox may suffer from severe loss
of precision when computing cellBox(). This may result in wrong DBSCAN
results.

Initially, the problem was observed on a full NGSIM dataset:

$ ./ArborX_DBSCAN.exe --binary \
    --filename /data/data_and_logs/2021_02_20_dbscan_datasets/mustafa_2019/ngsim.arborx \
    --print-dbscan-timers --core-min-size 2 --eps 1.0 --max-num-points 90000 \
    --impl fdbscan-densebox --verify
<...>
Core point is marked as noise: 41888 [-1]
Core point is marked as noise: 17027 [-1]
Core point is marked as noise: 43104 [-1]

Tracking down the issue revealed that a point 2279 (part of a dense
cell) was finding point 10369 (not part of a dense cell), but the
reverse was not true (i.e., 10369 was not finding the dense cell that
2279 was in). Studying the dense boxes, I observed that the problematic
box had bounds

(6452399.000000,1872182.375000,0.000000)
(6452399.000000,1872183.000000,0.577350)

Note the same x coordinate for both min and max corner. The h in this
run was ~0.57. So it completely got lost. Even Y coordinate lost some
precision, though not as much.

I'm not sure how robust this is. I also don't know what the best way to indicate this to user. My issue with using ARBORX_ASSERT is that when a user compiles ArborX in release mode, this will never get noticed. On the other hand, hardcoding a printf statement contradicts xSDK policy on having no hardcoded printouts.

@aprokop
Copy link
Contributor Author

aprokop commented Oct 19, 2021

It just occurred to me that it may be possible to propely fix this. As long as the cell box is expanded to contain the points inside it, this should produce correct result. The question is: how does one guarantee that the (rounded) float_round(a + x) <= exact(a + x) and float_round(a + y) >= exact(a + y)?

But I wonder if this is the only place. We do some similar calculations in determining cell indices and such. Maybe it's just easier to warn.

@aprokop aprokop added the bug Something isn't working label Oct 31, 2021
@aprokop aprokop marked this pull request as ready for review January 1, 2022 18:25
@aprokop
Copy link
Contributor Author

aprokop commented Jan 1, 2022

Currently I don't have a better idea, so lets go with the current version.

@aprokop
Copy link
Contributor Author

aprokop commented Jan 21, 2022

Please review.

@dalg24
Copy link
Contributor

dalg24 commented Jan 21, 2022

My issue with using ARBORX_ASSERT is that when a user compiles ArborX in release mode, this will never get noticed.

As currently implemented you would always get the runtime error

// FIXME: Unconditionally assert for now
// Once moved out, possibly make it conditional
#define ARBORX_ASSERT(c) \
if (!(c)) \
throw ArborX::SearchException(#c ", failed at " __FILE__ \
":" ARBORX_STRINGIZE(__LINE__) ".")

@aprokop
Copy link
Contributor Author

aprokop commented Jul 21, 2022

@dalg24 Can we merge this PR in its current form? It's not robust, but improves status quo.

@aprokop
Copy link
Contributor Author

aprokop commented Sep 15, 2022

Please review.

Copy link
Contributor

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must fix the style

src/details/ArborX_DetailsCartesianGrid.hpp Outdated Show resolved Hide resolved
src/details/ArborX_DetailsCartesianGrid.hpp Outdated Show resolved Hide resolved
For some problems and eps, FDBSCAN-DenseBox may suffer from severe loss
of precision when computing cellBox(). This may result in wrong DBSCAN
results.

Initially, the problem was observed on a full NGSIM dataset:
```
$ ./ArborX_DBSCAN.exe --binary \
    --filename /data/data_and_logs/2021_02_20_dbscan_datasets/mustafa_2019/ngsim.arborx \
    --print-dbscan-timers --core-min-size 2 --eps 1.0 --max-num-points 90000 \
    --impl fdbscan-densebox --verify
<...>
Core point is marked as noise: 41888 [-1]
Core point is marked as noise: 17027 [-1]
Core point is marked as noise: 43104 [-1]
```
Tracking down the issue revealed that a point 2279 (part of a dense
cell) was finding point 10369 (not part of a dense cell), but the
reverse was not true (i.e., 10369 was not finding the dense cell that
2279 was in). Studying the dense boxes, I observed that the problematic
box had bounds
  [[6452399.000000,1872182.375000,0.000000],
   [6452399.000000,1872183.000000,0.577350]]
Note the same x coordinate for both min and max corner. The h in this
run was ~0.57. So it completely got lost. Even Y coordinate lost some
precision, though not as much.
@aprokop aprokop merged commit cca601e into arborx:master Oct 12, 2022
@aprokop aprokop deleted the warn_loss_of_precision branch October 12, 2022 04:22
@aprokop aprokop changed the title Warn about loss of precision in FDBSCAN-DenseBox Throw an exception when potentially losing precision in FDBSCAN-DenseBox Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants