Skip to content

Commit

Permalink
Warn about loss of precision in FDBSCAN-DenseBox
Browse files Browse the repository at this point in the history
For some problems and eps, FDBSCAN-DenseBox may suffer from severe loss
of precision when computing cellBox(). This may result in wrong DBSCAN
results.

Initially, the problem was observed on a full NGSIM dataset:
```
$ ./ArborX_DBSCAN.exe --binary \
    --filename /data/data_and_logs/2021_02_20_dbscan_datasets/mustafa_2019/ngsim.arborx \
    --print-dbscan-timers --core-min-size 2 --eps 1.0 --max-num-points 90000 \
    --impl fdbscan-densebox --verify
<...>
Core point is marked as noise: 41888 [-1]
Core point is marked as noise: 17027 [-1]
Core point is marked as noise: 43104 [-1]
```
Tracking down the issue revealed that a point 2279 (part of a dense
cell) was finding point 10369 (not part of a dense cell), but the
reverse was not true (i.e., 10369 was not finding the dense cell that
2279 was in). Studying the dense boxes, I observed that the problematic
box had bounds
  [[6452399.000000,1872182.375000,0.000000],
   [6452399.000000,1872183.000000,0.577350]]
Note the same x coordinate for both min and max corner. The h in this
run was ~0.57. So it completely got lost. Even Y coordinate lost some
precision, though not as much.
  • Loading branch information
aprokop committed Jul 21, 2022
1 parent 136143d commit 19d681a
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions src/details/ArborX_DetailsFDBSCANDenseBox.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,28 @@ struct CartesianGrid
constexpr auto max_size_t = std::numeric_limits<size_t>::max();
ARBORX_ASSERT(_nx == 0 || _ny == 0 || _nz == 0 ||
(_ny < max_size_t / _nx && _nz < max_size_t / (_nx * _ny)));

// Catch a potential loss of precision that may happen in cellBox() and can
// lead to wrong results. Only FDBSCAN-DenseBox suffers from this issue.
// FDBSCAN uses only min/max to construct bounding volumes, and thus has no
// loss.
//
// The machine precision by itself is not sufficient. In some experiments
// run with a full NGSIM datasets, values below 3 could still produce wrong
// results. This may still not be conservative enough, but all runs passed
// verification when this warning was not triggered.
float constexpr eps = 5 * std::numeric_limits<float>::epsilon();
if (std::abs(_h / min_corner[0]) < eps ||
std::abs(_h / min_corner[1]) < eps ||
std::abs(_h / min_corner[2]) < eps ||
std::abs(_h / max_corner[0]) < eps ||
std::abs(_h / max_corner[1]) < eps ||
std::abs(_h / max_corner[2]) < eps)
std::cerr
<< "ArborX warning: FDBSCAN-DenseBox algorithm may experience loss "
"of precision, producing wrong results. Please consider switching "
"to FDBSCAN."
<< std::endl;
}

KOKKOS_FUNCTION
Expand All @@ -72,6 +94,11 @@ struct CartesianGrid
auto i = cell_index % _nx;
auto j = (cell_index / _nx) % _ny;
auto k = cell_index / (_nx * _ny);

// This code may suffer from loss of precision depending on the problem
// bounds and h. We try to detect this case in the constructor. It could
// have been avoided if one computes min and max over all points in a dense
// cell. However, that may be expensive, and thus we avoid that.
return {{min_corner[0] + i * _h, min_corner[1] + j * _h,
min_corner[2] + k * _h},
{min_corner[0] + (i + 1) * _h, min_corner[1] + (j + 1) * _h,
Expand Down

0 comments on commit 19d681a

Please sign in to comment.