Update DBSCAN runner to use multi-dimensional geometries and migrate it to benchmarks #736

aprokop · 2022-09-03T04:04:23Z

Move DBSCAN driver from examples to benchmark
Allow to run the driver for dimension 2-6 (with Kokkos 3.7+)
Update the test file input.txt to make sure it's read as 3D data

dalg24

Do you plan on re-introducing a clustering example?

aprokop · 2022-09-08T14:41:44Z

Do you plan on re-introducing a clustering example?

Yes. It will be a simple point cloud and calling the DBSCAN interface, with printing the number of found clusters with the number of points in them.

benchmarks/dbscan/input.txt

test/CMakeLists.txt

examples/dbscan/dbscan.cpp

dalg24

What benefit from the explicit instantiation (as implemented here) did you see?

benchmarks/dbscan/CMakeLists.txt

benchmarks/dbscan/dbscan.cpp

benchmarks/dbscan/dbscan.hpp

dalg24 · 2022-09-10T10:37:26Z

benchmarks/dbscan/dbscan_timpl.hpp

+template <int DIM>
+std::vector<Point<DIM>> sampleData(std::vector<Point<DIM>> const &data,
+                                   int num_samples)
+{
+  std::vector<Point<DIM>> sampled_data(num_samples);
+
+  std::srand(1337);
+
+  // Knuth algorithm
+  auto const N = (int)data.size();
+  auto const M = num_samples;
+  for (int in = 0, im = 0; in < N && im < M; ++in)
+  {
+    int rn = N - in;
+    int rm = M - im;
+    if (std::rand() % rn < rm)
+      sampled_data[im++] = data[in];
+  }
+  return sampled_data;
+}


Not having this defined next to getDataDimensions hurts readability

Not sure what you are talking about. This function is completely independent from anything else.

Both function implementation are looking into the same input data and you move them away from each other

But functionally they have nothing to do with each other. Sampling data can be done with any input, independent of how you construct it. For example, with Gan-Tao generator you would generate the data without reading it in, and would still be able to sample it. So for me I see zero reason to keep them in the same place.

I thought I was commenting about loadData. Where did that function go?

It seemed to me that you were commenting about getDataDimensions. Nevertheless, I stand by my point that sampling has nothing to do with how the data is constructed, and thus is unnecessary to be kept together. However, I'll move the sampleData after loadData if it makes you happier.

aprokop · 2022-09-10T11:15:04Z

What benefit from the explicit instantiation (as implemented here) did you see?

Times are for the full dbscan directory (including conveter):

Single dimension: 11.1s
Five dimensions no ETI: 36s
Five dimension with ETI (make -j8): 12s

aprokop · 2022-09-10T19:08:52Z

Something wrong with the HIP tester

terminate called after throwing an instance of 'std::runtime_error'
  what():  hipGetDeviceCount(&m_hipDevCount) error( hipErrorNoDevice): hipErrorNoDevice /scratch/kokkos/core/src/HIP/Kokkos_HIP_Instance.cpp:80

but everything else passes

benchmarks/dbscan/CMakeLists.txt

benchmarks/dbscan/dbscan.cpp

dalg24 · 2022-09-12T17:55:18Z

benchmarks/dbscan/dbscan_timpl.hpp

+template <int DIM>
+std::vector<Point<DIM>> sampleData(std::vector<Point<DIM>> const &data,
+                                   int num_samples)
+{
+  std::vector<Point<DIM>> sampled_data(num_samples);
+
+  std::srand(1337);
+
+  // Knuth algorithm
+  auto const N = (int)data.size();
+  auto const M = num_samples;
+  for (int in = 0, im = 0; in < N && im < M; ++in)
+  {
+    int rn = N - in;
+    int rm = M - im;
+    if (std::rand() % rn < rm)
+      sampled_data[im++] = data[in];
+  }
+  return sampled_data;
+}


I thought I was commenting about loadData. Where did that function go?

dalg24 · 2022-09-12T17:56:11Z

benchmarks/dbscan/dbscan_timpl.hpp

+template <int DIM>
+std::vector<Point<DIM>> loadData(std::string const &filename,
+                                 bool binary = true, int max_num_points = -1,
+                                 int num_samples = -1)


aprokop added the enhancement New feature or request label Sep 3, 2022

aprokop force-pushed the dbscan_example branch 2 times, most recently from f7f798a to 8a646a7 Compare September 3, 2022 04:11

aprokop changed the title ~~Updated DBSCAN runner to use multi-dimensional geometries and migrate to benchmarks/~~ Updated DBSCAN runner to use multi-dimensional geometries and migrated it to benchmarks Sep 3, 2022

aprokop added the refactoring Code reorganization label Sep 3, 2022

aprokop changed the title ~~Updated DBSCAN runner to use multi-dimensional geometries and migrated it to benchmarks~~ Update DBSCAN runner to use multi-dimensional geometries and migrate it to benchmarks Sep 3, 2022

aprokop force-pushed the dbscan_example branch 2 times, most recently from 3ed17b1 to afd98ca Compare September 3, 2022 15:38

dalg24 reviewed Sep 8, 2022

View reviewed changes

benchmarks/dbscan/input.txt Show resolved Hide resolved

test/CMakeLists.txt Show resolved Hide resolved

examples/dbscan/dbscan.cpp Outdated Show resolved Hide resolved

aprokop added 5 commits September 9, 2022 15:39

Updated DBSCAN runner to use multi-dimensional geometry

852cf56

Move dbscan from examples/ to benchmarks/

a4b5d87

Update dbscan test file to retain behavior

e261477

Reorganize dbscan driver to make it easier for ETI

74a6575

Use explicit instantiation

338e2d9

aprokop force-pushed the dbscan_example branch from afd98ca to cf15c2e Compare September 10, 2022 02:48

dalg24 reviewed Sep 10, 2022

View reviewed changes

dalg24 reviewed Sep 12, 2022

View reviewed changes

aprokop force-pushed the dbscan_example branch from e785a98 to 7bac1ed Compare September 12, 2022 18:04

aprokop added 4 commits September 12, 2022 15:43

Changed helper function to get dimension and add FIXME

9111052

Make impl field in Parameters string to reduce inter-file pollination

0d00fae

Bugfix for data reading

48a51a8

Rearrange function order in the benchmark

0f2eaab

aprokop force-pushed the dbscan_example branch from 7bac1ed to 0f2eaab Compare September 12, 2022 19:44

dalg24 approved these changes Sep 13, 2022

View reviewed changes

aprokop merged commit f9728b5 into arborx:master Sep 13, 2022

aprokop deleted the dbscan_example branch September 13, 2022 02:45

aprokop mentioned this pull request Oct 13, 2022

Add DBSCAN example #763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update DBSCAN runner to use multi-dimensional geometries and migrate it to benchmarks #736

Update DBSCAN runner to use multi-dimensional geometries and migrate it to benchmarks #736

aprokop commented Sep 3, 2022 •

edited

dalg24 left a comment

aprokop commented Sep 8, 2022 •

edited

dalg24 left a comment

dalg24 Sep 10, 2022

aprokop Sep 10, 2022

dalg24 Sep 10, 2022

aprokop Sep 10, 2022

dalg24 Sep 12, 2022

aprokop Sep 12, 2022

aprokop commented Sep 10, 2022 •

edited

aprokop commented Sep 10, 2022

dalg24 Sep 12, 2022

dalg24 Sep 12, 2022

Update DBSCAN runner to use multi-dimensional geometries and migrate it to benchmarks #736

Update DBSCAN runner to use multi-dimensional geometries and migrate it to benchmarks #736

Conversation

aprokop commented Sep 3, 2022 • edited

dalg24 left a comment

Choose a reason for hiding this comment

aprokop commented Sep 8, 2022 • edited

dalg24 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aprokop commented Sep 10, 2022 • edited

aprokop commented Sep 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aprokop commented Sep 3, 2022 •

edited

aprokop commented Sep 8, 2022 •

edited

aprokop commented Sep 10, 2022 •

edited