Update: GPU knn BVH search for irregular grid#56
Conversation
…egular spatial search
…zation and restart processes
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
This PR upgrades the KNN infrastructure for OpenACC/GPU builds by switching the GPU backend to a BVH-based implementation (via the new knn-bvh submodule), and refactors interpolation/marker routines to use batched KNN queries with improved runtime diagnostics.
Changes:
- Add and wire up the
knn-bvhsubmodule + Makefile targets for OpenACC builds. - Refactor
KNNAPI to support GPU BVH search, batching limits, and returning a results-buffer pointer. - Update interpolation and marker remapping to use batched GPU-direct KNN searches and improved logging/memory reporting.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
.gitmodules |
Adds knn-bvh submodule configuration. |
Makefile |
Integrates knn-bvh include/lib, adds prepare step, and adds clean/build rules for the submodule. |
README.md |
Adds CI badge for the new nvc build workflow. |
array2d.hpp |
Adds helper to pack coordinates into float3 for GPU KNN. |
brc-interpolation.cxx |
Switches node interpolation to batched KNN queries and new KNN API. |
dynearthsol.cxx |
Disables create_neighbor() calls during init/restart. |
knn.hpp |
Refactors KNN interface to return a neighbor*, adds backend/memory reporting helpers. |
knn.cxx |
Replaces hash-grid GPU KNN with BVH-backed search; adds batch sizing and result-buffer handling. |
markerset.cxx |
Switches marker remap/replenish KNN calls to new API and batching with diagnostics. |
nn-interpolation.cxx |
Switches NN/ACM interpolation KNN calls to new API and batching with diagnostics. |
parameters.hpp |
Adjusts neighbor type definition for ACC builds and includes BVH header under ACC. |
remeshing.cxx |
Adds GPU memory reporting and additional acc wait synchronization points; disables neighbor rebuild. |
utils.hpp |
Adds number formatting helper for log output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class KNN | ||
| { | ||
| public: | ||
| KNN(const Param& param, const array_t& points_vec_, NANOKDTree& nano_kdtree_, | ||
| double resoTimes = 3); | ||
| KNN(const Param& param, const array_t& points_vec_, NANOKDTree& nano_kdtree_, bool is_msg_ = true, | ||
| int capacity = -1); | ||
| ~KNN(); | ||
|
|
||
| void search(const array_t& queries, neighbor_vec& neighbors, const int nquery, | ||
| int k, double resoTimes = 3); | ||
| // Search for k nearest neighbors. | ||
| neighbor* search(const array_t& queries, const int nquery, const int k_neig, | ||
| bool is_sync_to_host = true, const float* d_guess_radii_sq = nullptr); | ||
|
|
There was a problem hiding this comment.
knn.hpp relies on types like Param, array_t, NANOKDTree, and neighbor without including the headers (or forward declarations) that define them. This makes the header non-self-contained and brittle to include-order changes; add the necessary includes (or forward declare where possible) so #include "knn.hpp" compiles on its own.
| #ifdef ACC | ||
| void pack_to_xyz_float(std::vector<float3>& buffer, std::size_t limit_size = 0) const { | ||
| std::size_t count = (limit_size > 0 && limit_size <= n_) ? limit_size : n_; | ||
|
|
||
| if (buffer.size() < count) | ||
| buffer.resize(count); | ||
|
|
||
| #pragma acc parallel loop gang vector | ||
| for (std::size_t i = 0; i < count; ++i) { | ||
| buffer[i].x = (float)(*this)[i][0]; | ||
| buffer[i].y = (float)(*this)[i][1]; | ||
| #ifdef THREED | ||
| buffer[i].z = (float)(*this)[i][2]; | ||
| #else | ||
| buffer[i].z = 0.0; | ||
| #endif | ||
| } | ||
| } |
There was a problem hiding this comment.
pack_to_xyz_float() uses the float3 type, but this header does not define or include a definition for float3. With current include order (array2d.hpp is included before knn_bvh.hpp), this will fail to compile in ACC builds. Consider defining a project-local POD type for 3 floats in a common header, or include the header that defines float3 before this method is compiled.
|
|
||
| $(KNN_BVH_LIB): | ||
| $(MAKE) -C $(KNN_BVH_DIR) NDIM=$(ndims) |
There was a problem hiding this comment.
$(KNN_BVH_LIB) is only set when openacc=1, but this unconditional rule still expands to a target even when the variable is empty (effectively creating a rule for :). Wrap this rule in the same ifeq ($(openacc), 1) guard (or give KNN_BVH_LIB a safe default) so non-OpenACC builds don't get an invalid/unintended make rule.
| $(KNN_BVH_LIB): | |
| $(MAKE) -C $(KNN_BVH_DIR) NDIM=$(ndims) | |
| ifeq ($(openacc), 1) | |
| $(KNN_BVH_LIB): | |
| $(MAKE) -C $(KNN_BVH_DIR) NDIM=$(ndims) | |
| endif |
…ile; add instructions for submodules in README and DEVELOPING
… -fsanitize=address warnings
1e7b1f9 to
8de695c
Compare
d9d918c to
78280c7
Compare
…move redundant environment variable in nvc-build workflow
…cker build to improve build performance during PR
78280c7 to
de19b62
Compare
a21485f to
468c6d0
Compare
a08a12d to
a6a8f78
Compare
This pull request introduces improvements to the KNN (k-nearest neighbors) search infrastructure, especially for GPU/OpenACC builds, and enhances the efficiency and flexibility of marker and node interpolation routines. The changes include adding the
knn-bvhsubmodule for GPU-accelerated KNN, refactoring theKNNclass API, updating the build system to support the new backend, and improving logging and batching for large-scale searches. These updates aim to improve performance, support larger datasets, and provide clearer diagnostics during execution.KNN backend integration and build system updates:
This change
Commit 5b41399
knn-bvhsubmodule and integrated its build process into theMakefilefor OpenACC-enabled builds. The build now ensures the submodule is initialized and built as needed, and cleans up appropriately. (.gitmodules,Makefile,knn-bvh) [1] [2] [3] [4] [5]KNN class and API refactoring:
KNNclass to support both CPU and GPU backends, added batching support, memory usage reporting, and a new search API that returns a pointer to the result buffer. The constructor now accepts a maximum capacity to minimize reallocations. (knn.hpp)Performance and batching improvements for interpolation and marker routines:
brc-interpolation.cxx,markerset.cxx,nn-interpolation.cxx) [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]Auxiliary improvements and code cleanup:
Array2Dfor packing data into a GPU-friendly format (pack_to_xyz_float). (array2d.hpp)create_neighborcalls. [1] [2]Documentation and CI:
README.md. (README.md)These changes collectively modernize and optimize the KNN infrastructure, especially for GPU-accelerated workflows, and improve the maintainability and scalability of the codebase.
Close #9