Skip to content

Comments

Refactor NSS and DNSS implementations for improved readability and modularity#5

Closed
eclipse0922 wants to merge 2 commits intomasterfrom
refactor_2
Closed

Refactor NSS and DNSS implementations for improved readability and modularity#5
eclipse0922 wants to merge 2 commits intomasterfrom
refactor_2

Conversation

@eclipse0922
Copy link
Owner

@eclipse0922 eclipse0922 commented Feb 8, 2026

User description

  • Rewrote legacy code into clean C++17 style.
  • Added missing method implementations and input validation.
  • Introduced options struct for configurable parameters in NSS and DNSS.
  • Implemented CUDA support for DNSS rotational feature computation.
  • Added new CMake configuration for building with or without CUDA.
  • Updated README to reflect changes and provide build instructions.
  • Added .gitignore to exclude build directories.

PR Type

Enhancement, Bug fix, Documentation


Description

  • Refactored NSS and DNSS implementations into clean C++17 with modular design and options structs.

  • Added input validation, fixed rotational-return math, and replaced legacy bucket operations with efficient lazy removal.

  • Implemented optional CUDA acceleration for DNSS rotational feature computation.

  • Updated README with algorithm notes, build instructions, and references.


Changes walkthrough 📝

Relevant files
Enhancement
NSS.cpp
Refactor sampling logic and add CUDA support                         

NSS.cpp

  • Replaced legacy bucket-based sampling with modular helper functions
    and options.
  • Implemented computeCenteredAndScaledVertices,
    computeRotationalReturnValue, and sphericalBucketIndex.
  • Added CUDA integration stub and improved normalSpaceSampling and
    dualNormalSpaceSampling logic.
  • Removed concurrency dependencies and replaced with standard library
    RNG/shuffle.
  • +545/-310
    NSS.h
    Modernize header and options                                                         

    NSS.h

  • Replaced legacy class definitions with modern options structs.
  • Added custom glm::fvec3 fallback for portability.
  • Removed Eigen and concurrency dependencies, added method declarations
    for new options and CUDA support.
  • +162/-123
    dnss_cuda.cu
    Add CUDA implementation for DNSS                                                 

    dnss_cuda.cu

  • Implemented CUDA kernel for rotational feature computation.
  • Added host-side wrapper with memory management and error handling.
  • Falls back to CPU if CUDA fails or is disabled.
  • +217/-0 
    Configuration changes
    CMakeLists.txt
    Add CUDA build configuration                                                         

    CMakeLists.txt

  • Added DNSS_ENABLE_CUDA option for optional CUDA acceleration.
  • Configured C++17 standard and integrated CUDA sources when enabled.
  • Set up proper linking and compilation flags for CUDA.
  • +21/-0   
    Documentation
    README.md
    Update documentation and build guide                                         

    README.md

  • Updated to describe refactored implementation, fixes, and build
    instructions.
  • Added algorithm notes, bucket defaults, and CUDA usage guidance.
  • Included references to original papers and repositories.
  • +50/-10 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • …dularity
    
    - Rewrote legacy code into clean C++17 style.
    - Added missing method implementations and input validation.
    - Introduced options struct for configurable parameters in NSS and DNSS.
    - Implemented CUDA support for DNSS rotational feature computation.
    - Added new CMake configuration for building with or without CUDA.
    - Updated README to reflect changes and provide build instructions.
    - Added .gitignore to exclude build directories.
    @chatgpt-codex-connector
    Copy link

    You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Description updated to latest commit (0afba09)

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🏅 Score: 85
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Memory leak in CUDA error paths

    In the CUDA implementation (dnss_cuda.cu), when memory allocation or data transfer fails, the code calls cudaFree() on potentially uninitialized pointers (d_vertices, d_normals, etc.) before returning false. Since these pointers are only initialized after successful cudaMalloc(), calling cudaFree() on them in error paths leads to undefined behavior. Additionally, the pointers are not set to nullptr after freeing, which could cause double-free issues if the function is called again or in other error scenarios.

    Race condition in bucket cleanup

    In the DNSS::dualNormalSpaceSampling method, the popTranslationalPoint and popRotationalPoint lambdas use a while loop to skip inactive points by checking is_active[point_index] == 0u. However, this approach is not thread-safe and could lead to inconsistent state if the sampling logic were to be parallelized in the future. More critically, the lazy removal pattern combined with the priority queue (r_buckets) may cause points to be skipped multiple times without proper cleanup, potentially leading to infinite loops or excessive memory usage when many points are deactivated.

    Numerical instability in CUDA kernel

    The CUDA kernel computeRotationalFeaturesKernel uses sinf, cosf, atanf, and sqrtf functions which may have reduced precision compared to their double-precision counterparts. More importantly, the division operations (e.g., px / point_norm, nx / normal_norm) can lead to numerical instability when point_norm or normal_norm are very close to zero, even with the epsilon check. This could result in incorrect rotational return values, especially for nearly degenerate input data.

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix CUDA kernel normal validation

    The CUDA kernel should use isfinite checks consistently and avoid modifying
    normal_norm directly when it's invalid. Instead, set both rotational normal and
    return to safe defaults when the normal is degenerate to prevent undefined behavior.

    dnss_cuda.cu [44-55]

     const float point_norm = sqrtf(px * px + py * py + pz * pz);
     if (!(isfinite(point_norm)) || point_norm <= kEpsilon)
     {
         rotational_returns[index] = 0.0f;
         return;
     }
     
     float normal_norm = sqrtf(nx * nx + ny * ny + nz * nz);
     if (!(isfinite(normal_norm)) || normal_norm <= kEpsilon)
     {
    -    normal_norm = 1.0f;
    +    rotational_returns[index] = 0.0f;
    +    return;
     }
    Suggestion importance[1-10]: 8

    __

    Why: This is an important fix for correctness. The original code modified normal_norm to 1.0f when the normal was degenerate, but then continued using the original (potentially invalid) normal components in subsequent calculations. Setting rotational_returns[index] = 0.0f and returning early prevents undefined behavior and ensures consistent handling of degenerate normals, matching the CPU implementation's behavior.

    Medium
    General
    Improve bucket sampling termination logic

    The current bucket sampling loop uses a selected_any flag to detect when no points
    were selected in a full pass, but it doesn't handle the case where buckets become
    empty during the loop. This can cause premature termination if buckets get exhausted
    unevenly. Replace the selected_any logic with a more robust check that ensures all
    buckets are empty before breaking.

    NSS.cpp [255-281]

     ...
     while (sampled_vertices.size() < static_cast<std::size_t>(target_count))
     {
    -    bool selected_any = false;
    +    bool progress_made = false;
         for (std::vector<int> &bucket : buckets)
         {
             if (sampled_vertices.size() >= static_cast<std::size_t>(target_count))
             {
                 break;
             }
             if (bucket.empty())
             {
                 continue;
             }
     
    -        selected_any = true;
    +        progress_made = true;
             const int point_index = bucket.back();
             bucket.pop_back();
     
             sampled_vertices.push_back(m_original_vertices[static_cast<std::size_t>(point_index)]);
             sampled_normals.push_back(m_original_normals[static_cast<std::size_t>(point_index)]);
         }
     
    -    if (!selected_any)
    +    if (!progress_made)
         {
             break;
         }
     }
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion correctly identifies that the selected_any flag logic could be clearer, but renaming it to progress_made is a minor stylistic improvement rather than a functional fix. The termination condition behavior remains the same, as both flags serve the same purpose. While the change improves readability slightly, it doesn't address a real bug or significant issue.

    Medium
    Remove redundant lower bound clamp

    The std::clamp call with lower bound 0 is redundant since count is already computed
    using std::ceil on a positive value. Remove the unnecessary lower bound clamp for
    cleaner code.

    NSS.cpp [621-631]

     int DNSS::targetSampleCount(float sample_rate) const
     {
         if (!std::isfinite(sample_rate) || sample_rate <= 0.0f || m_vertices_original.empty())
         {
             return 0;
         }
     
         const float clamped_rate = std::min(1.0f, sample_rate);
         const int count = static_cast<int>(std::ceil(clamped_rate * static_cast<float>(m_vertices_original.size())));
    -    return std::clamp(count, 0, static_cast<int>(m_vertices_original.size()));
    +    return std::min(count, static_cast<int>(m_vertices_original.size()));
     }
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion is technically correct that the lower bound of 0 in std::clamp is redundant since count is computed from std::ceil of a positive value. However, keeping the explicit clamp serves as defensive programming and makes the intent clearer. The change offers minimal benefit and removes a safety check that could catch edge cases.

    Low

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant