Skip to content

[BUG+FIX] SIGSEGV using allclose with user-owned exec with profiling activated #1103

@osayamenja

Description

@osayamenja

Describe the Bug
Hey folks! Wonderful work with this library, I am a huge fan!

Currently, using allclose with an external executor causes a SIGSEGV error which crashes the running program.

Here's a summary:

  • Error occurs when a user passes an exec object with profiling activated to allclose
  • allclose receives exec by value, so its destructor is called at the end of the function.
template <typename OutType, typename InType1, typename InType2>
void __MATX_INLINE__ allclose(OutType dest, const InType1 &in1, const InType2 &in2, double rtol, double atol, cudaExecutor exec = 0)
  • The destructor destroys cuda events.
  • However, this destructor is called again when the exec object goes out of scope in the user's program.
  • This triggers a SIGSEGV as the events have already been destroyed in the previous call to the destructor.
  • FIX: Changing allclose to receive exec by reference like so: (...cudaExecutor& exec) fixes the issue because it ensures the destructor is only called once: when the referred exec object goes out of scope.

To Reproduce

#include <matx.h>
int main() {
    cudaSetDevice(0);
    cudaStream_t stream;
    cudaStreamCreate(&stream);
    matx::cudaExecutor exec{stream, true}; // necessary for profiling to be on.
    auto tA = matx::make_tensor<float>({2, 2});
    (tA = matx::ones<float>(tA.Shape())).run(exec);
    auto tB = matx::make_tensor<float>({2, 2});
    (tB = matx::ones<float>(tB.Shape())).run(exec);
    auto result = matx::make_tensor<int>({});
    constexpr auto rtol = 1e-3;
    constexpr auto atol = 1e-4;
    matx::allclose(result, tA, tB, rtol, atol, exec); // exec destructor called within allclose
    printf("Is Close? %s\n", result() ? "Yes" : "No");
    cudaStreamDestroy(stream);
} // SIGSEGV as internal state (events) has already been destroyed

Expected Behavior
No program crash.

Code Snippets
See above.

System Details (please complete the following information):

  • OS: Ubuntu 24.04.3 LTS (GNU/Linux 6.14.0-1021-gcp x86_64)
  • CUDA version: CUDA 13.0
  • g++ version: 13.3

Additional Context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions