Describe the Bug
Hey folks! Wonderful work with this library, I am a huge fan!
Currently, using allclose with an external executor causes a SIGSEGV error which crashes the running program.
Here's a summary:
- Error occurs when a user passes an
exec object with profiling activated to allclose
allclose receives exec by value, so its destructor is called at the end of the function.
template <typename OutType, typename InType1, typename InType2>
void __MATX_INLINE__ allclose(OutType dest, const InType1 &in1, const InType2 &in2, double rtol, double atol, cudaExecutor exec = 0)
- The destructor destroys cuda events.
- However, this destructor is called again when the
exec object goes out of scope in the user's program.
- This triggers a SIGSEGV as the events have already been destroyed in the previous call to the destructor.
- FIX: Changing
allclose to receive exec by reference like so: (...cudaExecutor& exec) fixes the issue because it ensures the destructor is only called once: when the referred exec object goes out of scope.
To Reproduce
#include <matx.h>
int main() {
cudaSetDevice(0);
cudaStream_t stream;
cudaStreamCreate(&stream);
matx::cudaExecutor exec{stream, true}; // necessary for profiling to be on.
auto tA = matx::make_tensor<float>({2, 2});
(tA = matx::ones<float>(tA.Shape())).run(exec);
auto tB = matx::make_tensor<float>({2, 2});
(tB = matx::ones<float>(tB.Shape())).run(exec);
auto result = matx::make_tensor<int>({});
constexpr auto rtol = 1e-3;
constexpr auto atol = 1e-4;
matx::allclose(result, tA, tB, rtol, atol, exec); // exec destructor called within allclose
printf("Is Close? %s\n", result() ? "Yes" : "No");
cudaStreamDestroy(stream);
} // SIGSEGV as internal state (events) has already been destroyed
Expected Behavior
No program crash.
Code Snippets
See above.
System Details (please complete the following information):
- OS: Ubuntu 24.04.3 LTS (GNU/Linux 6.14.0-1021-gcp x86_64)
- CUDA version: CUDA 13.0
- g++ version: 13.3
Additional Context
N/A
Describe the Bug
Hey folks! Wonderful work with this library, I am a huge fan!
Currently, using allclose with an external executor causes a SIGSEGV error which crashes the running program.
Here's a summary:
execobject with profiling activated toallcloseallclosereceivesexecby value, so its destructor is called at the end of the function.execobject goes out of scope in the user's program.allcloseto receiveexecby reference like so:(...cudaExecutor& exec)fixes the issue because it ensures the destructor is only called once: when the referredexecobject goes out of scope.To Reproduce
Expected Behavior
No program crash.
Code Snippets
See above.
System Details (please complete the following information):
Additional Context
N/A