-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw error from device to host with a msg #2258 #2283
base: develop
Are you sure you want to change the base?
Throw error from device to host with a msg #2258 #2283
Conversation
fbffb91
to
09c787d
Compare
09c787d
to
970405d
Compare
6e2a9e5
to
d58d3b2
Compare
9f719d1
to
b03addd
Compare
b03addd
to
ee7d889
Compare
# define ALPAKA_DEVICE_THROW(MSG) \ | ||
{ \ | ||
printf("%s [ALPAKA_DEVICE_THROW: Calling std::abort(). Sycl backend is enabled.]\n", (MSG)); \ | ||
std::abort(); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we can call std::abort()
from inside a SYCL kernel running e.g. on a GPU ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only support one-api and it is tested on CI and HAL. . I wanted to do assert(false) but assert turned out to be calling std::abort from kernel in sycl (url)
42b8049
to
0789e32
Compare
Ok I have added
|
332300c
to
ad62470
Compare
Unrelated to the test, but what is the HAL computer ? |
//! Therefore std::runtime_error thrown by ALPAKA_DEVICE_THROW aborts the the program for OpenMP backends. If needed | ||
//! the SIGABRT signal can be catched by signal handler. | ||
#if defined(ALPAKA_ACC_GPU_CUDA_ENABLED) && defined(__CUDA_ARCH__) | ||
# define ALPAKA_DEVICE_THROW(MSG) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a macro ?
Can we use a device function instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will not make much difference since only 2 lines are inlined. On the other hand Isn't it better to call the aborts or exceptions directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@psychocoderHPC what do you prefer here, functions or macros ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions are always prefered above macros. We should only use macros if we need __LINE__
, ...
#if defined(ALPAKA_ACC_GPU_CUDA_ENABLED) && defined(__CUDA_ARCH__) | ||
# define ALPAKA_DEVICE_THROW(MSG) \ | ||
{ \ | ||
printf("%s [ALPAKA_DEVICE_THROW: Calling __trap(). Cuda backend is enabled.]\n", (MSG)); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a message like
%s [ALPAKA_DEVICE_THROW: Calling __trap(). Cuda backend is enabled.]\n
is fine for debugging the implementation - but once this goes in production can we print just the user-supplied message ?
Or, if we do want to add more information, something like
alpaka encountered a user-defined error condition while running on the ___ back-end:\n
%s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, changed. Thanks.
22d1838
to
26fbae4
Compare
A HPC system at HZDR :) Has different kind of gpus and easily configurable. |
//! Therefore std::runtime_error thrown by ALPAKA_DEVICE_THROW aborts the the program for OpenMP backends. If needed | ||
//! the SIGABRT signal can be catched by signal handler. | ||
#if defined(ALPAKA_ACC_GPU_CUDA_ENABLED) && defined(__CUDA_ARCH__) | ||
# define ALPAKA_DEVICE_THROW(MSG) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was on the way to say change it to a C++ function and than this should be deviceThrow() within the alpaka namespace.
BUT than it should be a trait which gets the acc as argument with the coresponding class equal to the trait name which is than specialized for each acc. Device functions always take the acc as first argument.
Never the less I would handle it equal to ALPAKA_ASSERT_ACC
and would add the line and error message to the printf.
To follow the naming schema this macro should be named ALPAKA_THROW_ACC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hence; according to you it should be a macro called ALPAKA_THROW_ACC without an Acc argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this will follow our macro ALPAKA_ASSERT_ACC
and avoid introducing a new trait. IMO this function/macro will be used in seldem cases therefore I would like to avoid increasing the code base because of this tiny throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then now we have a macro which is taking a string, throwing or aborting depending on backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macro name changed to ALPAKA_THROW_ACC. Thanks.
26fbae4
to
a5fade5
Compare
a5fade5
to
f73328e
Compare
6635930
to
472b74d
Compare
@mehmetyusufoglu please rebase against develop branch to fix the CI issues |
9564419
to
697cea7
Compare
cfd024c
to
0a6bb4b
Compare
0a6bb4b
to
0f221e7
Compare
Macros to make signalling an error from the device side to the host code.
In case of Cuda, Hip, Sycl a user defined message is added to the abort. For other backends
std::runtime_error
exception with the user message is thrown.Testing:
-Tests could be done for Accs: CpuThreads and CpuBlocks by catching the
runtime_error
exceptions thrown during exec.-Aborts can not be catched from Cuda, Hip, Sycl as we call exec. (Only tested by running a temporary fail test at CI)
But for Cuda;
__trap
triggersruntime_error
can be catched during the wait(queue).-For sycl assert(false) is used. std::abort generated a runtime error at HAL no compile error but not "aborted"
-I turned out to be OpenMP specification mandates std::runtime errors should be handled by the same thread otherwise it is converted to abort. I checked with a signal handler and
SIGABRT
is fired. (Therefore openmp cases could not be tested other than a one time fail test at CI.)Issue
fix #2258