Skip to content

Conditionally abort in xrt::run dtor and xrt::run::set_arg#9649

Merged
stsoe merged 1 commit intoXilinx:masterfrom
stsoe:throw_run_dtor
Mar 6, 2026
Merged

Conditionally abort in xrt::run dtor and xrt::run::set_arg#9649
stsoe merged 1 commit intoXilinx:masterfrom
stsoe:throw_run_dtor

Conversation

@stsoe
Copy link
Copy Markdown
Collaborator

@stsoe stsoe commented Mar 6, 2026

Problem solved by the commit

Issue warning in xrt::run::~run() if an xrt::run object is active. Abort in dtor if and only if there are no active exceptions. Also, throw in set_arg in elf flow if a run object is active.

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

The change helps prevent applications from failing to call xrt::run::wait() before destructing a run object.

How problem was solved, alternative solutions (if any) and why they were rejected

Considered dynamically checking if the run is complete before message and call to abort. But this could result in a situation where application sometimes succeeds and sometimes fails depending on timing.

While the current XRT execution model requires applications to call run.wait() for every run.start(), execution pipeline probably shouldn't enforce this if runs have been synchronized with fences. A final wait() is still required before destructing a run object. The final wait() ensures deterministic behavior.

@stsoe stsoe requested review from maxzhen and wendyliang25 March 6, 2026 20:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

clang-tidy review says "All clean, LGTM! 👍"

@larry9523
Copy link
Copy Markdown
Collaborator

should we also disable run.set_arg() ?

@stsoe
Copy link
Copy Markdown
Collaborator Author

stsoe commented Mar 6, 2026

should we also disable run.set_arg() ?

Can't easily do, since we have Alveo mailbox support where set_arg is used on active run objects.
It would have to be conditional some how. I would prefer not.

Issue warning in xrt::run::~run() if an xrt::run object is active.
Abort in dtor if and only if there are no active exceptions. Also,
throw in set_arg in elf flow.

The change helps prevent applications from failing to call
xrt::run::wait() before destructing a run object.

Considered dynamically checking if the run is complete before message
and call to abort.  But this could result in a situation where
application sometimes succeeds and sometimes fails depending on timing.

While the current XRT execution model requires applications to call
run.wait() for every run.start(), execution pipeline probably
shouldn't enforce this if runs have been synchronized with fences.
A final wait() is still required before destructing a run object.  The
final wait() ensures deterministic behavior.

xrt::run::set_arg throws exception in ELF flow (when module is
present). It cannot throw in general as there are flows (mailbox) that
allow calling set_arg on a active run object.

Signed-off-by: Soren Soe <2106410+stsoe@users.noreply.github.com>
@stsoe stsoe changed the title Conditionally abort in xrt::run dtor Conditionally abort in xrt::run dtor and xrt::run::set_arg Mar 6, 2026
@stsoe stsoe requested a review from larry9523 March 6, 2026 21:26
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

clang-tidy review says "All clean, LGTM! 👍"

@wendyliang25
Copy link
Copy Markdown
Collaborator

tested with the LLM flow which missing wait for some runs :

root@f8b62563520f:~/run_llm# ./model_benchmark -i models/Qwen2.5-3B-Instruct-onnx-ryzenai-npu/ -l 128 -f amd_genai_prompt.txt
DEPRECATED session option was used (config_entries): use 'session_options' directly instead.
Prompt Number of Tokens: 128
XRT build version: 2.23.0
Build hash: 9a057b09126b221e568db1ce448b09b96b9365ce
Build hash date: Fri, 6 Mar 2026 13:56:00 -0800
Git branch: HEAD
PID: 25
UID: 0
[Fri Mar  6 22:04:26 2026 GMT]
HOST: f8b62563520f
EXE: /root/run_llm/model_benchmark
[XRT] ERROR: xrt::run destructed while command is still in progress
Aborted (core dumped)

@stsoe stsoe merged commit 52cc0a9 into Xilinx:master Mar 6, 2026
21 checks passed
@stsoe stsoe deleted the throw_run_dtor branch March 6, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants