Skip to content

Forward-merge release/1.4 into develop#1504

Merged
GPUtester merged 1 commit intodevelopfrom
release/1.4
Jan 28, 2026
Merged

Forward-merge release/1.4 into develop#1504
GPUtester merged 1 commit intodevelopfrom
release/1.4

Conversation

@rapids-bot
Copy link

@rapids-bot rapids-bot bot commented Jan 28, 2026

Forward-merge triggered by push to release/1.4 that creates a PR to keep develop up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

* Update to use only a single Dask client per front-end worker, this avoids some unintended behavior that happens when a Dask client is closed some of this is discussed in dask/distributed#5667 because of this, the dask client context manger is replaced with a simple lazy property method.
* Switch to using Dask in blocking mode, the async usage was creating deadlocks with the front-end worker.
* Move the `_setup_worker` method to the `async_job.py` module

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.



## Summary by CodeRabbit

* **Bug Fixes**
  * Improved worker process isolation to reduce signal propagation and improve cluster shutdown reliability.

* **Refactor**
  * Centralized and simplified Dask client handling to a single cached access pattern for more consistent resource lifecycle.

* **Chores**
  * Changed default Dask worker memory limit from "auto" to "0" (no limit).

* **Tests**
  * Updated test suite to use synchronous flows, new fixtures, and adapted helpers for the revised client model.

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: #1498
@rapids-bot rapids-bot bot requested a review from a team as a code owner January 28, 2026 16:49
@GPUtester GPUtester merged commit a029bc5 into develop Jan 28, 2026
1 check passed
@rapids-bot
Copy link
Author

rapids-bot bot commented Jan 28, 2026

SUCCESS - forward-merge complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants