fix: Skip output directory cleanup when --skip_workflow is set by bledden · Pull Request #1627 · NVIDIA/NeMo-Agent-Toolkit

bledden · 2026-02-21T07:57:50Z

Summary

I ran into this while testing eval workflows: running nat eval --skip_workflow without --dataset was silently deleting the workflow_output.json from a previous run. The cleanup step in run_and_evaluate() runs before the dataset is loaded, so by the time the code tries to load the previous output, it's already been wiped by shutil.rmtree().

Since --skip_workflow exists specifically to reuse existing workflow output, it doesn't make sense to clean up the output directory when that flag is set. This change skips cleanup when --skip_workflow is active and logs an info message explaining why.

Normal eval behavior (without --skip_workflow) is unchanged.

Test plan

Run nat eval normally, confirm output directory cleanup still works
Run nat eval --skip_workflow after a previous run, confirm workflow_output.json is preserved
Existing eval tests pass

Summary by CodeRabbit

Bug Fixes
- Corrected output directory cleanup so it is skipped when the workflow is intentionally bypassed, preserving generated files.
- Added user-facing logs that indicate when cleanup operations are being skipped to improve transparency.

When running `nat eval --skip_workflow`, the output directory was being cleaned up before the dataset was loaded, destroying the workflow_output.json that the user intended to evaluate. The --skip_workflow flag exists to reuse previous output, so cleaning it up is contradictory. Skip cleanup when --skip_workflow is set and log an info message so the user knows why cleanup was skipped. Closes NVIDIA#1587 Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>

copy-pr-bot · 2026-02-21T07:57:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-21T07:58:09Z

No actionable comments were generated in the recent review. 🎉

Walkthrough

Output directory cleanup in run_and_evaluate() is now conditional: cleanup_output_directory() is invoked only when self.eval_config.general.output is truthy and self.config.skip_workflow is False; if skip_workflow is True, cleanup is skipped and a log message is emitted.

Changes

Cohort / File(s)	Summary
Output cleanup logic `packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py`	Changed `run_and_evaluate()` to call `cleanup_output_directory()` only when `self.eval_config.general.output` is truthy and `self.config.skip_workflow` is False. Added logging to indicate when cleanup is skipped due to `skip_workflow`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title concisely describes the fix using imperative mood and is well within the 72-character limit at 62 characters.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#1587` by preventing output directory cleanup when --skip_workflow is set, preserving existing workflow_output.json as required.
Out of Scope Changes check	✅ Passed	All changes are scoped to fixing the --skip_workflow cleanup issue; no unrelated modifications to other functionality were introduced.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py (1)

588-592: Misleading log when output config is absent.

The elif self.config.skip_workflow: branch fires whenever skip_workflow is True, regardless of whether self.eval_config.general.output is set. When output is falsy, cleanup would have been skipped anyway, so the log message "Skipping output directory cleanup because --skip_workflow is set" is misleading — it implies cleanup was about to happen.

Tighten the guard so the log only emits when cleanup would have actually been performed:

♻️ Proposed fix

-        # Cleanup the output directory (skip when reusing existing workflow output)
-        if self.eval_config.general.output and not self.config.skip_workflow:
-            self.cleanup_output_directory()
-        elif self.config.skip_workflow:
-            logger.info("Skipping output directory cleanup because --skip_workflow is set")
+        # Cleanup the output directory (skip when reusing existing workflow output)
+        if self.eval_config.general.output:
+            if self.config.skip_workflow:
+                logger.info("Skipping output directory cleanup because --skip_workflow is set")
+            else:
+                self.cleanup_output_directory()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py` around
lines 588 - 592, The log is emitted even when output is falsy, which is
misleading; change the conditional so the "Skipping output directory cleanup
because --skip_workflow is set" message only appears when cleanup would have
happened (i.e., when self.eval_config.general.output is truthy) — update the
branches around cleanup_output_directory(), referencing
self.eval_config.general.output and self.config.skip_workflow (and the
cleanup_output_directory() call and logger.info) to check output &&
skip_workflow for the log path and output && !skip_workflow for performing
cleanup.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py`:
- Around line 588-592: The log is emitted even when output is falsy, which is
misleading; change the conditional so the "Skipping output directory cleanup
because --skip_workflow is set" message only appears when cleanup would have
happened (i.e., when self.eval_config.general.output is truthy) — update the
branches around cleanup_output_directory(), referencing
self.eval_config.general.output and self.config.skip_workflow (and the
cleanup_output_directory() call and logger.info) to check output &&
skip_workflow for the log path and output && !skip_workflow for performing
cleanup.

bledden · 2026-02-21T08:26:54Z

Validation

I wrote a targeted test to both reproduce the bug and validate the fix:

Bug reproduction (reverted fix):

With the original code, cleanup_output_directory() gets called even when skip_workflow=True
This deletes workflow_output.json before the dataset handler can read it

Fix validation (with the change):

skip_workflow=True: cleanup_output_directory is correctly NOT called, workflow_output.json survives
skip_workflow=False: cleanup_output_directory IS called as expected (normal behavior preserved)

Also ran the full eval test suite (test_evaluate.py) - all 23 tests pass including both parametrized test_run_and_evaluate[True] and test_run_and_evaluate[False].

packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py

Use nested conditional for clearer logic flow. Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>

willkill07 · 2026-02-21T20:07:36Z

/ok to test 7e86fde

willkill07 · 2026-02-22T03:02:00Z

/merge

bledden requested a review from a team as a code owner February 21, 2026 07:57

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

willkill07 added bug Something isn't working non-breaking Non-breaking change labels Feb 21, 2026

willkill07 requested changes Feb 21, 2026

View reviewed changes

packages/nvidia_nat_eval/src/nat/plugins/eval/runtime/evaluate.py Outdated Show resolved Hide resolved

willkill07 self-assigned this Feb 21, 2026

Simplify skip_workflow conditional per review feedback

7e86fde

Use nested conditional for clearer logic flow. Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>

willkill07 approved these changes Feb 21, 2026

View reviewed changes

rapids-bot bot merged commit 351a943 into NVIDIA:develop Feb 22, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Skip output directory cleanup when --skip_workflow is set#1627

fix: Skip output directory cleanup when --skip_workflow is set#1627
rapids-bot[bot] merged 2 commits intoNVIDIA:developfrom
bledden:fix/eval-skip-workflow-cleanup

bledden commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 21, 2026

Uh oh!

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

bledden commented Feb 21, 2026

Uh oh!

Uh oh!

willkill07 commented Feb 21, 2026

Uh oh!

willkill07 commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bledden commented Feb 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 21, 2026

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

bledden commented Feb 21, 2026

Validation

Uh oh!

Uh oh!

willkill07 commented Feb 21, 2026

Uh oh!

willkill07 commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bledden commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading