feat: add preview review reference and update interactive iterate step by johnnygreco · Pull Request #441 · NVIDIA-NeMo/DataDesigner

johnnygreco · 2026-03-20T00:23:14Z

Summary

Add references/preview-review.md with structured guidance for reviewing dataset previews (diversity, data quality, design choices)
Update interactive workflow iterate step to offer self-review and reference the new guide
Remove stale "and serve again" wording from iterate step

greptile-apps · 2026-03-20T00:25:20Z

Greptile Summary

This PR adds a new references/preview-review.md guide that provides structured criteria for self-reviewing dataset previews (diversity, data quality, design choices), and wires it into the interactive workflow's iterate step. It also generalizes the record-count warning in both workflow files by removing the hard-coded 50-record threshold.

Key points:

The review guide is well-structured and fills a real gap in the self-review loop.
Several issues flagged in prior review threads remain unresolved: the references/preview-review.md path in interactive.md may not resolve correctly relative to the file's directory; dataset.parquet is referenced in the guide but its existence as a --save-results artefact is undocumented; the self-review capability is not surfaced in the autopilot iterate loop; and autopilot.md step 7 still has a contradictory instruction order (run command first, then warn-before-running second).
A minor new ambiguity: the fallback artifacts/preview_results_*/ glob in preview-review.md can match multiple directories and does not explain how an agent should determine which is "most recent".

Confidence Score: 3/5

Safe to merge for documentation value, but four previously-flagged issues (path resolution, parquet existence, autopilot parity, instruction ordering) remain unaddressed and could cause agent failures in the field.
Prior threads identified four concrete issues — none were resolved in this revision. The new guide provides real value, but two of the prior concerns (missing parquet artefact, broken relative path) can cause outright agent errors rather than just degraded behavior, which prevents a higher confidence score.
skills/data-designer/references/preview-review.md (dataset.parquet availability, fallback directory resolution) and skills/data-designer/workflows/autopilot.md (contradictory create-step ordering).

Important Files Changed

Filename	Overview
skills/data-designer/references/preview-review.md	New guide providing structured review criteria (diversity, data quality, design choices). Instructs agents to load `dataset.parquet`, but the file's existence from `--save-results` is undocumented; fallback glob `artifacts/preview_results_*/` also lacks guidance on how to resolve "most recent".
skills/data-designer/workflows/interactive.md	Iterate step expanded to offer self-review via `references/preview-review.md`; stale "serve again" wording removed; finalize caution generalized from hard-coded 50-record threshold. Path `references/preview-review.md` may not resolve correctly relative to this file's location.
skills/data-designer/workflows/autopilot.md	Create step reworded to remove the 50-record hard threshold in favor of context-sensitive guidance; however, the instruction still directs the agent to run the command first and warn/confirm "before running" second — a contradictory ordering that was flagged in a prior review thread and remains unresolved.

Sequence Diagram

sequenceDiagram
    participant U as User
    participant A as Agent
    participant DD as data-designer CLI
    participant PR as preview-review.md

    Note over A,DD: Interactive Workflow — Iterate Step (updated)
    DD-->>A: Results path: artifacts/preview_results_*/
    A->>U: Here is your preview link
    A->>U: Would you like me to review the records?
    alt User accepts self-review
        A->>PR: Read references/preview-review.md
        PR-->>A: Diversity / Quality / Design criteria
        A->>DD: Load dataset.parquet (pandas)
        DD-->>A: Sample records
        A->>A: Evaluate diversity, quality, design
        A->>U: Suggested improvements
    else User provides own feedback
        U->>A: Feedback
    end
    A->>DD: data-designer validate + preview
    DD-->>A: Updated preview
    A->>U: Repeat until satisfied

Prompt To Fix All With AI

This is a comment left during a code review.
Path: skills/data-designer/references/preview-review.md
Line: 9

Comment:
**"Most recent" directory is ambiguous for an agent**

The fallback path `artifacts/preview_results_*/` is a glob that can match multiple directories. The guide says to use "the most recent" one, but does not explain how to determine recency — by modification time, by directory name suffix, or by listing and sorting. An agent may arbitrarily pick any match.

Consider making this explicit, e.g.:

```suggestion
Load `dataset.parquet` from the preview results directory (printed as `Results path:` by the preview command, or the most recent `artifacts/preview_results_*/` directory — sort by name or modification time to find it).
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (9): Last reviewed commit: "Merge branch 'main' into feat/skill-prev..." | Re-trigger Greptile}

johnnygreco added 3 commits March 19, 2026 17:17

feat: add preview review reference for Data Designer skill

6d7f39f

feat: add preview review offer to interactive workflow iterate step

95ff92b

fix: remove stale "and serve again" from iterate step

e5f7b17

johnnygreco requested a review from a team as a code owner March 20, 2026 00:23

greptile-apps Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread skills/data-designer/workflows/interactive.md

johnnygreco added 9 commits March 19, 2026 17:25

fix: reframe design choices as general feature-fit guidance

a897cb7

fix: move judge calibration from design choices to data quality

b4bfdb4

fix: make review offer more prominent in iterate step

0c916c1

fix: clarify "the user" in iterate step

bcc94c1

fix: generalize generation time warning across workflows

205e647

fix: soften generation time warning wording

b5922dd

fix: rephrase generation time warning

834f822

feat: make preview review a dedicated workflow step

d4f5556

fix: revert preview review to an offer within the iterate step

71eaa3a

greptile-apps Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread skills/data-designer/workflows/autopilot.md

Comment thread skills/data-designer/workflows/interactive.md

nabinchha approved these changes Mar 20, 2026

View reviewed changes

Merge branch 'main' into feat/skill-preview-review-reference

ee6fc7b

greptile-apps Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread skills/data-designer/references/preview-review.md

Merge branch 'main' into feat/skill-preview-review-reference

5af7c02

johnnygreco merged commit 6de3032 into main Mar 23, 2026
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add preview review reference and update interactive iterate step#441

feat: add preview review reference and update interactive iterate step#441
johnnygreco merged 14 commits intomainfrom
feat/skill-preview-review-reference

johnnygreco commented Mar 20, 2026

Uh oh!

greptile-apps Bot commented Mar 20, 2026 •

edited

Loading

Confidence Score: 3/5

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

johnnygreco commented Mar 20, 2026

Summary

Uh oh!

greptile-apps Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Mar 20, 2026 •

edited

Loading