NVIDIA-NeMo · johnnygreco · Apr 7, 2026 · Apr 7, 2026 · Apr 7, 2026 · Apr 7, 2026
@@ -8,9 +8,6 @@ argument-hint: [describe the dataset you want to generate]
 
 Do not explore the workspace first. The workflow's Learn step gives you everything you need.
 
-`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)`
-Use this path for all `data-designer` commands throughout this skill. If blank, see Troubleshooting.
-
 # Goal
 
 Build a synthetic dataset using the Data Designer library that matches this description:
@@ -42,7 +39,7 @@ Read **only** the workflow file that matches the selected mode, then follow it:
 
 # Troubleshooting
 
-- **`data-designer` command not found:** If no virtual environment exists, create one first (`python -m venv .venv && source .venv/bin/activate`), then install (`pip install data-designer`). If a virtual environment already exists, activate it and verify the package is installed.
+- **`data-designer` CLI not found:** Tell the user that `data-designer` is not installed in this environment (requires Python >= 3.10). Ask if they would like you to create a virtual environment and install it, or if they prefer to do it themselves. Do not install anything without the user's permission.
 - **Network errors during preview:** A sandbox environment may be blocking outbound requests. Ask the user for permission to retry the command with the sandbox disabled. Only as a last resort, if retrying outside the sandbox also fails, tell the user to run the command themselves.
 
 # Output Template

@@ -2,25 +2,28 @@
 
 In this mode, make reasonable design decisions autonomously based on the dataset description. Do not ask clarifying questions — infer sensible defaults and move straight through to a working preview.
 
-1. **Learn** — Run `data-designer agent context`.
+1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`.
+  - If the output is a path, use it as the `data-designer` executable for all commands in this workflow.
+  - If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step.
+2. **Learn** — Run `data-designer agent context`.
   - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding.
   - Inspect schemas for every column, sampler type, validator, and processor you plan to use.
   - Never guess types or parameters — read the relevant config files first.
   - Always read `base.py` for inherited fields shared by all config objects.
-2. **Infer** — Based on the dataset description, make reasonable decisions for:
+3. **Infer** — Based on the dataset description, make reasonable decisions for:
   - Axes of diversity and what should be well represented.
   - Which variables to randomize.
   - The schema of the final dataset.
   - The structure of any structured output columns.
   - Briefly state the key decisions you made so the user can course-correct if needed.
-3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed.
-4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
-5. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
-6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
+4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed.
+5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
+6. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
+7. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
-7. **Create** — If the user specified a record count:
+8. **Create** — If the user specified a record count:
   - Run `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Generation speed depends heavily on the dataset configuration and the user's inference setup. For larger datasets, warn the user and ask for confirmation before running.
   - If no record count was specified, skip this step.
-8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate.
+9. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate.
@@ -2,12 +2,15 @@
 
 This is an interactive, iterative design process. Do not disengage from the loop unless the user says they are satisfied.
 
-1. **Learn** — Run `data-designer agent context`.
+1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`.
+  - If the output is a path, use it as the `data-designer` executable for all commands in this workflow.
+  - If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step.
+2. **Learn** — Run `data-designer agent context`.
   - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding.
   - Inspect schemas for every column, sampler type, validator, and processor you plan to use.
   - Never guess types or parameters — read the relevant config files first.
   - Always read `base.py` for inherited fields shared by all config objects.
-2. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want.
+3. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want.
   - Optimize for a great user experience: prefer a structured question tool over plain text if one is available, batch related questions together, keep the set short, provide concrete options/examples/defaults where possible, and use structured inputs (single-select, multi-select, free text, etc.) when they make answering easier.
   - If multiple model aliases are available, ask which one(s) to use (or default to an alias with the appropriate `generation_type` for each column).
   - Common things to make precise:
@@ -17,17 +20,17 @@ This is an interactive, iterative design process. Do not disengage from the loop
     - The schema of the final dataset.
     - The structure of any required structured output columns.
     - What facets of the output dataset are important to capture.
-3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview.
-4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
-5. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
-6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
+4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview.
+5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
+6. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
+7. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
   - Note the sample records directory printed by the `data-designer preview` command
   - Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
-7. **Iterate**
+8. **Iterate**
    - Ask the user for feedback.
    - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance.
    - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
-8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
+9. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
   - `data-designer create <path> --num-records <N> --dataset-name <name>`.
   - Caution the user that generation speed depends heavily on the dataset configuration and their inference setup.
   - Do not run this command yourself — the user should control when it runs.