From 074d05762fc779f96f1531d6715910a846301e13 Mon Sep 17 00:00:00 2001 From: Johnny Greco Date: Tue, 7 Apr 2026 11:29:38 -0400 Subject: [PATCH 1/5] fix: prevent skill load failure when data-designer CLI is not installed Append `|| true` to the shell command that resolves the data-designer path so it always exits 0. Without this, the skill fails to load entirely when the CLI is missing, and the "If blank, see Troubleshooting" fallback is never reached. --- skills/data-designer/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/data-designer/SKILL.md b/skills/data-designer/SKILL.md index cbb05c1a..feda1d87 100644 --- a/skills/data-designer/SKILL.md +++ b/skills/data-designer/SKILL.md @@ -8,7 +8,7 @@ argument-hint: [describe the dataset you want to generate] Do not explore the workspace first. The workflow's Learn step gives you everything you need. -`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)` +`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || true` Use this path for all `data-designer` commands throughout this skill. If blank, see Troubleshooting. # Goal From e6a96f1bb4317baa30ce2abcf3bb813922cc8fab Mon Sep 17 00:00:00 2001 From: Johnny Greco Date: Tue, 7 Apr 2026 11:57:43 -0400 Subject: [PATCH 2/5] fix: use explicit NOT_FOUND sentinel when data-designer CLI is missing Replace `|| true` (blank output) with `|| echo NOT_FOUND` so the agent sees a clear signal. Update the instruction to bold/imperative so it actually gets followed. --- skills/data-designer/SKILL.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/skills/data-designer/SKILL.md b/skills/data-designer/SKILL.md index feda1d87..b4da1c7f 100644 --- a/skills/data-designer/SKILL.md +++ b/skills/data-designer/SKILL.md @@ -8,8 +8,8 @@ argument-hint: [describe the dataset you want to generate] Do not explore the workspace first. The workflow's Learn step gives you everything you need. -`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || true` -Use this path for all `data-designer` commands throughout this skill. If blank, see Troubleshooting. +`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo NOT_FOUND` +Use this path for all `data-designer` commands throughout this skill. **If the value is `NOT_FOUND`, STOP and follow the Troubleshooting section before doing anything else.** # Goal From aa34af2b6676bb3152ebbb95c16f18be02a3743f Mon Sep 17 00:00:00 2001 From: Johnny Greco Date: Tue, 7 Apr 2026 14:14:09 -0400 Subject: [PATCH 3/5] fix: move CLI resolution into workflow steps instead of skill preamble Remove the \!`command` substitution from SKILL.md and add a "Resolve CLI command" step to both workflows. The agent now runs the lookup itself and uses the result as the data-designer executable for all subsequent commands. If the command fails, the agent stops and follows Troubleshooting. --- skills/data-designer/SKILL.md | 3 --- skills/data-designer/workflows/autopilot.md | 19 +++++++++++-------- skills/data-designer/workflows/interactive.md | 19 +++++++++++-------- 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/skills/data-designer/SKILL.md b/skills/data-designer/SKILL.md index b4da1c7f..ddee328a 100644 --- a/skills/data-designer/SKILL.md +++ b/skills/data-designer/SKILL.md @@ -8,9 +8,6 @@ argument-hint: [describe the dataset you want to generate] Do not explore the workspace first. The workflow's Learn step gives you everything you need. -`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo NOT_FOUND` -Use this path for all `data-designer` commands throughout this skill. **If the value is `NOT_FOUND`, STOP and follow the Troubleshooting section before doing anything else.** - # Goal Build a synthetic dataset using the Data Designer library that matches this description: diff --git a/skills/data-designer/workflows/autopilot.md b/skills/data-designer/workflows/autopilot.md index 2f13b7e7..c56c070f 100644 --- a/skills/data-designer/workflows/autopilot.md +++ b/skills/data-designer/workflows/autopilot.md @@ -2,25 +2,28 @@ In this mode, make reasonable design decisions autonomously based on the dataset description. Do not ask clarifying questions — infer sensible defaults and move straight through to a working preview. -1. **Learn** — Run `data-designer agent context`. +1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)`. + - If the command outputs a path, use it as the `data-designer` executable for all commands in this workflow. + - If it produces no output or fails, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. +2. **Learn** — Run `data-designer agent context`. - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding. - Inspect schemas for every column, sampler type, validator, and processor you plan to use. - Never guess types or parameters — read the relevant config files first. - Always read `base.py` for inherited fields shared by all config objects. -2. **Infer** — Based on the dataset description, make reasonable decisions for: +3. **Infer** — Based on the dataset description, make reasonable decisions for: - Axes of diversity and what should be well represented. - Which variables to randomize. - The schema of the final dataset. - The structure of any structured output columns. - Briefly state the key decisions you made so the user can course-correct if needed. -3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. -4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). -5. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. -6. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. +4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. +5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). +6. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. +7. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. - Note the sample records directory printed by the `data-designer preview` command - Give the user a clickable link: `file:///sample_records_browser.html` -7. **Create** — If the user specified a record count: +8. **Create** — If the user specified a record count: - Run `data-designer create --num-records --dataset-name `. - Generation speed depends heavily on the dataset configuration and the user's inference setup. For larger datasets, warn the user and ask for confirmation before running. - If no record count was specified, skip this step. -8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate. +9. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate. diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md index d4a4ab33..5e3d87f7 100644 --- a/skills/data-designer/workflows/interactive.md +++ b/skills/data-designer/workflows/interactive.md @@ -2,12 +2,15 @@ This is an interactive, iterative design process. Do not disengage from the loop unless the user says they are satisfied. -1. **Learn** — Run `data-designer agent context`. +1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)`. + - If the command outputs a path, use it as the `data-designer` executable for all commands in this workflow. + - If it produces no output or fails, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. +2. **Learn** — Run `data-designer agent context`. - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding. - Inspect schemas for every column, sampler type, validator, and processor you plan to use. - Never guess types or parameters — read the relevant config files first. - Always read `base.py` for inherited fields shared by all config objects. -2. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want. +3. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want. - Optimize for a great user experience: prefer a structured question tool over plain text if one is available, batch related questions together, keep the set short, provide concrete options/examples/defaults where possible, and use structured inputs (single-select, multi-select, free text, etc.) when they make answering easier. - If multiple model aliases are available, ask which one(s) to use (or default to an alias with the appropriate `generation_type` for each column). - Common things to make precise: @@ -17,17 +20,17 @@ This is an interactive, iterative design process. Do not disengage from the loop - The schema of the final dataset. - The structure of any required structured output columns. - What facets of the output dataset are important to capture. -3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview. -4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). -5. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. -6. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. +4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview. +5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). +6. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. +7. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. - Note the sample records directory printed by the `data-designer preview` command - Give the user a clickable link: `file:///sample_records_browser.html` -7. **Iterate** +8. **Iterate** - Ask the user for feedback. - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance. - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied. -8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset: +9. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset: - `data-designer create --num-records --dataset-name `. - Caution the user that generation speed depends heavily on the dataset configuration and their inference setup. - Do not run this command yourself — the user should control when it runs. From ea1e2f5a42ba96fb86c1e6f403e48b6cab8c8e44 Mon Sep 17 00:00:00 2001 From: Johnny Greco Date: Tue, 7 Apr 2026 14:20:24 -0400 Subject: [PATCH 4/5] fix: use CLI_NOT_FOUND sentinel to avoid triggering agent error-fixing The resolve command now always exits 0 and outputs CLI_NOT_FOUND when the executable is missing, so the agent evaluates a value rather than reacting to a shell error. --- skills/data-designer/workflows/autopilot.md | 6 +++--- skills/data-designer/workflows/interactive.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/skills/data-designer/workflows/autopilot.md b/skills/data-designer/workflows/autopilot.md index c56c070f..e6c2a396 100644 --- a/skills/data-designer/workflows/autopilot.md +++ b/skills/data-designer/workflows/autopilot.md @@ -2,9 +2,9 @@ In this mode, make reasonable design decisions autonomously based on the dataset description. Do not ask clarifying questions — infer sensible defaults and move straight through to a working preview. -1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)`. - - If the command outputs a path, use it as the `data-designer` executable for all commands in this workflow. - - If it produces no output or fails, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. +1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`. + - If the output is a path, use it as the `data-designer` executable for all commands in this workflow. + - If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. 2. **Learn** — Run `data-designer agent context`. - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding. - Inspect schemas for every column, sampler type, validator, and processor you plan to use. diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md index 5e3d87f7..590447b6 100644 --- a/skills/data-designer/workflows/interactive.md +++ b/skills/data-designer/workflows/interactive.md @@ -2,9 +2,9 @@ This is an interactive, iterative design process. Do not disengage from the loop unless the user says they are satisfied. -1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)`. - - If the command outputs a path, use it as the `data-designer` executable for all commands in this workflow. - - If it produces no output or fails, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. +1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`. + - If the output is a path, use it as the `data-designer` executable for all commands in this workflow. + - If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. 2. **Learn** — Run `data-designer agent context`. - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding. - Inspect schemas for every column, sampler type, validator, and processor you plan to use. From 8406bffd6785bda1efc361063af63c4a9a92e457 Mon Sep 17 00:00:00 2001 From: Johnny Greco Date: Tue, 7 Apr 2026 16:01:49 -0400 Subject: [PATCH 5/5] fix: require user permission before installing data-designer Update Troubleshooting to ask the user before creating a venv or installing packages, instead of attempting it automatically. --- skills/data-designer/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/data-designer/SKILL.md b/skills/data-designer/SKILL.md index ddee328a..51cbdef3 100644 --- a/skills/data-designer/SKILL.md +++ b/skills/data-designer/SKILL.md @@ -39,7 +39,7 @@ Read **only** the workflow file that matches the selected mode, then follow it: # Troubleshooting -- **`data-designer` command not found:** If no virtual environment exists, create one first (`python -m venv .venv && source .venv/bin/activate`), then install (`pip install data-designer`). If a virtual environment already exists, activate it and verify the package is installed. +- **`data-designer` CLI not found:** Tell the user that `data-designer` is not installed in this environment (requires Python >= 3.10). Ask if they would like you to create a virtual environment and install it, or if they prefer to do it themselves. Do not install anything without the user's permission. - **Network errors during preview:** A sandbox environment may be blocking outbound requests. Ask the user for permission to retry the command with the sandbox disabled. Only as a last resort, if retrying outside the sandbox also fails, tell the user to run the command themselves. # Output Template