diff --git a/skills/data-designer/SKILL.md b/skills/data-designer/SKILL.md index cbb05c1a..51cbdef3 100644 --- a/skills/data-designer/SKILL.md +++ b/skills/data-designer/SKILL.md @@ -8,9 +8,6 @@ argument-hint: [describe the dataset you want to generate] Do not explore the workspace first. The workflow's Learn step gives you everything you need. -`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)` -Use this path for all `data-designer` commands throughout this skill. If blank, see Troubleshooting. - # Goal Build a synthetic dataset using the Data Designer library that matches this description: @@ -42,7 +39,7 @@ Read **only** the workflow file that matches the selected mode, then follow it: # Troubleshooting -- **`data-designer` command not found:** If no virtual environment exists, create one first (`python -m venv .venv && source .venv/bin/activate`), then install (`pip install data-designer`). If a virtual environment already exists, activate it and verify the package is installed. +- **`data-designer` CLI not found:** Tell the user that `data-designer` is not installed in this environment (requires Python >= 3.10). Ask if they would like you to create a virtual environment and install it, or if they prefer to do it themselves. Do not install anything without the user's permission. - **Network errors during preview:** A sandbox environment may be blocking outbound requests. Ask the user for permission to retry the command with the sandbox disabled. Only as a last resort, if retrying outside the sandbox also fails, tell the user to run the command themselves. # Output Template diff --git a/skills/data-designer/workflows/autopilot.md b/skills/data-designer/workflows/autopilot.md index 2f13b7e7..e6c2a396 100644 --- a/skills/data-designer/workflows/autopilot.md +++ b/skills/data-designer/workflows/autopilot.md @@ -2,25 +2,28 @@ In this mode, make reasonable design decisions autonomously based on the dataset description. Do not ask clarifying questions — infer sensible defaults and move straight through to a working preview. -1. **Learn** — Run `data-designer agent context`. +1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`. + - If the output is a path, use it as the `data-designer` executable for all commands in this workflow. + - If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. +2. **Learn** — Run `data-designer agent context`. - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding. - Inspect schemas for every column, sampler type, validator, and processor you plan to use. - Never guess types or parameters — read the relevant config files first. - Always read `base.py` for inherited fields shared by all config objects. -2. **Infer** — Based on the dataset description, make reasonable decisions for: +3. **Infer** — Based on the dataset description, make reasonable decisions for: - Axes of diversity and what should be well represented. - Which variables to randomize. - The schema of the final dataset. - The structure of any structured output columns. - Briefly state the key decisions you made so the user can course-correct if needed. -3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. -4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). -5. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. -6. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. +4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. +5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). +6. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. +7. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. - Note the sample records directory printed by the `data-designer preview` command - Give the user a clickable link: `file:///sample_records_browser.html` -7. **Create** — If the user specified a record count: +8. **Create** — If the user specified a record count: - Run `data-designer create --num-records --dataset-name `. - Generation speed depends heavily on the dataset configuration and the user's inference setup. For larger datasets, warn the user and ask for confirmation before running. - If no record count was specified, skip this step. -8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate. +9. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate. diff --git a/skills/data-designer/workflows/interactive.md b/skills/data-designer/workflows/interactive.md index d4a4ab33..590447b6 100644 --- a/skills/data-designer/workflows/interactive.md +++ b/skills/data-designer/workflows/interactive.md @@ -2,12 +2,15 @@ This is an interactive, iterative design process. Do not disengage from the loop unless the user says they are satisfied. -1. **Learn** — Run `data-designer agent context`. +1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`. + - If the output is a path, use it as the `data-designer` executable for all commands in this workflow. + - If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step. +2. **Learn** — Run `data-designer agent context`. - If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding. - Inspect schemas for every column, sampler type, validator, and processor you plan to use. - Never guess types or parameters — read the relevant config files first. - Always read `base.py` for inherited fields shared by all config objects. -2. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want. +3. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want. - Optimize for a great user experience: prefer a structured question tool over plain text if one is available, batch related questions together, keep the set short, provide concrete options/examples/defaults where possible, and use structured inputs (single-select, multi-select, free text, etc.) when they make answering easier. - If multiple model aliases are available, ask which one(s) to use (or default to an alias with the appropriate `generation_type` for each column). - Common things to make precise: @@ -17,17 +20,17 @@ This is an interactive, iterative design process. Do not disengage from the loop - The schema of the final dataset. - The structure of any required structured output columns. - What facets of the output dataset are important to capture. -3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview. -4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). -5. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. -6. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. +4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview. +5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md). +6. **Validate** — Run `data-designer validate `. Address any warnings or errors and re-validate until it passes. +7. **Preview** — Run `data-designer preview --save-results` to generate sample records as HTML files. - Note the sample records directory printed by the `data-designer preview` command - Give the user a clickable link: `file:///sample_records_browser.html` -7. **Iterate** +8. **Iterate** - Ask the user for feedback. - Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance. - Apply changes, re-validate, and re-preview. Repeat until the user is satisfied. -8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset: +9. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset: - `data-designer create --num-records --dataset-name `. - Caution the user that generation speed depends heavily on the dataset configuration and their inference setup. - Do not run this command yourself — the user should control when it runs.