Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions skills/data-designer/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ argument-hint: [describe the dataset you want to generate]

Do not explore the workspace first. The workflow's Learn step gives you everything you need.

`data-designer` command: !`command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer)`
Use this path for all `data-designer` commands throughout this skill. If blank, see Troubleshooting.

# Goal

Build a synthetic dataset using the Data Designer library that matches this description:
Expand Down Expand Up @@ -42,7 +39,7 @@ Read **only** the workflow file that matches the selected mode, then follow it:

# Troubleshooting

- **`data-designer` command not found:** If no virtual environment exists, create one first (`python -m venv .venv && source .venv/bin/activate`), then install (`pip install data-designer`). If a virtual environment already exists, activate it and verify the package is installed.
- **`data-designer` CLI not found:** Tell the user that `data-designer` is not installed in this environment (requires Python >= 3.10). Ask if they would like you to create a virtual environment and install it, or if they prefer to do it themselves. Do not install anything without the user's permission.
- **Network errors during preview:** A sandbox environment may be blocking outbound requests. Ask the user for permission to retry the command with the sandbox disabled. Only as a last resort, if retrying outside the sandbox also fails, tell the user to run the command themselves.

# Output Template
Expand Down
19 changes: 11 additions & 8 deletions skills/data-designer/workflows/autopilot.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,28 @@

In this mode, make reasonable design decisions autonomously based on the dataset description. Do not ask clarifying questions — infer sensible defaults and move straight through to a working preview.

1. **Learn** — Run `data-designer agent context`.
1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`.
- If the output is a path, use it as the `data-designer` executable for all commands in this workflow.
- If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step.
2. **Learn** — Run `data-designer agent context`.
- If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding.
- Inspect schemas for every column, sampler type, validator, and processor you plan to use.
- Never guess types or parameters — read the relevant config files first.
- Always read `base.py` for inherited fields shared by all config objects.
2. **Infer** — Based on the dataset description, make reasonable decisions for:
3. **Infer** — Based on the dataset description, make reasonable decisions for:
- Axes of diversity and what should be well represented.
- Which variables to randomize.
- The schema of the final dataset.
- The structure of any structured output columns.
- Briefly state the key decisions you made so the user can course-correct if needed.
3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed.
4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
5. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed.
5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
6. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
7. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
- Note the sample records directory printed by the `data-designer preview` command
- Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
7. **Create** — If the user specified a record count:
8. **Create** — If the user specified a record count:
- Run `data-designer create <path> --num-records <N> --dataset-name <name>`.
- Generation speed depends heavily on the dataset configuration and the user's inference setup. For larger datasets, warn the user and ask for confirmation before running.
- If no record count was specified, skip this step.
8. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate.
9. **Present** — Summarize what was built: columns, samplers used, key design choices. If the create command was run, share the results. Ask the user if they want any changes. If so, edit the script, re-validate, re-preview, and iterate.
19 changes: 11 additions & 8 deletions skills/data-designer/workflows/interactive.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@

This is an interactive, iterative design process. Do not disengage from the loop unless the user says they are satisfied.

1. **Learn** — Run `data-designer agent context`.
1. **Resolve CLI command** — Run `command -v data-designer 2>/dev/null || (test -x .venv/bin/data-designer && realpath .venv/bin/data-designer) || echo CLI_NOT_FOUND`.
- If the output is a path, use it as the `data-designer` executable for all commands in this workflow.
- If the output is `CLI_NOT_FOUND`, STOP and follow the Troubleshooting section in SKILL.md. Do not continue to the next step.
2. **Learn** — Run `data-designer agent context`.
- If no model aliases are configured, stop and tell the user to run `data-designer config` to set them up before proceeding.
- Inspect schemas for every column, sampler type, validator, and processor you plan to use.
- Never guess types or parameters — read the relevant config files first.
- Always read `base.py` for inherited fields shared by all config objects.
2. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want.
3. **Clarify** — Ask the user clarifying questions to narrow down precisely what they want.
- Optimize for a great user experience: prefer a structured question tool over plain text if one is available, batch related questions together, keep the set short, provide concrete options/examples/defaults where possible, and use structured inputs (single-select, multi-select, free text, etc.) when they make answering easier.
- If multiple model aliases are available, ask which one(s) to use (or default to an alias with the appropriate `generation_type` for each column).
- Common things to make precise:
Expand All @@ -17,17 +20,17 @@ This is an interactive, iterative design process. Do not disengage from the loop
- The schema of the final dataset.
- The structure of any required structured output columns.
- What facets of the output dataset are important to capture.
3. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview.
4. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
5. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
6. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
4. **Plan** — Determine columns, samplers, processors, validators, and other dataset features needed. Present the plan to the user and ask if they want any changes before generating a preview.
5. **Build** — Write the Python script with `load_config_builder()` (see Output Template in SKILL.md).
6. **Validate** — Run `data-designer validate <path>`. Address any warnings or errors and re-validate until it passes.
7. **Preview** — Run `data-designer preview <path> --save-results` to generate sample records as HTML files.
- Note the sample records directory printed by the `data-designer preview` command
- Give the user a clickable link: `file://<sample-records-dir>/sample_records_browser.html`
7. **Iterate**
8. **Iterate**
- Ask the user for feedback.
- Offer to review the records yourself and suggest improvements. If the user accepts, read `references/preview-review.md` for guidance.
- Apply changes, re-validate, and re-preview. Repeat until the user is satisfied.
8. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
9. **Finalize** — Once the user is happy, tell them they can run the following command to create the dataset:
- `data-designer create <path> --num-records <N> --dataset-name <name>`.
- Caution the user that generation speed depends heavily on the dataset configuration and their inference setup.
- Do not run this command yourself — the user should control when it runs.
Loading