Conversation
- add an ATIF rollout handler for standalone JSON trajectories - require explicit paths for ATIF while keeping Claude and Codex defaults - cover config, reader, interface, and recipe usage updates Closes #493
Greptile SummaryThis PR adds
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py | New ATIF format handler; parses standalone JSON trajectories into normalized rollout records with correct role mapping, sequential step_id validation, tool-call/observation linking, and subagent-ref collection |
| packages/data-designer-config/src/data_designer/config/seed_source.py | Adds ATIF to AgentRolloutFormat enum, returns (None, '*.json') defaults, and adds model_validator to enforce explicit path requirement at construction time |
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/registry.py | Registers AtifAgentRolloutFormatHandler alongside existing Claude Code and Codex handlers |
| packages/data-designer-engine/tests/engine/resources/agent_rollout/test_atif.py | Comprehensive handler tests covering happy path, non-object payload, non-sequential step IDs, assistant-only field rejection, and observation validation |
| packages/data-designer-config/tests/config/test_seed_source.py | Adds config-level tests for ATIF path requirement and default file pattern |
| packages/data-designer-engine/tests/engine/resources/test_seed_reader.py | Integration test verifying ATIF JSON files are hydrated correctly into seed reader records |
| packages/data-designer/tests/interface/test_data_designer.py | E2E test for dataset creation with ATIF seed source, verifying source_kind, cwd, and git_branch |
| docs/assets/recipes/trace_ingestion/agent_rollout_distillation.py | Adds early validation requiring --trace-dir when --format atif and updated help text |
| docs/concepts/seed-datasets.md | Documents ATIF usage, explicit-path requirement, and .json file pattern |
| docs/recipes/trace_ingestion/agent_rollout_distillation.md | Updates recipe description to include atif format and its --trace-dir requirement |
| docs/recipes/cards.md | Adds ATIF to the list of formats covered by the rollout distillation recipe |
Sequence Diagram
sequenceDiagram
actor User
participant Config as AgentRolloutSeedSource
participant Engine as Engine / Seed Reader
participant Handler as AtifAgentRolloutFormatHandler
participant Utils as parse helpers
User->>Config: AgentRolloutSeedSource(format=ATIF, path="/traces")
Config->>Config: model_validator: get_agent_rollout_format_defaults(ATIF)<br/>→ default_path=None
alt path is None AND default_path is None
Config-->>User: raises ValueError "path is required for format 'atif'"
else path provided
Config-->>User: AgentRolloutSeedSource instance
end
User->>Engine: build dataset / ingest seed source
loop for each *.json file under root_path
Engine->>Handler: parse_file(root_path, relative_path)
Handler->>Utils: load_atif_payload(file_path)
Utils-->>Handler: validated payload dict
Handler->>Handler: extract schema_version, session_id, agent, steps
loop for each step in steps
Handler->>Utils: normalize_atif_role(step.source)
Utils-->>Handler: message_role (assistant/user/system)
Handler->>Utils: validate_atif_step_fields(role, raw_step)
Note over Utils: rejects tool_calls/observation/reasoning_content<br/>on non-assistant steps
Handler->>Utils: normalize_atif_tool_calls(step.tool_calls)
Utils-->>Handler: normalized tool-call list
Handler->>Utils: build_message(role, content, tool_calls)
Utils-->>Handler: message dict appended to messages[]
alt step has observation
Handler->>Utils: normalize_atif_observation_messages(obs, valid_tool_call_ids)
loop for each observation result
Utils->>Utils: validate source_call_id in valid_tool_call_ids
Utils->>Utils: build tool message or collect subagent_ref
end
Utils-->>Handler: tool messages appended to messages[]
end
end
Handler->>Utils: build_atif_source_meta(payload, agent, steps, refs)
Utils-->>Handler: source_meta dict
Handler-->>Engine: [NormalizedAgentRolloutRecord]
end
Engine-->>User: dataset / seed rows
Reviews (4): Last reviewed commit: "docs: clarify ATIF path requirement" | Re-trigger Greptile
- reject non-sequential step ids before normalizing traces - reject assistant-only fields on user/system ATIF steps - require observation tool results to reference declared tool calls
|
Updated this PR to tighten ATIF ingestion semantics around malformed rollout data. Changes made:
Why: Verification:
|
|
Manual ATIF smoke test I just ran against this branch ( What I used:
Procedure:
Observed results:
One harness-specific note: because I imported the recipe module ad hoc via Net: this branch handled real external ATIF trajectories successfully both through raw ingestion and through the published recipe workflow. |
nabinchha
left a comment
There was a problem hiding this comment.
Thanks for putting this together, @eric-tramel — here is a focused review of PR #495.
Summary
The PR adds AgentRolloutFormat.ATIF with a dedicated handler, config validation (explicit path, default *.json), registry wiring, tests at config/reader/interface layers, and recipe/docs updates. The implementation matches the stated intent: one JSON file → one normalized rollout row, reuse of shared message/tool semantics, and stricter ATIF validation (sequential step_id, role-specific fields, observation ↔ tool-call linkage). Ruff passes on all changed Python files in this branch.
Findings
Critical — Let's fix these before merge
(none)
Warnings — Worth addressing
packages/data-designer-config/src/data_designer/config/seed_source.py — AgentRolloutSeedSource.path field description
- What: The
pathfield description explains defaults for Claude Code and Codex when omitted, but does not state that ATIF requires an explicitpath(even thoughvalidate_runtime_path_sourceenforces it). - Why: Anyone reading generated JSON Schema or docstrings from the model may assume omission is always OK if they only read the field text, not the validator error.
- Suggestion: Extend the description with a short clause such as: “
atifrequires an explicit directory path; home-directory defaults apply only toclaude_codeandcodex.”
Suggestions — Take it or leave it
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py — started_at / ended_at from string timestamps
- What:
started_atandended_atusemin(timestamps)/max(timestamps)on raw strings. - Why: For ISO-8601 strings in a consistent format this behaves as expected; mixed formats or locale-specific timestamps could produce lexicographic min/max that do not match chronological order.
- Suggestion: If you expect heterogeneous timestamp strings in the wild, consider documenting that assumption in the handler docstring, or parsing to
datetimewhen normalization is required.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py — Sequential step_id
- What: Steps must satisfy
step_id ==1-based list index. - Why: This is clearly intentional and tested; if some ATIF producers emit non-contiguous or differently ordered IDs while still being valid per the ATIF spec, those files would be rejected.
- Suggestion: If the public ATIF spec guarantees 1-based contiguous IDs in file order, a one-line comment citing that would help future maintainers; if not, consider whether reordering/sorting is worth a follow-up.
What Looks Good
- Layering: Config stays declarative; parsing lives in the engine handler; the reader already normalizes
AgentRolloutSeedParseError/OSErrorconsistently with other rollout formats. - Validation depth: The follow-up commit tightening observation ↔
tool_call_idlinkage and rejecting assistant-only fields on non-agent steps materially reduces silent garbage in normalized rows. - Test coverage: Dedicated
test_atif.py, config tests for path/pattern, seed reader hydration with multiple JSON files, and an E2EDataDesignercase give a sensible safety net for a new format.
Verdict
Ship it (with nits) — Addressing the path field description for ATIF would remove the only “worth fixing before merge” documentation gap; everything else is optional polish.
This review was generated by an AI assistant.
📋 Summary
Add
AgentRolloutFormat.ATIFsupport for standalone 1-file-1-rollout JSON trajectories inAgentRolloutSeedSource.This keeps the existing Claude Code/Codex ingestion model intact: each imported rollout becomes one normalized seed row with
messages, rollout metadata, and derived summary fields. For ATIF specifically,pathis required and the default file pattern is*.json.🔄 Changes
✨ Added
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py- new ATIF rollout handler for standalone trajectory JSON objectspackages/data-designer-engine/tests/engine/resources/agent_rollout/test_atif.py- handler coverage for ATIF step, tool-call, observation, and metadata normalization🔧 Changed
packages/data-designer-config/src/data_designer/config/seed_source.py- addATIFtoAgentRolloutFormat, default its file pattern to*.json, and require an explicitpathpackages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/registry.py- register the ATIF handler alongside Claude Code and Codexdocs/concepts/seed-datasets.md- document ATIF usage and explicit-path semanticsdocs/assets/recipes/trace_ingestion/agent_rollout_distillation.py- require--trace-dirwhen--format atifdocs/recipes/trace_ingestion/agent_rollout_distillation.mdanddocs/recipes/cards.md- include ATIF in rollout ingestion docs🧪 Tests
packages/data-designer-config/tests/config/test_seed_source.py- config validation for ATIF file patterns and explicit-path requirementpackages/data-designer-engine/tests/engine/resources/test_seed_reader.py- seed reader hydration for ATIF JSON filespackages/data-designer/tests/interface/test_data_designer.py- end-to-end dataset creation with ATIF seed sourcesUsage
uv run docs/assets/recipes/trace_ingestion/agent_rollout_distillation.py \ --format atif \ --trace-dir /path/to/atif-traces \ --num-records 20🔍 Attention Areas
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py- this PR intentionally scopes ATIF support to standalone 1-file-1-rollout JSON trajectories and reuses the existing normalized message/tool-output semanticspackages/data-designer-config/src/data_designer/config/seed_source.py- ATIF does not get an implicit home-directory default; callers must providepathcontinued_trajectory_refis preserved insource_metafor future follow-up if we choose to support multi-file logical traces laterTest Plan
uv run pytest tests/config/test_seed_source.pyinpackages/data-designer-configuv run pytest tests/engine/resources/agent_rollout/test_atif.py tests/engine/resources/test_seed_reader.pyinpackages/data-designer-engineuv run pytest tests/interface/test_data_designer.pyinpackages/data-designerCloses #493
🤖 Generated with AI