feat: agent CLI introspection (simplified)#415
Conversation
Greptile SummaryThis PR introduces a The implementation is clean and well-factored: a single flat command layer calls plain functions in Key observations:
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/base.py | Adds schema_text() classmethod to ConfigBase with correct default_factory handling, enum coercion, and a comprehensive Google-style section header set for docstring extraction. The _format_annotation and _get_docstring_summary helpers are clean and well-tested. |
| packages/data-designer/src/data_designer/cli/utils/agent_introspection.py | Core introspection module. The get_model_aliases_state early-return path (model_registry is None) reports default_provider using provider_registry.default directly, while the main path falls back to providers[0].name — causing an inconsistency that can mislead agents about provider availability. Everything else (family discovery, schema building, builder API introspection, registry loading) is correct and well-guarded. |
| packages/data-designer/src/data_designer/cli/utils/agent_text_formatter.py | All previously-flagged issues (builder section, None rendering) are addressed. The _cell() helper correctly coerces None to empty string in both width calculation and cell rendering. All six output formatters are present and structurally correct. |
| packages/data-designer/src/data_designer/cli/main.py | Adds agent and agent state Typer subgroups using the existing create_lazy_typer_group mechanism. _build_agent_lazy_group correctly partitions top-level vs. dotted (state.*) commands. The new import of AGENT_COMMANDS from agent_command_defs is cheap (stdlib-only transitive deps) and does not materially affect --help startup time. |
| packages/data-designer/src/data_designer/cli/commands/agent.py | Thin command layer; all six commands delegate to plain functions via the _run helper, which correctly differentiates AgentIntrospectionError (structured error code) from generic exceptions (internal_error). Clean and consistent. |
Sequence Diagram
sequenceDiagram
participant Agent
participant CLI as data-designer agent
participant Cmd as commands/agent.py
participant Intro as agent_introspection.py
participant Fmt as agent_text_formatter.py
Agent->>CLI: agent context
CLI->>Cmd: context_command()
Cmd->>Intro: get_context(DATA_DESIGNER_HOME)
Intro->>Intro: get_family_catalog() × 5 families
Intro->>Intro: get_model_aliases_state()
Intro->>Intro: get_persona_datasets_state()
Intro->>Intro: get_builder_api()
Intro-->>Cmd: dict payload
Cmd->>Fmt: format_context_text(data)
Fmt-->>Cmd: formatted text
Cmd-->>Agent: stdout (human-readable)
Agent->>CLI: agent schema columns llm-text
CLI->>Cmd: schema_command(family, type_name)
Cmd->>Intro: get_schema("columns", "llm-text")
Intro->>Intro: discover_family_types()
Intro->>Intro: _build_schema_dict() → cls.schema_text()
Intro-->>Cmd: schema dict
Cmd->>Fmt: format_schema_text(data)
Fmt-->>Agent: stdout (schema_text output)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/utils/agent_introspection.py
Line: 185-191
Comment:
**`default_provider` logic inconsistent with main path**
The early-return path (when `model_registry is None`) reports `default_provider` as `provider_registry.default` directly. If a user has a `providers.yaml` with providers but no explicit `default` set, `provider_registry.default` is `None` — so the output shows `default_provider: null`.
In contrast, the main code path (lines 198–200) falls back to `provider_registry.providers[0].name` when `provider_registry.default` is falsy:
```python
default_provider = provider_registry.default or (
provider_registry.providers[0].name if provider_registry.providers else None
)
```
An agent running `agent state model-aliases` before any models are configured (e.g., after `data-designer config providers` but before `data-designer config models`) would see `default_provider: null` and could incorrectly diagnose the setup as having no provider, when one is actually available. The fix is to apply the same fallback in the early return:
```suggestion
if model_registry is None:
if provider_registry is not None:
computed_default = provider_registry.default or (
provider_registry.providers[0].name if provider_registry.providers else None
)
else:
computed_default = None
return {
"model_config_present": False,
"provider_config_present": provider_registry is not None,
"default_provider": computed_default,
"items": items,
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/utils/agent_introspection.py
Line: 282-285
Comment:
**`__init__` included but condition is unintuitive**
The filter `name.startswith("_") and name != "__init__"` explicitly retains `__init__` while excluding all other private/dunder methods. If `DataDesignerConfigBuilder.__init__` has parameters and a docstring, this is intentional. However, if `__init__` is inherited from a base class or is just the default `object.__init__`, the method signature `__init__() -> None` would appear in the builder output but provide no useful information to agents.
Consider only including `__init__` when it is actually defined on `DataDesignerConfigBuilder` itself (not inherited), to avoid a spurious method entry from `object.__init__`:
```suggestion
for name, attr in inspect.getmembers(DataDesignerConfigBuilder):
if name.startswith("_") and not (name == "__init__" and "__init__" in DataDesignerConfigBuilder.__dict__):
continue
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 0bb974b
a6214a9 to
5283f97
Compare
nabinchha
left a comment
There was a problem hiding this comment.
Summary: Adds a data-designer agent CLI subgroup that lets AI coding agents dynamically discover Data Designer's configuration types, builder API, and local state via human-readable text output. Also adds schema_text() to ConfigBase and normalizes validator/constraint discriminators to concrete Literal fields. The implementation matches the stated intent well — the simplified single-layer architecture (commands call plain functions directly) is a significant improvement over the prior 3-layer approach.
Findings
Critical — Must fix before merge
No critical issues found.
Warnings — Strongly recommend fixing
packages/data-designer/src/data_designer/cli/utils/agent_text_formatter.py:119,128 — or '' coercion suppresses falsy values like 0 and False
- What:
row.get(col) or ''coercesNoneto empty string (good), but also coerces0,False, and empty string""to empty string. Theusablecolumn in model aliases contains booleanTrue/Falsevalues. WhenusableisFalse,row.get("usable") or ''evaluates to'', so the table would show an empty cell instead ofFalse. - Why: An agent reading the model-aliases table would see an empty
usablecolumn for unusable models, losing a key diagnostic signal. - Suggestion: Use an explicit
Nonecheck:
def _cell_value(val: Any) -> str:
return "" if val is None else str(val)Then use _cell_value(row.get(col)) in both the width calculation and row rendering.
packages/data-designer/src/data_designer/cli/main.py:110 — Implicit and/or precedence in filter expression
- What: The condition
if prefix == "" and "." not in cmd.name or cmd.name.startswith(f"{prefix}.")relies onandbinding tighter thanor. While logically correct, it reads as ambiguous. - Why: A future maintainer adding a new command pattern could easily misread the intent and introduce a routing bug.
- Suggestion: Add explicit parentheses:
if (prefix == "" and "." not in cmd.name) or (prefix != "" and cmd.name.startswith(f"{prefix}."))packages/data-designer-config/src/data_designer/config/sampler_constraints.py:27-29 — Constraint base class is no longer abstract but not meant to be instantiated
- What: The base
Constraintclass was previouslyABCbut is now a plainConfigBasesubclass. It declaresconstraint_type: ConstraintTypeas required (no default), so direct instantiation would fail with a confusing validation error. - Why: If someone instantiates
Constraintdirectly, the error won't clearly indicate they should use a subclass. - Suggestion: Restore
ABConConstraint— it was abstract before and still shouldn't be instantiated directly.
packages/data-designer/tests/cli/commands/test_agent_command.py:57-65 — test_error_outputs_message_to_stderr may fail depending on Typer version
- What: The test accesses
result.stdoutandresult.stderrseparately, but Typer'sCliRunnerbehavior around stderr separation depends on the Typer/Click version pinned. - Why: A test that errors instead of testing the intended behavior provides false confidence.
- Suggestion: Confirm this test passes in CI. If it doesn't, use
CliRunner(mix_stderr=False)explicitly.
Suggestions — Consider improving
packages/data-designer-config/tests/config/test_schema_text.py:149 — Single-letter loop variable l
- What:
next(i for i, l in enumerate(lines) if "name: str" in l)useslwhich is visually similar to1. - Suggestion: Rename to
line.
packages/data-designer/tests/cli/utils/test_agent_text_formatter.py:219 — Import inside test function
- What:
test_format_schema_text_on_real_config_modelsimportsget_family_schemainside the function body rather than at module level. - Suggestion: Move the import to the top of the file per AGENTS.md guidelines.
What Looks Good
-
schema_text()onConfigBaseis an elegant 25-line solution replacing 299 lines ofpydantic_inspector. Putting schema knowledge on the types themselves is the right design — discoverable, testable, and naturally stays in sync as fields change. -
Clean data/presentation separation:
agent_introspection.pyreturns plain dicts,agent_text_formatter.pyrenders them. Both are independently testable and leave the door open for adding JSON output later with minimal effort. -
Thorough edge-case testing for
schema_text(): The 15 tests intest_schema_text.pycoverdefault_factory, enum defaults,Nonedefaults, scalar defaults, descriptions, docstrings, and the mixed-model integration case. ThePydanticUndefinedregression test is particularly valuable.
Verdict
Ship it (with nits) — The core design is sound, the implementation is clean, and the test coverage is good. The or '' falsy-value suppression is the most impactful issue to fix before merge since it silently drops False values from the model-aliases table. The other warnings are lower priority but worth addressing. Linter passes clean on all changed files.
|
Thanks for the review @nabinchha! All items addressed: Warnings — fixed:
Suggestions — fixed: |
ConfigBase.schema_text() returns a concise text representation including the class docstring summary, field names, types, defaults, and descriptions. Field descriptions added to column config types to surface through this method.
Delete AgentController class and agent_command_defs module. Move all logic into agent_introspection (data) and agent_text_formatter (display) as plain functions. Add --json flag so commands default to human-readable text using schema_text(), with JSON as opt-in. Unify _emit helper, remove include_docstrings parameter, deduplicate catalog calls, and fix N+1 discover_family_types in get_family_schemas.
Port test_agent_controller.py to use plain functions instead of deleted AgentController. Extract AGENT_COMMANDS constant as single source for operation descriptions, syncing with main.py help strings.
Extract AGENT_COMMANDS into agent_command_defs.py so main.py and agent_introspection.py share a single source for command names, help text, and metadata. The new module has no heavy dependencies, keeping --help latency unaffected.
…trospection - schema_text() now detects default_factory fields and renders e.g. "list()" instead of leaking PydanticUndefined - Guard against IndexError when provider registry has an empty providers list - Add 15 edge-case tests for schema_text covering default_factory, enum defaults, None defaults, scalar defaults, descriptions, and docstrings
Text-only output simplifies the interface. Structured output can be added back trivially since the functions already return dicts.
- format_context_text now renders a ## Builder section - format_types_text now includes import_path column in tables
All config objects are imported via dd.<ClassName>, so the full import path is redundant noise in agent output.
row.get(col, '') returns None when the key exists with value None, causing str(None) to render "None" in the output. Use `or ''` instead.
…ration tests There is no controller layer — these tests exercise functions in agent_introspection.py, so they belong in tests/cli/utils/.
The previous `or ''` pattern treated all falsy values (including False) as empty. Use an explicit None check so booleans render correctly.
- Add explicit parentheses to and/or precedence in _build_agent_lazy_group - Rename loop variable l to line in test_schema_text - Move get_family_schema import to module level in test_agent_text_formatter
99bab37 to
5926c7e
Compare
…nd docstring parsing - _format_annotation now renders Literal['value'] instead of bare Literal - _format_signature strips quotes from stringified annotations caused by `from __future__ import annotations` - _get_docstring_summary stops at any Google-style section header, not just Attributes:
| if model_registry is None: | ||
| return { | ||
| "model_config_present": False, | ||
| "provider_config_present": provider_registry is not None, | ||
| "default_provider": None if provider_registry is None else provider_registry.default, | ||
| "items": items, | ||
| } |
There was a problem hiding this comment.
default_provider logic inconsistent with main path
The early-return path (when model_registry is None) reports default_provider as provider_registry.default directly. If a user has a providers.yaml with providers but no explicit default set, provider_registry.default is None — so the output shows default_provider: null.
In contrast, the main code path (lines 198–200) falls back to provider_registry.providers[0].name when provider_registry.default is falsy:
default_provider = provider_registry.default or (
provider_registry.providers[0].name if provider_registry.providers else None
)An agent running agent state model-aliases before any models are configured (e.g., after data-designer config providers but before data-designer config models) would see default_provider: null and could incorrectly diagnose the setup as having no provider, when one is actually available. The fix is to apply the same fallback in the early return:
| if model_registry is None: | |
| return { | |
| "model_config_present": False, | |
| "provider_config_present": provider_registry is not None, | |
| "default_provider": None if provider_registry is None else provider_registry.default, | |
| "items": items, | |
| } | |
| if model_registry is None: | |
| if provider_registry is not None: | |
| computed_default = provider_registry.default or ( | |
| provider_registry.providers[0].name if provider_registry.providers else None | |
| ) | |
| else: | |
| computed_default = None | |
| return { | |
| "model_config_present": False, | |
| "provider_config_present": provider_registry is not None, | |
| "default_provider": computed_default, | |
| "items": items, | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/utils/agent_introspection.py
Line: 185-191
Comment:
**`default_provider` logic inconsistent with main path**
The early-return path (when `model_registry is None`) reports `default_provider` as `provider_registry.default` directly. If a user has a `providers.yaml` with providers but no explicit `default` set, `provider_registry.default` is `None` — so the output shows `default_provider: null`.
In contrast, the main code path (lines 198–200) falls back to `provider_registry.providers[0].name` when `provider_registry.default` is falsy:
```python
default_provider = provider_registry.default or (
provider_registry.providers[0].name if provider_registry.providers else None
)
```
An agent running `agent state model-aliases` before any models are configured (e.g., after `data-designer config providers` but before `data-designer config models`) would see `default_provider: null` and could incorrectly diagnose the setup as having no provider, when one is actually available. The fix is to apply the same fallback in the early return:
```suggestion
if model_registry is None:
if provider_registry is not None:
computed_default = provider_registry.default or (
provider_registry.providers[0].name if provider_registry.providers else None
)
else:
computed_default = None
return {
"model_config_present": False,
"provider_config_present": provider_registry is not None,
"default_provider": computed_default,
"items": items,
}
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds a
data-designer agentCLI subgroup that lets AI coding agents dynamically discover Data Designer's configuration types, builder API, and local state. All output is human-readable text.Also adds
schema_text()toConfigBaseso every config type can describe its own fields, and normalizes validator/constraint discriminators to kebab-case.Agent commands
agent contextagent types [FAMILY]columns,samplers,validators,processors,constraints)agent schema <FAMILY> <TYPE>agent schema <FAMILY> --allagent builderDataDesignerConfigBuildermethod signatures and docstringsagent state model-aliasesagent state persona-datasetsDesign
agent_introspection.pydirectly — no controller or service layeragent_text_formatter.pyschema_text()onConfigBase(25 lines) replaces the 299-linepydantic_inspectorfrom the previous approach — schema knowledge lives on the types themselves--helpstays fast (main.pyhas zero new imports)Comparison with
johnny/feat/agent-contextThis branch delivers the same agent CLI functionality with significantly less complexity:
agent-contextKey differences:
schema_text()on ConfigBase (25 lines) replacespydantic_inspector.py(299 lines)--json/--compactflagsmain.pyhas zero new imports;--helppath stays fast