feat: normalize validator and constraint discriminators#414
feat: normalize validator and constraint discriminators#414johnnygreco merged 6 commits intomainfrom
Conversation
Greptile SummaryThis PR normalizes Pydantic discriminated-union patterns across the config layer: it adds Key design decisions worth noting:
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/sampler_constraints.py | Converts constraint_type from abstract property to concrete Literal field, removes ABC enforcement, and adds ColumnConstraintInputT discriminated union with a callable resolver for legacy config support. The resolver logic and _can_coerce_to_float are correct; existing concerns about ABC and the missing-rhs fallback were covered in previous threads. |
| packages/data-designer-config/src/data_designer/config/validator_params.py | Adds validator_type Literal discriminator field with a default to each params class. Pattern mirrors the existing sampler_type approach and is consistent across all three validator types. |
| packages/data-designer-config/src/data_designer/config/column_configs.py | Switches validator_params to Annotated[ValidatorParamsT, Discriminator("validator_type")] and adds inject_validator_type_into_params model validator. Pattern mirrors inject_sampler_type_into_params exactly and maintains backward compatibility for dict-based construction. |
| packages/data-designer-config/src/data_designer/config/data_designer_config.py | Replaces ColumnConstraintT with ColumnConstraintInputT for the constraints field, enabling discriminated deserialization at the config boundary while keeping the engine layer using the simpler ColumnConstraintT type alias. |
| packages/data-designer-config/tests/config/test_data_designer_config.py | Adds round-trip tests for both tagged and legacy constraint shapes, plus a missing-rhs error test. The error test uses overly broad pytest.raises(Exception) rather than the specific ValidationError. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Raw input dict / YAML"] --> B["DataDesignerConfig.model_validate()"]
subgraph Constraints ["Constraint deserialization (ColumnConstraintInputT)"]
B --> C["resolve_constraint_input_type()"]
C -->|"constraint_type key present"| D["Return value as-is"]
C -->|"rhs is None"| E["Default → SCALAR_INEQUALITY\n(Pydantic surfaces 'rhs required')"]
C -->|"rhs is str + _can_coerce_to_float"| F["→ SCALAR_INEQUALITY"]
C -->|"rhs is str, not float-like"| G["→ COLUMN_INEQUALITY"]
C -->|"rhs is numeric"| F
D --> H["Tag match → ScalarInequalityConstraint\nor ColumnInequalityConstraint"]
F --> H
G --> H
end
subgraph ValidatorParams ["Validator params deserialization"]
B --> I["inject_validator_type_into_params (mode=before)"]
I -->|"injects validator_type into params dict"| J["Discriminator('validator_type')"]
J -->|"'code'"| K["CodeValidatorParams"]
J -->|"'local_callable'"| L["LocalCallableValidatorParams"]
J -->|"'remote'"| M["RemoteValidatorParams"]
end
H --> N["Validated DataDesignerConfig\n(ColumnConstraintT instances for engine)"]
K & L & M --> N
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/data-designer-config/tests/config/test_data_designer_config.py
Line: 113-130
Comment:
**Overly broad exception type in error test**
`pytest.raises(Exception)` will pass for any exception — including `AttributeError`, `KeyError`, or even `SystemExit`. This can mask situations where the wrong exception is raised (e.g., a bug in the resolver that crashes before Pydantic even runs validation).
Using `ValidationError` keeps the test intent precise and ensures failures surface at the expected layer:
```suggestion
def test_data_designer_config_constraint_missing_rhs_raises_validation_error() -> None:
from pydantic import ValidationError
with pytest.raises(ValidationError):
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 2170a2f
Address Greptile review feedback: - Add docstring to Constraint noting it should not be instantiated directly - Add comment explaining the rhs fallback behavior in the resolver
16d33e0 to
3305e46
Compare
nabinchha
left a comment
There was a problem hiding this comment.
Code Review
Summary: Solid normalization of the Pydantic discriminated union patterns for validator params and sampler constraints. The inject_validator_type_into_params model validator mirrors the established inject_sampler_type_into_params pattern cleanly, the backward-compatible discriminator resolver for constraints is well thought out, and the test coverage is thorough.
Suggestions (nits)
sampler_constraints.py:73-78 — _can_coerce_to_float passes on "inf" and "nan"
float("inf")andfloat("nan")succeed without raisingValueError, so these strings would be treated as scalar inequality constraints rather than column references. Unlikely in practice, but a column literally named"inf"or"nan"would be misrouted by the legacy fallback path. Could guard withmath.isinf/math.isnan.
test_columns.py:270,283 — New test functions missing -> None return type annotation
test_validation_column_config_injects_validator_type_into_params_dictandtest_validation_column_config_schema_uses_validator_discriminatorlack-> None. The neighboring new test intest_data_designer_config.pycorrectly has it.
sampler_constraints.py:53 — ColumnConstraintT vs ColumnConstraintInputT naming could use a comment
- Both type aliases exist for the same union of types. A brief comment above each clarifying their roles (plain union for engine-layer type hints vs discriminated union for deserialization) would help future readers.
test_data_designer_config.py — Legacy constraint test doesn't cover missing rhs
- The resolver has explicit handling for
rhs is None(added in the latest commit), but no test exercises that path. Consider adding a case confirming that a missingrhsproduces a clear Pydantic validation error.
What Looks Good
- Consistent pattern application — follows the established
inject_sampler_type_into_paramsconvention exactly - Backward-compatible discriminator resolver with thoughtful handling of numeric strings (
"65"vs"minimum_age") - Thorough test coverage across discriminator field assertions, schema introspection, dict-based construction, and legacy round-trip serialization
- Guard _can_coerce_to_float against inf/nan strings - Add -> None return type annotations to test functions - Add clarifying comments to ColumnConstraintT vs ColumnConstraintInputT - Add tests for tagged constraint round-trip and missing rhs validation
|
Thanks for the thorough review @nabinchha! All four suggestions addressed in 2170a2f:
|
📋 Summary
Normalizes the Pydantic discriminated union patterns for validator params and sampler constraints to match the established
sampler_typepattern. This makes the config layer more consistent and produces self-describing JSON schemas — foundational work for the agent CLI.🔄 Changes
🔧 Changed
validator_typeLiteral discriminator field toCodeValidatorParams,LocalCallableValidatorParams, andRemoteValidatorParams, matching thesampler_typepattern used by sampler paramsvalidator_paramsfrom an untagged union toAnnotated[ValidatorParamsT, Discriminator("validator_type")]with amodel_validator(mode="before")that injectsvalidator_typeinto the params dict (same pattern asinject_sampler_type_into_params)constraint_typefrom an abstract property to a concreteLiteralfield with defaults on each subclass; ABC retained onConstraintbase class to signal it should not be instantiated directlyColumnConstraintInputTdiscriminated union with a custom resolver that supports both the new tagged shape and legacy configs (infers type fromrhsbeingstrvs numeric)._can_coerce_to_floatguards againstinf/nanstrings to avoid misrouting column names🧪 Tests
validator_typediscriminator mappingconstraint_typein input)constraint_typealready present in input)rhsvalidation error testconstraint_typefield🔍 Attention Areas
sampler_constraints.py— Most significant change:constraint_typeconverted from abstract property to Literal field, custom discriminator resolver with_can_coerce_to_floatfor backward compatibility with legacy configscolumn_configs.py— NewDiscriminator("validator_type")annotation andinject_validator_type_into_paramsmodel validator🤖 Generated with AI