Skip to content

feat: normalize validator and constraint discriminators#414

Merged
johnnygreco merged 6 commits intomainfrom
johnny/feat/schema-discriminator-foundation
Mar 13, 2026
Merged

feat: normalize validator and constraint discriminators#414
johnnygreco merged 6 commits intomainfrom
johnny/feat/schema-discriminator-foundation

Conversation

@johnnygreco
Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco commented Mar 13, 2026

📋 Summary

Normalizes the Pydantic discriminated union patterns for validator params and sampler constraints to match the established sampler_type pattern. This makes the config layer more consistent and produces self-describing JSON schemas — foundational work for the agent CLI.

🔄 Changes

🔧 Changed

  • Validator params — Added validator_type Literal discriminator field to CodeValidatorParams, LocalCallableValidatorParams, and RemoteValidatorParams, matching the sampler_type pattern used by sampler params
  • Validation column config — Switched validator_params from an untagged union to Annotated[ValidatorParamsT, Discriminator("validator_type")] with a model_validator(mode="before") that injects validator_type into the params dict (same pattern as inject_sampler_type_into_params)
  • Sampler constraints — Converted constraint_type from an abstract property to a concrete Literal field with defaults on each subclass; ABC retained on Constraint base class to signal it should not be instantiated directly
  • Constraint deserialization — Added ColumnConstraintInputT discriminated union with a custom resolver that supports both the new tagged shape and legacy configs (infers type from rhs being str vs numeric). _can_coerce_to_float guards against inf/nan strings to avoid misrouting column names

🧪 Tests

  • Validator type discriminator field assertions on all validator param types
  • Schema introspection test verifying validator_type discriminator mapping
  • Validator type injection test for dict-based construction
  • Legacy constraint parsing round-trip test (no constraint_type in input)
  • Tagged constraint round-trip test (constraint_type already present in input)
  • Missing rhs validation error test
  • Updated serialization assertions to include constraint_type field

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • sampler_constraints.py — Most significant change: constraint_type converted from abstract property to Literal field, custom discriminator resolver with _can_coerce_to_float for backward compatibility with legacy configs
  • column_configs.py — New Discriminator("validator_type") annotation and inject_validator_type_into_params model validator

🤖 Generated with AI

@johnnygreco johnnygreco requested a review from a team as a code owner March 13, 2026 18:26
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 13, 2026

Greptile Summary

This PR normalizes Pydantic discriminated-union patterns across the config layer: it adds validator_type Literal discriminator fields to all three validator-params models (mirroring the existing sampler_type pattern), switches ValidationColumnConfig.validator_params to use Discriminator("validator_type") with a pre-injection model validator for backward compatibility, and converts constraint_type from an abstract property to a concrete Literal field on each constraint subclass — replacing the ABC enforcement with a custom callable discriminator (resolve_constraint_input_type) that supports both the new tagged shape and legacy untagged configs via rhs type inference.

Key design decisions worth noting:

  • Two-type splitColumnConstraintInputT (with the callable discriminator) is used only at the config-input boundary (DataDesignerConfig); the engine layer continues to use ColumnConstraintT (the plain union) for already-validated instances in DataSchema, SchemaBuilder, and SamplerMultiColumnConfig. This is a clean separation.
  • Legacy compatibility_can_coerce_to_float correctly uses math.isfinite to exclude inf/nan strings from being treated as scalar values, and the is None guard for rhs avoids the classic Python falsy-value bug (rhs = 0 is not skipped).
  • Serialization is a breaking changeconstraint_type and validator_type are now always included in model_dump() output; tests have been updated accordingly. Any downstream consumers that snapshot serialized output will need the same update.

Confidence Score: 4/5

  • PR is safe to merge — logic is correct and well-tested; one minor test specificity issue.
  • The implementation is faithful to established patterns in the codebase, the edge cases in the constraint type resolver (is None guard, math.isfinite for inf/nan, numeric coercion for legacy strings) are handled correctly, and the engine/config boundary split is clean. The only non-trivial concern is the overly broad pytest.raises(Exception) in one test, which is a low-risk style issue rather than a functional bug. Previously flagged concerns about the non-abstract Constraint base class and the missing-rhs fallback behavior were already addressed in prior review threads.
  • No files require special attention beyond the broad exception type in tests/config/test_data_designer_config.py.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/sampler_constraints.py Converts constraint_type from abstract property to concrete Literal field, removes ABC enforcement, and adds ColumnConstraintInputT discriminated union with a callable resolver for legacy config support. The resolver logic and _can_coerce_to_float are correct; existing concerns about ABC and the missing-rhs fallback were covered in previous threads.
packages/data-designer-config/src/data_designer/config/validator_params.py Adds validator_type Literal discriminator field with a default to each params class. Pattern mirrors the existing sampler_type approach and is consistent across all three validator types.
packages/data-designer-config/src/data_designer/config/column_configs.py Switches validator_params to Annotated[ValidatorParamsT, Discriminator("validator_type")] and adds inject_validator_type_into_params model validator. Pattern mirrors inject_sampler_type_into_params exactly and maintains backward compatibility for dict-based construction.
packages/data-designer-config/src/data_designer/config/data_designer_config.py Replaces ColumnConstraintT with ColumnConstraintInputT for the constraints field, enabling discriminated deserialization at the config boundary while keeping the engine layer using the simpler ColumnConstraintT type alias.
packages/data-designer-config/tests/config/test_data_designer_config.py Adds round-trip tests for both tagged and legacy constraint shapes, plus a missing-rhs error test. The error test uses overly broad pytest.raises(Exception) rather than the specific ValidationError.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Raw input dict / YAML"] --> B["DataDesignerConfig.model_validate()"]

    subgraph Constraints ["Constraint deserialization (ColumnConstraintInputT)"]
        B --> C["resolve_constraint_input_type()"]
        C -->|"constraint_type key present"| D["Return value as-is"]
        C -->|"rhs is None"| E["Default → SCALAR_INEQUALITY\n(Pydantic surfaces 'rhs required')"]
        C -->|"rhs is str + _can_coerce_to_float"| F["→ SCALAR_INEQUALITY"]
        C -->|"rhs is str, not float-like"| G["→ COLUMN_INEQUALITY"]
        C -->|"rhs is numeric"| F
        D --> H["Tag match → ScalarInequalityConstraint\nor ColumnInequalityConstraint"]
        F --> H
        G --> H
    end

    subgraph ValidatorParams ["Validator params deserialization"]
        B --> I["inject_validator_type_into_params (mode=before)"]
        I -->|"injects validator_type into params dict"| J["Discriminator('validator_type')"]
        J -->|"'code'"| K["CodeValidatorParams"]
        J -->|"'local_callable'"| L["LocalCallableValidatorParams"]
        J -->|"'remote'"| M["RemoteValidatorParams"]
    end

    H --> N["Validated DataDesignerConfig\n(ColumnConstraintT instances for engine)"]
    K & L & M --> N
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/data-designer-config/tests/config/test_data_designer_config.py
Line: 113-130

Comment:
**Overly broad exception type in error test**

`pytest.raises(Exception)` will pass for any exception — including `AttributeError`, `KeyError`, or even `SystemExit`. This can mask situations where the wrong exception is raised (e.g., a bug in the resolver that crashes before Pydantic even runs validation).

Using `ValidationError` keeps the test intent precise and ensures failures surface at the expected layer:

```suggestion
def test_data_designer_config_constraint_missing_rhs_raises_validation_error() -> None:
    from pydantic import ValidationError

    with pytest.raises(ValidationError):
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 2170a2f

Comment thread packages/data-designer-config/src/data_designer/config/sampler_constraints.py Outdated
Comment thread packages/data-designer-config/src/data_designer/config/sampler_constraints.py Outdated
@johnnygreco johnnygreco force-pushed the johnny/feat/schema-discriminator-foundation branch from 16d33e0 to 3305e46 Compare March 13, 2026 19:41
nabinchha
nabinchha previously approved these changes Mar 13, 2026
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary: Solid normalization of the Pydantic discriminated union patterns for validator params and sampler constraints. The inject_validator_type_into_params model validator mirrors the established inject_sampler_type_into_params pattern cleanly, the backward-compatible discriminator resolver for constraints is well thought out, and the test coverage is thorough.

Suggestions (nits)

sampler_constraints.py:73-78_can_coerce_to_float passes on "inf" and "nan"

  • float("inf") and float("nan") succeed without raising ValueError, so these strings would be treated as scalar inequality constraints rather than column references. Unlikely in practice, but a column literally named "inf" or "nan" would be misrouted by the legacy fallback path. Could guard with math.isinf / math.isnan.

test_columns.py:270,283 — New test functions missing -> None return type annotation

  • test_validation_column_config_injects_validator_type_into_params_dict and test_validation_column_config_schema_uses_validator_discriminator lack -> None. The neighboring new test in test_data_designer_config.py correctly has it.

sampler_constraints.py:53ColumnConstraintT vs ColumnConstraintInputT naming could use a comment

  • Both type aliases exist for the same union of types. A brief comment above each clarifying their roles (plain union for engine-layer type hints vs discriminated union for deserialization) would help future readers.

test_data_designer_config.py — Legacy constraint test doesn't cover missing rhs

  • The resolver has explicit handling for rhs is None (added in the latest commit), but no test exercises that path. Consider adding a case confirming that a missing rhs produces a clear Pydantic validation error.

What Looks Good

  • Consistent pattern application — follows the established inject_sampler_type_into_params convention exactly
  • Backward-compatible discriminator resolver with thoughtful handling of numeric strings ("65" vs "minimum_age")
  • Thorough test coverage across discriminator field assertions, schema introspection, dict-based construction, and legacy round-trip serialization

- Guard _can_coerce_to_float against inf/nan strings
- Add -> None return type annotations to test functions
- Add clarifying comments to ColumnConstraintT vs ColumnConstraintInputT
- Add tests for tagged constraint round-trip and missing rhs validation
@johnnygreco
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @nabinchha! All four suggestions addressed in 2170a2f:

  1. _can_coerce_to_float inf/nan guard — Now uses math.isfinite(result) so "inf", "nan", "-inf" are treated as column references, not scalars.
  2. Missing -> None annotations — Added to both test functions in test_columns.py.
  3. ColumnConstraintT vs ColumnConstraintInputT comments — Added a one-liner above each clarifying their roles (engine-layer type hints vs deserialization).
  4. Test coverage gaps — Added tests for the tagged round-trip path (constraint_type already present in input) and missing rhs validation error.

@johnnygreco johnnygreco merged commit 26a9cf2 into main Mar 13, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants