Skip to content

feat: Introduce pydantic conversion for schemas#324

Open
Claude wants to merge 10 commits intomainfrom
claude/convert-dy-schema-to-pydantic-model
Open

feat: Introduce pydantic conversion for schemas#324
Claude wants to merge 10 commits intomainfrom
claude/convert-dy-schema-to-pydantic-model

Conversation

@Claude
Copy link
Copy Markdown

@Claude Claude AI commented Apr 9, 2026

Motivation

Closes #323.

Changes

  • Add to_pydantic_model on dy.Schema

Side-note: GitHub-orchestrated Claude was really unhelpful 🫠

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (f5de4fc) to head (6a06e13).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #324    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           56        56            
  Lines         3233      3399   +166     
==========================================
+ Hits          3233      3399   +166     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@borchero
Copy link
Copy Markdown
Member

@claude[agent] implement all review comments

Agent-Logs-Url: https://github.com/Quantco/dataframely/sessions/5299dcd4-f2fc-4862-b4b6-79fe11566ad8

Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
@borchero
Copy link
Copy Markdown
Member

@claude[agent] implement the review comment

@borchero Oliver Borchert (borchero) changed the title [WIP] Convert dy.Schema to pydantic model feat: Introduce pydantic conversion for schemas Apr 10, 2026
@github-actions github-actions bot added the enhancement New feature or request label Apr 10, 2026
@borchero Oliver Borchert (borchero) marked this pull request as ready for review April 10, 2026 15:45
Copilot AI review requested due to automatic review settings April 10, 2026 15:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class Pydantic v2 support to Dataframely schemas/columns to enable generating structured-output models (e.g., for LLM constrained decoding) directly from dy.Schema.

Changes:

  • Add Schema.to_pydantic_model() to generate a pydantic.BaseModel reflecting schema columns and structured constraints.
  • Introduce Column.pydantic_field() plus per-column _python_type / _pydantic_field_kwargs() implementations to translate constraints into Pydantic annotations.
  • Add an optionals-only test suite covering pydantic field generation and basic schema-to-model conversion.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/schema/test_pydantic_model.py Tests for Schema.to_pydantic_model() behavior (basic fields, validation, warning on schema rules).
tests/columns/test_pydantic.py Tests for Column.pydantic_field() across many column types and constraints/warnings.
dataframely/schema.py Implements Schema.to_pydantic_model() using pydantic.create_model.
dataframely/columns/_base.py Adds Column.pydantic_field(), _python_type abstract property, and _pydantic_field_kwargs() hook.
dataframely/columns/_mixins.py Adds _pydantic_field_kwargs() to OrdinalMixin to translate min/max constraints.
dataframely/columns/any.py Defines _python_type for Any to support pydantic type generation.
dataframely/columns/array.py Adds _python_type and pydantic length constraints for arrays (incl. multidim warning).
dataframely/columns/binary.py Defines _python_type for binary columns.
dataframely/columns/bool.py Defines _python_type for bool columns.
dataframely/columns/categorical.py Defines _python_type for categorical columns.
dataframely/columns/datetime.py Defines _python_type and warnings for non-translatable datetime/date/time/duration constraints.
dataframely/columns/decimal.py Defines _python_type and pydantic decimal-place constraint.
dataframely/columns/enum.py Defines _python_type using Literal[...] for category validation.
dataframely/columns/float.py Defines _python_type and pydantic constraints for bounds/inf/nan handling (incl. warning).
dataframely/columns/integer.py Defines _python_type and pydantic constraints for integer bounds and is_in.
dataframely/columns/list.py Defines _python_type and pydantic min/max length constraints for lists.
dataframely/columns/object.py Defines _python_type for object columns.
dataframely/columns/string.py Defines _python_type and pydantic length/pattern constraints for strings.
dataframely/columns/struct.py Defines _python_type as an inner pydantic model for struct validation.

if cls._schema_validation_rules():
warnings.warn("pydantic models do not include schema-level rules.")

model_name = f"{cls.__name__.removesuffix('Schema')}Model"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable? Our users may or may not use the ...Schema naming convention

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to allow the user to pass a name? If their schema name does not end in Schema, this doesn't have any effect.

warnings.warn(
f"Custom checks for column '{self.name or self.__class__.__name__}' "
"are not translated to pydantic constraints."
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create a ton of warnings if you have a few custom checks. Do we need to warn on execution as opposed to just being really clear about this in the docuemntation of to_pydantic_model?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have all these warnings have the same prefix so that they're easy to ignore? 🤔 not sure if you can shoot yourself in the foot too easily without the warnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Convert dy.Schema to pydantic model

4 participants