-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pydantic 2.0, polars 0.20.10 and remove duckdb support #32
Add support for pydantic 2.0, polars 0.20.10 and remove duckdb support #32
Conversation
test_validators passes with exception of tests containing `json_schema_extra`
all `test_validators` now succeed, all but `test_dataframe_get_method` now pass in `test_polars`
fix: switch to *args/**kwargs
8e3b3f5
to
e0bf587
Compare
I'll build this one locally and see if it will fix the other bugs Patito 0.50 still has |
They were also working on v2 here: https://github.com/JakobGM/patito/tree/jakobgm%2Fpatito-v2 |
Anxious to hear how it goes |
chore: cleanup init chore: cleanup init chore: more misc cleanup cleanup json_schema_extras
22e0fef
to
ffe343d
Compare
@brendancooley nice work! The FieldInfo class nicely fixes the issue with properties being lost after any class method, I solved a bit a different with v1 #20. Perhaps you can grab my test from there and include it in your pr. I already checked the test also passes with your changes. One thing that still will be broken is the examples creation, you need to include this fix #20 and then build on top of it since now a type which is I will run it tomorrow on our production repo to see if anything breaks : ) (possibly in the examples creation it will) |
happy to include the test. did you mean to reference a different issue/pr in your second comment? not clear on how #20 helps with example string generation but perhaps I am missing something. |
#20 is a fix regarding, creating proper examples with numerical values. I meant just taking that fix directly into pydantic v2 won't work. I saw that grabbing the minimum field is now nested if you have a field that is optional |
@brendancooley tried it on our repo and tests, I am also getting this now static type hint error: Expression of type "FieldInfo" cannot be assigned to declared type "int"
"FieldInfo" is incompatible with "int" on this code: class Test(pt.Model):
test: int= pt.Field(constraints=pl.struct("a", "b").is_unique()) |
Also nullable columns are not working, mre: class Test(pt.Model):
foo: str|None = pt.Field(dtype=pl.Utf8)
print(Test.nullable_columns)
set() Result should be: |
Thanks @ion-elgreco, I'll add these tests to the PR and start hacking at them tomorrow. Have not looked much into structs but it would be nice if we had more tests for those. |
This example where I am using a struct is just a simple trick to do multiple column uniqueness as a constraint. One thing that is missing though in Patito but probably out of scope for the migration is proper struct support as a dtype |
@ion-elgreco should be addressed by e2bf0d7 |
017c59b also adds some sanity checking that the annotated types match the allowable polars types specified in |
yes I can replicate these in vscode if I enable
in settings.json. Looking into it. |
This should be handled by 161300b |
Yes this makes sense. Let me know if 9e132bc fixes the issue @ion-elgreco. |
@brendancooley 🚀🚀, I'm going to try it out now |
707422b
to
008d28d
Compare
- support validation for column subsets - tests for nested models (as structs) - fill_null adds missing columns with default values - test recursive derivation - test derive column subset - allow conversion pt.DataFrame -> pl.DataFrame - support pydantic validation_alias chore: fixes for python 3.9 (all tests passing) chore(patito): cleanup
008d28d
to
dbad4cf
Compare
@brendancooley datetime doesn't work if the type doesn't contain a TZ with .examples(): This will fix it: if dtype.time_zone is not None:
tzinfo = ZoneInfo(dtype.time_zone)
else:
tzinfo = None
return datetime(
year=1970, month=1, day=1, tzinfo=tzinfo
) |
Adding dtype in pt.Field(dtype=) breaks validation for datetimes: class Test(pt.Model):
date_value: datetime = pt.Field(dtype=pl.Datetime)
Test.validate(pl.DataFrame({"date_value": [datetime(2020,1,1)]}))
DataFrameValidationError: 1 validation error for Test
date_value
Polars dtype Datetime(time_unit='us', time_zone=None) does not match model field type. (type=type_error.columndtype) While this works: class Test(pt.Model):
date_value: datetime
Test.validate(pl.DataFrame({"date_value": [datetime(2020,1,1)]})) |
GE constraint not respected while creating examples: class Test(pt.Model):
value: int = pt.Field(ge=0)
print(Test.examples())
shape: (1, 1)
┌───────┐
│ value │
│ --- │
│ i64 │
╞═══════╡
│ -1 │
└───────┘
This should return 0 or larger :) Good to mention this only happens when ge=0, otherwise it works fine. LOL, python evalautes @brendancooley this will fix it: minimum = properties.get('minimum')
exclusive_minimum = properties.get("exclusiveMinimum")
maximum = properties.get('maximum')
exclusive_maximum = properties.get("exclusiveMaximum")
lower = minimum if minimum is not None else exclusive_minimum
upper = maximum if maximum is not None else exclusive_maximum |
So, besides these small bugs everything works as expected! |
awesome @ion-elgreco, should be good to go on all of these now |
@brendancooley awesome! Then I think it's in a state to release now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple small comments, really nice work @brendancooley!
Hope we can get this merged soon @thomasaarholt :)
Useful for knowing where to flesh out the docs -- thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the tests pass, I am keen to get this merged now, and then we can work on smaller parts separately.
Very well done to everyone involved 😊
Awesome! @thomasaarholt can't wait for the release :D |
Doing it right now |
Released now on github, should be on pypi shortly! |
I see that the pypi release action failed, because everything has to pass. 😬 I'll see if I can tone down the requirements this evening. |
We are live now with 0.6.1! https://pypi.org/project/patito/ |
Woohoo 🎉🎉 |
Awesome, thanks @thomasaarholt. Excited to do more with this, glad that we have a stable foundation to build on top of. |
These pieces of code the docs refer to were removed as part of JakobGM#32 (commit 054d034)
builds upon #4, all existing tests (excepting
tests/test_duckdb/*
) passing after upgrade topydantic==2.4.2
,polars==0.19.11
Closes #11
Closes #28
Closes #26
...possibly closes others
Summary
ValidationError
from pydantic v1 to allow for multiple error reporting, collect data frame validation errors in this object, rename asDataFrameValidationError
to distinguish from errors raised by pydantic directly.pydantic.fields.FieldInfo
to accommodate patito-specific field attributes (constraints
,derived_from
,dtype
,unique
)patito.Model._schema_properties
to append patito-specific field attributes to the field property dictionaries topydantic.BaseModel.model_json_schema
json_schema_extra
required
andnullable
.nullable
now refers to columns that allowpl.Null
as a valid entry.required
(pydantic-side) refers to columns that do not have a default value, and therefore must be passed to theBaseModel
constructor.polars.collect
bug (see fix: update LDF.collect() for polars==0.19.8 #24)TODO
patito.pydantic
And I'm sure there are other issues introduced by these changes, and bugs that the existing test suite does not yet catch. But hopefully this serves as a template for discussion and new test collection and can help move the ball forward on getting this package onto pydantic2. Feedback very welcome.