-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow sorting keys on to_json and to_python by passing in sort_keys #1637
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
CodSpeed Performance ReportMerging #1637 will not alter performanceComparing Summary
|
07a31f5
to
7222c8d
Compare
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main issue is perf regression
src/errors/validation_exception.rs
Outdated
false, | ||
false, | ||
false, | ||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidhewitt this is getting gross 😢
src/serializers/fields.rs
Outdated
// NOTE! we maintain the order of the input dict assuming that's right | ||
for result in main_iter { | ||
let (key, value) = result?; | ||
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that this extra Vec being created is the cause for the slowdown in benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the function out, so we can re-use this. and avoid creating new variable when sort keys is false
tests/serializers/test_model.py
Outdated
'field_a': core_schema.model_field(core_schema.bytes_schema()), | ||
'field_b': core_schema.model_field(core_schema.int_schema()), | ||
'field_a': core_schema.model_field(core_schema.bytes_schema()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did these have to change? Is this to make the tests change? I would prefer a new self-contained test just for this bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seperated this.
I separated the test out, I also refactor the functions so we can reuse when Let me know what else I need to improve. Thanks |
please review, not sure how I can take it from here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted']
or something like that.
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?; | ||
items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're already collecting into a Vec here I think you should do something like this:
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?; | |
items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string()); | |
let mut items = main_iter.map(|k,v| (key_str(key), v)).collect::<PyResult<Vec<_>>>()?; |
Might need to collapse two levels of Result or something like that.
Then you'll have to convert back to the original type. I think this will be the fastest approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I kept the original implementation. Didn't quite get your advice, still quite new to Rust 😓
But added recursive mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first thing process_field_entry_python
does is call key_str
:
pydantic-core/src/serializers/fields.rs
Line 234 in 95f9329
let key_str = key_str(key)?; |
So I think you can just convert to it from the get go
0cf4b6d
to
95f9329
Compare
Added different sort mode as above, updated the PR description. |
please review 👍 |
Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense? Since I have been trying to do model_dump_json with sort keys too.
If so...
Should I also raise the error early instead of doing
unwrap_or_default()
in this PR?I think we can remove to_python implementation, looks like the default sorting for dictionary is always sorted.
Codspeed benchmark actually slowed down for those two functions. Let me know how I can do better.Based on the feedback from adriangb, its better to add in the support for recursive sort.
However, I do not sort dictionary in an array, as it add more complexity and am not very confident to add everything in one PR.
Thanks!
Change Summary
allow sorting keys on to_json and to_python by passing in sort_keys
Related issue number
should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.
Checklist
pydantic-core
(except for expected changes)Selected Reviewer: @davidhewitt