Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow sorting keys on to_json and to_python by passing in sort_keys #1637

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aezomz
Copy link

@aezomz aezomz commented Feb 15, 2025

Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense? Since I have been trying to do model_dump_json with sort keys too.
If so...
Should I also raise the error early instead of doing unwrap_or_default() in this PR?
I think we can remove to_python implementation, looks like the default sorting for dictionary is always sorted.

Codspeed benchmark actually slowed down for those two functions. Let me know how I can do better.

Based on the feedback from adriangb, its better to add in the support for recursive sort.
However, I do not sort dictionary in an array, as it add more complexity and am not very confident to add everything in one PR.

Will sort from:
        {
            'field_123': b'test_123',
            'field_b': 12,
            'field_a': b'test',
            'field_c': {'mango': 2, 'banana': 3, 'apple': 1},
            'field_d': [{'d': 3, 'b': 2, 'a': 1}, 2, 3],
        }
To:
s.to_python(m, exclude_none=True, sort_keys='recursive', mode='json')
        {
            'field_123': 'test_123',
            'field_a': 'test',
            'field_b': 12,
            'field_c': {'apple': 1, 'banana': 3, 'mango': 2},
            'field_d': [{'d': 3, 'b': 2, 'a': 1}, 2, 3],
        }

Thanks!

Change Summary

allow sorting keys on to_json and to_python by passing in sort_keys

Related issue number

should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

Copy link

codecov bot commented Feb 15, 2025

Codecov Report

Attention: Patch coverage is 77.31481% with 49 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/serializers/fields.rs 83.03% 28 Missing ⚠️
src/serializers/extra.rs 37.50% 20 Missing ⚠️
python/pydantic_core/core_schema.py 50.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copy link

codspeed-hq bot commented Feb 15, 2025

CodSpeed Performance Report

Merging #1637 will not alter performance

Comparing aezomz:allow_model_dump_sort_keys (95f9329) with main (f977516)

Summary

✅ 157 untouched benchmarks

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 07a31f5 to 7222c8d Compare March 4, 2025 10:22
@aezomz
Copy link
Author

aezomz commented Mar 4, 2025

please review

@aezomz aezomz marked this pull request as ready for review March 4, 2025 13:19
Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main issue is perf regression

Comment on lines 348 to 351
false,
false,
false,
true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidhewitt this is getting gross 😢

// NOTE! we maintain the order of the input dict assuming that's right
for result in main_iter {
let (key, value) = result?;
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that this extra Vec being created is the cause for the slowdown in benchmarks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the function out, so we can re-use this. and avoid creating new variable when sort keys is false

Comment on lines 1001 to 1002
'field_a': core_schema.model_field(core_schema.bytes_schema()),
'field_b': core_schema.model_field(core_schema.int_schema()),
'field_a': core_schema.model_field(core_schema.bytes_schema()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did these have to change? Is this to make the tests change? I would prefer a new self-contained test just for this bit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperated this.

@aezomz
Copy link
Author

aezomz commented Mar 5, 2025

I separated the test out, I also refactor the functions so we can reuse when sort_keys=true.
To keep the original perf benchmark, I have done a simple bool check on sort_keys before using expensive function like sorting.

Let me know what else I need to improve. Thanks

@zzstoatzz zzstoatzz mentioned this pull request Mar 10, 2025
@aezomz
Copy link
Author

aezomz commented Mar 18, 2025

please review, not sure how I can take it from here

Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

Comment on lines +171 to +172
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;
items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're already collecting into a Vec here I think you should do something like this:

Suggested change
let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;
items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string());
let mut items = main_iter.map(|k,v| (key_str(key), v)).collect::<PyResult<Vec<_>>>()?;

Might need to collapse two levels of Result or something like that.

Then you'll have to convert back to the original type. I think this will be the fastest approach.

Copy link
Author

@aezomz aezomz Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I kept the original implementation. Didn't quite get your advice, still quite new to Rust 😓
But added recursive mode.

Copy link
Member

@adriangb adriangb Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first thing process_field_entry_python does is call key_str:

let key_str = key_str(key)?;

So I think you can just convert to it from the get go

@aezomz aezomz force-pushed the allow_model_dump_sort_keys branch from 0cf4b6d to 95f9329 Compare March 23, 2025 14:44
@aezomz
Copy link
Author

aezomz commented Mar 23, 2025

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

Added different sort mode as above, updated the PR description.

@aezomz
Copy link
Author

aezomz commented Mar 25, 2025

please review 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sorting keys to model_dump_json
3 participants