allow sorting keys on to_json and to_python by passing in sort_keys #1637

aezomz · 2025-02-15T19:16:37Z

Hello Pydantic Team! This is my first time contributing to a Rust and Pyo3 related repo.
I am also new in Rust.
Do you think this PR will make sense? Since I have been trying to do model_dump_json with sort keys too.
If so...
Should I also raise the error early instead of doing unwrap_or_default() in this PR?
I think we can remove to_python implementation, looks like the default sorting for dictionary is always sorted.

~~Codspeed benchmark actually slowed down for those two functions. Let me know how I can do better.~~

Based on the feedback from adriangb, its better to add in the support for recursive sort.
However, I do not sort dictionary in an array, as it add more complexity and am not very confident to add everything in one PR.

Will sort from:
        {
            'field_123': b'test_123',
            'field_b': 12,
            'field_a': b'test',
            'field_c': {'mango': 2, 'banana': 3, 'apple': 1},
            'field_d': [{'d': 3, 'b': 2, 'a': 1}, 2, 3],
        }
To:
s.to_python(m, exclude_none=True, sort_keys='recursive', mode='json')
        {
            'field_123': 'test_123',
            'field_a': 'test',
            'field_b': 12,
            'field_c': {'apple': 1, 'banana': 3, 'mango': 2},
            'field_d': [{'d': 3, 'b': 2, 'a': 1}, 2, 3],
        }

Thanks!

Change Summary

allow sorting keys on to_json and to_python by passing in sort_keys

Related issue number

should fix pydantic/pydantic#7424
Might need to create another MR on Python repo though, need to check.

Checklist

Unit tests for the changes exist
Documentation reflects the changes where applicable
Pydantic tests pass with this pydantic-core (except for expected changes)
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

codecov · 2025-02-15T19:19:04Z

Codecov Report

Attention: Patch coverage is 77.31481% with 49 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/serializers/fields.rs	83.03%	28 Missing ⚠️
src/serializers/extra.rs	37.50%	20 Missing ⚠️
python/pydantic_core/core_schema.py	50.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

codspeed-hq · 2025-02-15T19:23:13Z

CodSpeed Performance Report

Merging #1637 will not alter performance

_{Comparing aezomz:allow_model_dump_sort_keys (95f9329) with main (f977516)}

Summary

✅ 157 untouched benchmarks

aezomz · 2025-03-04T13:13:32Z

please review

adriangb

Main issue is perf regression

adriangb · 2025-03-04T14:00:07Z

src/errors/validation_exception.rs

            false,
            false,
+            false,
            true,


@davidhewitt this is getting gross 😢

adriangb · 2025-03-04T14:02:22Z

src/serializers/fields.rs

-        // NOTE! we maintain the order of the input dict assuming that's right
-        for result in main_iter {
-            let (key, value) = result?;
+        let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;


My intuition is that this extra Vec being created is the cause for the slowdown in benchmarks.

I refactored the function out, so we can re-use this. and avoid creating new variable when sort keys is false

adriangb · 2025-03-04T14:35:50Z

tests/serializers/test_model.py

-                'field_a': core_schema.model_field(core_schema.bytes_schema()),
                'field_b': core_schema.model_field(core_schema.int_schema()),
+                'field_a': core_schema.model_field(core_schema.bytes_schema()),


Why did these have to change? Is this to make the tests change? I would prefer a new self-contained test just for this bit.

seperated this.

aezomz · 2025-03-05T10:44:05Z

I separated the test out, I also refactor the functions so we can reuse when sort_keys=true.
To keep the original perf benchmark, I have done a simple bool check on sort_keys before using expensive function like sorting.

Let me know what else I need to improve. Thanks

aezomz · 2025-03-18T15:23:13Z

please review, not sure how I can take it from here

adriangb

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

adriangb · 2025-03-18T16:03:20Z

src/serializers/fields.rs

+            let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;
+            items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string());


Since you're already collecting into a Vec here I think you should do something like this:

Suggested change

let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;

items.sort_by_cached_key(|(key, _)| key_str(key).unwrap_or_default().to_string());

let mut items = main_iter.map(|k,v| (key_str(key), v)).collect::<PyResult<Vec<_>>>()?;

Might need to collapse two levels of Result or something like that.

Then you'll have to convert back to the original type. I think this will be the fastest approach.

Sorry, I kept the original implementation. Didn't quite get your advice, still quite new to Rust 😓
But added recursive mode.

The first thing process_field_entry_python does is call key_str:

pydantic-core/src/serializers/fields.rs

Line 234 in 95f9329

let key_str = key_str(key)?;

So I think you can just convert to it from the get go

aezomz · 2025-03-23T16:35:06Z

Hmm I see now that this is not recursive (it only applies to the top level keys). Would it be hard to make it recursive? I fear that if we implement the non-recursive version someone is going to come along and want the recursive version... if so we can make it a Literal['recursive', 'top-level', 'unsorted'] or something like that.

Added different sort mode as above, updated the PR description.

aezomz · 2025-03-25T16:32:48Z

please review 👍

aezomz force-pushed the allow_model_dump_sort_keys branch from 07a31f5 to 7222c8d Compare March 4, 2025 10:22

pydantic-hooky bot added the ready for review label Mar 4, 2025

pydantic-hooky bot assigned davidhewitt Mar 4, 2025

aezomz marked this pull request as ready for review March 4, 2025 13:19

adriangb reviewed Mar 4, 2025

View reviewed changes

zzstoatzz mentioned this pull request Mar 10, 2025

[wip] sort keys #1666

Draft

adriangb reviewed Mar 18, 2025

View reviewed changes

aezomz added 3 commits March 23, 2025 22:43

allow sorting keys on to_json and to_python by passing in sort_keys

bebbd73

refactor according to comments

81d080a

recursive sort dictionary

95f9329

aezomz force-pushed the allow_model_dump_sort_keys branch from 0cf4b6d to 95f9329 Compare March 23, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow sorting keys on to_json and to_python by passing in sort_keys #1637

allow sorting keys on to_json and to_python by passing in sort_keys #1637

aezomz commented Feb 15, 2025 •

edited

Loading

codecov bot commented Feb 15, 2025 •

edited

Loading

codspeed-hq bot commented Feb 15, 2025 •

edited

Loading

aezomz commented Mar 4, 2025

adriangb left a comment

adriangb Mar 4, 2025

adriangb Mar 4, 2025

aezomz Mar 10, 2025

adriangb Mar 4, 2025

aezomz Mar 10, 2025

aezomz commented Mar 5, 2025

aezomz commented Mar 18, 2025

adriangb left a comment

adriangb Mar 18, 2025

aezomz Mar 23, 2025 •

edited

Loading

adriangb Mar 25, 2025 •

edited

Loading

aezomz commented Mar 23, 2025

aezomz commented Mar 25, 2025

		let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;
		items.sort_by_cached_key(\|(key, _)\| key_str(key).unwrap_or_default().to_string());

	let mut items = main_iter.collect::<PyResult<Vec<_>>>()?;
	items.sort_by_cached_key(\|(key, _)\| key_str(key).unwrap_or_default().to_string());
	let mut items = main_iter.map(\|k,v\| (key_str(key), v)).collect::<PyResult<Vec<_>>>()?;

allow sorting keys on to_json and to_python by passing in sort_keys #1637

Are you sure you want to change the base?

allow sorting keys on to_json and to_python by passing in sort_keys #1637

Conversation

aezomz commented Feb 15, 2025 • edited Loading

Change Summary

Related issue number

Checklist

codecov bot commented Feb 15, 2025 • edited Loading

Codecov Report

codspeed-hq bot commented Feb 15, 2025 • edited Loading

CodSpeed Performance Report

Merging #1637 will not alter performance

Summary

aezomz commented Mar 4, 2025

adriangb left a comment

Choose a reason for hiding this comment

adriangb Mar 4, 2025

Choose a reason for hiding this comment

adriangb Mar 4, 2025

Choose a reason for hiding this comment

aezomz Mar 10, 2025

Choose a reason for hiding this comment

adriangb Mar 4, 2025

Choose a reason for hiding this comment

aezomz Mar 10, 2025

Choose a reason for hiding this comment

aezomz commented Mar 5, 2025

aezomz commented Mar 18, 2025

adriangb left a comment

Choose a reason for hiding this comment

adriangb Mar 18, 2025

Choose a reason for hiding this comment

aezomz Mar 23, 2025 • edited Loading

Choose a reason for hiding this comment

adriangb Mar 25, 2025 • edited Loading

Choose a reason for hiding this comment

aezomz commented Mar 23, 2025

aezomz commented Mar 25, 2025

aezomz commented Feb 15, 2025 •

edited

Loading

codecov bot commented Feb 15, 2025 •

edited

Loading

codspeed-hq bot commented Feb 15, 2025 •

edited

Loading

aezomz Mar 23, 2025 •

edited

Loading

adriangb Mar 25, 2025 •

edited

Loading