Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize JSON_SET and JSON_REPLACE on IndexedJsonDocument #8107

Merged
merged 2 commits into from
Jul 10, 2024

Conversation

nicktobey
Copy link
Contributor

@nicktobey nicktobey commented Jul 8, 2024

This PR includes a new implementation of the JSON_SET and JSON_REPLACE functions that leverage the new indexed JSON storage format.

For JSON documents that span multiple chunks, only the affected chunks need to be loaded and modified, allowing operations to scale with the size of the removed value instead of the size of the entire document.

@coffeegoddd
Copy link
Contributor

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000
version result total
d7cb874 ok 5937457
version total_tests
d7cb874 5937457
correctness_percentage
100.0

@nicktobey nicktobey changed the title Optimize JSON_SET and JSON_REMOVE on IndexedJsonDocument Optimize JSON_SET and JSON_REPLACE on IndexedJsonDocument Jul 8, 2024
@nicktobey nicktobey requested a review from fulghum July 9, 2024 21:58
Copy link
Contributor

@fulghum fulghum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple comments about two pieces of duplicated code that I noticed.

Comment on lines +415 to +416
// The supplied path may be 0-indexing into a scalar, which is the same as referencing the scalar. Remove
// the index and try again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I didn't know MySQL's JSON path logic would allow this. Would this logic be worth considering moving into JsonCursor? I'm wondering if this is more generic than just for JSON_SET and would make sense to support for other JSON operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely cases that we don't cover here, such as including the [0] index in the middle of a path.

Example: the number in { "a": { "b": 1 } } could be accessed via the path $[0].a[0].b[0].

This isn't the right place for a comprehensive solution, this is just to handle some cases that we currently test for. A comprehensive solution would go somewhere else, but I don't think is worth implementing unless a user actually needs this behavior, given how odd it is.

Comment on lines +265 to +267
convertToIndexedJsonDocument := func(t *testing.T, s interface{}) interface{} {
return newIndexedJsonDocumentFromValue(t, ctx, ns, s)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) this function is defined in at least 6 places. Might be a small improvement to pull that out into a shared variable that all the tests could reuse. Not a huge deal, but could cut down on some duplicated test code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is a closure that captures the context and node store from the surrounding block. Pulling it out makes that more complicated and doesn't seem worth it.

@nicktobey nicktobey merged commit 00add71 into main Jul 10, 2024
34 of 35 checks passed
@nicktobey nicktobey deleted the nicktobey/json-set branch July 10, 2024 23:41
Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.05 1.8
batching batch sql 10000 1 0.07 1.86
batching by line sql 10000 1 0.07 2
blob 1 blob 200000 1 0.89 3.96 3.88
blob 2 blobs 200000 1 0.87 4.57 4.55
blob no blob 200000 1 0.9 2.48 2.14
col type datetime 200000 1 0.82 3.01 2.87
col type varchar 200000 1 0.68 3.57 3.06
config width 2 cols 200000 1 0.78 2.6 2.26
config width 32 cols 200000 1 1.84 2.02 2.49
config width 8 cols 200000 1 0.96 2.46 2.27
pk type float 200000 1 0.84 2.7 2.1
pk type int 200000 1 0.82 2.48 2.26
pk type varchar 200000 1 1.64 1.77 1.38
row count 1.6mm 1600000 1 5.69 2.96 2.55
row count 400k 400000 1 1.41 2.94 2.5
row count 800k 800000 1 2.8 2.98 2.56
secondary index four index 200000 1 3.46 1.47 1.13
secondary index no secondary 200000 1 0.88 2.52 2.18
secondary index one index 200000 1 1.13 2.43 2.12
secondary index two index 200000 1 1.96 1.82 1.49
sorting shuffled 1mm 1000000 0 5.27 2.79 2.5
sorting sorted 1mm 1000000 1 5.2 2.84 2.52

Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.24
dolt_blame_commit_filter system table 3.42
dolt_commit_ancestors_commit_filter system table 0.87
dolt_commits_commit_filter system table 0.97
dolt_diff_log_join_from_commit system table 2.05
dolt_diff_log_join_to_commit system table 2.01
dolt_diff_table_from_commit_filter system table 1.04
dolt_diff_table_to_commit_filter system table 1.14
dolt_diffs_commit_filter system table 0.95
dolt_history_commit_filter system table 1.19
dolt_log_commit_filter system table 0.92

Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 1.38
adds_updates_deletes 60000 60000 60000 4.47
deletes_only 0 60000 0 2.45
updates_only 0 0 60000 3.06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants