Optimize JSON_SET and JSON_REPLACE on `IndexedJsonDocument` #8107

nicktobey · 2024-07-08T23:03:34Z

This PR includes a new implementation of the JSON_SET and JSON_REPLACE functions that leverage the new indexed JSON storage format.

For JSON documents that span multiple chunks, only the affected chunks need to be loaded and modified, allowing operations to scale with the size of the removed value instead of the size of the entire document.

coffeegoddd · 2024-07-08T23:36:17Z

@nicktobey DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`d7cb874`	ok	5937457

version	total_tests
`d7cb874`	5937457

correctness_percentage
100.0

fulghum

Looks good! Just a couple comments about two pieces of duplicated code that I noticed.

fulghum · 2024-07-10T21:05:31Z

go/store/prolly/tree/json_indexed_document.go

+	// The supplied path may be 0-indexing into a scalar, which is the same as referencing the scalar. Remove
+	// the index and try again.


This is interesting. I didn't know MySQL's JSON path logic would allow this. Would this logic be worth considering moving into JsonCursor? I'm wondering if this is more generic than just for JSON_SET and would make sense to support for other JSON operations.

There's definitely cases that we don't cover here, such as including the [0] index in the middle of a path.

Example: the number in { "a": { "b": 1 } } could be accessed via the path $[0].a[0].b[0].

This isn't the right place for a comprehensive solution, this is just to handle some cases that we currently test for. A comprehensive solution would go somewhere else, but I don't think is worth implementing unless a user actually needs this behavior, given how odd it is.

fulghum · 2024-07-10T21:16:08Z

go/store/prolly/tree/json_indexed_document_test.go

+	convertToIndexedJsonDocument := func(t *testing.T, s interface{}) interface{} {
+		return newIndexedJsonDocumentFromValue(t, ctx, ns, s)
+	}


(minor) this function is defined in at least 6 places. Might be a small improvement to pull that out into a shared variable that all the tests could reuse. Not a huge deal, but could cut down on some duplicated test code.

The function is a closure that captures the context and node store from the surrounding block. Pulling it out makes that more complicated and doesn't seem worth it.

github-actions · 2024-07-11T02:23:27Z

@coffeegoddd DOLT

test_name	detail	row_cnt	sorted	mysql_time	sql_mult	cli_mult
batching	LOAD DATA	10000	1	0.05	1.8
batching	batch sql	10000	1	0.07	1.86
batching	by line sql	10000	1	0.07	2
blob	1 blob	200000	1	0.89	3.96	3.88
blob	2 blobs	200000	1	0.87	4.57	4.55
blob	no blob	200000	1	0.9	2.48	2.14
col type	datetime	200000	1	0.82	3.01	2.87
col type	varchar	200000	1	0.68	3.57	3.06
config width	2 cols	200000	1	0.78	2.6	2.26
config width	32 cols	200000	1	1.84	2.02	2.49
config width	8 cols	200000	1	0.96	2.46	2.27
pk type	float	200000	1	0.84	2.7	2.1
pk type	int	200000	1	0.82	2.48	2.26
pk type	varchar	200000	1	1.64	1.77	1.38
row count	1.6mm	1600000	1	5.69	2.96	2.55
row count	400k	400000	1	1.41	2.94	2.5
row count	800k	800000	1	2.8	2.98	2.56
secondary index	four index	200000	1	3.46	1.47	1.13
secondary index	no secondary	200000	1	0.88	2.52	2.18
secondary index	one index	200000	1	1.13	2.43	2.12
secondary index	two index	200000	1	1.96	1.82	1.49
sorting	shuffled 1mm	1000000	0	5.27	2.79	2.5
sorting	sorted 1mm	1000000	1	5.2	2.84	2.52

github-actions · 2024-07-11T02:31:19Z

@coffeegoddd DOLT

name	detail	mean_mult
dolt_blame_basic	system table	1.24
dolt_blame_commit_filter	system table	3.42
dolt_commit_ancestors_commit_filter	system table	0.87
dolt_commits_commit_filter	system table	0.97
dolt_diff_log_join_from_commit	system table	2.05
dolt_diff_log_join_to_commit	system table	2.01
dolt_diff_table_from_commit_filter	system table	1.04
dolt_diff_table_to_commit_filter	system table	1.14
dolt_diffs_commit_filter	system table	0.95
dolt_history_commit_filter	system table	1.19
dolt_log_commit_filter	system table	0.92

github-actions · 2024-07-11T02:49:53Z

@coffeegoddd DOLT

name	add_cnt	delete_cnt	update_cnt	latency
adds_only	60000	0	0	1.38
adds_updates_deletes	60000	60000	60000	4.47
deletes_only	0	60000	0	2.45
updates_only	0	0	60000	3.06

nicktobey added 2 commits July 8, 2024 15:56

Add indexed json tests for JSON_REPLACE and JSON_SET

6a4134c

Add support for JSON_SET and JSON_REPLACE

d7cb874

coffeegoddd added the correctness_approved label Jul 8, 2024

nicktobey changed the title ~~Optimize JSON_SET and JSON_REMOVE on IndexedJsonDocument~~ Optimize JSON_SET and JSON_REPLACE on IndexedJsonDocument Jul 8, 2024

nicktobey requested a review from fulghum July 9, 2024 21:58

fulghum approved these changes Jul 10, 2024

View reviewed changes

nicktobey merged commit 00add71 into main Jul 10, 2024
34 of 35 checks passed

nicktobey deleted the nicktobey/json-set branch July 10, 2024 23:41

BrewTestBot mentioned this pull request Jul 12, 2024

dolt 1.41.4 Homebrew/homebrew-core#177098

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize JSON_SET and JSON_REPLACE on `IndexedJsonDocument` #8107

Optimize JSON_SET and JSON_REPLACE on `IndexedJsonDocument` #8107

nicktobey commented Jul 8, 2024 •

edited

Loading

coffeegoddd commented Jul 8, 2024

fulghum left a comment

fulghum Jul 10, 2024

nicktobey Jul 10, 2024

fulghum Jul 10, 2024

nicktobey Jul 10, 2024

github-actions bot commented Jul 11, 2024

github-actions bot commented Jul 11, 2024

github-actions bot commented Jul 11, 2024

		// The supplied path may be 0-indexing into a scalar, which is the same as referencing the scalar. Remove
		// the index and try again.

Optimize JSON_SET and JSON_REPLACE on IndexedJsonDocument #8107

Optimize JSON_SET and JSON_REPLACE on IndexedJsonDocument #8107

Conversation

nicktobey commented Jul 8, 2024 • edited Loading

coffeegoddd commented Jul 8, 2024

fulghum left a comment

Choose a reason for hiding this comment

fulghum Jul 10, 2024

Choose a reason for hiding this comment

nicktobey Jul 10, 2024

Choose a reason for hiding this comment

fulghum Jul 10, 2024

Choose a reason for hiding this comment

nicktobey Jul 10, 2024

Choose a reason for hiding this comment

github-actions bot commented Jul 11, 2024

github-actions bot commented Jul 11, 2024

github-actions bot commented Jul 11, 2024

Optimize JSON_SET and JSON_REPLACE on `IndexedJsonDocument` #8107

Optimize JSON_SET and JSON_REPLACE on `IndexedJsonDocument` #8107

nicktobey commented Jul 8, 2024 •

edited

Loading