-
Notifications
You must be signed in to change notification settings - Fork 19
Optimize Starrocks #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1fbea63
optimize for starrocks
murphyatwork caa34de
incldue ddl_flat in main.sh
murphyatwork 5433550
add empty line in queries
murphyatwork 46bfa2b
Merge remote-tracking branch 'origin/main' into murphy_opt_flat
rschu1ze 815d991
rename
murphyatwork 6c2c0f3
Merge remote-tracking branch 'origin/main' into murphy_opt_flat
rschu1ze 0da5e48
Add measurements
rschu1ze File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
CREATE TABLE bluesky ( | ||
`id` BIGINT AUTO_INCREMENT, | ||
`data` JSON NOT NULL COMMENT "Primary JSON object, optimized for field access using FlatJSON" | ||
); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
SELECT cast(data->'commit.collection' AS VARCHAR) AS event,count() AS count FROM bluesky GROUP BY event ORDER BY count DESC; | ||
SELECT cast(data->'commit.collection' AS VARCHAR) AS event, count() AS count, count(DISTINCT cast(data->'did' AS VARCHAR)) AS users FROM bluesky WHERE (data->'kind' = 'commit') AND (data->'commit.operation' = 'create') GROUP BY event ORDER BY count DESC; | ||
SELECT cast(data->'commit.collection' AS VARCHAR) AS event, hour(from_unixtime(round(divide(cast(data->'time_us' AS BIGINT), 1000000)))) as hour_of_day, count() AS count FROM bluesky WHERE (data->'kind' = 'commit') AND (data->'commit.operation' = 'create') AND (array_contains(['app.bsky.feed.post', 'app.bsky.feed.repost', 'app.bsky.feed.like'], cast(data->'commit.collection' AS VARCHAR))) GROUP BY event, hour_of_day ORDER BY hour_of_day, event; | ||
SELECT cast(data->'$.did' as VARCHAR) as user_id, min(from_unixtime(round(divide(cast(data->'time_us' AS BIGINT), 1000000)))) AS first_post_date FROM bluesky WHERE (data->'kind' = 'commit') AND (data->'commit.operation' = 'create') AND (data->'commit.collection' = 'app.bsky.feed.post') GROUP BY user_id ORDER BY first_post_date ASC LIMIT 3; | ||
SELECT cast(data->'$.did' as VARCHAR) as user_id, date_diff('millisecond', min(from_unixtime(round(divide(cast(data->'time_us' AS BIGINT), 1000000)))),max(from_unixtime(round(divide(cast(data->'time_us' AS BIGINT), 1000000))))) AS activity_span FROM bluesky WHERE (data->'kind' = 'commit') AND (data->'commit.operation' = 'create') AND (data->'commit.collection' = 'app.bsky.feed.post') GROUP BY user_id ORDER BY activity_span DESC LIMIT 3; | ||
SELECT get_json_string(data, 'commit.collection') AS event, count() AS count FROM bluesky GROUP BY event ORDER BY count DESC; | ||
SELECT get_json_string(data, 'commit.collection') AS event, count() AS count, count(DISTINCT get_json_string(data, 'did')) AS users FROM bluesky WHERE (get_json_string(data, 'kind') = 'commit') AND (get_json_string(data, 'commit.operation') = 'create') GROUP BY event ORDER BY count DESC; | ||
SELECT get_json_string(data, 'commit.collection') AS event, hour(from_unixtime(round(divide(get_json_int(data, 'time_us'), 1000000)))) as hour_of_day, count() AS count FROM bluesky WHERE (get_json_string(data, 'kind') = 'commit') AND (get_json_string(data, 'commit.operation') = 'create') AND (array_contains(['app.bsky.feed.post', 'app.bsky.feed.repost', 'app.bsky.feed.like'], get_json_string(data, 'commit.collection'))) GROUP BY event, hour_of_day ORDER BY hour_of_day, event; | ||
SELECT get_json_string(data, '$.did') as user_id, min(from_unixtime(round(divide(get_json_int(data, 'time_us'), 1000000)))) AS first_post_date FROM bluesky WHERE (get_json_string(data, 'kind') = 'commit') AND (get_json_string(data, 'commit.operation') = 'create') AND (get_json_string(data, 'commit.collection') = 'app.bsky.feed.post') GROUP BY user_id ORDER BY first_post_date ASC LIMIT 3; | ||
SELECT get_json_string(data, '$.did') as user_id, date_diff('millisecond', min(from_unixtime(round(divide(get_json_int(data, 'time_us'), 1000000)))), max(from_unixtime(round(divide(get_json_int(data, 'time_us'), 1000000))))) AS activity_span FROM bluesky WHERE (get_json_string(data, 'kind') = 'commit') AND (get_json_string(data, 'commit.operation') = 'create') AND (get_json_string(data, 'commit.collection') = 'app.bsky.feed.post') GROUP BY user_id ORDER BY activity_span DESC LIMIT 3; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14 changes: 7 additions & 7 deletions
14
...esults/m6i.8xlarge_bluesky_1000m_lz4.json → ...ts/m6i.8xlarge_bluesky_1000m_default.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,23 @@ | ||
{ | ||
"system": "Starrocks (lz4)", | ||
"system": "Starrocks (default)", | ||
"version": "3.4.0-e94580b", | ||
"os": "Ubuntu 24.04", | ||
"date": "2025-01-13", | ||
"date": "2025-03-24", | ||
"machine": "m6i.8xlarge, 10000gib gp3", | ||
"cluster_size": 1, | ||
"comment": "", | ||
"retains_structure": "yes", | ||
"tags": [ | ||
], | ||
"dataset_size": 1000000000, | ||
"num_loaded_documents": null, | ||
"num_loaded_documents": 804000000, | ||
"data_compression": "lz4", | ||
"total_size": null, | ||
"result": [ | ||
[null, null, null], | ||
[null, null, null], | ||
[null, null, null], | ||
[null, null, null], | ||
[2.27,1.24,1.21], | ||
[17.81,10.67,10.20], | ||
[7.38,6.78,7.62], | ||
[7.24, null, null], | ||
[null, null, null] | ||
] | ||
} |
16 changes: 8 additions & 8 deletions
16
...sults/m6i.8xlarge_bluesky_1000m_zstd.json → ...i.8xlarge_bluesky_1000m_materialized.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,23 @@ | ||
{ | ||
"system": "Starrocks (zstd)", | ||
"system": "Starrocks (materialized)", | ||
"version": "3.4.0-e94580b", | ||
"os": "Ubuntu 24.04", | ||
"date": "2025-01-13", | ||
"date": "2025-03-24", | ||
"machine": "m6i.8xlarge, 10000gib gp3", | ||
"cluster_size": 1, | ||
"comment": "", | ||
"retains_structure": "yes", | ||
"tags": [ | ||
], | ||
"dataset_size": 1000000000, | ||
"num_loaded_documents": null, | ||
"num_loaded_documents": 997000000, | ||
"data_compression": "zstd", | ||
"total_size": null, | ||
"total_size": 191541000000, | ||
"result": [ | ||
[null, null, null], | ||
[null, null, null], | ||
[null, null, null], | ||
[null, null, null], | ||
[1.75,1.56,1.54], | ||
[49.75,41.61,31.38], | ||
[12.90,12.58,5.76], | ||
[5.64,6.21,6.03], | ||
[null, null, null] | ||
] | ||
} |
18 changes: 9 additions & 9 deletions
18
...results/m6i.8xlarge_bluesky_100m_lz4.json → ...lts/m6i.8xlarge_bluesky_100m_default.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,23 @@ | ||
{ | ||
"system": "Starrocks (lz4)", | ||
"system": "Starrocks (default)", | ||
"version": "3.4.0-e94580b", | ||
"os": "Ubuntu 24.04", | ||
"date": "2025-01-13", | ||
"date": "2025-03-24", | ||
"machine": "m6i.8xlarge, 10000gib gp3", | ||
"cluster_size": 1, | ||
"comment": "", | ||
"retains_structure": "yes", | ||
"tags": [ | ||
], | ||
"dataset_size": 100000000, | ||
"num_loaded_documents": 100000000, | ||
"num_loaded_documents": 91000000, | ||
"data_compression": "lz4", | ||
"total_size": 19182000000, | ||
"total_size": 17109000000, | ||
"result": [ | ||
[0.25,0.17,0.17], | ||
[8.13,4.33,3.82], | ||
[3.18,3.08,3.05], | ||
[4.06,4.07,4.12], | ||
[4.04,4.20,3.97] | ||
[0.61,0.16,0.16], | ||
[19.26,7.12,7.18], | ||
[1.12,1.08,1.08], | ||
[0.55,0.55,0.54], | ||
[0.60,0.60,0.60] | ||
] | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 9 additions & 9 deletions
18
.../results/m6i.8xlarge_bluesky_10m_lz4.json → ...ults/m6i.8xlarge_bluesky_10m_default.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,23 +1,23 @@ | ||
{ | ||
"system": "Starrocks (lz4)", | ||
"system": "Starrocks (default)", | ||
"version": "3.4.0-e94580b", | ||
"os": "Ubuntu 24.04", | ||
"date": "2025-01-13", | ||
"date": "2025-03-24", | ||
"machine": "m6i.8xlarge, 10000gib gp3", | ||
"cluster_size": 1, | ||
"comment": "", | ||
"retains_structure": "yes", | ||
"tags": [ | ||
], | ||
"dataset_size": 10000000, | ||
"num_loaded_documents": 9999994, | ||
"num_loaded_documents": 7000000, | ||
"data_compression": "lz4", | ||
"total_size": 1967000000, | ||
"total_size": 824028000, | ||
"result": [ | ||
[0.11,0.10,0.10], | ||
[0.45,0.40,0.42], | ||
[0.58,0.45,0.50], | ||
[0.57,0.62,0.61], | ||
[0.69,0.60,0.55] | ||
[0.03,0.02,0.03], | ||
[0.52,0.50,0.48], | ||
[0.25,0.25,0.18], | ||
[0.11,0.11,0.10], | ||
[0.11,0.12,0.12] | ||
] | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ideally update the queries and the runtime measurements in the same PR, otherwise the measurements become stale. If you like me to run the measurements on my local machine, please let me know.
(but let's first clarify the questions in my other comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please help me run that measurements. Appreciate it.
I attempted to run this benchmark in a 32-core Docker container but obtained results that differed from yours. I’ll work on reproducing your results; however, for now, I believe it’s best to rely solely on your data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, i'm running on the latest version StarRocks-3.4.1, your results seems to be on StarRocks-3.4.0. If possible please use that version when running the measurements, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running the benchmarks right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not choose the version intentionally, I am just testing whichever version is loaded in
install.sh
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My local measurements for scale factors 1, 10, 100 succeeded, then for scale factor 1000 only a single file was processed. I think the reason was that I was in parallel experimenting with #43 ... redoing the measurements now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, SR ensures the atomic loading of a file, meaning the load will fail if any records are unqualified. However, SR offers a
max_filter_ratio
parameter in stream load to control this behavior—consider whether you need to adjust it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@murphyatwork I am not very familiar with Starrocks but feel free to change the scripts so they use the setting. I could then re-benchmark.