(WIP) benchmark: [RFC-98] COW table read performance comparison#18351
Draft
geserdugarov wants to merge 8 commits intoapache:masterfrom
Draft
(WIP) benchmark: [RFC-98] COW table read performance comparison#18351geserdugarov wants to merge 8 commits intoapache:masterfrom
geserdugarov wants to merge 8 commits intoapache:masterfrom
Conversation
…lyPushed` for different Spark versions
dc2433f to
5d3093d
Compare
713d749 to
b137583
Compare
b137583 to
98af4fe
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
This draft PR shows benchmark, which is used to support design doc #18276 and implementation for COW #18277.
Only the latest commit contains benchmark. All previous commits are copy from #18277, which makes this branch independent and ready for recheck.
Data: 800 parquet files with column stats, 30 mln rows, 300 column, 100 GB in total.
The results of reading data locally:
The results of reading data from remote HDFS:
Summary and Changelog
Scala code for submitting to Spark cluster with description, which will perform DSv2 and DSv1 read comparison.
Impact
None. Supporting materials.
Risk Level
None
Documentation Update
None
Contributor's checklist