Skip to content

Update PartitionTable to use RowSetBuilder instead of RowSet.insert()#2852

Merged
lbooker42 merged 15 commits intodeephaven:mainfrom
lbooker42:lab-partitionby-opt
Sep 19, 2022
Merged

Update PartitionTable to use RowSetBuilder instead of RowSet.insert()#2852
lbooker42 merged 15 commits intodeephaven:mainfrom
lbooker42:lab-partitionby-opt

Conversation

@lbooker42
Copy link
Copy Markdown
Contributor

@lbooker42 lbooker42 commented Sep 14, 2022

Overview:
Use a sequential builder during the creation stage of the PartitionTable since we can rely on ordered keys, transition to using random builder on following updates.

Initial tests show a 30-90% speedup in some cases (data to follow)

Zulu11.54+23-CA (build 11.0.14+9-LTS) on MacOS, M1				
Rows	        Buckets	    partitionBy (ms)	partitionBy_opt (ms)	speedup
100,000,000	100	        1,454	            2,253	                -54.9%
100,000,000	25,000	    95,338	            9,083	                90.5%
100,000,000	1,000,000	30,935	            19,952	                35.5%
				
				
Zulu17.34+19-CA (build 17.0.3+7-LTS) on MacOS, M1				
Rows	        Buckets	    partitionBy (ms)	partitionBy_opt (ms)	speedup
100,000,000	100	        1,500	            2,346	                -56.4%
100,000,000	25,000	    87,767	            8,952	                89.8%
100,000,000	1,000,000	30,993	            19,811	                36.1%

The only regression is in the very large bucket (1M items per bucket) where direct RowSet.insert() is measurably faster. The other cases greatly benefit from these changes however and the overall change is beneficial.

NOTE: It's likely that groupBy() can also benefit from these changes.

@lbooker42 lbooker42 added this to the Sept 2022 milestone Sep 14, 2022
@lbooker42 lbooker42 changed the title Update PartitionTable to use RowSetBuilder instead of RowSet.insert() Update 'PartitionTable' to use RowSetBuilder instead of RowSet.insert() Sep 14, 2022
@lbooker42 lbooker42 changed the title Update 'PartitionTable' to use RowSetBuilder instead of RowSet.insert() Update PartitionTable to use RowSetBuilder instead of RowSet.insert() Sep 14, 2022
Copy link
Copy Markdown
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are generally sound. We can be a little cleaner with allocation.

@github-actions github-actions Bot locked and limited conversation to collaborators Sep 19, 2022
@lbooker42 lbooker42 deleted the lab-partitionby-opt branch June 26, 2024 20:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants