Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSQ: Change default clusterStatisticsMergeMode to SEQUENTIAL. #14310

Merged
merged 8 commits into from
Jun 26, 2023

Conversation

gianm
Copy link
Contributor

@gianm gianm commented May 18, 2023

This is an undocumented parameter that controls how cluster-by statistics are merged. In PARALLEL mode, statistics are gathered from workers all at once. In SEQUENTIAL mode, statistics are gathered time chunk by time chunk. This improves accuracy for jobs with many time chunks, and reduces memory usage.

The main downside of SEQUENTIAL is that it can take longer, but in most situations I've seen, PARALLEL is only really usable in cases where the sketches are small enough that SEQUENTIAL would also run relatively quickly. So it seems like SEQUENTIAL is a better default.

This is an undocumented parameter that controls how cluster-by statistics
are merged. In PARALLEL mode, statistics are gathered from workers all
at once. In SEQUENTIAL mode, statistics are gathered time chunk by time
chunk. This improves accuracy for jobs with many time chunks, and reduces
memory usage.

The main downside of SEQUENTIAL is that it can take longer, but in most
situations I've seen, PARALLEL is only really usable in cases where the
sketches are small enough that SEQUENTIAL would also run relatively
quickly. So it seems like SEQUENTIAL is a better default.
@gianm gianm added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label May 18, 2023
@cryptoe
Copy link
Contributor

cryptoe commented May 22, 2023

Segments cuts for sequential would be atleast equal to segment cuts in parallel mode. I have seen cases where the job runs 30% faster when changing modes from parallel to sequential when number of workers was 1000.

@cryptoe
Copy link
Contributor

cryptoe commented Jun 24, 2023

We should also change

  • MSQTestBase#245 to PARALLEL_MERGE_CONTEXT since now the default has changed/

This can be done in another PR as well

@gianm
Copy link
Contributor Author

gianm commented Jun 26, 2023

We should also change

* MSQTestBase#245 to PARALLEL_MERGE_CONTEXT since now the default has changed/

This can be done in another PR as well

Ah, I'll do it in this patch, since it needs updates anyway due to one of the test cases failing. Currently, the test case MSQFaultsTest.testInsertCannotBeEmptyFault is timing out, I suppose because the sequential fetching logic doesn't handle the case where no workers have any time chunks. @cryptoe any idea why that might be happening? If not, I'll take a deeper look into it soon.

@cryptoe
Copy link
Contributor

cryptoe commented Jun 26, 2023

@gianm Quite possible we missed this case since we throw InsertCannotByEmptyFault only after getting the partition boundaries.

    if (isTimeBucketed && partitionBoundaries.equals(ClusterByPartitions.oneUniversalPartition())) {
            throw new MSQException(new InsertCannotBeEmptyFault(task.getDataSource()));
          } else {
            log.info("Query [%s] generating %d segments.", queryDef.getQueryId(), partitionBoundaries.size());
          }


The fix would be here WorkerSketcherFetcher#235 . Need to check if CompleteKeyStatisticsInformation is empty.

@gianm
Copy link
Contributor Author

gianm commented Jun 26, 2023

The fix would be here WorkerSketcherFetcher#235 . Need to check if CompleteKeyStatisticsInformation is empty.

TY, I added a block there that registers ClusterByStatisticsSnapshot.empty() for all workers if completeKeyStatisticsInformation.getTimeSegmentVsWorkerMap().isEmpty(). Please let me know if this looks good.

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @gianm

@gianm gianm merged commit 8211379 into apache:master Jun 26, 2023
45 checks passed
@gianm gianm deleted the msq-cmm-default branch June 26, 2023 17:54
@abhishekagarwal87 abhishekagarwal87 added this to the 27.0 milestone Jul 19, 2023
sergioferragut pushed a commit to sergioferragut/druid that referenced this pull request Jul 21, 2023
…#14310)

* MSQ: Change default clusterStatisticsMergeMode to SEQUENTIAL.

This is an undocumented parameter that controls how cluster-by statistics
are merged. In PARALLEL mode, statistics are gathered from workers all
at once. In SEQUENTIAL mode, statistics are gathered time chunk by time
chunk. This improves accuracy for jobs with many time chunks, and reduces
memory usage.

The main downside of SEQUENTIAL is that it can take longer, but in most
situations I've seen, PARALLEL is only really usable in cases where the
sketches are small enough that SEQUENTIAL would also run relatively
quickly. So it seems like SEQUENTIAL is a better default.

* Switch off-test from SEQUENTIAL to PARALLEL.

* Fix sequential merge for situations where there are no time chunks at all.

* Add a couple more tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants