Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-6751 Force using range scan vs skip scan when using large IN clause #1495

Merged
merged 1 commit into from Sep 1, 2022

Conversation

jpisaac
Copy link
Contributor

@jpisaac jpisaac commented Aug 29, 2022

No description provided.

@jpisaac
Copy link
Contributor Author

jpisaac commented Aug 29, 2022

@tkhurana Please review

1 similar comment
@jpisaac
Copy link
Contributor Author

jpisaac commented Aug 29, 2022

@tkhurana Please review

// is below the configured max (maxInListSkipScanSize).
// We shall force a range scan if the configured max is exceeded.
// cnfStartPos => is the start slot of this IN list
if (checkMaxSkipScanCardinality) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this conversion of skip scan to range scan configurable because range scan on bigger data sets is slow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a config param to control how many elements triggers the conversion, which could be used to turn it off by setting it very high. However, when we get in this state (high cardinality skip scans) we find that we get OOM exceptions even with large client-side heaps, which is worse than a slow query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrajeshbabu - do you still have concerns about this patch or is it ready to be merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrajeshbabu For some queries especially with RVC expression and mixed sort orders, the cost of optimization results in huge memory allocations and sometimes even exceeds the number of KEY_RANGES allowed This JIRA provides a framework for opting out the optimization path when a certain threshold is reached. We will be working towards an algorithm that is more linear in nature than combinatorial as is the case today.

@gjacoby126
Copy link
Contributor

@jpisaac @tkhurana - could you please explain what's different between this version of 6751 and the one that was previously committed and then reverted?

@tkhurana
Copy link
Contributor

@jpisaac @tkhurana - could you please explain what's different between this version of 6751 and the one that was previously committed and then reverted?

@gjacoby126 The difference is the use of biginteger to avoid overflow issues when determining whether to use skip scan or range scan.
https://github.com/apache/phoenix/pull/1495/files#diff-9494157265dca11f05041f6a7238b64857cc22e08c3c23e92412e1a0da1d12feR335-R343

Copy link
Contributor

@gjacoby126 gjacoby126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@gjacoby126 gjacoby126 merged commit c607518 into apache:master Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants