PHOENIX-6751 Force using range scan vs skip scan when using large IN clause #1495

jpisaac · 2022-08-29T19:51:51Z

No description provided.

…clause

jpisaac · 2022-08-29T19:54:03Z

@tkhurana Please review

jpisaac · 2022-08-29T19:54:29Z

@tkhurana Please review

chrajeshbabu · 2022-08-29T23:22:49Z

phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java

+            // is below the configured max (maxInListSkipScanSize).
+            // We shall force a range scan if the configured max is exceeded.
+            // cnfStartPos => is the start slot of this IN list
+            if (checkMaxSkipScanCardinality) {


Can this conversion of skip scan to range scan configurable because range scan on bigger data sets is slow?

There's already a config param to control how many elements triggers the conversion, which could be used to turn it off by setting it very high. However, when we get in this state (high cardinality skip scans) we find that we get OOM exceptions even with large client-side heaps, which is worse than a slow query.

@chrajeshbabu - do you still have concerns about this patch or is it ready to be merged?

@chrajeshbabu For some queries especially with RVC expression and mixed sort orders, the cost of optimization results in huge memory allocations and sometimes even exceeds the number of KEY_RANGES allowed This JIRA provides a framework for opting out the optimization path when a certain threshold is reached. We will be working towards an algorithm that is more linear in nature than combinatorial as is the case today.

gjacoby126 · 2022-08-30T01:52:47Z

@jpisaac @tkhurana - could you please explain what's different between this version of 6751 and the one that was previously committed and then reverted?

tkhurana · 2022-08-30T16:06:03Z

@jpisaac @tkhurana - could you please explain what's different between this version of 6751 and the one that was previously committed and then reverted?

@gjacoby126 The difference is the use of biginteger to avoid overflow issues when determining whether to use skip scan or range scan.
https://github.com/apache/phoenix/pull/1495/files#diff-9494157265dca11f05041f6a7238b64857cc22e08c3c23e92412e1a0da1d12feR335-R343

gjacoby126

+1

PHOENIX-6751 Force using range scan vs skip scan when using large IN …

f56cdc2

…clause

jpisaac requested a review from gjacoby126 August 29, 2022 19:53

tkhurana approved these changes Aug 29, 2022

View reviewed changes

chrajeshbabu reviewed Aug 29, 2022

View reviewed changes

gjacoby126 approved these changes Aug 31, 2022

View reviewed changes

gjacoby126 merged commit c607518 into apache:master Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-6751 Force using range scan vs skip scan when using large IN clause #1495

PHOENIX-6751 Force using range scan vs skip scan when using large IN clause #1495

jpisaac commented Aug 29, 2022

jpisaac commented Aug 29, 2022

jpisaac commented Aug 29, 2022

chrajeshbabu Aug 29, 2022

gjacoby126 Aug 30, 2022

gjacoby126 Aug 31, 2022

jpisaac Aug 31, 2022

gjacoby126 commented Aug 30, 2022

tkhurana commented Aug 30, 2022

gjacoby126 left a comment

PHOENIX-6751 Force using range scan vs skip scan when using large IN clause #1495

PHOENIX-6751 Force using range scan vs skip scan when using large IN clause #1495

Conversation

jpisaac commented Aug 29, 2022

jpisaac commented Aug 29, 2022

jpisaac commented Aug 29, 2022

chrajeshbabu Aug 29, 2022

Choose a reason for hiding this comment

gjacoby126 Aug 30, 2022

Choose a reason for hiding this comment

gjacoby126 Aug 31, 2022

Choose a reason for hiding this comment

jpisaac Aug 31, 2022

Choose a reason for hiding this comment

gjacoby126 commented Aug 30, 2022

tkhurana commented Aug 30, 2022

gjacoby126 left a comment

Choose a reason for hiding this comment