PHOENIX-6960 Scan range is incorrect when query desc columns #1663

virajjasani · 2023-08-28T16:06:16Z

virajjasani · 2023-08-29T02:35:12Z

build results: https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-PreCommit-GitHub-PR/job/PR-1663/4/testReport/

stoty · 2023-08-31T13:45:39Z

phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java

+            boolean lowerInclusive = column.getSortOrder() == SortOrder.ASC;
+            boolean upperInclusive = column.getSortOrder() == SortOrder.DESC;
+            KeyRange range = type.getKeyRange(lowerRange, lowerInclusive, upperRange,
+                    upperInclusive, SortOrder.ASC);


I have debugged into this, and after inverting twice, we end up returning a non-inverted range, which eventually gets inverted in WhereOptimizer.pushKeyExpressionsToScan().

Do we even need to take SortOrder into account here, and if yes, couldn't we simplify this logic ?

I'm talking the whole method, not the specific line above.

stoty · 2023-08-31T13:57:28Z

phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java

+            // TODO: is there a case where we'd need to go through the childPart to calculate the key range?
+            PColumn column = childSlot.getKeyPart().getColumn();
+            PDataType type = column.getDataType();
+            byte[] key = PVarchar.INSTANCE.toBytes(startsWith, column.getSortOrder());


This is where we get an inverted range

stoty · 2023-08-31T13:58:18Z

phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java

+            boolean lowerInclusive = column.getSortOrder() == SortOrder.ASC;
+            boolean upperInclusive = column.getSortOrder() == SortOrder.DESC;
+            KeyRange range = type.getKeyRange(lowerRange, lowerInclusive, upperRange,
+                    upperInclusive, SortOrder.ASC);
            if (column.getSortOrder() == SortOrder.DESC) {
                range = range.invert();


And this is wehere we re-invert, getting a normal range back.

…master

virajjasani · 2023-09-04T00:54:33Z

Thank you for taking a look @stoty, and yes you are correct that we have scope for optimization here, i just addressed your review with the latest revision. Also, @jinggou found some interesting test case failures, i tried to address them. Thank you for running additional tests, Jing.

Created 5.1 backport PR: #1668

@stoty @jinggou could you please take a look again? (not urgent, i might likely be able to come online after 5 days)

stoty

+1 LGTM

tkhurana · 2023-09-05T17:26:18Z

@virajjasani can you also add some unit tests in WhereOptimizerTest . Those tests verify that the scan range is generated correctly. I want to make sure that we have some tests for descending columns with like expressions.

jinggou · 2023-09-08T18:49:35Z

LGTM. The test case that failed before passes now (e.g., descending columns with LIKE x%).

virajjasani · 2023-09-12T17:17:59Z

Thank you for the reviews @stoty @tkhurana @jinggou

Tanuj, sure let me add some test in WhereOptimizerTest, sounds good

virajjasani · 2023-09-12T18:42:21Z

done

dbwong · 2023-09-13T08:45:03Z

phoenix-core/src/main/java/org/apache/phoenix/compile/WhereOptimizer.java

+            // TODO: is there a case where we'd need to go through the childPart to calculate the key range?
+            PColumn column = childSlot.getKeyPart().getColumn();
+            PDataType type = column.getDataType();
+            byte[] key = PVarchar.INSTANCE.toBytes(startsWith, SortOrder.ASC);


This might work in most of the cases but I'd be a little worried in the admittedly weird case of where the LIKE contains a DESC column reference. Maybe add a test here and see? Likely it won't extract any keys but worth a look. @virajjasani @stoty?
SELECT * FROM table WHERE col1 LIKE ('abc' || col2)

interesting! do we support col1 LIKE ('xy%' || col2) case?
i tried this and saw that we return right here as null KeySlots due to empty childParts:

@Override public KeySlots visitLeave(LikeExpression node, List<KeySlots> childParts) { // TODO: optimize ILIKE by creating two ranges for the literal prefix: one with lower case, one with upper case if (childParts.isEmpty()) { return null; } ... ...

Even col1 LIKE col2 doesn't seem to be working as childParts are coming empty, resulting in null KeySlots

Makes sense, we cannot use values from the table for constructing scan filters on the rowkey (unless they are coming from uncorrelated subqueries, but those are effectively constants)

Thanks for the test, @virajjasani I do think that in theory 'xy%' || col2 would be extractable as the prefix here is a constant though I was guessing we don't try to optimize this currently.

yeah that seems the case, this also got me to look into what postgres has, whether they allow col reference but seems like they also support constant or expression (bit wider variety of regular expressions): https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE

Thanks for the review @dbwong

PHOENIX-6960 Scan range is incorrect when query desc columns

59c8b2f

virajjasani marked this pull request as draft August 28, 2023 20:22

addendum

fd4069d

virajjasani marked this pull request as ready for review August 29, 2023 02:34

virajjasani requested review from stoty, tkhurana and kadirozde August 29, 2023 02:34

virajjasani changed the title ~~PHOENIX-6960 Scan range is incorrect when query desc columns~~ PHOENIX-6960 Scan range is incorrect when query desc columns (WIP) Aug 31, 2023

virajjasani marked this pull request as draft August 31, 2023 07:23

stoty reviewed Aug 31, 2023

View reviewed changes

virajjasani added 2 commits September 3, 2023 11:52

Merge branch 'master' of github.com:apache/phoenix into PHOENIX-6960-…

ab62738

…master

addendum - addressing review and adding more tests

4cc66c4

virajjasani changed the title ~~PHOENIX-6960 Scan range is incorrect when query desc columns (WIP)~~ PHOENIX-6960 Scan range is incorrect when query desc columns Sep 3, 2023

virajjasani marked this pull request as ready for review September 4, 2023 00:42

virajjasani mentioned this pull request Sep 4, 2023

PHOENIX-6960 Scan range is incorrect when query desc columns #1668

Merged

stoty approved these changes Sep 5, 2023

View reviewed changes

addendum

9d3594f

tkhurana approved these changes Sep 12, 2023

View reviewed changes

dbwong reviewed Sep 13, 2023

View reviewed changes

dbwong approved these changes Sep 14, 2023

View reviewed changes

virajjasani merged commit 7f6cc3f into apache:master Sep 14, 2023
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-6960 Scan range is incorrect when query desc columns #1663

PHOENIX-6960 Scan range is incorrect when query desc columns #1663

virajjasani commented Aug 28, 2023 •

edited

Loading

virajjasani commented Aug 29, 2023

stoty Aug 31, 2023

stoty Aug 31, 2023

stoty Aug 31, 2023

stoty Aug 31, 2023

virajjasani commented Sep 4, 2023 •

edited

Loading

stoty left a comment

tkhurana commented Sep 5, 2023

jinggou commented Sep 8, 2023

virajjasani commented Sep 12, 2023

virajjasani commented Sep 12, 2023

dbwong Sep 13, 2023 •

edited

Loading

virajjasani Sep 13, 2023 •

edited

Loading

virajjasani Sep 14, 2023

stoty Sep 14, 2023

dbwong Sep 14, 2023

virajjasani Sep 14, 2023

virajjasani Sep 14, 2023

PHOENIX-6960 Scan range is incorrect when query desc columns #1663

PHOENIX-6960 Scan range is incorrect when query desc columns #1663

Conversation

virajjasani commented Aug 28, 2023 • edited Loading

virajjasani commented Aug 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

virajjasani commented Sep 4, 2023 • edited Loading

stoty left a comment

Choose a reason for hiding this comment

tkhurana commented Sep 5, 2023

jinggou commented Sep 8, 2023

virajjasani commented Sep 12, 2023

virajjasani commented Sep 12, 2023

dbwong Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

virajjasani Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

virajjasani commented Aug 28, 2023 •

edited

Loading

virajjasani commented Sep 4, 2023 •

edited

Loading

dbwong Sep 13, 2023 •

edited

Loading

virajjasani Sep 13, 2023 •

edited

Loading