Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-6960 Scan range is incorrect when query desc columns #1663

Merged
merged 5 commits into from
Sep 14, 2023

Conversation

virajjasani
Copy link
Contributor

@virajjasani virajjasani commented Aug 28, 2023

Jira: PHOENIX-6960

@virajjasani virajjasani marked this pull request as draft August 28, 2023 20:22
@virajjasani virajjasani marked this pull request as ready for review August 29, 2023 02:34
@virajjasani
Copy link
Contributor Author

@virajjasani virajjasani changed the title PHOENIX-6960 Scan range is incorrect when query desc columns PHOENIX-6960 Scan range is incorrect when query desc columns (WIP) Aug 31, 2023
@virajjasani virajjasani marked this pull request as draft August 31, 2023 07:23
boolean lowerInclusive = column.getSortOrder() == SortOrder.ASC;
boolean upperInclusive = column.getSortOrder() == SortOrder.DESC;
KeyRange range = type.getKeyRange(lowerRange, lowerInclusive, upperRange,
upperInclusive, SortOrder.ASC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have debugged into this, and after inverting twice, we end up returning a non-inverted range, which eventually gets inverted in WhereOptimizer.pushKeyExpressionsToScan().

Do we even need to take SortOrder into account here, and if yes, couldn't we simplify this logic ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking the whole method, not the specific line above.

// TODO: is there a case where we'd need to go through the childPart to calculate the key range?
PColumn column = childSlot.getKeyPart().getColumn();
PDataType type = column.getDataType();
byte[] key = PVarchar.INSTANCE.toBytes(startsWith, column.getSortOrder());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we get an inverted range

boolean lowerInclusive = column.getSortOrder() == SortOrder.ASC;
boolean upperInclusive = column.getSortOrder() == SortOrder.DESC;
KeyRange range = type.getKeyRange(lowerRange, lowerInclusive, upperRange,
upperInclusive, SortOrder.ASC);
if (column.getSortOrder() == SortOrder.DESC) {
range = range.invert();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is wehere we re-invert, getting a normal range back.

@virajjasani virajjasani changed the title PHOENIX-6960 Scan range is incorrect when query desc columns (WIP) PHOENIX-6960 Scan range is incorrect when query desc columns Sep 3, 2023
@virajjasani virajjasani marked this pull request as ready for review September 4, 2023 00:42
@virajjasani
Copy link
Contributor Author

virajjasani commented Sep 4, 2023

Thank you for taking a look @stoty, and yes you are correct that we have scope for optimization here, i just addressed your review with the latest revision. Also, @jinggou found some interesting test case failures, i tried to address them. Thank you for running additional tests, Jing.

Created 5.1 backport PR: #1668

@stoty @jinggou could you please take a look again? (not urgent, i might likely be able to come online after 5 days)

Copy link
Contributor

@stoty stoty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@tkhurana
Copy link
Contributor

tkhurana commented Sep 5, 2023

@virajjasani can you also add some unit tests in WhereOptimizerTest . Those tests verify that the scan range is generated correctly. I want to make sure that we have some tests for descending columns with like expressions.

@jinggou
Copy link
Contributor

jinggou commented Sep 8, 2023

LGTM. The test case that failed before passes now (e.g., descending columns with LIKE x%).

@virajjasani
Copy link
Contributor Author

Thank you for the reviews @stoty @tkhurana @jinggou

Tanuj, sure let me add some test in WhereOptimizerTest, sounds good

@virajjasani
Copy link
Contributor Author

done

// TODO: is there a case where we'd need to go through the childPart to calculate the key range?
PColumn column = childSlot.getKeyPart().getColumn();
PDataType type = column.getDataType();
byte[] key = PVarchar.INSTANCE.toBytes(startsWith, SortOrder.ASC);
Copy link
Contributor

@dbwong dbwong Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might work in most of the cases but I'd be a little worried in the admittedly weird case of where the LIKE contains a DESC column reference. Maybe add a test here and see? Likely it won't extract any keys but worth a look. @virajjasani @stoty?
SELECT * FROM table WHERE col1 LIKE ('abc' || col2)

Copy link
Contributor Author

@virajjasani virajjasani Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting! do we support col1 LIKE ('xy%' || col2) case?
i tried this and saw that we return right here as null KeySlots due to empty childParts:

        @Override
        public KeySlots visitLeave(LikeExpression node, List<KeySlots> childParts) {
            // TODO: optimize ILIKE by creating two ranges for the literal prefix: one with lower case, one with upper case
            if (childParts.isEmpty()) {
                return null;
            }
...
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even col1 LIKE col2 doesn't seem to be working as childParts are coming empty, resulting in null KeySlots

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, we cannot use values from the table for constructing scan filters on the rowkey (unless they are coming from uncorrelated subqueries, but those are effectively constants)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test, @virajjasani I do think that in theory 'xy%' || col2 would be extractable as the prefix here is a constant though I was guessing we don't try to optimize this currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that seems the case, this also got me to look into what postgres has, whether they allow col reference but seems like they also support constant or expression (bit wider variety of regular expressions): https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-LIKE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @dbwong

@virajjasani virajjasani merged commit 7f6cc3f into apache:master Sep 14, 2023
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants