-
Notifications
You must be signed in to change notification settings - Fork 1k
PHOENIX-4845 Support using Row Value Constructors in OFFSET clause fo… #671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ChinmaySKulkarni @kadirozde @yanxinyi Initial Review Please |
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
|
General comment: @dbwong Please open a documentation JIRA for this feature and mark as a dependency of 4845 so we don't commit code without the documentation. Will review the code in detail soon. |
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
|
Made https://issues.apache.org/jira/browse/PHOENIX-5660 as requested @ChinmaySKulkarni |
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
| OffsetParseNodeVisitor visitor = new OffsetParseNodeVisitor(context); | ||
| offsetNode.getOffsetParseNode().accept(visitor); | ||
| return visitor.getOffset(); | ||
| if (offsetNode == null) { return new CompiledOffset(Optional.<Integer>absent(), Optional.<byte[]>absent()); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: more than 100 chars per line.
break into two lines for better readability
ChinmaySKulkarni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbwong overall looks great so far. Really nice test coverage! I have posted some comments and questions.
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/main/java/org/apache/phoenix/compile/WhereCompiler.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/phoenix/schema/RowValueConstructorOffsetNotCoercibleException.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java
Outdated
Show resolved
Hide resolved
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
ChinmaySKulkarni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbwong Overall lgtm. I have few nits.
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java
Outdated
Show resolved
Hide resolved
| rs = conn.createStatement().executeQuery(sql); | ||
|
|
||
|
|
||
| } catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbwong is it possible to narrow down this catch to the expected type of exception?
|
|
||
| import com.google.common.base.Optional; | ||
|
|
||
| //pojo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can tell that it's a pojo without the comment being there. I meant adding a comment about what it is going to be used to represent, since that would be more useful for someone reading the code. Either way, a comment is not necessarily even needed here, but that's just my opinion.
phoenix-core/src/main/java/org/apache/phoenix/compile/OffsetCompiler.java
Outdated
Show resolved
Hide resolved
| //other use cases. | ||
| //Note If the table is salted we still mark as row ordered in this code path | ||
| if(offset.getByteOffset().isPresent() && orderByExpressions.isEmpty()){ | ||
| throw new SQLException("Do not allow non ORDER BY with RVC OFFSET"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbwong ping on this change
phoenix-core/src/main/java/org/apache/phoenix/compile/RVCOffsetCompiler.java
Outdated
Show resolved
Hide resolved
ChinmaySKulkarni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lgtm. Thanks @dbwong
|
@dbwong please rebase and upload 4.x and master patches to the JIRA. If the Hadoop QA has no test failures, I will merge your patch shortly. Thanks. |
…r paging in tables where the sort order of PK columns varies
Phoenix-4845 Low Level Design
Feature Availability
Use of RVC Offset requires a fully specified Row Key.
Use of RVC Offset requires row order queries.
RVC Offset can not be used with Uncovered Indexes.
Why can we not support leading edge offsets?
This is possible but does not really support the use case of pagination. In addition, it increases the complexity as Phoenix has to garuntee the cases of index prefixes matching multiple tables and keeping the paginated table consistent.
Why do we only support rowkey orders?
This is possible but greatly increases the end users handling of the data as the result order cannot be used to power the next pagination. The end user would have to figure out which row was the “last” one in the result set. I think this is better handled in a followup Server Side Cursors.
Why do we not support Joins?
In theory a join is a view on two separate tables a single offset does not entirely make sense. It may be implementable with a more expressive syntax, however, that would increase the scope.
Why do we not support aggregations?
Properly handling paginated aggregated queries will require a different approach than simply scanning a subset of rows, the groupby would have to scan all the rows with each unique set of values and a global aggregate does not make sense in a paginated query. Not a major usecase in Salesforce.
Why do we not support subqueries?
For the “virtual“ table that consists of the subselect we may not have a row key. Offseting into this table is non-trivial.
Why do we not support uncovered indexes?
Uncovered indexes by default do a full table scan on the base table. If index hint is used then this is performed as a subquery select which we were not handling. This particular subquery select is in theory implementable and this may work but is not a current case for work in Salesforce.
Approach In Phoenix
CREATE TABLE TABLE1 (A INTEGER NOT NULL, B INTEGER NOT NULL DESC, C INTEGER NOT NULL, D INTEGER NOT NULL, Time INTEGER, CONSTRAINT PK(A,B,C,D) )
SELECT A,B,Value1 FROM TABLE1 WHERE Time > 10000000 Offset (A,B,C,D)=(1,2,3,4)
Two Main Implementation Paths were considered
Query Rewrite Approach
Convert the Above Query To
SELECT A,B,Value1 FROM TABLE1 WHERE Time > 10000000 AND
A > 1 OR (A = 1 AND B < 2 OR ( B = 2 AND C > 3 OR (D > 4))))
Note the specific > or < depending on the sort order of the partial rowkey.
Then run the query through the optimizer normally.
Pushing Offset Rowkey Directly To Scan via Resolution
Construct the following where clause and run it through using the same table references etc.
“WHERE (A,B,C,D)=(1,2,34) ”
This should generate a plan with a Point Lookup, since we pass what should be a Fully Qualified Key and we can extract the rowkey byte[] scan
Solution
With some discussion with @twdsilva I went with the mini resolution approach. Query rewrite may change the query plan as this now constrains the entire rowkey. In theory based on how Phoenix will always pick an index if the query there are some paths where initial query can hit the base table but the attempts to paginate will not. Consider if the Index Key is a only a reordering of the primary keys and does not add any additional columns. By adding all the columns to the WHERE clause we qualify the index and make the index path selected. This could lead to inconsistent pagination.
While this issues can be overcome by running the parser additional time one with and one without the rewrite or restructuring the flow of the optimizer adding additional phase post optimizer plan selection. These type of concerns and keeping most of the changes in the OffsetCompiler were the advantages.
Dataflow
This change touches the following major components in phoenix, Tokenizer/Lexer, Parser, and Optimizer.
Changes in this area is relatively straight forward mostly in PhoenixSQL.g. We add additional rules for allowing a RVC in the OFFSET clause. This rule generates a modified OffsetNode (This class is also poorly named as it is called a node though its being used more as a clause and is not a subclass of ParseNode.)
Note that the returned ExpressionTree is optimized where the query was rewriten into a conjunction of equality expressions. - > (A = 1 AND B = 2 AND C= 3 AND D = 4).