PHOENIX-4845 Support using Row Value Constructors in OFFSET clause fo… #671

dbwong · 2020-01-07T00:02:35Z

…r paging in tables where the sort order of PK columns varies

Phoenix-4845 Low Level Design

Feature Availability

Use of RVC Offset requires a fully specified Row Key.
Use of RVC Offset requires row order queries.
RVC Offset can not be used with Uncovered Indexes.

Why can we not support leading edge offsets?
This is possible but does not really support the use case of pagination. In addition, it increases the complexity as Phoenix has to garuntee the cases of index prefixes matching multiple tables and keeping the paginated table consistent.

Why do we only support rowkey orders?
This is possible but greatly increases the end users handling of the data as the result order cannot be used to power the next pagination. The end user would have to figure out which row was the “last” one in the result set. I think this is better handled in a followup Server Side Cursors.

Why do we not support Joins?
In theory a join is a view on two separate tables a single offset does not entirely make sense. It may be implementable with a more expressive syntax, however, that would increase the scope.

Why do we not support aggregations?
Properly handling paginated aggregated queries will require a different approach than simply scanning a subset of rows, the groupby would have to scan all the rows with each unique set of values and a global aggregate does not make sense in a paginated query. Not a major usecase in Salesforce.

Why do we not support subqueries?
For the “virtual“ table that consists of the subselect we may not have a row key. Offseting into this table is non-trivial.

Why do we not support uncovered indexes?
Uncovered indexes by default do a full table scan on the base table. If index hint is used then this is performed as a subquery select which we were not handling. This particular subquery select is in theory implementable and this may work but is not a current case for work in Salesforce.

Approach In Phoenix

CREATE TABLE TABLE1 (A INTEGER NOT NULL, B INTEGER NOT NULL DESC, C INTEGER NOT NULL, D INTEGER NOT NULL, Time INTEGER, CONSTRAINT PK(A,B,C,D) )
SELECT A,B,Value1 FROM TABLE1 WHERE Time > 10000000 Offset (A,B,C,D)=(1,2,3,4)

Two Main Implementation Paths were considered

Query Rewrite Approach

Convert the Above Query To
SELECT A,B,Value1 FROM TABLE1 WHERE Time > 10000000 AND
A > 1 OR (A = 1 AND B < 2 OR ( B = 2 AND C > 3 OR (D > 4))))
Note the specific > or < depending on the sort order of the partial rowkey.
Then run the query through the optimizer normally.

Pushing Offset Rowkey Directly To Scan via Resolution

Construct the following where clause and run it through using the same table references etc.
“WHERE (A,B,C,D)=(1,2,34) ”
This should generate a plan with a Point Lookup, since we pass what should be a Fully Qualified Key and we can extract the rowkey byte[] scan

Solution

With some discussion with @twdsilva I went with the mini resolution approach. Query rewrite may change the query plan as this now constrains the entire rowkey. In theory based on how Phoenix will always pick an index if the query there are some paths where initial query can hit the base table but the attempts to paginate will not. Consider if the Index Key is a only a reordering of the primary keys and does not add any additional columns. By adding all the columns to the WHERE clause we qualify the index and make the index path selected. This could lead to inconsistent pagination.
While this issues can be overcome by running the parser additional time one with and one without the rewrite or restructuring the flow of the optimizer adding additional phase post optimizer plan selection. These type of concerns and keeping most of the changes in the OffsetCompiler were the advantages.

Dataflow

This change touches the following major components in phoenix, Tokenizer/Lexer, Parser, and Optimizer.

Tokenizer/Lexer
Changes in this area is relatively straight forward mostly in PhoenixSQL.g. We add additional rules for allowing a RVC in the OFFSET clause. This rule generates a modified OffsetNode (This class is also poorly named as it is called a node though its being used more as a clause and is not a subclass of ParseNode.)
Parser
1. The OffsetNode clause is handled in compileSingleFlatQuery in QueryCompiler.java. This method is the main driver for the compilation of a single QueryPlan. This method is intended for querying a table and certain set of subqueries. This also returns a new Class, CompiledOffset, which is a union of the original Integer offset and the rvc offset which is a rowkey. If the offset is not valid we mark this plan as not valid though the use of a new field isApplicable in BaseQueryPlan.
2. OffsetCompiler compile - This method is the bulk of the changes. And consists of several logical steps.
  1. Input Validation, is the users provided OFFSET clause valid? If not throw an exception. Main job is to get the Primary Key of the current table and compare it to the defined offset clause.
  2. Construction of Fake Where Clause, Compilation/Optimizer
    1. Code constructs a miniWhere clause for the RVC Offset WHERE (
    2. Compiles the Where Clause
    3. Uses WhereOptimizer to generate the scan startkey for this WHERE, this will be the start of our actual scan.
  3. Optimized/Compiled Expression Validation, with alias etc, the basic column validation prior to resolution cannot be entirely trusted. As we have now compiled/optimized the Mini-Where we can evaluate it.
    Note that the returned ExpressionTree is optimized where the query was rewriten into a conjunction of equality expressions. - > (A = 1 AND B = 2 AND C= 3 AND D = 4).
Optimizer
1. QueryOptimizer - getApplicablePlansForSingleFlatQuery - This code prunes the set of plans by the set of which that are applicable.
  1. The addPlan method is used to handle whether the plan for a given index works, this controls for example not using the uncovered indexes without a hint by catching exception from the call to compiler.compile for the index after column references are rewritten. An uncovered index for example returns a columnNotFound exception which causes that index plan to be dropped. We add an additional catch here for the row value constructor to do a similar handling.
2. WhereOptimizer - Currently the where clause controls. In case of no where clause offset still has to be applied to the user’s query. In addition, in pushKeyExpressionsToScan is where we end up passing hte compiledOffset in order to start the scan This is done by again passing the rowkey to ScanRanges.create.
3. ScanRanges - Defines an Physical HBase ScanRange - Change the minimum of the ScanRanges with the passed in the rowkey, assuming the rowkey is more restrictive.
4. ScanPlan - rowOffset is added and injected into scanPlan in order to support handling the explain plan.

dbwong · 2020-01-07T00:03:32Z

@ChinmaySKulkarni @kadirozde @yanxinyi Initial Review Please
Update I've added a small writeup of the changes as requested.

phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java

ChinmaySKulkarni · 2020-01-07T01:58:16Z

General comment: @dbwong Please open a documentation JIRA for this feature and mark as a dependency of 4845 so we don't commit code without the documentation. Will review the code in detail soon.

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java

dbwong · 2020-01-07T02:05:41Z

Made https://issues.apache.org/jira/browse/PHOENIX-5660 as requested @ChinmaySKulkarni

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java

yanxinyi · 2020-01-07T02:49:57Z

phoenix-core/src/main/java/org/apache/phoenix/compile/OffsetCompiler.java

-        OffsetParseNodeVisitor visitor = new OffsetParseNodeVisitor(context);
-        offsetNode.getOffsetParseNode().accept(visitor);
-        return visitor.getOffset();
+        if (offsetNode == null) { return new CompiledOffset(Optional.<Integer>absent(), Optional.<byte[]>absent()); }


nit: more than 100 chars per line.
break into two lines for better readability

ChinmaySKulkarni

@dbwong overall looks great so far. Really nice test coverage! I have posted some comments and questions.

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java

phoenix-core/src/main/java/org/apache/phoenix/compile/WhereCompiler.java

phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java

.../src/main/java/org/apache/phoenix/schema/RowValueConstructorOffsetNotCoercibleException.java

phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java

ChinmaySKulkarni

@dbwong Overall lgtm. I have few nits.

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java

ChinmaySKulkarni · 2020-02-07T19:25:56Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java

+            rs = conn.createStatement().executeQuery(sql);
+
+
+        } catch (Exception e) {


@dbwong is it possible to narrow down this catch to the expected type of exception?

ChinmaySKulkarni · 2020-02-07T19:33:52Z

phoenix-core/src/main/java/org/apache/phoenix/compile/CompiledOffset.java

+
+import com.google.common.base.Optional;
+
+//pojo


You can tell that it's a pojo without the comment being there. I meant adding a comment about what it is going to be used to represent, since that would be more useful for someone reading the code. Either way, a comment is not necessarily even needed here, but that's just my opinion.

phoenix-core/src/main/java/org/apache/phoenix/compile/OffsetCompiler.java

ChinmaySKulkarni · 2020-02-07T19:38:14Z

phoenix-core/src/main/java/org/apache/phoenix/compile/OrderByCompiler.java

+        //other use cases.
+        //Note If the table is salted we still mark as row ordered in this code path
+        if(offset.getByteOffset().isPresent() && orderByExpressions.isEmpty()){
+            throw new SQLException("Do not allow non ORDER BY with RVC OFFSET");


@dbwong ping on this change

phoenix-core/src/main/java/org/apache/phoenix/compile/RVCOffsetCompiler.java

ChinmaySKulkarni

+1 lgtm. Thanks @dbwong

yanxinyi · 2020-02-14T19:32:19Z

@dbwong please rebase and upload 4.x and master patches to the JIRA. If the Hadoop QA has no test failures, I will merge your patch shortly. Thanks.

yanxinyi reviewed Jan 7, 2020

View reviewed changes

phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java Outdated Show resolved Hide resolved

yanxinyi reviewed Jan 7, 2020

View reviewed changes

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java Outdated Show resolved Hide resolved

yanxinyi reviewed Jan 7, 2020

View reviewed changes

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java Outdated Show resolved Hide resolved

yanxinyi reviewed Jan 7, 2020

View reviewed changes

phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java Outdated Show resolved Hide resolved

yanxinyi reviewed Jan 7, 2020

View reviewed changes

ChinmaySKulkarni requested changes Jan 7, 2020

View reviewed changes

ChinmaySKulkarni requested a review from kadirozde January 7, 2020 03:27

ChinmaySKulkarni reviewed Feb 7, 2020

View reviewed changes

yanxinyi reviewed Feb 8, 2020

View reviewed changes

phoenix-core/src/main/java/org/apache/phoenix/compile/RVCOffsetCompiler.java Outdated Show resolved Hide resolved

ChinmaySKulkarni approved these changes Feb 14, 2020

View reviewed changes

dbwong force-pushed the 4.x-HBase-1.3 branch from 7713f5b to e2b011d Compare March 5, 2020 22:30

dbwong closed this Mar 6, 2020

dbwong force-pushed the 4.x-HBase-1.3 branch from e2b011d to 4522286 Compare March 6, 2020 01:24

		rs = conn.createStatement().executeQuery(sql);


		} catch (Exception e) {


		import com.google.common.base.Optional;

		//pojo

PHOENIX-4845 Support using Row Value Constructors in OFFSET clause fo… #671

PHOENIX-4845 Support using Row Value Constructors in OFFSET clause fo… #671

Uh oh!

Conversation

dbwong commented Jan 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Phoenix-4845 Low Level Design

Feature Availability

Approach In Phoenix

Query Rewrite Approach

Pushing Offset Rowkey Directly To Scan via Resolution

Solution

Dataflow

Uh oh!

dbwong commented Jan 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChinmaySKulkarni commented Jan 7, 2020

Uh oh!

Uh oh!

dbwong commented Jan 7, 2020

Uh oh!

Uh oh!

yanxinyi Jan 7, 2020

Choose a reason for hiding this comment

Uh oh!

ChinmaySKulkarni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChinmaySKulkarni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChinmaySKulkarni Feb 7, 2020

Choose a reason for hiding this comment

Uh oh!

ChinmaySKulkarni Feb 7, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChinmaySKulkarni Feb 7, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChinmaySKulkarni left a comment

Choose a reason for hiding this comment

Uh oh!

yanxinyi commented Feb 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dbwong commented Jan 7, 2020 •

edited

Loading

dbwong commented Jan 7, 2020 •

edited

Loading