Skip to content

Conversation

@dbwong
Copy link
Contributor

@dbwong dbwong commented Jan 7, 2020

…r paging in tables where the sort order of PK columns varies

Phoenix-4845 Low Level Design

Feature Availability

Use of RVC Offset requires a fully specified Row Key.
Use of RVC Offset requires row order queries.
RVC Offset can not be used with Uncovered Indexes.

Why can we not support leading edge offsets?
This is possible but does not really support the use case of pagination. In addition, it increases the complexity as Phoenix has to garuntee the cases of index prefixes matching multiple tables and keeping the paginated table consistent.

Why do we only support rowkey orders?
This is possible but greatly increases the end users handling of the data as the result order cannot be used to power the next pagination. The end user would have to figure out which row was the “last” one in the result set. I think this is better handled in a followup Server Side Cursors.

Why do we not support Joins?
In theory a join is a view on two separate tables a single offset does not entirely make sense. It may be implementable with a more expressive syntax, however, that would increase the scope.

Why do we not support aggregations?
Properly handling paginated aggregated queries will require a different approach than simply scanning a subset of rows, the groupby would have to scan all the rows with each unique set of values and a global aggregate does not make sense in a paginated query. Not a major usecase in Salesforce.

Why do we not support subqueries?
For the “virtual“ table that consists of the subselect we may not have a row key. Offseting into this table is non-trivial.

Why do we not support uncovered indexes?
Uncovered indexes by default do a full table scan on the base table. If index hint is used then this is performed as a subquery select which we were not handling. This particular subquery select is in theory implementable and this may work but is not a current case for work in Salesforce.

Approach In Phoenix

CREATE TABLE TABLE1 (A INTEGER NOT NULL, B INTEGER NOT NULL DESC, C INTEGER NOT NULL, D INTEGER NOT NULL, Time INTEGER, CONSTRAINT PK(A,B,C,D) )
SELECT A,B,Value1 FROM TABLE1 WHERE Time > 10000000 Offset (A,B,C,D)=(1,2,3,4)

Two Main Implementation Paths were considered

Query Rewrite Approach

Convert the Above Query To
SELECT A,B,Value1 FROM TABLE1 WHERE Time > 10000000 AND
A > 1 OR (A = 1 AND B < 2 OR ( B = 2 AND C > 3 OR (D > 4))))
Note the specific > or < depending on the sort order of the partial rowkey.
Then run the query through the optimizer normally.

Pushing Offset Rowkey Directly To Scan via Resolution

Construct the following where clause and run it through using the same table references etc.
“WHERE (A,B,C,D)=(1,2,34) ”
This should generate a plan with a Point Lookup, since we pass what should be a Fully Qualified Key and we can extract the rowkey byte[] scan

Solution

With some discussion with @twdsilva I went with the mini resolution approach. Query rewrite may change the query plan as this now constrains the entire rowkey. In theory based on how Phoenix will always pick an index if the query there are some paths where initial query can hit the base table but the attempts to paginate will not. Consider if the Index Key is a only a reordering of the primary keys and does not add any additional columns. By adding all the columns to the WHERE clause we qualify the index and make the index path selected. This could lead to inconsistent pagination.
While this issues can be overcome by running the parser additional time one with and one without the rewrite or restructuring the flow of the optimizer adding additional phase post optimizer plan selection. These type of concerns and keeping most of the changes in the OffsetCompiler were the advantages.

Dataflow

This change touches the following major components in phoenix, Tokenizer/Lexer, Parser, and Optimizer.

  1. Tokenizer/Lexer
    Changes in this area is relatively straight forward mostly in PhoenixSQL.g. We add additional rules for allowing a RVC in the OFFSET clause. This rule generates a modified OffsetNode (This class is also poorly named as it is called a node though its being used more as a clause and is not a subclass of ParseNode.)
  2. Parser
    1. The OffsetNode clause is handled in compileSingleFlatQuery in QueryCompiler.java. This method is the main driver for the compilation of a single QueryPlan. This method is intended for querying a table and certain set of subqueries. This also returns a new Class, CompiledOffset, which is a union of the original Integer offset and the rvc offset which is a rowkey. If the offset is not valid we mark this plan as not valid though the use of a new field isApplicable in BaseQueryPlan.
    2. OffsetCompiler compile - This method is the bulk of the changes. And consists of several logical steps.
      1. Input Validation, is the users provided OFFSET clause valid? If not throw an exception. Main job is to get the Primary Key of the current table and compare it to the defined offset clause.
      2. Construction of Fake Where Clause, Compilation/Optimizer
        1. Code constructs a miniWhere clause for the RVC Offset WHERE (
        2. Compiles the Where Clause
        3. Uses WhereOptimizer to generate the scan startkey for this WHERE, this will be the start of our actual scan.
      3. Optimized/Compiled Expression Validation, with alias etc, the basic column validation prior to resolution cannot be entirely trusted. As we have now compiled/optimized the Mini-Where we can evaluate it.
        Note that the returned ExpressionTree is optimized where the query was rewriten into a conjunction of equality expressions. - > (A = 1 AND B = 2 AND C= 3 AND D = 4).
  3. Optimizer
    1. QueryOptimizer - getApplicablePlansForSingleFlatQuery - This code prunes the set of plans by the set of which that are applicable.
      1. The addPlan method is used to handle whether the plan for a given index works, this controls for example not using the uncovered indexes without a hint by catching exception from the call to compiler.compile for the index after column references are rewritten. An uncovered index for example returns a columnNotFound exception which causes that index plan to be dropped. We add an additional catch here for the row value constructor to do a similar handling.
    2. WhereOptimizer - Currently the where clause controls. In case of no where clause offset still has to be applied to the user’s query. In addition, in pushKeyExpressionsToScan is where we end up passing hte compiledOffset in order to start the scan This is done by again passing the rowkey to ScanRanges.create.
    3. ScanRanges - Defines an Physical HBase ScanRange - Change the minimum of the ScanRanges with the passed in the rowkey, assuming the rowkey is more restrictive.
    4. ScanPlan - rowOffset is added and injected into scanPlan in order to support handling the explain plan.

@dbwong
Copy link
Contributor Author

dbwong commented Jan 7, 2020

@ChinmaySKulkarni @kadirozde @yanxinyi Initial Review Please
Update I've added a small writeup of the changes as requested.

@ChinmaySKulkarni
Copy link
Contributor

General comment: @dbwong Please open a documentation JIRA for this feature and mark as a dependency of 4845 so we don't commit code without the documentation. Will review the code in detail soon.

@dbwong
Copy link
Contributor Author

dbwong commented Jan 7, 2020

Made https://issues.apache.org/jira/browse/PHOENIX-5660 as requested @ChinmaySKulkarni

OffsetParseNodeVisitor visitor = new OffsetParseNodeVisitor(context);
offsetNode.getOffsetParseNode().accept(visitor);
return visitor.getOffset();
if (offsetNode == null) { return new CompiledOffset(Optional.<Integer>absent(), Optional.<byte[]>absent()); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more than 100 chars per line.
break into two lines for better readability

Copy link
Contributor

@ChinmaySKulkarni ChinmaySKulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbwong overall looks great so far. Really nice test coverage! I have posted some comments and questions.

Copy link
Contributor

@ChinmaySKulkarni ChinmaySKulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbwong Overall lgtm. I have few nits.

rs = conn.createStatement().executeQuery(sql);


} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbwong is it possible to narrow down this catch to the expected type of exception?


import com.google.common.base.Optional;

//pojo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can tell that it's a pojo without the comment being there. I meant adding a comment about what it is going to be used to represent, since that would be more useful for someone reading the code. Either way, a comment is not necessarily even needed here, but that's just my opinion.

//other use cases.
//Note If the table is salted we still mark as row ordered in this code path
if(offset.getByteOffset().isPresent() && orderByExpressions.isEmpty()){
throw new SQLException("Do not allow non ORDER BY with RVC OFFSET");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbwong ping on this change

Copy link
Contributor

@ChinmaySKulkarni ChinmaySKulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 lgtm. Thanks @dbwong

@yanxinyi
Copy link
Contributor

@dbwong please rebase and upload 4.x and master patches to the JIRA. If the Hadoop QA has no test failures, I will merge your patch shortly. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants