Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-2722 support mysql limit,offset clauses #154

Closed
wants to merge 7 commits into from

Conversation

ankitsinghal
Copy link
Contributor

No description provided.

@JamesRTaylor
Copy link
Contributor

Would you mind reviewing this, @maryannxue as you'd be the best person to catch any corner cases missed, in particular around joins?

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would be correct. Suppose we have "select * from (select a from t offset 2 limit 8) offset 3", guess we should return the 5th row instead. If you consider the logic is too complicated to optimize at compiletime, you can simply quit flattening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. I thought a little and have updated the logic to
if (offsetRewrite != null || (limitRewrite != null & offset != null)) {
return select;
} else {
offsetRewrite = offset;
}
Let me know if any optimization is possible.

@maryannxue
Copy link
Contributor

@ankitsinghal Thank you very much for the pull request! I am impressed by how many details you have taken into account in your commit. It took me quite a while to go through all of the changes. That said, it would be great if you could add more test cases covering most (if not all) of the code changes, e.g. the join case (left outer join w/ offset and w/wo limit), the derived table case, the join with subquery case, etc.

Check JoinCompiler.isFlat(SelectStatement), it returns true only when limit == null, think it should be the same for offset. That function is called when a join contains a subquery, e.g. "select * from (select a, b from t1 offset 3) join (select c, d from t2 limit 10)".

@ankitsinghal
Copy link
Contributor Author

@maryannxue Thank you so much for the review! I have incorporated the changes as per your comments and added more test cases related to join, derived table ,group by case and join with subquery as you asked.

@@ -134,7 +134,7 @@ public void testDerivedTableWithWhere() throws Exception {
Connection conn = DriverManager.getConnection(getUrl(), props);
try {
// (where)
String query = "SELECT t.eid, t.x + 9 FROM (SELECT entity_id eid, b_string b, a_byte + 1 x FROM aTable WHERE a_byte + 1 < 9) AS t";
String query = "SELECT t.eid, t.x + 9 FROM (SELECT entity_id eid, b_string b, a_byte + 1 x FROM aTable WHERE a_byte + 1 < 9 ) AS t";
PreparedStatement statement = conn.prepareStatement(query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary diff

@maryannxue
Copy link
Contributor

@ankitsinghal Thank you very much for making the suggested changes! Added a few more suggestions, most of them being minor.

@ankitsinghal
Copy link
Contributor Author

Thanks @maryannxue for suggestions , I have incorporated the review comments.

@@ -85,7 +85,8 @@ public void close() throws SQLException {

@Override
public Tuple next() throws SQLException {
while (!this.closed && (offset != null && count++ < offset) && tupleIterator.hasNext()) {
while (!this.closed && (offset != null && count < offset) && tupleIterator.hasNext()) {
count++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was this "count++" doesn't seem to work with LIMIT(below) together. LIMIT should be number of rows returned starting from the OFFSET row, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was in a notion that I updated limit to limit+offset already.
Thanks for catching it, I have corrected it now.

@maryannxue
Copy link
Contributor

@ankitsinghal Some (maybe final) comments added.

updated LiteralResultIterationPlan
updated test in erivedTableIT.java
@ankitsinghal
Copy link
Contributor Author

I have taken care the last review comments in my latest commit.
Let me know if it is good to go now. Thanks you so much @maryannxue for review.

@ankitsinghal
Copy link
Contributor Author

@maryannxue , what do you think .. is it good to go now?

@maryannxue
Copy link
Contributor

Hi @ankitsinghal, sorry about the delay. I don't get github notification in mailbox... It looks good to me now. Thank you for all the work! @JamesRTaylor What do you think?

subSelect = NODE_FACTORY.select(subSelect, select.getOrderBy(), select.getLimit(), null);
} else {
subSelect = NODE_FACTORY.select(subSelect, select.getOrderBy(), null, null);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than this if statement, can we do the following to simplify it?

subSelect = NODE_FACTORY.select(subSelect, select.getOrderBy(), select.getLimit(), select.getOffset())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually , In case of union , we need to apply offset at the final result so we can't apply limit or offset in subselect.
but, if offset is not present , limit can be applied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually , In case of union , we need to apply offset at the final result so we can't apply limit or offset in subselect.
but, if offset is not present , limit can be applied

}
if (limit >= 0 && count++ >= limit) { return null; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be just {{count}} or {{count++}} ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it should count++ only, but there is a bug in offset count .. which I have updated now.

@ankitsinghal
Copy link
Contributor Author

Thanks @JamesRTaylor for review.. I have made the changes are per your review comments.

* Default implementation that creates a scan group if a plan is row key ordered (which requires a merge sort),
* or if a scan crosses a region boundary and the table is salted or a local index.
*/
public class OffsetScanGrouper implements ParallelScanGrouper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this class needed as it doesn't appear to do anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed to avoid scans in parallel even in serialIterators. should I use a anonymous class while creating SerialIterators to do offset on server?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scans aren't done in parallel for SerialIterator, they're done serially. Have you tried it without this? I can't figure out what purpose it's serving, but perhaps I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesRTaylor , It seems for serialIterators also , we use DefaultParallelScanGrouper for splitting scans in groups by condition (isSalted || table.getIndexType() == IndexType.LOCAL) && ScanUtil.shouldRowsBeInRowKeyOrder(orderBy, context) and run each group in parallel.

Anyways, I have modified the plan to run offset on server by including this condition and re-use the DefaultParallelScanGrouper

@JamesRTaylor
Copy link
Contributor

A few minor items, but otherwise it looks good. Nice work, @ankitsinghal! If you could file a JIRA for the corresponding website/doc changes and submit a patch for that when we're closer to the 4.8 release, that'd be much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants