Skip to content

Conversation

@ming535
Copy link
Contributor

@ming535 ming535 commented May 28, 2022

Which issue does this PR close?

Closes #2624

Rationale for this change

See below.

What changes are included in this PR?

  1. When SqlToRel build the plan, Limit should be the parent of Offset, according to https://www.postgresql.org/docs/current/queries-limit.html
    (If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the LIMIT rows that are returned.)

  2. fix limit_push_down to consider every combination of Offset and Limit when traversing down the plan tree.

Are there any user-facing changes?

The semantic of LogicalPlan#offset, and LogicalPlan#limit is different.
Previously, builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.

The current semantic is consistent with how data is flowing in the plan tree.

Does this PR break compatibility with Ballista?

@github-actions github-actions bot added core Core DataFusion crate datafusion optimizer Optimizer rules sql SQL Planner labels May 28, 2022
@Ted-Jiang
Copy link
Member

builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.
@ming535 If this happened, means only return 90 rows?

@ming535
Copy link
Contributor Author

ming535 commented May 30, 2022

builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.
@ming535 If this happened, means only return 90 rows?

Yes. The current design of plan builder tries to mimics the semantics of SQL language; this PR tries to mimics the tree structure of logical operator.

We can add another api for the builder to combines limit and offset to mimics SQL language. For example:
fetch(offset: Option<usize>, limit: Option<usize>)

@Ted-Jiang
Copy link
Member

builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.
@ming535 If this happened, means only return 90 rows?

Yes. The current design of plan builder tries to mimics the semantics of SQL language; this PR tries to mimics the tree structure of logical operator.

We can add another api for the builder to combines limit and offset to mimics SQL language. For example: fetch(offset: Option<usize>, limit: Option<usize>)

Before i impl the rule i test in pg

postgres=# select * from users offset 1 limit 12 ;
 id |          hash_firstname          |          hash_lastname           | gender
----+----------------------------------+----------------------------------+--------
  2 | f5e8f62332c256ebdceea3050a1da595 | 5d6c111b409bd97b5b8e5357668564d7 | male
  3 | ff82b6f31f4b4a7647ded507a599147a | cb9a959c8e42ee8f1de33c50c15b2267 | male
  4 | a643dd92e5a999e9372d73373cceb407 | 68ae1a597b5b04daa3f029573dde4fb9 | female
  5 | f7f6c7f939202806ebe4cdda0821b1f8 | 7a451c9e30238af75001423f17f300ac | male
  6 | 37f9869968840feaf441f963e2171254 | 6a06f56a6e66ee457ddc547b7df20de4 | female
  7 | e2c4c7cb13dd5f2511d3402d36cb4884 | 7cc3e36824f369b88a3445b5dc2b36d4 | female
  8 | aea96778de92539cc9b80864a969c82b | 92517c35f9f1baf0ed831eab4c23757e | male
  9 | 363547e325516d28f96b69a8e88deb44 | 4e6cadf5f26aacca0f8c03c115825fbe | female
 10 | 340d4e6b3968e80c2fb02704a30fc574 | e0b7b41ae5397ae231c783e4e5d9d848 | female
 11 | 3c8e1102dc3c9551be2fb40e4dbc0bc7 | d20d0155652a23c327734fb0e0127247 | female
 12 | cc70f3378a9364603023924670b8bb0d | ac16e18018a438445e12492bf018cf0f | male
 13 | 78cecc8caaf5c303097eb3816a411e79 | a0a19a4b5b595562fc8b978520401166 | female
(12 rows)

postgres=# select * from users limit 10 offset 1;
 id |          hash_firstname          |          hash_lastname           | gender
----+----------------------------------+----------------------------------+--------
  2 | f5e8f62332c256ebdceea3050a1da595 | 5d6c111b409bd97b5b8e5357668564d7 | male
  3 | ff82b6f31f4b4a7647ded507a599147a | cb9a959c8e42ee8f1de33c50c15b2267 | male
  4 | a643dd92e5a999e9372d73373cceb407 | 68ae1a597b5b04daa3f029573dde4fb9 | female
  5 | f7f6c7f939202806ebe4cdda0821b1f8 | 7a451c9e30238af75001423f17f300ac | male
  6 | 37f9869968840feaf441f963e2171254 | 6a06f56a6e66ee457ddc547b7df20de4 | female
  7 | e2c4c7cb13dd5f2511d3402d36cb4884 | 7cc3e36824f369b88a3445b5dc2b36d4 | female
  8 | aea96778de92539cc9b80864a969c82b | 92517c35f9f1baf0ed831eab4c23757e | male
  9 | 363547e325516d28f96b69a8e88deb44 | 4e6cadf5f26aacca0f8c03c115825fbe | female
 10 | 340d4e6b3968e80c2fb02704a30fc574 | e0b7b41ae5397ae231c783e4e5d9d848 | female
 11 | 3c8e1102dc3c9551be2fb40e4dbc0bc7 | d20d0155652a23c327734fb0e0127247 | female
(10 rows)

postgres=# select * from users offset 1 limit 10 ;
 id |          hash_firstname          |          hash_lastname           | gender
----+----------------------------------+----------------------------------+--------
  2 | f5e8f62332c256ebdceea3050a1da595 | 5d6c111b409bd97b5b8e5357668564d7 | male
  3 | ff82b6f31f4b4a7647ded507a599147a | cb9a959c8e42ee8f1de33c50c15b2267 | male
  4 | a643dd92e5a999e9372d73373cceb407 | 68ae1a597b5b04daa3f029573dde4fb9 | female
  5 | f7f6c7f939202806ebe4cdda0821b1f8 | 7a451c9e30238af75001423f17f300ac | male
  6 | 37f9869968840feaf441f963e2171254 | 6a06f56a6e66ee457ddc547b7df20de4 | female
  7 | e2c4c7cb13dd5f2511d3402d36cb4884 | 7cc3e36824f369b88a3445b5dc2b36d4 | female
  8 | aea96778de92539cc9b80864a969c82b | 92517c35f9f1baf0ed831eab4c23757e | male
  9 | 363547e325516d28f96b69a8e88deb44 | 4e6cadf5f26aacca0f8c03c115825fbe | female
 10 | 340d4e6b3968e80c2fb02704a30fc574 | e0b7b41ae5397ae231c783e4e5d9d848 | female
 11 | 3c8e1102dc3c9551be2fb40e4dbc0bc7 | d20d0155652a23c327734fb0e0127247 | female
(10 rows)

Use offset before limit should also return 10 rows.

@ming535
Copy link
Contributor Author

ming535 commented May 30, 2022

c70f3378a9364603023924670b8bb0d | ac16e18018a438445e12492bf018cf0f | male
13 | 78cecc

In pg, the order of "offset" and "limit" doen't matter. It always means "OFFSET rows are skipped before starting to count the LIMIT rows that are returned".

In the logical plan tree, the order of "offset" and “limit” does matter.

This PR makes several "assumptions":

  1. The builder api should be independent of SQL language, it only describes the logical tree structure.
  2. When users use the builder api to construct the plan tree, if "limit" is the parent of "offset", then the semantic is to ignore some rows first; otherwise, the semantic is to first count the number rows.
  3. When translating SQL to logical plan in the planner, we make "limit" to be the parent of "offset" to be compatible with pg.

@ming535
Copy link
Contributor Author

ming535 commented May 30, 2022

For api of plan builder, I like the way Calcite is doing https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/RelBuilder.html#limit(int,int)
which combines "limit" and "offset".

@alamb
Copy link
Contributor

alamb commented May 31, 2022

I plan to review this later today or tomorrow if no one else has a chance to do so before

@Ted-Jiang
Copy link
Member

I plan to review this later today or tomorrow if no one else has a chance to do so before

@alamb I am reviewing this one, I will finish this evening.

Copy link
Member

@Ted-Jiang Ted-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @ming535 👍
i run this in pg

explain VERBOSE  select id from users limit 10 offset 1;
                               QUERY PLAN
------------------------------------------------------------------------
 Limit  (cost=0.02..0.26 rows=10 width=8)
   Output: id
   ->  Seq Scan on public.users  (cost=0.00..239.02 rows=10002 width=8)
         Output: id
(4 rows)

For api of plan builder, I like the way Calcite is doing https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/RelBuilder.html#limit(int,int) which combines "limit" and "offset".

Like @ming535 said, I think combines "limit" and "offset" them is a good way, avoid create a new plan struct. @alamb need your opinion on this

/// it is updated to (None, min(n1, n2))).
/// 2. Ancestor_Limit(n1) -> .. -> Current_Offset(m1)
/// it is updated to (m1, n1 + m1).
/// 3. Ancestor_Offset(m1) -> .. -> Current_Offset(m2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think without subquery, it should not allowed two offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think without subquery, it should not allowed two offset.

I think the sql parser should forbids this (two offset without subquery). As of the plan tree building with builder api, I think it is fine that it is not aware of this as long as the semantics looks right.

@ming535 ming535 force-pushed the fix-limit-pushdown branch from 14b7cf1 to 59d9b1f Compare June 1, 2022 14:13
@andygrove
Copy link
Member

Yes. The current design of plan builder tries to mimics the semantics of SQL language; this PR tries to mimics the tree structure of logical operator.

We can add another api for the builder to combines limit and offset to mimics SQL language. For example: fetch(offset: Option<usize>, limit: Option<usize>)

I wonder if we should consider renaming the DataFrame / LogicalPlanBuilder methods. The terms limit and offset are very SQL-specific and not so intuitive here and I think that adds to confusion about how they might be interpreted. Perhaps we should consider renaming offset to drop for example. That would be closer to pandas based on a quick look at the documentation.

@andygrove
Copy link
Member

I also like the idea of combining limit and offset in one DataFrame / builder method

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on the current tests. I left some opinions on possible changes to the builder API.

@alamb alamb changed the title Fix limit pushdown Fix limit + offset pushdown Jun 1, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the plan and code changes carefully. I think it looks very nice 👌 @ming535 Thank you very much.

Comment on lines +78 to +80
ancestor: Ancestor,
ancestor_offset: Option<usize>,
ancestor_limit: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to be fancy / more idomatic rust and have the compiler check more of the invariants you could consider encoding the ancestor, offset and limit together.

Something like

enum Ancestor {
    /// Limit
    FromLimit(usize),
    /// Offset
    FromOffset(usize),
    /// Other nodes that don't affect the adjustment of "Limit"
    NotRelevant,
}


let expected = "Offset: 10\
\n Limit: 1010\
\n Limit: 1000\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this plan change is due to the change in semantics of LogicalPlanBuiler::offset(), right?

let expected = "Limit: 1000\
\n Offset: 10\
\n Projection: #test.a\
\n TableScan: test projection=None, limit=1010";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Jun 1, 2022

Like @ming535 said, I think combines "limit" and "offset" them is a good way, avoid create a new plan struct. @alamb need your opinion on this

I agree 👍

@alamb alamb merged commit 45a975d into apache:master Jun 1, 2022
@alamb
Copy link
Contributor

alamb commented Jun 1, 2022

I merged this PR as it has three reviewers and approvers -- I think any additional work can be done as follow on PRs.

Thanks again @ming535 @Ted-Jiang and @andygrove 🏅

gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 30, 2022
gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 31, 2022
gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

limit_push_down is not working properly with OFFSET

4 participants