Fix limit + offset pushdown #2638

ming535 · 2022-05-28T14:38:56Z

Which issue does this PR close?

Rationale for this change

See below.

What changes are included in this PR?

When SqlToRel build the plan, Limit should be the parent of Offset, according to https://www.postgresql.org/docs/current/queries-limit.html
(If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the LIMIT rows that are returned.)
fix limit_push_down to consider every combination of Offset and Limit when traversing down the plan tree.

Are there any user-facing changes?

The semantic of LogicalPlan#offset, and LogicalPlan#limit is different.
Previously, builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.

The current semantic is consistent with how data is flowing in the plan tree.

Does this PR break compatibility with Ballista?

Ted-Jiang · 2022-05-30T03:50:37Z

builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.
@ming535 If this happened, means only return 90 rows?

ming535 · 2022-05-30T07:25:05Z

builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.
@ming535 If this happened, means only return 90 rows?

Yes. The current design of plan builder tries to mimics the semantics of SQL language; this PR tries to mimics the tree structure of logical operator.

We can add another api for the builder to combines limit and offset to mimics SQL language. For example:
fetch(offset: Option<usize>, limit: Option<usize>)

Ted-Jiang · 2022-05-30T08:29:27Z

builder.limit(100).offset(10) means ignore the first 10 rows and then starting to count 100 rows. Now it
means returns 100 rows and then skip 10 rows.
@ming535 If this happened, means only return 90 rows?

Yes. The current design of plan builder tries to mimics the semantics of SQL language; this PR tries to mimics the tree structure of logical operator.

We can add another api for the builder to combines limit and offset to mimics SQL language. For example: fetch(offset: Option<usize>, limit: Option<usize>)

Before i impl the rule i test in pg

postgres=# select * from users offset 1 limit 12 ;
 id |          hash_firstname          |          hash_lastname           | gender
----+----------------------------------+----------------------------------+--------
  2 | f5e8f62332c256ebdceea3050a1da595 | 5d6c111b409bd97b5b8e5357668564d7 | male
  3 | ff82b6f31f4b4a7647ded507a599147a | cb9a959c8e42ee8f1de33c50c15b2267 | male
  4 | a643dd92e5a999e9372d73373cceb407 | 68ae1a597b5b04daa3f029573dde4fb9 | female
  5 | f7f6c7f939202806ebe4cdda0821b1f8 | 7a451c9e30238af75001423f17f300ac | male
  6 | 37f9869968840feaf441f963e2171254 | 6a06f56a6e66ee457ddc547b7df20de4 | female
  7 | e2c4c7cb13dd5f2511d3402d36cb4884 | 7cc3e36824f369b88a3445b5dc2b36d4 | female
  8 | aea96778de92539cc9b80864a969c82b | 92517c35f9f1baf0ed831eab4c23757e | male
  9 | 363547e325516d28f96b69a8e88deb44 | 4e6cadf5f26aacca0f8c03c115825fbe | female
 10 | 340d4e6b3968e80c2fb02704a30fc574 | e0b7b41ae5397ae231c783e4e5d9d848 | female
 11 | 3c8e1102dc3c9551be2fb40e4dbc0bc7 | d20d0155652a23c327734fb0e0127247 | female
 12 | cc70f3378a9364603023924670b8bb0d | ac16e18018a438445e12492bf018cf0f | male
 13 | 78cecc8caaf5c303097eb3816a411e79 | a0a19a4b5b595562fc8b978520401166 | female
(12 rows)

postgres=# select * from users limit 10 offset 1;
 id |          hash_firstname          |          hash_lastname           | gender
----+----------------------------------+----------------------------------+--------
  2 | f5e8f62332c256ebdceea3050a1da595 | 5d6c111b409bd97b5b8e5357668564d7 | male
  3 | ff82b6f31f4b4a7647ded507a599147a | cb9a959c8e42ee8f1de33c50c15b2267 | male
  4 | a643dd92e5a999e9372d73373cceb407 | 68ae1a597b5b04daa3f029573dde4fb9 | female
  5 | f7f6c7f939202806ebe4cdda0821b1f8 | 7a451c9e30238af75001423f17f300ac | male
  6 | 37f9869968840feaf441f963e2171254 | 6a06f56a6e66ee457ddc547b7df20de4 | female
  7 | e2c4c7cb13dd5f2511d3402d36cb4884 | 7cc3e36824f369b88a3445b5dc2b36d4 | female
  8 | aea96778de92539cc9b80864a969c82b | 92517c35f9f1baf0ed831eab4c23757e | male
  9 | 363547e325516d28f96b69a8e88deb44 | 4e6cadf5f26aacca0f8c03c115825fbe | female
 10 | 340d4e6b3968e80c2fb02704a30fc574 | e0b7b41ae5397ae231c783e4e5d9d848 | female
 11 | 3c8e1102dc3c9551be2fb40e4dbc0bc7 | d20d0155652a23c327734fb0e0127247 | female
(10 rows)

postgres=# select * from users offset 1 limit 10 ;
 id |          hash_firstname          |          hash_lastname           | gender
----+----------------------------------+----------------------------------+--------
  2 | f5e8f62332c256ebdceea3050a1da595 | 5d6c111b409bd97b5b8e5357668564d7 | male
  3 | ff82b6f31f4b4a7647ded507a599147a | cb9a959c8e42ee8f1de33c50c15b2267 | male
  4 | a643dd92e5a999e9372d73373cceb407 | 68ae1a597b5b04daa3f029573dde4fb9 | female
  5 | f7f6c7f939202806ebe4cdda0821b1f8 | 7a451c9e30238af75001423f17f300ac | male
  6 | 37f9869968840feaf441f963e2171254 | 6a06f56a6e66ee457ddc547b7df20de4 | female
  7 | e2c4c7cb13dd5f2511d3402d36cb4884 | 7cc3e36824f369b88a3445b5dc2b36d4 | female
  8 | aea96778de92539cc9b80864a969c82b | 92517c35f9f1baf0ed831eab4c23757e | male
  9 | 363547e325516d28f96b69a8e88deb44 | 4e6cadf5f26aacca0f8c03c115825fbe | female
 10 | 340d4e6b3968e80c2fb02704a30fc574 | e0b7b41ae5397ae231c783e4e5d9d848 | female
 11 | 3c8e1102dc3c9551be2fb40e4dbc0bc7 | d20d0155652a23c327734fb0e0127247 | female
(10 rows)

Use offset before limit should also return 10 rows.

ming535 · 2022-05-30T09:11:27Z

c70f3378a9364603023924670b8bb0d | ac16e18018a438445e12492bf018cf0f | male
13 | 78cecc

In pg, the order of "offset" and "limit" doen't matter. It always means "OFFSET rows are skipped before starting to count the LIMIT rows that are returned".

In the logical plan tree, the order of "offset" and “limit” does matter.

This PR makes several "assumptions":

The builder api should be independent of SQL language, it only describes the logical tree structure.
When users use the builder api to construct the plan tree, if "limit" is the parent of "offset", then the semantic is to ignore some rows first; otherwise, the semantic is to first count the number rows.
When translating SQL to logical plan in the planner, we make "limit" to be the parent of "offset" to be compatible with pg.

ming535 · 2022-05-30T12:06:10Z

For api of plan builder, I like the way Calcite is doing https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/RelBuilder.html#limit(int,int)
which combines "limit" and "offset".

alamb · 2022-05-31T12:16:07Z

I plan to review this later today or tomorrow if no one else has a chance to do so before

Ted-Jiang · 2022-05-31T13:54:52Z

I plan to review this later today or tomorrow if no one else has a chance to do so before

@alamb I am reviewing this one, I will finish this evening.

Ted-Jiang

LGTM, Thanks @ming535 👍
i run this in pg

explain VERBOSE  select id from users limit 10 offset 1;
                               QUERY PLAN
------------------------------------------------------------------------
 Limit  (cost=0.02..0.26 rows=10 width=8)
   Output: id
   ->  Seq Scan on public.users  (cost=0.00..239.02 rows=10002 width=8)
         Output: id
(4 rows)

For api of plan builder, I like the way Calcite is doing https://calcite.apache.org/javadocAggregate/org/apache/calcite/tools/RelBuilder.html#limit(int,int) which combines "limit" and "offset".

Like @ming535 said, I think combines "limit" and "offset" them is a good way, avoid create a new plan struct. @alamb need your opinion on this

Ted-Jiang · 2022-05-31T14:34:21Z

datafusion/core/src/optimizer/limit_push_down.rs

+///    it is updated to (None, min(n1, n2))).
+/// 2. Ancestor_Limit(n1) -> .. -> Current_Offset(m1)
+///    it is updated to (m1, n1 + m1).
+/// 3. Ancestor_Offset(m1) -> .. -> Current_Offset(m2)


I think without subquery, it should not allowed two offset.

I think without subquery, it should not allowed two offset.

I think the sql parser should forbids this (two offset without subquery). As of the plan tree building with builder api, I think it is fine that it is not aware of this as long as the semantics looks right.

datafusion/core/src/optimizer/limit_push_down.rs

andygrove · 2022-06-01T18:37:01Z

Yes. The current design of plan builder tries to mimics the semantics of SQL language; this PR tries to mimics the tree structure of logical operator.

We can add another api for the builder to combines limit and offset to mimics SQL language. For example: fetch(offset: Option<usize>, limit: Option<usize>)

I wonder if we should consider renaming the DataFrame / LogicalPlanBuilder methods. The terms limit and offset are very SQL-specific and not so intuitive here and I think that adds to confusion about how they might be interpreted. Perhaps we should consider renaming offset to drop for example. That would be closer to pandas based on a quick look at the documentation.

andygrove · 2022-06-01T18:39:46Z

I also like the idea of combining limit and offset in one DataFrame / builder method

andygrove

LGTM based on the current tests. I left some opinions on possible changes to the builder API.

alamb

I went through the plan and code changes carefully. I think it looks very nice 👌 @ming535 Thank you very much.

alamb · 2022-06-01T20:02:59Z

datafusion/core/src/optimizer/limit_push_down.rs

+    ancestor: Ancestor,
+    ancestor_offset: Option<usize>,
+    ancestor_limit: Option<usize>,


If you wanted to be fancy / more idomatic rust and have the compiler check more of the invariants you could consider encoding the ancestor, offset and limit together.

Something like

enum Ancestor { /// Limit FromLimit(usize), /// Offset FromOffset(usize), /// Other nodes that don't affect the adjustment of "Limit" NotRelevant, }

alamb · 2022-06-01T20:05:22Z

datafusion/core/src/optimizer/limit_push_down.rs


        let expected = "Offset: 10\
-        \n  Limit: 1010\
+        \n  Limit: 1000\


this plan change is due to the change in semantics of LogicalPlanBuiler::offset(), right?

alamb · 2022-06-01T20:05:39Z

datafusion/core/src/optimizer/limit_push_down.rs

+        let expected = "Limit: 1000\
+        \n  Offset: 10\
        \n    Projection: #test.a\
        \n      TableScan: test projection=None, limit=1010";


alamb · 2022-06-01T20:13:55Z

Like @ming535 said, I think combines "limit" and "offset" them is a good way, avoid create a new plan struct. @alamb need your opinion on this

I agree 👍

alamb · 2022-06-01T20:14:43Z

I merged this PR as it has three reviewers and approvers -- I think any additional work can be done as follow on PRs.

Thanks again @ming535 @Ted-Jiang and @andygrove 🏅

github-actions bot added core Core DataFusion crate datafusion optimizer Optimizer rules sql SQL Planner labels May 28, 2022

ming535 mentioned this pull request May 28, 2022

Physical Plan for OFFSET #2629

Closed

Ted-Jiang approved these changes May 31, 2022

View reviewed changes

fix OFFSET without LIMIT

59d9b1f

ming535 force-pushed the fix-limit-pushdown branch from 14b7cf1 to 59d9b1f Compare June 1, 2022 14:13

andygrove approved these changes Jun 1, 2022

View reviewed changes

alamb changed the title ~~Fix limit pushdown~~ Fix limit + offset pushdown Jun 1, 2022

alamb approved these changes Jun 1, 2022

View reviewed changes

alamb merged commit 45a975d into apache:master Jun 1, 2022

ming535 mentioned this pull request Jun 4, 2022

Combine limit and offset to fetch and skip and implement physical plan support #2694

Merged

gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 30, 2022

fix OFFSET without LIMIT (apache#2638)

fc0e00d

gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 31, 2022

fix OFFSET without LIMIT (apache#2638)

dd85b3a

gandronchik pushed a commit to cube-js/arrow-datafusion that referenced this pull request Sep 2, 2022

fix OFFSET without LIMIT (apache#2638)

cf46d4a

Fix limit + offset pushdown #2638

Fix limit + offset pushdown #2638

Uh oh!

Conversation

ming535 commented May 28, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Does this PR break compatibility with Ballista?

Uh oh!

Ted-Jiang commented May 30, 2022

Uh oh!

ming535 commented May 30, 2022

Uh oh!

Ted-Jiang commented May 30, 2022

Uh oh!

ming535 commented May 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ming535 commented May 30, 2022

Uh oh!

alamb commented May 31, 2022

Uh oh!

Ted-Jiang commented May 31, 2022

Uh oh!

Ted-Jiang left a comment

Choose a reason for hiding this comment

Uh oh!

Ted-Jiang May 31, 2022

Choose a reason for hiding this comment

Uh oh!

ming535 Jun 1, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andygrove commented Jun 1, 2022

Uh oh!

andygrove commented Jun 1, 2022

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Jun 1, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Jun 1, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Jun 1, 2022

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 1, 2022

Uh oh!

alamb commented Jun 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ming535 commented May 30, 2022 •

edited

Loading