Skip to content

branch-4.1: [feature](fe) Push down limit into CTE producer (#63675)#64003

Open
CalvinKirs wants to merge 1 commit into
branch-4.1from
branch-4.1-63675
Open

branch-4.1: [feature](fe) Push down limit into CTE producer (#63675)#64003
CalvinKirs wants to merge 1 commit into
branch-4.1from
branch-4.1-63675

Conversation

@CalvinKirs
Copy link
Copy Markdown
Member

Master PR: #63675

This PR adds CTE producer-side limit pushdown in Nereids.

When all CTE consumers only need a bounded number of rows, the optimizer
collects the required row count from each consumer, takes
the maximum value, and pushes that limit into the CTE producer. The
original consumer-side limit is still kept.

  The rule only handles safe shapes:

  ```text
  LogicalLimit
    LogicalCTEConsumer
  ```

  ```text
  LogicalLimit
    LogicalProject
      LogicalCTEConsumer
  ```

  The project must be row-preserving.

  ## Scenarios

  ### 1. Direct Limit

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT * FROM cte
  LIMIT 10;
  ```

The consumer only needs 10 rows, so the CTE producer can produce at most
10 rows.

  ### 2. Project + Limit

  ```sql
  WITH cte AS (
      SELECT order_id, total_price, user_id FROM orders
  )
  SELECT order_id, total_price
  FROM cte
  LIMIT 10;
  ```

A normal project only prunes columns and does not change row count, so
the producer can still be limited to 10 rows.

  ### 3. Multiple Consumers + Limit

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT * FROM cte LIMIT 10
  UNION ALL
  SELECT * FROM cte LIMIT 20;
  ```

  For multiple CTE consumers, the producer limit is:

  ```text
  producerLimit = max(consumerLimit1, consumerLimit2, ...)
  ```

  In this case, the pushed producer limit is 20.

  If any consumer needs full CTE data, pushdown is skipped:

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT * FROM cte LIMIT 10
  UNION ALL
  SELECT * FROM cte;
  ```

  ### 4. Limit + Offset

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT * FROM cte
  LIMIT 10 OFFSET 100;
  ```

The consumer needs to skip 100 rows and then return 10 rows, so the
producer must provide at least 110 rows.

  The producer side only truncates rows and does not apply offset:

  ```text
  producerLimit = limit + offset
  producerOffset = 0
  ```

  ### 5. SplitLimit

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT * FROM cte
  LIMIT 10 OFFSET 100;
  ```

Doris may split this into local/global limits. The local limit closest
to the CTE consumer already represents `limit + offset`.

The collector uses the local limit value directly and does not add
offset again.

  ### 6. Filter + Limit Is Not Matched

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT * FROM cte
  WHERE order_id > 10
  LIMIT 10;
  ```

Filter can reduce rows before limit, so the producer may need more than
10 input rows. This rule does not push limit through filter.

  ### 7. TopN Is Not Matched

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT * FROM cte
  ORDER BY order_id
  LIMIT 10;
  ```

`ORDER BY ... LIMIT` is TopN. It needs the first N rows after ordering,
so it cannot be treated as a normal limit.

  ### 8. Join / Aggregate / Window / Sort Are Not Matched

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT *
  FROM cte JOIN users ON cte.user_id = users.user_id
  LIMIT 10;
  ```

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT user_id, COUNT(*)
  FROM cte
  GROUP BY user_id
  LIMIT 10;
  ```

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT *
  FROM (
      SELECT order_id, ROW_NUMBER() OVER (ORDER BY order_id) AS rn
      FROM cte
  ) t
  LIMIT 10;
  ```

  ```sql
  WITH cte AS (
      SELECT * FROM orders
  )
  SELECT *
  FROM (
      SELECT * FROM cte ORDER BY order_id
  ) t
  LIMIT 10;
  ```

These operators can change row cardinality or ordering semantics. Unless
other rules have already rewritten the shape into `Limit ->
CTEConsumer` or `Limit -> Project -> CTEConsumer`, this collector skips
them.

(cherry picked from commit d898a1d)
@CalvinKirs CalvinKirs requested a review from yiguolei as a code owner June 2, 2026 07:10
@CalvinKirs
Copy link
Copy Markdown
Member Author

run buildall

1 similar comment
@CalvinKirs
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 77.78% (35/45) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants