[Bug](nereids) distributionPrune doesn't slice nereidsPrunedTabletIds per partition

### Search before asking

- [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.


### Version

4.x

### What's Wrong?

  `OlapScanNode.distributionPrune` returns the entire `nereidsPrunedTabletIds` set whenever the query goes through Nereids. The caller `OlapScanNode.computeTabletInfo` invokes it inside a per-partition loop:

  ```java
  // fe-core/src/main/java/org/apache/doris/planner/OlapScanNode.java
  for (Long partitionId : selectedPartitionIds) {
      final MaterializedIndex selectedTable = partition.getIndex(selectedIndexId);
      Collection<Long> prunedTabletIds = distributionPrune(..., true);
      //   ↑ returns the GLOBAL nereidsPrunedTabletIds set, copied into a new ArrayList,
      //     not just tablets belonging to this partition

      if (prunedTabletIds != null) {
          for (Long id : prunedTabletIds) {
              if (selectedTable.getTablet(id) != null) {       // hash lookup
                  tablets.add(selectedTable.getTablet(id));    // hash lookup again
                  scanTabletIds.add(id);
              }
              // else: id belongs to another partition; result is null and discarded
          }
      }
  }
  ```

  Because `nereidsPrunedTabletIds` is the union of **all** selected tablets across **all** partitions (set in `PhysicalPlanTranslator.visitPhysicalOlapScan`):

  ```java
  olapScanNode.setNereidsPrunedTabletIds(
      new LinkedHashSet<>(olapScan.getSelectedTabletIds()));
  ```

  each per-partition iteration walks the entire global set and does a `MaterializedIndex.getTablet(id)` HashMap lookup on ids that belong to other partitions. The vast majority of these lookups return `null` and are discarded, but the hash work is already paid.


  #### Complexity

  Let `P = selectedPartitionIds.size()` and `M = nereidsPrunedTabletIds.size()`.

  - `getTablet` calls: **O(P × M × 2)** instead of O(M × 2)
  - HashSet → ArrayList copies inside `distributionPrune`: **O(P × M)** elements

  The regression is most pronounced on tables whose distribution layout is **many partitions × few buckets per partition** (e.g. fine-grained hourly partitioning with 2 buckets per partition), because then `M` ≈ `P × bucketsPerPartition`, and the total `getTablet` calls grow as **O(P² × bucketsPerPartition)**.

  #### Concrete example

  A range-partitioned table with **hourly partitions** and **2 buckets per partition**, queried with a one-year time range and a bucket-key equality predicate that prunes one of the two buckets per partition:

  - `P` ≈ 24 × 365 ≈ 8760 partitions
  - `M` ≈ 8760 (one tablet per partition selected after bucket pruning)
  - Total `getTablet` lookups: ≈ 76 million
  - HashSet copies: ≈ 76M element copies

  CPU profiling of the FE plan thread:

  ```
  OlapScanNode.computeTabletInfo                      83%
    ├─ MaterializedIndex.getTablet -> HashMap.getNode 54%
    ├─ ArrayList.<init> -> HashSet.toArray            17%
    └─ OlapScanNode.addScanRangeLocations              2%
  ```

  In practice this means a query that planned in ~0.1 s before #53403 now spends ~2.7 s in `Nereids Translate Time` for the same SQL on the same data.


### What You Expected?

  `distributionPrune` under the Nereids path should only return tablets belonging to the partition currently being processed, matching the semantics of the non-Nereids `HashDistributionPruner` path. Plan time should not depend quadratically on partition count.

### How to Reproduce?

  1. Create a range-partitioned table with **many partitions** and **few buckets per partition**, e.g.:

     ```sql
     CREATE TABLE t (
         event_time DATETIME,
         user_id    BIGINT,
         v          BIGINT
     )
     DUPLICATE KEY(event_time, user_id)
     PARTITION BY RANGE(event_time) (...)              -- hourly, ~8000 partitions
     DISTRIBUTED BY HASH(user_id) BUCKETS 2;
     ```

  2. Run a query that selects most partitions plus a bucket-key predicate that prunes most of the buckets per partition:

     ```sql
     SET enable_profile = true;
     EXPLAIN SELECT * FROM t
       WHERE event_time >= '2025-01-01' AND event_time < '2026-01-01'
         AND user_id = 12345;
     ```

  3. `SHOW QUERY PROFILE "/<query_id>"` and look at `Nereids Translate Time`. Compare against a build prior to #53403, or against an equivalent table whose partition count is small.

  4. Optional — attach an async-profiler CPU profile to the FE process during a stress run; the hottest stack will be `MaterializedIndex.getTablet -> HashMap.getNode` under `OlapScanNode.computeTabletInfo`.


### Anything Else?

_No response_

### Are you willing to submit PR?

- [x] Yes I am willing to submit a PR!

### Code of Conduct

- [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug](nereids) distributionPrune doesn't slice nereidsPrunedTabletIds per partition #63854

Search before asking

Version

What's Wrong?

Complexity

Concrete example

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug](nereids) distributionPrune doesn't slice nereidsPrunedTabletIds per partition #63854

Description

Search before asking

Version

What's Wrong?

Complexity

Concrete example

What You Expected?

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions