Skip to content

ORCA: align CBitSet vec_size for grouping-set bitsets#1754

Open
yjhjstz wants to merge 2 commits into
apache:mainfrom
yjhjstz:orca/grouping-sets-bitset-vec-size
Open

ORCA: align CBitSet vec_size for grouping-set bitsets#1754
yjhjstz wants to merge 2 commits into
apache:mainfrom
yjhjstz:orca/grouping-sets-bitset-vec-size

Conversation

@yjhjstz
Copy link
Copy Markdown
Member

@yjhjstz yjhjstz commented May 18, 2026

CreateGroupingSetsForRollup / Cube and the GROUPING_SET_EMPTY case in GetColumnAttnosForGroupBy were constructing their accumulator/seed CBitSets via the default ctor (vec_size = 256), then Union'ing in per-grouping-set bitsets built with vec_size = num_cols. CBitSet::Union just splices in any missing CBitSetLinks wholesale, so the accumulator ended up with a link at offset 0 (vec_size 256) plus a stray link at offset num_cols (vec_size num_cols) covering the high tleSortGroupRef. CBitSet::Get then computed the offset using the destination's m_vector_size = 256 and never consulted the stray link, while CBitSetIter happily walked both -- so Get(k) disagreed with the iterator for any k >= num_cols.

In CreateDXLProjectNullsForGroupingSets this caused tleSortGroupRefs >= num_cols to be misclassified as non-grouping columns and NULL'd out, even in grouping sets that included them. Visible as:

select generate_series(1, a) g, a+b ab
from (values (1,1),(2,2)) t(a,b)
group by rollup(a, ab) order by 1,2;

returning 0 rows instead of 6 -- the rollup(a, ab) branch projected the a column as NULL, so generate_series(1, NULL) produced no rows.

Fix by passing num_cols when constructing the accumulator and seed bitsets so all participants in the Union share m_vector_size.

Add the repro to groupingsets.sql.

Fixes #1753

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@yjhjstz yjhjstz force-pushed the orca/grouping-sets-bitset-vec-size branch from ae9752e to 6010360 Compare May 18, 2026 16:30
@leborchuk leborchuk self-requested a review May 20, 2026 10:59
Comment thread src/test/regress/expected/groupingsets_optimizer.out
Comment thread src/backend/gpopt/translate/CTranslatorUtils.cpp
yjhjstz added a commit to yjhjstz/cloudberry that referenced this pull request May 20, 2026
Address review feedback on apache#1754:
- Add GPOS_ASSERT in CBitSet::Union/Intersection that destination and
  source share m_vector_size.  The bug fixed in 6010360 was a silent
  miscalculation: Union splices in the source's links verbatim, but
  FindLinkByOffset on the destination uses our own m_vector_size to
  compute offsets, so any link past the smaller vec_size becomes
  unreachable from Get() while still walkable by CBitSetIter.  Catch
  this class of misuse in debug builds rather than letting it surface
  later as wrong results.
- Add a cube() variant of the rollup() repro to groupingsets.sql.
  Before the fix, cube(a, ab) also returned 0 rows for the same
  reason; this guards against regressions in the cube path.
@yjhjstz yjhjstz force-pushed the orca/grouping-sets-bitset-vec-size branch from 6cf1c77 to db8c4e9 Compare May 20, 2026 16:39
yjhjstz added 2 commits May 21, 2026 00:40
CreateGroupingSetsForRollup / Cube and the GROUPING_SET_EMPTY case in
GetColumnAttnosForGroupBy were constructing their accumulator/seed
CBitSets via the default ctor (vec_size = 256), then Union'ing in
per-grouping-set bitsets built with vec_size = num_cols.  CBitSet::Union
just splices in any missing CBitSetLinks wholesale, so the accumulator
ended up with a link at offset 0 (vec_size 256) plus a stray link at
offset num_cols (vec_size num_cols) covering the high tleSortGroupRef.
CBitSet::Get then computed the offset using the destination's
m_vector_size = 256 and never consulted the stray link, while
CBitSetIter happily walked both -- so Get(k) disagreed with the iterator
for any k >= num_cols.

In CreateDXLProjectNullsForGroupingSets this caused tleSortGroupRefs >=
num_cols to be misclassified as non-grouping columns and NULL'd out,
even in grouping sets that included them.  Visible as:

  select generate_series(1, a) g, a+b ab
    from (values (1,1),(2,2)) t(a,b)
    group by rollup(a, ab) order by 1,2;

returning 0 rows instead of 6 -- the rollup(a, ab) branch projected the
a column as NULL, so generate_series(1, NULL) produced no rows.

Fix by passing num_cols when constructing the accumulator and seed
bitsets so all participants in the Union share m_vector_size.

Add the repro to groupingsets.sql.
@yjhjstz yjhjstz force-pushed the orca/grouping-sets-bitset-vec-size branch from db8c4e9 to db6a19f Compare May 20, 2026 16:40
Copy link
Copy Markdown
Contributor

@leborchuk leborchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Wrong result on ROLLUP/CUBE when a SortGroupRef exceeds num_cols

2 participants