Skip to content

Upgrade Apache Calcite from 1.40.0 to 1.42.0#18658

Merged
yashmayya merged 1 commit into
apache:masterfrom
yashmayya:calcite-1.42.0-upgrade
Jun 5, 2026
Merged

Upgrade Apache Calcite from 1.40.0 to 1.42.0#18658
yashmayya merged 1 commit into
apache:masterfrom
yashmayya:calcite-1.42.0-upgrade

Conversation

@yashmayya
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya commented Jun 2, 2026

Summary

Upgrades Apache Calcite from 1.40.0 to 1.42.0. Pinot's master was never moved to 1.41, so this folds in both the 1.40→1.41 and 1.41→1.42 deltas in a single bump.

The bulk of the upgrade is a faithful re-sync of Pinot's customized SQL parser grammar to upstream 1.42, plus a handful of targeted workarounds for behavioral changes Calcite introduced across these two releases. No public Pinot API or wire/segment format changes.

Changes

Dependency

  • pom.xml: calcite.version1.42.0 (covers calcite-core and calcite-babel).
  • Pinned joou-java-6 to 0.9.5 in dependencyManagement to resolve a dependency-convergence conflict: calcite-core 1.42 pulls joou-java-6 0.9.5 while the transitive avatica-core 1.28.0 still wants 0.9.4.
  • LICENSE-binary: bumped the Calcite/Avatica entries to 1.42.0 / 1.28.0 (calcite-core, calcite-babel, calcite-linq4j, avatica-core, avatica-metrics) and added org.jooq:joou-java-6:0.9.5 — matching the binary distribution's DEPENDENCIES manifest.

SQL parser codegen sync (pinot-common/src/main/codegen)

  • Re-synced templates/Parser.jj, config.fmpp, and default_config.fmpp to upstream Calcite 1.42, preserving every PINOT CUSTOMIZATION region.
  • The new babel feature flags introduced upstream — includeStarExclude (SELECT * EXCLUDE/REPLACE), includeSelectBy (SELECT ... BY), and includeIntervalWithoutQualifier — are intentionally kept OFF. The grammar is synced but inactive: the multi-stage engine has no downstream support for these features, so enabling them would parse syntax the planner/runtime can't execute. They are wired through default_config.fmpp so a future change can flip them on deliberately.
  • UNSIGNED is added as a non-reserved keyword (needed for the unsigned integer types below).

Behavioral workarounds for 1.41/1.42 changes

  • CALCITE-7189 (non-strict GROUP BY): 1.41+ BABEL enables MySQL-style non-strict GROUP BY (wrapping non-grouped columns in ANY_VALUE()), but the implementation NPEs when a window function is combined with GROUP BY (e.g. SELECT MIN(col) OVER() FROM t GROUP BY col). Validator now uses a SqlDelegatingConformance over BABEL that overrides isNonStrictGroupBy() to false. This is also the semantically correct behavior for Pinot, which requires all non-aggregated columns to appear in GROUP BY. The feature remains present (un-reverted) in 1.42, so the override is retained.
  • CALCITE-7379 (decorrelation type assertion): the upstream fix does not fully cover the correlated-subquery shapes Pinot produces — a post-decorrelation Litmus.THROW type assertion still fires for a nullability-only divergence. PinotRelDecorrelator (new, see below) relaxes that one assertion: it logs a warning when the row types differ only in nullability and continues, but still fails fast on any structural type change.
  • CALCITE-7351: RelDataTypeSystem#getMaxNumericScale/getMaxNumericPrecision became final. TypeSystem drops the now-illegal overrides; the equivalent behavior is preserved via the type-specific getMaxScale/getMaxPrecision(DECIMAL) overrides Pinot already defines.
  • Filtered MIN/MAX nullability: 1.42 exposes SqlOperatorBinding#hasEmptyGroup(). PinotMinMaxReturnTypeInference now also treats a possibly-empty group as nullable (alongside the existing getGroupCount() == 0 / hasFilter() checks), matching the runtime's null-on-empty behavior for MIN(x) FILTER (WHERE ...).

Unsigned integer types (CALCITE-1466)

BABEL now parses UTINYINT/USMALLINT/UINTEGER/UBIGINT. Pinot has no native unsigned storage, so each is mapped to the narrowest signed type that holds its full range without loss — and UBIGINT (BIGINT UNSIGNED), which has no such type, is rejected rather than silently wrapping:

  • UTINYINT/USMALLINTINT, UINTEGERLONG (a signed INT would wrap UINTEGER values above 2³¹); applied in RelToPlanNodeConverter / v2 PRelToPlanNodeConverter (convertToColumnDataType), the single-stage DataTypeConversionFunctions.cast, TypeSystem.deriveSumType (widens to signed BIGINT so SUM doesn't overflow a 32-bit INT), and ArithmeticFunctionUtils.normalizeNumericType (keeps arithmetic integral instead of widening to DOUBLE).
  • UBIGINT is rejected at planning (convertToColumnDataType throws): its 0..2⁶⁴−1 range exceeds signed LONG (2⁶³−1), so mapping it to LONG would silently wrap values above Long.MAX_VALUE into negatives — a silent wrong result. Failing fast (with a clear message suggesting CAST … AS BIGINT/DECIMAL) is safer; UBIGINT was a parse error pre-1.41 anyway, and only ever arises from an explicit CAST(… AS BIGINT UNSIGNED). (Per review feedback from @xiangfu0.)
  • PinotEvaluateLiteralRule folds a constant unsigned cast into its signed-equivalent type by delegating to convertToColumnDataType — so the representable types fold, and a UBIGINT literal cast is rejected on the same path.

New class

  • org.apache.pinot.calcite.sql2rel.PinotRelDecorrelator — a minimal subclass of Calcite's RelDecorrelator that exists solely to relax the CALCITE-7379 assertion described above. It lives under the org.apache.calcite.sql2rel package because the relevant members (CorelMap, decorrelate, etc.) are package/protected-subclass visible in Calcite.

Testing & validation

  • pinot-query-planner unit suite: 1262/1262 pass.
  • pinot-query-runtime result-correctness vs H2 — ResourceBasedQueriesTest 3571 pass / 0 fail (6 pre-existing skips), QueryRunnerTest 130/130.
  • OfflineClusterIntegrationTest run locally; updated testQueryWithRepeatedColumnsV2 to reflect that 1.42 now accepts repeated columns in GROUP BY but still rejects ambiguous repeated columns in ORDER BY.
  • New/updated regression tests pin every workaround: filtered MIN/MAX nullability, the CALCITE-7379 decorrelation path (the structural-vs-nullability divergence decision is extracted and unit-tested), the non-strict-GROUP-BY-with-window NPE, unsigned-type cast acceptance, SUM/arithmetic return types over unsigned operands, and single-stage unsigned casts.
  • The 8 pinot-query-planner/src/test/resources/queries/*.json EXPLAIN-plan snapshots are mechanically regenerated (label/whitespace deltas from upstream rule changes), not hand-edited.

Behavior & compatibility notes

  • UNSIGNED casts — new accepted/rejected query surface (user-facing). As a consequence of CALCITE-1466, BABEL now parses CAST(x AS <type> UNSIGNED) on both engines. Pinot accepts the representable ones, mapping to the narrowest lossless signed type (TINYINT/SMALLINT UNSIGNEDINT, INTEGER UNSIGNEDLONG), and rejects BIGINT UNSIGNED at planning with a clear IllegalArgumentException (no signed type holds its full range). Net: some ... UNSIGNED casts that were previously parse errors now succeed, and BIGINT UNSIGNED now produces a specific planning-time rejection. Worth a release note.
  • Repeated GROUP BY key now accepted (MSE): under the multi-stage engine, a query with a repeated grouping key (e.g. SELECT x, COUNT(*) FROM t GROUP BY x, x) previously failed validation and now succeeds — Calcite 1.42 de-duplicates the repeated key. This is a benign relaxation (no previously-working query breaks), but during a mixed-version rolling upgrade the same query is rejected by a 1.40 broker and accepted by a 1.42 broker. Repeated columns in ORDER BY are still rejected as ambiguous (covered by the updated testQueryWithRepeatedColumnsV2).
  • Plan shape: semi → inner join in a few IN/EXISTS shapes. The 1.41 subquery-decorrelation rework rewrites the outer semi-join to an inner join in a small number of nested IN/semi-join shapes (visible in the regenerated JoinPlans.json/PhysicalOptimizerPlans.json). This is a sound, plan-equivalent rewrite — Calcite only applies it where the join's right input is already distinct (e.g. fed by an aggregate) — and result-equivalence is covered by the H2-comparison suites (ResourceBasedQueriesTest) and the integration tests, which all pass.
  • New parseable bitwise infix operators (<<, &, ^). The faithful parser sync makes these parseable for the first time (upstream-stock 1.42 additions to BinaryRowOperator, mapping to SqlStdOperatorTable.BIT_LEFT_SHIFT/BITAND_OPERATOR/BITXOR_OPERATOR); previously they were parse errors. No operator wiring is added by this PR. End-to-end status differs by engine: in the multi-stage engine PinotOperatorTable is a curated allow-list that does not register them, so such expressions fail at operator resolution (validation) — same status as the other synced-but-inert 1.42 grammar. In the single-stage engine the canonical names of &/^ (bitand/bitxor) coincide with Pinot's existing bitAnd/bitXor scalar functions, so a & b / a ^ b may resolve to those (a MySQL-style infix alias), while << has no Pinot equivalent and errors. This v1/v2 divergence is an inherent consequence of faithfully syncing the upstream grammar; explicitly wiring up or rejecting these operators is left as a separate, deliberate decision.
  • Unsigned casts in the single-stage engine. Because the single-stage parser also uses BABEL, CAST(x AS INTEGER UNSIGNED) (and the other representable unsigned types) is now parseable in v1 too. DataTypeConversionFunctions.cast maps them to their signed equivalent (UTINYINT/USMALLINT → INT, UINTEGER → LONG), mirroring the multi-stage converter, and rejects BIGINT UNSIGNED (UBIGINT) with the same clear error. Covered by DataTypeConversionFunctionsTest#testCastToUnsignedTypes.
  • No config keys, SPI signatures, enum/DataType additions, JSON/Protobuf fields, or DataTable/segment-version changes — nothing else has mixed-version visibility.

Notes for reviewers

  • The synced-but-disabled grammar (EXCLUDE/REPLACE/SELECT-BY/bare-INTERVAL) is deliberately inert — included to keep Parser.jj a faithful upstream sync rather than a divergent fork. Flipping the three fmpp flags is a separate, future decision.
  • The upstream colon-path field-access grammar (AddOptionalColonPath/ColonBracketSegment) was also synced and is likewise inert under Pinot — but it is gated by the upstream conformance method SqlConformance.isColonFieldAccessAllowed() (which returns false for BABEL), not by an fmpp flag. These productions are byte-for-byte from upstream Calcite 1.42.0 (verified against calcite-1.42.0 Parser.jj), so they intentionally carry no PINOT CUSTOMIZATION markers.
  • All PINOT CUSTOMIZATION markers from the prior grammar are preserved.
  • BIGINT UNSIGNED (UBIGINT) is rejected rather than mapped to a lossy LONG (per @xiangfu0's review) — see the unsigned-types section. The representable unsigned types (TINYINT/SMALLINT/INTEGER UNSIGNED) map losslessly, so no unsigned value silently wraps.
  • Follow-up (not in this PR): the unsigned→signed handling touches several type-dispatch switches (the two converters, DataTypeConversionFunctions, ArithmeticFunctionUtils.normalizeNumericType, TypeSystem.deriveSumType). Each maps to a different target enum and uses case labels (which must be compile-time constants, so they can't delegate to a shared predicate), so the per-switch listing is largely inherent. The one genuinely-collapsible duplication is RelToPlanNodeConverter.convertToColumnDataType and the v2 PRelToPlanNodeConverter.convertToColumnDataType, which are byte-for-byte identical public static copies — a pre-existing smell this diff merely extends. Collapsing those two (have v2 delegate to v1) is worth a dedicated refactor; left out here to keep the diff scoped to the version bump.
  • The filtered MIN/MAX fix (hasEmptyGroup()) is a planning-time type-nullability correction — it lets the query validate under 1.42's stricter nullability checks. Pinot's DataSchema/ColumnDataType erases nullability, so this does not change runtime values; the empty-filtered-group → NULL runtime semantics are pre-existing and unchanged. The fix is covered by a compile-time regression test (testFilteredMinMaxAggregateNullability).

@yashmayya yashmayya added the dependencies Pull requests that update a dependency file label Jun 2, 2026
@yashmayya yashmayya force-pushed the calcite-1.42.0-upgrade branch from b8bf3a6 to 3cbe3eb Compare June 2, 2026 21:14
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 64.46%. Comparing base (f6b930b) to head (a8c9fe5).
⚠️ Report is 32 commits behind head on master.

Files with missing lines Patch % Lines
...he/pinot/calcite/sql2rel/PinotRelDecorrelator.java 95.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18658      +/-   ##
============================================
+ Coverage     64.45%   64.46%   +0.01%     
- Complexity     1282     1291       +9     
============================================
  Files          3352     3372      +20     
  Lines        207171   208583    +1412     
  Branches      32348    32573     +225     
============================================
+ Hits         133534   134465     +931     
- Misses        62910    63312     +402     
- Partials      10727    10806      +79     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.46% <97.29%> (+0.01%) ⬆️
temurin 64.46% <97.29%> (+0.01%) ⬆️
unittests 64.46% <97.29%> (+0.01%) ⬆️
unittests1 56.90% <97.29%> (+0.08%) ⬆️
unittests2 37.08% <24.32%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found 1 high-signal issue; see inline comment.

case UINTEGER:
// UBIGINT (0..2^64-1) has no wider signed type, so values above Long.MAX_VALUE wrap (two's-complement) - this is
// unavoidable and acceptable since Pinot has no unsigned storage type.
case UBIGINT:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting UBIGINT here turns BIGINT UNSIGNED into a lossy signed LONG. Any value above Long.MAX_VALUE will now silently wrap, so this is a wrong-result regression rather than a harmless type downgrade. Since Pinot cannot represent the full unsigned 64-bit range, this needs to fail validation/planning instead of being mapped to LONG.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — agreed, and fixed. BIGINT UNSIGNED (UBIGINT) now fails fast at planning instead of being mapped to a lossy LONG: convertToColumnDataType throws a clear "Unsigned BIGINT is not supported … CAST to BIGINT or DECIMAL instead" error.

I kept the other unsigned types accepted, since those map losslessly and don't have the wrap problem — TINYINT/SMALLINT UNSIGNEDINT, INTEGER UNSIGNEDLONG. UBIGINT is the only one with no signed Pinot type wide enough for its full 0..2⁶⁴−1 range, so it's the one that has to be rejected.

The rejection is applied consistently across every site that touched the type — both (P)RelToPlanNodeConverter.convertToColumnDataType, the single-stage DataTypeConversionFunctions.cast, TypeSystem.deriveSumType, and ArithmeticFunctionUtils.normalizeNumericType — and is covered by regression tests (QueryCompilationTest#testUnsignedBigintCastIsRejected for both the column- and literal-cast planning paths, plus updated RelToPlanNodeConverterTest/PRelToPlanNodeConverterTest/DataTypeConversionFunctionsTest unit assertions). PR description updated too. Thanks!

Bump calcite-core/babel to 1.42.0 (folds in both the 1.40->1.41 and
1.41->1.42 deltas, since master was never moved to 1.41) and pin
joou-java-6 to 0.9.5 to resolve a dependency-convergence conflict
between calcite-core and the transitive avatica-core 1.28.0.

Sync the customized SQL parser (Parser.jj + the fmpp configs) to upstream
1.42, preserving all PINOT CUSTOMIZATION regions. The new babel feature
flags (includeStarExclude, includeSelectBy, includeIntervalWithoutQualifier)
are intentionally kept OFF: the grammar is synced but inactive, as the
multi-stage engine has no downstream support for those features yet. The
upstream colon-path field-access grammar was likewise synced and is inert
under Pinot's BABEL conformance (isColonFieldAccessAllowed() returns false) -
gated by conformance rather than an fmpp flag. The sync also makes the bitwise
infix operators '<<', '&' and infix '^' parseable (upstream-stock 1.42
additions); they are not registered in PinotOperatorTable, so they are not yet
supported end-to-end.

Handle 1.42 behavioral changes:
- CALCITE-7189: Validator disables non-strict GROUP BY (BABEL enables it
  in 1.41+, which NPEs for window functions combined with GROUP BY).
- CALCITE-7379: PinotRelDecorrelator relaxes the post-decorrelation type
  assertion for the nullability-only divergence that still fires on some
  Pinot correlated-subquery shapes; it fails fast on structural changes.
- CALCITE-7351: drop the now-final getMaxNumericScale/getMaxNumericPrecision
  overrides (they delegate to the type-specific overrides Pinot keeps).
- Filtered MIN/MAX are now nullable via SqlOperatorBinding.hasEmptyGroup().
- Unsigned integer types (CALCITE-1466) parse under BABEL. The representable
  ones are mapped to the narrowest lossless signed type (TINYINT/SMALLINT
  UNSIGNED -> INT, INTEGER UNSIGNED -> LONG) throughout (converters, literal
  folding, SUM return type, arithmetic normalization). BIGINT UNSIGNED (UBIGINT)
  has no signed type wide enough for its full 0..2^64-1 range, so it is rejected
  at planning rather than silently wrapping values above Long.MAX_VALUE.

Regenerate the EXPLAIN plan snapshots and update/extend the affected tests.
@yashmayya yashmayya force-pushed the calcite-1.42.0-upgrade branch from 3cbe3eb to a8c9fe5 Compare June 4, 2026 01:08
@yashmayya yashmayya merged commit 9717ba6 into apache:master Jun 5, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants