[improvement](fe) Support hour-offset date_add/date_sub in MTMV partition refresh#62410
Closed
hakanuzum wants to merge 10 commits intoapache:masterfrom
Closed
[improvement](fe) Support hour-offset date_add/date_sub in MTMV partition refresh#62410hakanuzum wants to merge 10 commits intoapache:masterfrom
hakanuzum wants to merge 10 commits intoapache:masterfrom
Conversation
…tion rollup ### What problem does this PR solve? Issue Number: close apache#62395 Related PR: None Problem Summary: MTMV partition rollup only handled `date_trunc(partition_col, unit)`. It could not track partition lineage when the MV partition expression uses hour offset conversion such as `date_trunc(date_add(k2, INTERVAL 3 HOUR), 'day')` or `date_trunc(date_sub(k2, INTERVAL 3 HOUR), 'day')`, which is needed for UTC-to-local-day partitioning and incremental maintenance. ### Release note Support MTMV partition rollup and partition increment check for `date_trunc(date_add/date_sub(partition_col, INTERVAL N HOUR), unit)`. ### Check List (For Author) - Test: Regression test / Unit Test - Unit Test: `./run-fe-ut.sh --run org.apache.doris.mtmv.MTMVRelatedPartitionDescRollUpGeneratorTest` - Build: `MVN_OPT='-Dmaven.build.cache.enabled=false' ./build.sh --fe --clean -j$(nproc)` - Regression test: `./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_add` (failed locally because FE/BE test cluster was not started; connection refused) - Behavior changed: Yes (MTMV partition expression now supports hour-offset `date_add/date_sub` lineage under `date_trunc`) - Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Author
|
run buildall |
### What problem does this PR solve? Issue Number: close apache#62395 Related PR: None Problem Summary: Partition increment check walks all sub-expressions; datetime plans often wrap slots in Cast under HoursAdd/HoursSub. Without Cast in the allowed set, MTMV partition analysis failed with invalid implicit expression. ### Release note None ### Check List (For Author) - Test: No need to test (with reason) - Checkstyle verified: mvn checkstyle:check -pl fe-core - Behavior changed: No - Does this need documentation: No Made-with: Cursor
### What problem does this PR solve? Issue Number: apache#62395 Related PR: apache#62410 Problem Summary: MTMV partition refresh builds base-table predicates directly from MV partition bounds. When MV partition expression is date_trunc(date_add/sub(col, INTERVAL N HOUR), unit), the bounds must be shifted by the inverse hour offset; otherwise refresh can miss rows or fail with no-partition errors. ### Release note Supports MTMV partition refresh when partition expression includes date_trunc(date_add/sub(..., INTERVAL N HOUR), ...). ### Check List (For Author) - Test: Regression test - ./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_add (OrbStack Linux) - Behavior changed: Yes (MV refresh predicate mapping for hour-offset partition expressions) - Does this need documentation: No
Author
|
/review |
Author
|
run buildall |
Contributor
There was a problem hiding this comment.
Findings:
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/UpdateMvByPartitionCommand.java/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/StructInfo.java
The inverse-hour bound shift is only applied inconstructTableWithPredicates()viastatementContext.setMvRefreshPredicates(...), but the optimizer-side union-compensation path still callsconstructPredicates(partitionHasDataItems, partitionSlot)without the new offset handling inPredicateAdder.visitLogicalCatalogRelation(). That means the samedate_trunc(date_add/sub(col, INTERVAL N HOUR), ...)MV can still build base-table filters from raw MV bounds during union compensation, reintroducing the exact off-by-offset partition mismatch this PR is trying to fix. A concrete path isAbstractMaterializedViewRule -> StructInfo.addFilterOnTableScan -> PredicateAdder.visitLogicalCatalogRelation, where partition names are converted back to predicates with unshifted literals.
Critical checkpoint conclusions:
- Goal of current task: Partially achieved. The PR adds support for hour-offset
date_add/date_subin partition analysis/rollup and fixes one refresh-predicate path, but it does not update the parallel optimizer filter path, so end-to-end correctness is still incomplete. - Small, clear, focused modification: Mostly yes, but the behavior change is split across multiple MTMV paths and one equivalent path was missed.
- Concurrency: No new concurrency-sensitive logic identified.
- Lifecycle/static initialization: No special lifecycle or static-init risk introduced.
- Configuration changes: None.
- Compatibility/incompatible changes: No protocol or storage-format compatibility issue identified.
- Functionally parallel code paths: Not fully covered. The union-compensation path still uses old predicate construction semantics.
- Special conditional checks: The new
date_add/date_sub + HOURgating is clear, but equivalent consumers of the derived bounds need to share the same transformation. - Test coverage: Improved for rollup and manual refresh scenarios, but missing coverage for the union-compensation / rewrite path that also synthesizes base-table predicates.
- Test result changes: No
.outchanges involved; added regression and unit tests look consistent with the implemented path. - Observability: No additional observability appears necessary for this change.
- Transaction/persistence: No persistence or failover-sensitive changes identified.
- Data writes/modifications: This affects refresh/query correctness rather than storage writes; the remaining path inconsistency can still produce wrong row coverage.
- FE-BE variable passing: Not involved.
- Performance: No material performance concern from the added logic.
- Other issues: None beyond the missing parallel-path fix above.
Overall opinion: request changes until the offset adjustment is applied consistently anywhere MTMV partition names are translated back into base-table predicates.
### What problem does this PR solve? Issue Number: N/A Related PR: apache#62410 Problem Summary: MTMV partitioned by `date_trunc(date_add/sub(base_col, INTERVAL N HOUR), ...)` needs inverse-hour shifting when translating partition names back into base-table predicates. The refresh path applied the shift, but the optimizer union-compensation path reconstructed predicates without the offset, which could miss rows or produce duplicates. ### Release note Fix MTMV union-compensation predicate bounds for hour-offset date_add/date_sub partition expressions. ### Check List (For Author) - Test: No need to test (local environment lacks JDK 17) - Behavior changed: Yes (query rewrite/union-compensation correctness) - Does this need documentation: No
### What problem does this PR solve? Issue Number: Related PR: Problem Summary: Regression case selected raw BITMAP/VARBINARY column and compared unstable/driver-dependent bytes; compare only bitmap_to_string output instead. ### Release note None ### Check List (For Author) - Test: No need to test (CI regression will cover) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: Related PR: apache#62410 Problem Summary: License Check falls back to full-repo scan when git diff fails under shallow fetch, causing unrelated pre-existing files to fail the PR. Fetch full history for PR head and base to reliably generate incremental config. ### Release note None ### Check List (For Author) - Test: No need to test (GitHub Actions) - Behavior changed: No - Does this need documentation: No
This reverts commit fe56305.
hakanuzum
added a commit
to hakanuzum/doris
that referenced
this pull request
Apr 15, 2026
### What problem does this PR solve? Issue Number: apache#62395 Related PR: apache#62410 Problem Summary: MTMV partition refresh builds base-table predicates directly from MV partition bounds. When MV partition expression is date_trunc(date_add/sub(col, INTERVAL N HOUR), unit), the bounds must be shifted by the inverse hour offset; otherwise refresh can miss rows or fail with no-partition errors. ### Release note Supports MTMV partition refresh when partition expression includes date_trunc(date_add/sub(..., INTERVAL N HOUR), ...). ### Check List (For Author) - Test: Regression test - ./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_add (OrbStack Linux) - Behavior changed: Yes (MV refresh predicate mapping for hour-offset partition expressions) - Does this need documentation: No
hakanuzum
added a commit
to hakanuzum/doris
that referenced
this pull request
Apr 15, 2026
### What problem does this PR solve? Issue Number: N/A Related PR: apache#62410 Problem Summary: MTMV partitioned by `date_trunc(date_add/sub(base_col, INTERVAL N HOUR), ...)` needs inverse-hour shifting when translating partition names back into base-table predicates. The refresh path applied the shift, but the optimizer union-compensation path reconstructed predicates without the offset, which could miss rows or produce duplicates. ### Release note Fix MTMV union-compensation predicate bounds for hour-offset date_add/date_sub partition expressions. ### Check List (For Author) - Test: No need to test (local environment lacks JDK 17) - Behavior changed: Yes (query rewrite/union-compensation correctness) - Does this need documentation: No
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #62395
Related PR: None
Problem Summary:
MTMV (Materialized View) partition rollup and incremental refresh logic previously only handled partition expressions of the form:
date_trunc(partition_col, 'unit')It could not recognize or process expressions where a hour-level offset is applied to the partition column before truncation, such as:
This pattern is widely used in production pipelines where raw data is stored in UTC but the MV must partition by local business-day boundaries (e.g., UTC+3 →
date_add(event_time, INTERVAL 3 HOUR)). Without this support:PartitionIncrementMaintainerrejected these expressions as unsupported, blocking partition-aware query rewrite.The root causes were:
MTMVPartitionExprFactoryonly returnedMTMVPartitionExprDateTruncfor anydate_trunc(...)call, with no branch for the innerdate_add/date_subcase.PartitionIncrementChecker.SUPPORT_EXPRESSION_TYPESdid not includeHoursAdd,HoursSub, orCast, so any MV defined with these expressions was rejected during increment check.What is changed and how does it work?
New class —
MTMVPartitionExprDateTruncDateAddSub:Implements
MTMVPartitionExprServicefor expressions of the formdate_trunc(date_add/date_sub(col,INTERVAL N HOUR), 'unit').Key responsibilities:
date_truncouter function and extracts the truncation unit.TimestampArithmeticExpr(date_add/date_sub) and converts the hour offset to a signedlong(date_sub→ negative offset).getRollUpIdentity(): applies the offset to each partition value and truncates, returning a stable identity string for lineage mapping.generateRollUpPartitionKeyDesc(): computes[lower, upper)range bounds in the MV partition space, applying the offset before truncation and incrementing by the appropriate time unit (day,week,month,year,quarter,hour).analyze(): validates that the truncation unit is supported and that the base table uses RANGE partitioning with a DATE/DATETIME column.toSql(): regenerates the canonical SQL expression for the partition expression.MTMVPartitionExprFactory— routing logic extended:Added a guard inside the
date_truncbranch: if the first argument is aTimestampArithmeticExpr(i.e.,date_add/date_sub), the factory now returnsMTMVPartitionExprDateTruncDateAddSubinstead of the plainMTMVPartitionExprDateTrunc.PartitionIncrementMaintainer.PartitionIncrementChecker:Extended
SUPPORT_EXPRESSION_TYPESto includeCast,HoursAdd, andHoursSub,allowing the increment checker to traverse and validate MV partition expressions
that contain these node types:
UpdateMvByPartitionCommandandMTMVPartitionDefinition:Minor adjustments to pass the hour-offset partition expr through the incremental
refresh predicate builder without being rejected.
Example
Base table partitioned by hour-level UTC timestamp:
MTMV partitioned by local business day (UTC+3):
Before this PR: Creating this MV would fail or fall back to full refresh because the partition expression
date_trunc(date_add(...), 'day')was not recognized.After this PR: Partition lineage is correctly established. Incremental refresh only recomputes partitions whose source data changed, and the predicate pushed down to the base table correctly accounts for the 3-hour offset window.
Behavior change
date_trunc(col, 'day')date_trunc(date_add(col, INTERVAL N HOUR), 'day/week/month/year')date_trunc(date_sub(col, INTERVAL N HOUR), 'day/week/month/year')PartitionIncrementCheckerwithHoursAdd/HoursSubnodesBackward compatible. No change to existing
date_trunc(col, unit)behavior.Release note
Support MTMV partition rollup and incremental refresh for
date_trunc(date_add/date_sub(partition_col, INTERVAL N HOUR), 'unit'). This enables local-timezone day partitioning on UTC-stored tables via Async Materialized Views.Check List (For Author)
Test
./run-fe-ut.sh --run org.apache.doris.mtmv.MTMVRelatedPartitionDescRollUpGeneratorTest./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_addMVN_OPT='-Dmaven.build.cache.enabled=false' ./build.sh --fe --clean -j$(nproc)Behavior changed: Yes
date_trunc(date_add/date_sub(..., INTERVAL N HOUR), unit)partition expressions are now supported for MTMV partition lineage and incremental refresh.Does this need documentation: Yes
The MTMV partition expression documentation should be updated to list
date_trunc(date_add/date_sub(col, INTERVAL N HOUR), unit)as a supported form.