Skip to content

[improvement](fe) Support hour-offset date_add/date_sub in MTMV partition refresh#62410

Closed
hakanuzum wants to merge 10 commits intoapache:masterfrom
hakanuzum:feat/mtmv-date-add-interval-partition
Closed

[improvement](fe) Support hour-offset date_add/date_sub in MTMV partition refresh#62410
hakanuzum wants to merge 10 commits intoapache:masterfrom
hakanuzum:feat/mtmv-date-add-interval-partition

Conversation

@hakanuzum
Copy link
Copy Markdown

@hakanuzum hakanuzum commented Apr 12, 2026

What problem does this PR solve?

Issue Number: close #62395

Related PR: None

Problem Summary:

MTMV (Materialized View) partition rollup and incremental refresh logic previously only handled partition expressions of the form:

date_trunc(partition_col, 'unit')

It could not recognize or process expressions where a hour-level offset is applied to the partition column before truncation, such as:

date_trunc(date_add(partition_col, INTERVAL N HOUR), 'unit')
date_trunc(date_sub(partition_col, INTERVAL N HOUR), 'unit')

This pattern is widely used in production pipelines where raw data is stored in UTC but the MV must partition by local business-day boundaries (e.g., UTC+3 → date_add(event_time, INTERVAL 3 HOUR)). Without this support:

  • Partition lineage between the base table and MTMV could not be established.
  • Incremental refresh predicate generation failed silently, causing full refresh fallback or incorrect partition mapping.
  • PartitionIncrementMaintainer rejected these expressions as unsupported, blocking partition-aware query rewrite.

The root causes were:

  1. MTMVPartitionExprFactory only returned MTMVPartitionExprDateTrunc for any date_trunc(...) call, with no branch for the inner date_add/date_sub case.
  2. PartitionIncrementChecker.SUPPORT_EXPRESSION_TYPES did not include HoursAdd, HoursSub, or Cast, so any MV defined with these expressions was rejected during increment check.
  3. There was no rollup service implementation that could compute partition identity and generate range bounds when an hour offset is involved.

What is changed and how does it work?

New class — MTMVPartitionExprDateTruncDateAddSub:

Implements MTMVPartitionExprService for expressions of the form date_trunc(date_add/date_sub(col,INTERVAL N HOUR), 'unit').

Key responsibilities:

  • Parses the date_trunc outer function and extracts the truncation unit.
  • Parses the inner TimestampArithmeticExpr (date_add/date_sub) and converts the hour offset to a signed long (date_sub → negative offset).
  • getRollUpIdentity(): applies the offset to each partition value and truncates, returning a stable identity string for lineage mapping.
  • generateRollUpPartitionKeyDesc(): computes [lower, upper) range bounds in the MV partition space, applying the offset before truncation and incrementing by the appropriate time unit (day, week, month, year, quarter, hour).
  • analyze(): validates that the truncation unit is supported and that the base table uses RANGE partitioning with a DATE/DATETIME column.
  • toSql(): regenerates the canonical SQL expression for the partition expression.

MTMVPartitionExprFactory — routing logic extended:

Added a guard inside the date_trunc branch: if the first argument is a TimestampArithmeticExpr (i.e., date_add/date_sub), the factory now returns MTMVPartitionExprDateTruncDateAddSub instead of the plain MTMVPartitionExprDateTrunc.

if (paramsExprs.size() == 2 && paramsExprs.get(0) instanceof TimestampArithmeticExpr) {
    return new MTMVPartitionExprDateTruncDateAddSub(functionCallExpr);
}
return new MTMVPartitionExprDateTrunc(functionCallExpr);

PartitionIncrementMaintainer.PartitionIncrementChecker:

Extended SUPPORT_EXPRESSION_TYPES to include Cast, HoursAdd, and HoursSub,
allowing the increment checker to traverse and validate MV partition expressions
that contain these node types:

ImmutableSet.of(Cast.class, DateTrunc.class, HoursAdd.class, HoursSub.class,
    SlotReference.class, Literal.class)

UpdateMvByPartitionCommand and MTMVPartitionDefinition:

Minor adjustments to pass the hour-offset partition expr through the incremental
refresh predicate builder without being rejected.


Example

Base table partitioned by hour-level UTC timestamp:

CREATE TABLE orders (
    order_id BIGINT,
    event_time DATETIME NOT NULL
) PARTITION BY RANGE(event_time) ( ... )
DISTRIBUTED BY HASH(order_id);

MTMV partitioned by local business day (UTC+3):

CREATE MATERIALIZED VIEW orders_by_local_day
BUILD IMMEDIATE REFRESH ON MANUAL
PARTITION BY (date_trunc(date_add(event_time, INTERVAL 3 HOUR), 'day'))
DISTRIBUTED BY HASH(order_id)
AS SELECT
    date_trunc(date_add(event_time, INTERVAL 3 HOUR), 'day') AS local_day,
    count(*) AS cnt
FROM orders
GROUP BY local_day;

Before this PR: Creating this MV would fail or fall back to full refresh because the partition expression date_trunc(date_add(...), 'day') was not recognized.

After this PR: Partition lineage is correctly established. Incremental refresh only recomputes partitions whose source data changed, and the predicate pushed down to the base table correctly accounts for the 3-hour offset window.


Behavior change

Scenario Before After
date_trunc(col, 'day') ✅ Supported ✅ No change
date_trunc(date_add(col, INTERVAL N HOUR), 'day/week/month/year') ❌ Lineage broken, full refresh ✅ Incremental refresh works
date_trunc(date_sub(col, INTERVAL N HOUR), 'day/week/month/year') ❌ Not recognized ✅ Supported
PartitionIncrementChecker with HoursAdd/HoursSub nodes ❌ Rejected ✅ Passes validation

Backward compatible. No change to existing date_trunc(col, unit) behavior.


Release note

Support MTMV partition rollup and incremental refresh for date_trunc(date_add/date_sub(partition_col, INTERVAL N HOUR), 'unit'). This enables local-timezone day partitioning on UTC-stored tables via Async Materialized Views.


Check List (For Author)

  • Test

    • FE Unit Test:
      ./run-fe-ut.sh --run org.apache.doris.mtmv.MTMVRelatedPartitionDescRollUpGeneratorTest
    • Regression test:
      ./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_add
    • Manual test: Validated with FE build against a local single-node cluster.
      MVN_OPT='-Dmaven.build.cache.enabled=false' ./build.sh --fe --clean -j$(nproc)
  • Behavior changed: Yes
    date_trunc(date_add/date_sub(..., INTERVAL N HOUR), unit) partition expressions are now supported for MTMV partition lineage and incremental refresh.

  • Does this need documentation: Yes
    The MTMV partition expression documentation should be updated to list date_trunc(date_add/date_sub(col, INTERVAL N HOUR), unit) as a supported form.

…tion rollup

### What problem does this PR solve?

Issue Number: close apache#62395

Related PR: None

Problem Summary: MTMV partition rollup only handled `date_trunc(partition_col, unit)`. It could not track partition lineage when the MV partition expression uses hour offset conversion such as `date_trunc(date_add(k2, INTERVAL 3 HOUR), 'day')` or `date_trunc(date_sub(k2, INTERVAL 3 HOUR), 'day')`, which is needed for UTC-to-local-day partitioning and incremental maintenance.

### Release note

Support MTMV partition rollup and partition increment check for `date_trunc(date_add/date_sub(partition_col, INTERVAL N HOUR), unit)`.

### Check List (For Author)

- Test: Regression test / Unit Test
    - Unit Test: `./run-fe-ut.sh --run org.apache.doris.mtmv.MTMVRelatedPartitionDescRollUpGeneratorTest`
    - Build: `MVN_OPT='-Dmaven.build.cache.enabled=false' ./build.sh --fe --clean -j$(nproc)`
    - Regression test: `./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_add` (failed locally because FE/BE test cluster was not started; connection refused)
- Behavior changed: Yes (MTMV partition expression now supports hour-offset `date_add/date_sub` lineage under `date_trunc`)
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hakanuzum
Copy link
Copy Markdown
Author

run buildall

@hakanuzum hakanuzum changed the title [improvement](fe) support hour-offset date_add/date_sub in mtmv partition rollup [improvement](fe) Support hour-offset date_add/date_sub in MTMV partition rollup Apr 12, 2026
### What problem does this PR solve?

Issue Number: close apache#62395

Related PR: None

Problem Summary: Partition increment check walks all sub-expressions; datetime
plans often wrap slots in Cast under HoursAdd/HoursSub. Without Cast in the
allowed set, MTMV partition analysis failed with invalid implicit expression.

### Release note

None

### Check List (For Author)

- Test: No need to test (with reason)
    - Checkstyle verified: mvn checkstyle:check -pl fe-core
- Behavior changed: No
- Does this need documentation: No

Made-with: Cursor
### What problem does this PR solve?

Issue Number: apache#62395

Related PR: apache#62410

Problem Summary: MTMV partition refresh builds base-table predicates directly from MV partition bounds. When MV partition expression is date_trunc(date_add/sub(col, INTERVAL N HOUR), unit), the bounds must be shifted by the inverse hour offset; otherwise refresh can miss rows or fail with no-partition errors.

### Release note

Supports MTMV partition refresh when partition expression includes date_trunc(date_add/sub(..., INTERVAL N HOUR), ...).

### Check List (For Author)

- Test: Regression test
    - ./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_add (OrbStack Linux)
- Behavior changed: Yes (MV refresh predicate mapping for hour-offset partition expressions)
- Does this need documentation: No
@hakanuzum hakanuzum changed the title [improvement](fe) Support hour-offset date_add/date_sub in MTMV partition rollup [improvement](fe) Support hour-offset date_add/date_sub in MTMV partition refresh Apr 14, 2026
@hakanuzum
Copy link
Copy Markdown
Author

/review

@hakanuzum
Copy link
Copy Markdown
Author

run buildall

hakanuzum

This comment was marked as duplicate.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings:

  1. fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/UpdateMvByPartitionCommand.java / fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/StructInfo.java
    The inverse-hour bound shift is only applied in constructTableWithPredicates() via statementContext.setMvRefreshPredicates(...), but the optimizer-side union-compensation path still calls constructPredicates(partitionHasDataItems, partitionSlot) without the new offset handling in PredicateAdder.visitLogicalCatalogRelation(). That means the same date_trunc(date_add/sub(col, INTERVAL N HOUR), ...) MV can still build base-table filters from raw MV bounds during union compensation, reintroducing the exact off-by-offset partition mismatch this PR is trying to fix. A concrete path is AbstractMaterializedViewRule -> StructInfo.addFilterOnTableScan -> PredicateAdder.visitLogicalCatalogRelation, where partition names are converted back to predicates with unshifted literals.

Critical checkpoint conclusions:

  • Goal of current task: Partially achieved. The PR adds support for hour-offset date_add/date_sub in partition analysis/rollup and fixes one refresh-predicate path, but it does not update the parallel optimizer filter path, so end-to-end correctness is still incomplete.
  • Small, clear, focused modification: Mostly yes, but the behavior change is split across multiple MTMV paths and one equivalent path was missed.
  • Concurrency: No new concurrency-sensitive logic identified.
  • Lifecycle/static initialization: No special lifecycle or static-init risk introduced.
  • Configuration changes: None.
  • Compatibility/incompatible changes: No protocol or storage-format compatibility issue identified.
  • Functionally parallel code paths: Not fully covered. The union-compensation path still uses old predicate construction semantics.
  • Special conditional checks: The new date_add/date_sub + HOUR gating is clear, but equivalent consumers of the derived bounds need to share the same transformation.
  • Test coverage: Improved for rollup and manual refresh scenarios, but missing coverage for the union-compensation / rewrite path that also synthesizes base-table predicates.
  • Test result changes: No .out changes involved; added regression and unit tests look consistent with the implemented path.
  • Observability: No additional observability appears necessary for this change.
  • Transaction/persistence: No persistence or failover-sensitive changes identified.
  • Data writes/modifications: This affects refresh/query correctness rather than storage writes; the remaining path inconsistency can still produce wrong row coverage.
  • FE-BE variable passing: Not involved.
  • Performance: No material performance concern from the added logic.
  • Other issues: None beyond the missing parallel-path fix above.

Overall opinion: request changes until the offset adjustment is applied consistently anywhere MTMV partition names are translated back into base-table predicates.

### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#62410

Problem Summary: MTMV partitioned by `date_trunc(date_add/sub(base_col, INTERVAL N HOUR), ...)` needs inverse-hour shifting when translating partition names back into base-table predicates. The refresh path applied the shift, but the optimizer union-compensation path reconstructed predicates without the offset, which could miss rows or produce duplicates.

### Release note

Fix MTMV union-compensation predicate bounds for hour-offset date_add/date_sub partition expressions.

### Check List (For Author)

- Test: No need to test (local environment lacks JDK 17)
- Behavior changed: Yes (query rewrite/union-compensation correctness)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number:

Related PR:

Problem Summary: Regression case selected raw BITMAP/VARBINARY column and compared unstable/driver-dependent bytes; compare only bitmap_to_string output instead.

### Release note

None

### Check List (For Author)

- Test: No need to test (CI regression will cover)
- Behavior changed: No
- Does this need documentation: No
Copy link
Copy Markdown
Author

@hakanuzum hakanuzum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

### What problem does this PR solve?

Issue Number:

Related PR: apache#62410

Problem Summary: License Check falls back to full-repo scan when git diff fails under shallow fetch, causing unrelated pre-existing files to fail the PR. Fetch full history for PR head and base to reliably generate incremental config.

### Release note

None

### Check List (For Author)

- Test: No need to test (GitHub Actions)
- Behavior changed: No
- Does this need documentation: No
hakanuzum added a commit to hakanuzum/doris that referenced this pull request Apr 15, 2026
### What problem does this PR solve?

Issue Number: apache#62395

Related PR: apache#62410

Problem Summary: MTMV partition refresh builds base-table predicates directly from MV partition bounds. When MV partition expression is date_trunc(date_add/sub(col, INTERVAL N HOUR), unit), the bounds must be shifted by the inverse hour offset; otherwise refresh can miss rows or fail with no-partition errors.

### Release note

Supports MTMV partition refresh when partition expression includes date_trunc(date_add/sub(..., INTERVAL N HOUR), ...).

### Check List (For Author)

- Test: Regression test
    - ./run-regression-test.sh --run -d mtmv_p0 -s test_rollup_partition_mtmv_date_add (OrbStack Linux)
- Behavior changed: Yes (MV refresh predicate mapping for hour-offset partition expressions)
- Does this need documentation: No
hakanuzum added a commit to hakanuzum/doris that referenced this pull request Apr 15, 2026
### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#62410

Problem Summary: MTMV partitioned by `date_trunc(date_add/sub(base_col, INTERVAL N HOUR), ...)` needs inverse-hour shifting when translating partition names back into base-table predicates. The refresh path applied the shift, but the optimizer union-compensation path reconstructed predicates without the offset, which could miss rows or produce duplicates.

### Release note

Fix MTMV union-compensation predicate bounds for hour-offset date_add/date_sub partition expressions.

### Check List (For Author)

- Test: No need to test (local environment lacks JDK 17)
- Behavior changed: Yes (query rewrite/union-compensation correctness)
- Does this need documentation: No
@hakanuzum
Copy link
Copy Markdown
Author

Superseded by #62535 (cleaned-up MTMV hour-offset support) and #62536 (regression bitmap test fix). Closing to reduce confusion; please review the new PRs instead.

@hakanuzum hakanuzum closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support arbitrary scalar functions in materialized view partition expressions (e.g. CONVERT_TZ, DATE_ADD, CAST)

2 participants