Skip to content

[Opt](parse) Add fast path for canonical format datetime parse#62757

Merged
zclllyybb merged 8 commits intoapache:masterfrom
zclllyybb:agent/task-b3041d36-parse-datetime-optimization
Apr 30, 2026
Merged

[Opt](parse) Add fast path for canonical format datetime parse#62757
zclllyybb merged 8 commits intoapache:masterfrom
zclllyybb:agent/task-b3041d36-parse-datetime-optimization

Conversation

@zclllyybb
Copy link
Copy Markdown
Contributor

Query Now Base Improve
CAST(dt_s AS DATETIMEV2) 1.31 avg 3.01 avg ~ 56.5%
CAST(date_s AS DATEV2) 0.81 avg 1.43 avg ~ 43.4%

end-to-end testcases already covered. only added beut for new util funtions.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Add a fast path for canonical date and datetime prefixes during string-to-datelike casts, while preserving fallback parsing semantics for suffixes, timezone handling, and date-only targets.

### Release note

None

### Check List (For Author)

- Test: BE unit test and benchmark coverage for datelike parsing
    - Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:VExprTest.*`
    - Manual test: `LD_LIBRARY_PATH="/mnt/disk6/common/jdk-17.0.16/lib/server:${LD_LIBRARY_PATH}" ./be/build_RELEASE/bin/benchmark_test --benchmark_filter='parse_(date|datev2|datetime|datetimev2|timestamptz)/(hit|hit_suffix|miss)$' --benchmark_repetitions=5 --benchmark_report_aggregates_only=true`
    - Manual test: `./run-regression-test.sh --run -d datatype_p0/datev2 -s test_parse_fast_path` (blocked by local cluster state: no live BE available)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Restore strict parser correctness when canonical date fast-path detection falls back to the original parser, so valid non-fast-path time forms and date-target casts keep their previous behavior.

### Release note

None

### Check List (For Author)

- Test: BE unit tests for strict fallback parsing
    - Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST`
    - No regression test: removed the unbacked regression stub because it had no checked-in expected result file and the local cluster is not runnable in this worktree
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Remove zero-date inputs from the canonical hit benchmark corpus so release-mode parse benchmarks measure successful fast-path inputs only, and assert that benchmark hit samples parse successfully.

### Release note

None

### Check List (For Author)

- Test: Release benchmark smoke run
    - Manual test: `LD_LIBRARY_PATH="/mnt/disk6/common/jdk-17.0.16/lib/server:${LD_LIBRARY_PATH}" ./be/build_RELEASE/bin/benchmark_test --benchmark_filter='parse_(date|datev2|datetime|datetimev2|timestamptz)/(hit|hit_suffix|miss)$' --benchmark_repetitions=1 --benchmark_report_aggregates_only=true`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Extend the canonical datetime prefix optimization to DATEV2 casts, add missing date-family suffix benchmarks, and restore committed SQL regression coverage for canonical datelike cast parsing.

### Release note

None

### Check List (For Author)

- Test: BE unit tests, release benchmark smoke run, and committed regression case added
    - Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST`
    - Manual test: `LD_LIBRARY_PATH="/mnt/disk6/common/jdk-17.0.16/lib/server:${LD_LIBRARY_PATH}" ./be/build_RELEASE/bin/benchmark_test --benchmark_filter='parse_(date|datev2|datetime|datetimev2|timestamptz)/(hit|hit_suffix|miss)$' --benchmark_repetitions=1 --benchmark_report_aggregates_only=true`
    - Regression test: added `regression-test/suites/datatype_p0/datev2/test_parse_fast_path.groovy` with self-checking assertions; local execution remains blocked by missing live BE nodes in this worktree cluster
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Remove the temporary datelike benchmark code from the submitted patch set and keep the optimization validated by unit/regression coverage plus reported benchmark results in the task summary.

### Release note

None

### Check List (For Author)

- Test: Focused BE unit tests after benchmark removal
    - Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Make the shared canonical datelike helper self-contained and easier to read by using an explicit four-digit ASCII check for the year prefix.

### Release note

None

### Check List (For Author)

- Test: Focused BE unit test
    - ./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Clarify the shared datelike fast-path result semantics so future parser changes preserve the strict fallback handoff and the DATEV2 canonical datetime-prefix validation behavior.

### Release note

None

### Check List (For Author)

- Test: Focused BE unit test
    - ./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST
- Behavior changed: No
- Does this need documentation: No
Copilot AI review requested due to automatic review settings April 23, 2026 11:31
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 23, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zclllyybb
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a fast-path parser for canonical fixed-format date/time prefixes (e.g. YYYY-MM-DD and YYYY-MM-DD[ T]HH:MM:SS) to speed up strict-mode string casts to DATEV2/DATETIMEV2 (and Date/Datetime v1), with new BE unit tests for the helper utilities.

Changes:

  • Introduce try_parse_fixed_canonical_datelike_prefix() plus small ASCII digit helpers in datelike_serde_common.hpp.
  • Wire the fast path into strict-mode string parsing in cast_to_datev2_impl.hpp, cast_to_datetimev2_impl.hpp, and cast_to_date_or_datetime_impl.hpp.
  • Add unit tests covering the helpers, supported types, success cases, and failure/zero-date behavior.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
be/test/runtime/timestamptz_value_test.cpp Minor formatting/namespace closing line adjustment.
be/test/core/value/vdatetime_value_test.cpp Adds BEUT coverage for the new fast-parse helpers and behaviors.
be/src/exprs/function/cast/cast_to_datev2_impl.hpp Uses the canonical-prefix fast path in strict DATEV2 parsing.
be/src/exprs/function/cast/cast_to_datetimev2_impl.hpp Uses the canonical-prefix fast path in strict DATETIMEV2 parsing (incl. microsecond handling for exact-length cases).
be/src/exprs/function/cast/cast_to_date_or_datetime_impl.hpp Uses the canonical-prefix fast path in strict DATE/DATETIME v1 parsing.
be/src/core/data_type_serde/datelike_serde_common.hpp Adds canonical fixed-format prefix parsing utilities and result enum.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zclllyybb
Copy link
Copy Markdown
Contributor Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking findings.

Critical checkpoints:

  • Goal / correctness: The PR’s goal is to speed up canonical datelike string parsing without changing semantics. The new shared helper only recognizes the fixed YYYY-MM-DD / YYYY-MM-DD HH:MM:SS subset and cleanly falls back to the existing parser for all other forms, so the intended behavior is preserved.
  • Scope / minimality: The change is small and focused: one shared helper, three strict parser call sites, and focused BE unit tests.
  • Concurrency: No concurrency is involved in these parsing paths.
  • Lifecycle / initialization: No new lifecycle or static-initialization risk was introduced.
  • Config: No new config was added. Existing allow_zero_date handling is preserved.
  • Compatibility: No FE/BE protocol, storage-format, ABI, or rolling-upgrade compatibility concerns were introduced.
  • Parallel code paths: The optimization is applied across the relevant shared string-to-datelike strict parsers, including the shared TIMESTAMPTZ path through CastToDatetimeV2.
  • Conditional checks: The fast-path guard is intentionally narrow and documented; when a suffix still needs the legacy parser, the code keeps using it.
  • Test coverage: The PR adds focused BE unit coverage for the helper’s accepted/rejected canonical prefixes, supported target types, fallback semantics, and zero-date behavior. Existing cast / serde / timestamptz tests still cover the public parsing entrypoints.
  • Test result files: None changed.
  • Observability: Not applicable for this local parsing optimization.
  • Transaction / persistence / data write / FE-BE variable passing: Not involved.
  • Performance: The helper avoids generic parsing work on the hot canonical cases and does not introduce an obvious regression on fallback cases.
  • Other issues: None identified in this review.

User focus: No additional user-provided review focus was supplied, and I did not find an extra issue outside the normal review scope.

Residual risk: I did not run the BE unit tests in this review environment, so the remaining risk is limited to implementation details that would only show up under compilation or targeted test execution.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (126/126) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.57% (26786/37426)
Line Coverage 53.93% (279983/519145)
Region Coverage 47.17% (214816/455378)
Branch Coverage 50.58% (97416/192586)

Mryange
Mryange previously approved these changes Apr 24, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

HappenLee
HappenLee previously approved these changes Apr 26, 2026
Copy link
Copy Markdown
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Replace the higher-level fast-path regression coverage with focused BE unit tests for try_parse_fixed_canonical_datelike_prefix and its helper utilities, so the optimization is validated directly at the helper boundary.

### Release note

None

### Check List (For Author)

- Test: BE Unit Test
    - ./run-be-ut.sh --run --filter='VDateTimeValueTest.datelike_fast_parse_ascii_helpers:VDateTimeValueTest.datelike_fast_parse_supported_types:VDateTimeValueTest.datetime_v2_try_parse_fixed_canonical_datelike_prefix:VDateTimeValueTest.date_v2_try_parse_fixed_canonical_datelike_prefix:VDateTimeValueTest.datetime_try_parse_fixed_canonical_datelike_prefix:VDateTimeValueTest.datelike_try_parse_fixed_canonical_datelike_prefix_failures:VDateTimeValueTest.datelike_try_parse_fixed_canonical_datelike_prefix_zero_date'
- Behavior changed: No
- Does this need documentation: No
@zclllyybb zclllyybb dismissed stale reviews from HappenLee and Mryange via 205c522 April 29, 2026 07:28
@zclllyybb zclllyybb force-pushed the agent/task-b3041d36-parse-datetime-optimization branch from 1fad937 to 205c522 Compare April 29, 2026 07:28
@zclllyybb
Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Apr 29, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@zclllyybb
Copy link
Copy Markdown
Contributor Author

run buildall

@zclllyybb zclllyybb merged commit fcd4d98 into apache:master Apr 30, 2026
32 of 34 checks passed
@zclllyybb zclllyybb deleted the agent/task-b3041d36-parse-datetime-optimization branch April 30, 2026 08:57
github-actions Bot pushed a commit that referenced this pull request Apr 30, 2026
| Query | Now | Base | Improve |
|---|:---:|:---:|:---:|
| `CAST(dt_s AS DATETIMEV2)` | 1.31 avg | 3.01 avg | ~ 56.5% |
| `CAST(date_s AS DATEV2)` | 0.81 avg | 1.43 avg | ~ 43.4% |

end-to-end testcases already covered. only added beut for new util
funtions.
yiguolei pushed a commit that referenced this pull request May 1, 2026
…parse #62757 (#62976)

Cherry-picked from #62757

Co-authored-by: zclllyybb <zhaochangle@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants