[Opt](parse) Add fast path for canonical format datetime parse#62757
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add a fast path for canonical date and datetime prefixes during string-to-datelike casts, while preserving fallback parsing semantics for suffixes, timezone handling, and date-only targets.
### Release note
None
### Check List (For Author)
- Test: BE unit test and benchmark coverage for datelike parsing
- Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:VExprTest.*`
- Manual test: `LD_LIBRARY_PATH="/mnt/disk6/common/jdk-17.0.16/lib/server:${LD_LIBRARY_PATH}" ./be/build_RELEASE/bin/benchmark_test --benchmark_filter='parse_(date|datev2|datetime|datetimev2|timestamptz)/(hit|hit_suffix|miss)$' --benchmark_repetitions=5 --benchmark_report_aggregates_only=true`
- Manual test: `./run-regression-test.sh --run -d datatype_p0/datev2 -s test_parse_fast_path` (blocked by local cluster state: no live BE available)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Restore strict parser correctness when canonical date fast-path detection falls back to the original parser, so valid non-fast-path time forms and date-target casts keep their previous behavior.
### Release note
None
### Check List (For Author)
- Test: BE unit tests for strict fallback parsing
- Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST`
- No regression test: removed the unbacked regression stub because it had no checked-in expected result file and the local cluster is not runnable in this worktree
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Remove zero-date inputs from the canonical hit benchmark corpus so release-mode parse benchmarks measure successful fast-path inputs only, and assert that benchmark hit samples parse successfully.
### Release note
None
### Check List (For Author)
- Test: Release benchmark smoke run
- Manual test: `LD_LIBRARY_PATH="/mnt/disk6/common/jdk-17.0.16/lib/server:${LD_LIBRARY_PATH}" ./be/build_RELEASE/bin/benchmark_test --benchmark_filter='parse_(date|datev2|datetime|datetimev2|timestamptz)/(hit|hit_suffix|miss)$' --benchmark_repetitions=1 --benchmark_report_aggregates_only=true`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Extend the canonical datetime prefix optimization to DATEV2 casts, add missing date-family suffix benchmarks, and restore committed SQL regression coverage for canonical datelike cast parsing.
### Release note
None
### Check List (For Author)
- Test: BE unit tests, release benchmark smoke run, and committed regression case added
- Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST`
- Manual test: `LD_LIBRARY_PATH="/mnt/disk6/common/jdk-17.0.16/lib/server:${LD_LIBRARY_PATH}" ./be/build_RELEASE/bin/benchmark_test --benchmark_filter='parse_(date|datev2|datetime|datetimev2|timestamptz)/(hit|hit_suffix|miss)$' --benchmark_repetitions=1 --benchmark_report_aggregates_only=true`
- Regression test: added `regression-test/suites/datatype_p0/datev2/test_parse_fast_path.groovy` with self-checking assertions; local execution remains blocked by missing live BE nodes in this worktree cluster
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Remove the temporary datelike benchmark code from the submitted patch set and keep the optimization validated by unit/regression coverage plus reported benchmark results in the task summary.
### Release note
None
### Check List (For Author)
- Test: Focused BE unit tests after benchmark removal
- Unit Test: `./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST`
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Make the shared canonical datelike helper self-contained and easier to read by using an explicit four-digit ASCII check for the year prefix.
### Release note
None
### Check List (For Author)
- Test: Focused BE unit test
- ./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Clarify the shared datelike fast-path result semantics so future parser changes preserve the strict fallback handoff and the DATEV2 canonical datetime-prefix validation behavior.
### Release note
None
### Check List (For Author)
- Test: Focused BE unit test
- ./run-be-ut.sh --run --filter=VDateTimeValueTest.*:TimeStampTzValueTest.*:TEST_VEXPR.LITERALTEST
- Behavior changed: No
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Pull request overview
This PR adds a fast-path parser for canonical fixed-format date/time prefixes (e.g. YYYY-MM-DD and YYYY-MM-DD[ T]HH:MM:SS) to speed up strict-mode string casts to DATEV2/DATETIMEV2 (and Date/Datetime v1), with new BE unit tests for the helper utilities.
Changes:
- Introduce
try_parse_fixed_canonical_datelike_prefix()plus small ASCII digit helpers indatelike_serde_common.hpp. - Wire the fast path into strict-mode string parsing in
cast_to_datev2_impl.hpp,cast_to_datetimev2_impl.hpp, andcast_to_date_or_datetime_impl.hpp. - Add unit tests covering the helpers, supported types, success cases, and failure/zero-date behavior.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| be/test/runtime/timestamptz_value_test.cpp | Minor formatting/namespace closing line adjustment. |
| be/test/core/value/vdatetime_value_test.cpp | Adds BEUT coverage for the new fast-parse helpers and behaviors. |
| be/src/exprs/function/cast/cast_to_datev2_impl.hpp | Uses the canonical-prefix fast path in strict DATEV2 parsing. |
| be/src/exprs/function/cast/cast_to_datetimev2_impl.hpp | Uses the canonical-prefix fast path in strict DATETIMEV2 parsing (incl. microsecond handling for exact-length cases). |
| be/src/exprs/function/cast/cast_to_date_or_datetime_impl.hpp | Uses the canonical-prefix fast path in strict DATE/DATETIME v1 parsing. |
| be/src/core/data_type_serde/datelike_serde_common.hpp | Adds canonical fixed-format prefix parsing utilities and result enum. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
run buildall |
There was a problem hiding this comment.
No blocking findings.
Critical checkpoints:
- Goal / correctness: The PR’s goal is to speed up canonical datelike string parsing without changing semantics. The new shared helper only recognizes the fixed
YYYY-MM-DD/YYYY-MM-DD HH:MM:SSsubset and cleanly falls back to the existing parser for all other forms, so the intended behavior is preserved. - Scope / minimality: The change is small and focused: one shared helper, three strict parser call sites, and focused BE unit tests.
- Concurrency: No concurrency is involved in these parsing paths.
- Lifecycle / initialization: No new lifecycle or static-initialization risk was introduced.
- Config: No new config was added. Existing
allow_zero_datehandling is preserved. - Compatibility: No FE/BE protocol, storage-format, ABI, or rolling-upgrade compatibility concerns were introduced.
- Parallel code paths: The optimization is applied across the relevant shared string-to-datelike strict parsers, including the shared
TIMESTAMPTZpath throughCastToDatetimeV2. - Conditional checks: The fast-path guard is intentionally narrow and documented; when a suffix still needs the legacy parser, the code keeps using it.
- Test coverage: The PR adds focused BE unit coverage for the helper’s accepted/rejected canonical prefixes, supported target types, fallback semantics, and zero-date behavior. Existing cast / serde / timestamptz tests still cover the public parsing entrypoints.
- Test result files: None changed.
- Observability: Not applicable for this local parsing optimization.
- Transaction / persistence / data write / FE-BE variable passing: Not involved.
- Performance: The helper avoids generic parsing work on the hot canonical cases and does not introduce an obvious regression on fallback cases.
- Other issues: None identified in this review.
User focus: No additional user-provided review focus was supplied, and I did not find an extra issue outside the normal review scope.
Residual risk: I did not run the BE unit tests in this review environment, so the remaining risk is limited to implementation details that would only show up under compilation or targeted test execution.
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Replace the higher-level fast-path regression coverage with focused BE unit tests for try_parse_fixed_canonical_datelike_prefix and its helper utilities, so the optimization is validated directly at the helper boundary.
### Release note
None
### Check List (For Author)
- Test: BE Unit Test
- ./run-be-ut.sh --run --filter='VDateTimeValueTest.datelike_fast_parse_ascii_helpers:VDateTimeValueTest.datelike_fast_parse_supported_types:VDateTimeValueTest.datetime_v2_try_parse_fixed_canonical_datelike_prefix:VDateTimeValueTest.date_v2_try_parse_fixed_canonical_datelike_prefix:VDateTimeValueTest.datetime_try_parse_fixed_canonical_datelike_prefix:VDateTimeValueTest.datelike_try_parse_fixed_canonical_datelike_prefix_failures:VDateTimeValueTest.datelike_try_parse_fixed_canonical_datelike_prefix_zero_date'
- Behavior changed: No
- Does this need documentation: No
1fad937 to
205c522
Compare
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
run buildall |
| Query | Now | Base | Improve | |---|:---:|:---:|:---:| | `CAST(dt_s AS DATETIMEV2)` | 1.31 avg | 3.01 avg | ~ 56.5% | | `CAST(date_s AS DATEV2)` | 0.81 avg | 1.43 avg | ~ 43.4% | end-to-end testcases already covered. only added beut for new util funtions.
CAST(dt_s AS DATETIMEV2)CAST(date_s AS DATEV2)end-to-end testcases already covered. only added beut for new util funtions.