Skip to content

DX-105463: [C++][Gandiva] Add TimestampIR wrapper for next_day#135

Merged
akravchukdremio merged 1 commit intodremio_27.0_20from
kravchuk/DX-105463-next-day-fix
Apr 20, 2026
Merged

DX-105463: [C++][Gandiva] Add TimestampIR wrapper for next_day#135
akravchukdremio merged 1 commit intodremio_27.0_20from
kravchuk/DX-105463-next-day-fix

Conversation

@akravchukdremio
Copy link
Copy Markdown

@akravchukdremio akravchukdremio commented Apr 17, 2026

Summary

Fixes a silent-wrong-results correctness bug in next_day(timestamp[us/ns], utf8) when the relaxed DataTypeEquals routes non-ms timestamps through Gandiva but no IR wrapper exists.

Follow-up to #134 (TimestampIR for unit-aware timestamp support, now merged).

Background

During end-to-end testing of #134 against real Parquet/Iceberg data (which stores timestamps as timestamp[us] regardless of the original SQL precision), NEXT_DAY(ts_us, 'MO') returned +53425-02-28 — a date ~51,000 years in the future — instead of the correct Monday after the input date.

Root cause

next_day is registered in function_registry_datetime.cc for both date64 and timestamp via the DATE_TYPES macro:

DATE_TYPES(NEXT_DAY_SAFE_NULL_IF_NULL, next_day, {})

But next_day was not included in any of the static tables in timestamp_ir.cc (kExtracts, kTruncs, kDiffs, kCastsFromTs, kDateArithEntries, kFixedAdds, kCalendarAdds). As a result, no next_day_from_timestamp_us / _ns IR wrappers were generated.

With the DataTypeEquals relaxation from #134, the validation step matches timestamp[us] against the registered timestamp[ms] signature. BuildFunctionCall then tries to remap next_day_from_timestampnext_day_from_timestamp_us, fails to find it in IsTimestampIRFunction, and silently falls through to calling the precompiled millis function with raw microsecond values — yielding dates far in the future.

Fix

Adds a BuildNextDayWrapper that follows the cast-from-timestamp pattern:

next_day_from_timestamp_us(ctx, ts_us, day_str, day_len) -> date64 {
  millis = FloorDiv(ts_us, 1000)
  return next_day_from_timestamp(ctx, millis, day_str, day_len)
}

No remainder recombination is needed because next_day returns date64 (midnight of the next weekday), so sub-millisecond input precision is not meaningful in the result.

Files changed

  • cpp/src/gandiva/timestamp_ir.h — declare BuildNextDayWrapper
  • cpp/src/gandiva/timestamp_ir.cc
    • Register next_day_from_timestamp{_us,_ns} in BuildAllFunctionNames
    • Implement BuildNextDayWrapper (modeled after BuildCastFromTimestampWrapper with extra context/string/length args)
    • Call it from AddFunctions for each non-ms unit

Build artifacts

Native library built and published with this fix:

Testing

Verified end-to-end on a local Dremio instance reading Parquet timestamp data.

Test query:

SELECT NEXT_DAY(ts_us, 'MO') FROM tst.dremio_ts_test;

where ts_us is a timestamp[us] column containing 2021-06-15 14:30:45.123456 (a Tuesday).

Before fix (arrow-gandiva 18.3.0-...-28932a0-dremio):

+53425-02-28

(microseconds interpreted as milliseconds → date ~51,000 years in the future)

After fix (arrow-gandiva 18.3.0-...-73313a7-dremio):

2021-06-21

(correct Monday after 2021-06-15)

Are there user-facing changes?

No API changes. NEXT_DAY on TIMESTAMP(6) / TIMESTAMP(9) columns now returns correct results instead of dates ~51,000 years in the future.

🤖 Generated with Claude Code

next_day(timestamp, utf8) was registered in the C++ function registry for
timestamp inputs but was not included in any TimestampIR table, so no _us/_ns
IR wrappers were generated. With the relaxed DataTypeEquals (ignoring TimeUnit),
calls like next_day(timestamp[us], 'MO') pass validation and are routed to
Gandiva, but BuildFunctionCall silently falls through to the precompiled
millis function, which interprets microseconds as milliseconds — producing
dates ~51,000 years in the future (e.g. +53425-02-28 for a 2021 input).

This adds a BuildNextDayWrapper that scales the timestamp input to millis
via FloorDiv and calls the precompiled next_day_from_timestamp(context,
millis, day_str, day_len) function. Pattern follows the cast-from-timestamp
wrapper shape with the additional (context, string, length) args.

No remainder recombination is needed: next_day returns date64 (midnight of
the next weekday), so sub-millisecond input precision is not meaningful in
the result.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@akravchukdremio akravchukdremio force-pushed the kravchuk/DX-105463-next-day-fix branch from d256409 to 73313a7 Compare April 17, 2026 17:43
@akravchukdremio akravchukdremio self-assigned this Apr 17, 2026
@akravchukdremio akravchukdremio merged commit 8fdf3ca into dremio_27.0_20 Apr 20, 2026
14 of 50 checks passed
lriggs pushed a commit to lriggs/arrow that referenced this pull request Apr 23, 2026
…o#135)

next_day(timestamp, utf8) was registered in the C++ function registry for
timestamp inputs but was not included in any TimestampIR table, so no _us/_ns
IR wrappers were generated. With the relaxed DataTypeEquals (ignoring TimeUnit),
calls like next_day(timestamp[us], 'MO') pass validation and are routed to
Gandiva, but BuildFunctionCall silently falls through to the precompiled
millis function, which interprets microseconds as milliseconds — producing
dates ~51,000 years in the future (e.g. +53425-02-28 for a 2021 input).

This adds a BuildNextDayWrapper that scales the timestamp input to millis
via FloorDiv and calls the precompiled next_day_from_timestamp(context,
millis, day_str, day_len) function. Pattern follows the cast-from-timestamp
wrapper shape with the additional (context, string, length) args.

No remainder recombination is needed: next_day returns date64 (midnight of
the next weekday), so sub-millisecond input precision is not meaningful in
the result.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lriggs added a commit that referenced this pull request Apr 24, 2026
…/ns] support (#137)

* DX-105463: [C++][Gandiva] Add TimestampIR for unit-aware timestamp[us/ns] support

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* DX-105463: Fix clang-format violations in TimestampIR, engine, generator, and test files

* Fix compilation issues

* Updated tests and a few tweaks for passing dremio tests.

* add error message for missing timestamp function

* DX-105463: [C++][Gandiva] Add TimestampIR wrapper for next_day (#135)

next_day(timestamp, utf8) was registered in the C++ function registry for
timestamp inputs but was not included in any TimestampIR table, so no _us/_ns
IR wrappers were generated. With the relaxed DataTypeEquals (ignoring TimeUnit),
calls like next_day(timestamp[us], 'MO') pass validation and are routed to
Gandiva, but BuildFunctionCall silently falls through to the precompiled
millis function, which interprets microseconds as milliseconds — producing
dates ~51,000 years in the future (e.g. +53425-02-28 for a 2021 input).

This adds a BuildNextDayWrapper that scales the timestamp input to millis
via FloorDiv and calls the precompiled next_day_from_timestamp(context,
millis, day_str, day_len) function. Pattern follows the cast-from-timestamp
wrapper shape with the additional (context, string, length) args.

No remainder recombination is needed: next_day returns date64 (midnight of
the next weekday), so sub-millisecond input precision is not meaningful in
the result.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Dmitry Chirkov <dmitry.chirkov@dremio.com>
Co-authored-by: Arkadii Kravchuk <arkadii.kravchuk@dremio.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants