Skip to content

DATEDIFF('second', a, b) returns CEIL of duration instead of boundary-count (deviates from Snowflake semantics) #130

@rampage644

Description

@rampage644

Summary

DATEDIFF('second', a, b) on Embucket returns CEIL((b - a) / 1 second) — the ceiling of the true fractional duration. Snowflake's documented semantics is boundary-count: floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Any positive sub-second difference rounds up to 1 on Embucket. No error; silently wrong result.

Minimal standalone reproducer

Run the same query on both engines:

SELECT DATEDIFF('second',
                TIMESTAMP '2020-01-01 00:00:00.100',
                TIMESTAMP '2020-01-01 00:00:00.900');
-- Snowflake: 0   (same second bucket, 0 boundaries crossed)
-- Embucket:  1   (CEIL(0.8) = 1)

Full test matrix

a b true duration Embucket Snowflake
00:00:00.500 00:00:00.500 0.0 sec 0 0
00:00:00.100 00:00:00.101 0.001 sec same sec 1 0
00:00:00.200 00:00:00.700 0.5 sec same sec 1 0
00:00:00.750 00:00:01.250 0.5 sec straddles boundary 1 1
00:00:00.900 00:00:01.100 0.2 sec straddles boundary 1 1
00:00:00.500 00:00:01.500 1.0 sec 1 1
00:00:00.250 00:00:01.750 1.5 sec 2 1
00:00:00.000 00:00:02.000 2.0 sec exact 2 2
00:00:00.500 00:00:02.900 2.4 sec 3 2
00:00:00.000 00:00:02.500 2.5 sec 3 2
00:00:00.999 00:01:00.000 59.001 sec 60 60

Formulas

  • Snowflake (correct / documented): DATEDIFF('second', a, b) = floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Counts how many second-boundaries are crossed. Returns 0 for two timestamps in the same second regardless of sub-second distance.
  • Embucket (current): DATEDIFF('second', a, b) = CEIL((b - a) / 1 second). Returns the ceiling of the true elapsed seconds. Returns 1 for any positive sub-second difference.

The engines agree only when both endpoints sit on exact second boundaries, or when the ceil value happens to equal the boundary count.

Snowflake reference

Snowflake docs on DATEDIFF (https://docs.snowflake.com/en/sql-reference/functions/datediff): "Calculates the difference between two date, time, or timestamp expressions based on the date or time part requested. The function returns the result of subtracting the second argument from the third argument." The result is the count of date_or_time_part boundaries crossed, not the real elapsed duration. This is the same semantics as Microsoft SQL Server's DATEDIFF, BigQuery's DATETIME_DIFF, and Databricks' DATEDIFF.

Impact surfaced in a real pipeline

Found while running dbt-snowplow-web on Embucket vs Snowflake against a shared S3 Tables Iceberg source. Snowplow's snowplow_web_page_views_this_run computes:

DATEDIFF('second', p.derived_tstamp, COALESCE(t.end_tstamp, p.derived_tstamp)) AS absolute_time_in_s

Aggregate inflation on Embucket, 609,671 input events:

  • snowplow_web_page_views.sum_absolute_time_in_s: +1.25% over Snowflake (2,058,065 vs 2,032,667).
  • snowplow_web_sessions.sum_absolute_time_in_s: +0.85% over Snowflake (13,149,140 vs 13,037,353).

Direction is strictly one-sided (Embucket ≥ Snowflake always), consistent with the formula.

Spot checks:

  • Session 7da3c35f-5565-4903-88df-bb3c19a918f9: Embucket 231 sec, Snowflake 230 sec; all other columns identical.
  • Page view 00005058-c0e9-4e2a-8c2a-24db2e1f8fa4: Embucket 40, Snowflake 39; all else equal.

Expected fix

DATEDIFF('second', a, b) should return floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Presumably generalizable: for DATEDIFF(part, a, b), truncate both endpoints to part precision first, then compute the integer difference in part units. This is the semantics all Snowflake-compatible engines implement.

Same fix should be applied consistently to every time unit (minute, hour, day, etc.) — worth auditing whether other units already use boundary-count or also use CEIL-of-duration.

Environment

Embucket version: v0.2.2 (Lambda-deployed; MEM_POOL_SIZE_MB=9216, MEM_POOL_TYPE=greedy). DataFusion-based. Reproduced via the embucket SQL endpoint on an ARM64 Lambda, 10 GB memory.

Related context

Found as part of the broader Embucket-vs-Snowflake parity investigation at Embucket/embucket-snowplow#6 (full report in specs/2026-04-22-embucket-snowflake-investigation.md, "Finding A / Finding 3").

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions