Summary
DATEDIFF('second', a, b) on Embucket returns CEIL((b - a) / 1 second) — the ceiling of the true fractional duration. Snowflake's documented semantics is boundary-count: floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Any positive sub-second difference rounds up to 1 on Embucket. No error; silently wrong result.
Minimal standalone reproducer
Run the same query on both engines:
SELECT DATEDIFF('second',
TIMESTAMP '2020-01-01 00:00:00.100',
TIMESTAMP '2020-01-01 00:00:00.900');
-- Snowflake: 0 (same second bucket, 0 boundaries crossed)
-- Embucket: 1 (CEIL(0.8) = 1)
Full test matrix
| a |
b |
true duration |
Embucket |
Snowflake |
00:00:00.500 |
00:00:00.500 |
0.0 sec |
0 |
0 |
00:00:00.100 |
00:00:00.101 |
0.001 sec same sec |
1 |
0 |
00:00:00.200 |
00:00:00.700 |
0.5 sec same sec |
1 |
0 |
00:00:00.750 |
00:00:01.250 |
0.5 sec straddles boundary |
1 |
1 |
00:00:00.900 |
00:00:01.100 |
0.2 sec straddles boundary |
1 |
1 |
00:00:00.500 |
00:00:01.500 |
1.0 sec |
1 |
1 |
00:00:00.250 |
00:00:01.750 |
1.5 sec |
2 |
1 |
00:00:00.000 |
00:00:02.000 |
2.0 sec exact |
2 |
2 |
00:00:00.500 |
00:00:02.900 |
2.4 sec |
3 |
2 |
00:00:00.000 |
00:00:02.500 |
2.5 sec |
3 |
2 |
00:00:00.999 |
00:01:00.000 |
59.001 sec |
60 |
60 |
Formulas
- Snowflake (correct / documented):
DATEDIFF('second', a, b) = floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Counts how many second-boundaries are crossed. Returns 0 for two timestamps in the same second regardless of sub-second distance.
- Embucket (current):
DATEDIFF('second', a, b) = CEIL((b - a) / 1 second). Returns the ceiling of the true elapsed seconds. Returns 1 for any positive sub-second difference.
The engines agree only when both endpoints sit on exact second boundaries, or when the ceil value happens to equal the boundary count.
Snowflake reference
Snowflake docs on DATEDIFF (https://docs.snowflake.com/en/sql-reference/functions/datediff): "Calculates the difference between two date, time, or timestamp expressions based on the date or time part requested. The function returns the result of subtracting the second argument from the third argument." The result is the count of date_or_time_part boundaries crossed, not the real elapsed duration. This is the same semantics as Microsoft SQL Server's DATEDIFF, BigQuery's DATETIME_DIFF, and Databricks' DATEDIFF.
Impact surfaced in a real pipeline
Found while running dbt-snowplow-web on Embucket vs Snowflake against a shared S3 Tables Iceberg source. Snowplow's snowplow_web_page_views_this_run computes:
DATEDIFF('second', p.derived_tstamp, COALESCE(t.end_tstamp, p.derived_tstamp)) AS absolute_time_in_s
Aggregate inflation on Embucket, 609,671 input events:
snowplow_web_page_views.sum_absolute_time_in_s: +1.25% over Snowflake (2,058,065 vs 2,032,667).
snowplow_web_sessions.sum_absolute_time_in_s: +0.85% over Snowflake (13,149,140 vs 13,037,353).
Direction is strictly one-sided (Embucket ≥ Snowflake always), consistent with the formula.
Spot checks:
- Session
7da3c35f-5565-4903-88df-bb3c19a918f9: Embucket 231 sec, Snowflake 230 sec; all other columns identical.
- Page view
00005058-c0e9-4e2a-8c2a-24db2e1f8fa4: Embucket 40, Snowflake 39; all else equal.
Expected fix
DATEDIFF('second', a, b) should return floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Presumably generalizable: for DATEDIFF(part, a, b), truncate both endpoints to part precision first, then compute the integer difference in part units. This is the semantics all Snowflake-compatible engines implement.
Same fix should be applied consistently to every time unit (minute, hour, day, etc.) — worth auditing whether other units already use boundary-count or also use CEIL-of-duration.
Environment
Embucket version: v0.2.2 (Lambda-deployed; MEM_POOL_SIZE_MB=9216, MEM_POOL_TYPE=greedy). DataFusion-based. Reproduced via the embucket SQL endpoint on an ARM64 Lambda, 10 GB memory.
Related context
Found as part of the broader Embucket-vs-Snowflake parity investigation at Embucket/embucket-snowplow#6 (full report in specs/2026-04-22-embucket-snowflake-investigation.md, "Finding A / Finding 3").
Summary
DATEDIFF('second', a, b)on Embucket returnsCEIL((b - a) / 1 second)— the ceiling of the true fractional duration. Snowflake's documented semantics is boundary-count:floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Any positive sub-second difference rounds up to 1 on Embucket. No error; silently wrong result.Minimal standalone reproducer
Run the same query on both engines:
Full test matrix
00:00:00.50000:00:00.50000:00:00.10000:00:00.10100:00:00.20000:00:00.70000:00:00.75000:00:01.25000:00:00.90000:00:01.10000:00:00.50000:00:01.50000:00:00.25000:00:01.75000:00:00.00000:00:02.00000:00:00.50000:00:02.90000:00:00.00000:00:02.50000:00:00.99900:01:00.000Formulas
DATEDIFF('second', a, b) = floor(epoch_seconds(b)) - floor(epoch_seconds(a)). Counts how manysecond-boundaries are crossed. Returns 0 for two timestamps in the same second regardless of sub-second distance.DATEDIFF('second', a, b) = CEIL((b - a) / 1 second). Returns the ceiling of the true elapsed seconds. Returns 1 for any positive sub-second difference.The engines agree only when both endpoints sit on exact second boundaries, or when the ceil value happens to equal the boundary count.
Snowflake reference
Snowflake docs on
DATEDIFF(https://docs.snowflake.com/en/sql-reference/functions/datediff): "Calculates the difference between two date, time, or timestamp expressions based on the date or time part requested. The function returns the result of subtracting the second argument from the third argument." The result is the count ofdate_or_time_partboundaries crossed, not the real elapsed duration. This is the same semantics as Microsoft SQL Server'sDATEDIFF, BigQuery'sDATETIME_DIFF, and Databricks'DATEDIFF.Impact surfaced in a real pipeline
Found while running dbt-snowplow-web on Embucket vs Snowflake against a shared S3 Tables Iceberg source. Snowplow's
snowplow_web_page_views_this_runcomputes:Aggregate inflation on Embucket, 609,671 input events:
snowplow_web_page_views.sum_absolute_time_in_s: +1.25% over Snowflake (2,058,065 vs 2,032,667).snowplow_web_sessions.sum_absolute_time_in_s: +0.85% over Snowflake (13,149,140 vs 13,037,353).Direction is strictly one-sided (Embucket ≥ Snowflake always), consistent with the formula.
Spot checks:
7da3c35f-5565-4903-88df-bb3c19a918f9: Embucket 231 sec, Snowflake 230 sec; all other columns identical.00005058-c0e9-4e2a-8c2a-24db2e1f8fa4: Embucket 40, Snowflake 39; all else equal.Expected fix
DATEDIFF('second', a, b)should returnfloor(epoch_seconds(b)) - floor(epoch_seconds(a)). Presumably generalizable: forDATEDIFF(part, a, b), truncate both endpoints topartprecision first, then compute the integer difference inpartunits. This is the semantics all Snowflake-compatible engines implement.Same fix should be applied consistently to every time unit (minute, hour, day, etc.) — worth auditing whether other units already use boundary-count or also use CEIL-of-duration.
Environment
Embucket version:
v0.2.2(Lambda-deployed;MEM_POOL_SIZE_MB=9216,MEM_POOL_TYPE=greedy). DataFusion-based. Reproduced via the embucket SQL endpoint on an ARM64 Lambda, 10 GB memory.Related context
Found as part of the broader Embucket-vs-Snowflake parity investigation at Embucket/embucket-snowplow#6 (full report in
specs/2026-04-22-embucket-snowflake-investigation.md, "Finding A / Finding 3").