fix(table): normalize timestamp units for partitioned writes by fallintoplace · Pull Request #1112 · apache/iceberg-go

fallintoplace · 2026-05-21T19:00:05Z

Summary

convert Arrow timestamp partition values according to their Arrow unit before applying Iceberg partition transforms
use the table source type to choose microsecond vs nanosecond Iceberg timestamp literals
add partition fanout regression coverage for timestamp seconds, milliseconds, microseconds, and nanoseconds

Why

Partitioned writes compute partition keys before ToRequestedSchema normalizes Arrow timestamp arrays to the table schema. The old partition path cast raw Arrow timestamp values directly to Iceberg microsecond timestamps, so timestamp[s] and timestamp[ms] inputs could be routed to the wrong day/hour partition even though the data values were later written with normalized units.

Fixes #1111.

Testing

go test ./table -run TestFanoutWriter -count=1
git diff --check

tanmayrauth · 2026-05-21T19:09:29Z

+	case arrow.Microsecond:
+		return value, nil
+	case arrow.Nanosecond:
+		return value / 1_000, nil


Go's / truncates toward zero, but unit downconversion should floor. For pre-epoch values this rounds the wrong way: ns=-1500 gives -1 here, but the correct μs bin is -2 ([-2000, -1000)). Same class of bug this PR is fixing, wrong partition routing for negative timestamps.

Try

case arrow.Nanosecond: return math.FloorDiv(value, 1_000), nil

Can you please add a regression test with a negative ns value (e.g. one second before epoch with a sub-μs offset) asserting the partition path.

tanmayrauth · 2026-05-21T19:13:50Z

Out of scope: Time64 (line ~381) has the same unit-vs-iceberg.Time bug — follow-up PR.

zeroshade · 2026-05-22T18:24:44Z

+func floorDivInt64(a, b int64) int64 {
+	d := a / b
+	if (a^b) < 0 && d*b != a {
+		d--
+	}
+
+	return d
+}


this already exists in the root transforms.go file, we should probably just move the version in transforms.go:579 into an internal/utils.go file and then use that in both places rather than duplicate this function.

zeroshade · 2026-05-22T18:25:28Z

+	if (value > 0 && value > math.MaxInt64/factor) ||
+		(value < 0 && value < math.MinInt64/factor) {
+		return 0, fmt.Errorf("arrow timestamp value %d overflows int64 when scaled by %d", value, factor)
+	}


can you add a test that covers this? I don't think it's covered by the current tests

zeroshade · 2026-05-22T18:27:04Z

+	case iceberg.TimestampType, iceberg.TimestampTzType:
+		micros, err := arrowTimestampToMicros(value, timestampType.Unit)
+		if err != nil {
+			return nil, err
+		}
+
+		return iceberg.NewLiteral(iceberg.Timestamp(micros)), nil
+	case iceberg.TimestampNsType, iceberg.TimestampTzNsType:


the Tz variants don't seem to get tested, can you add cases that have TimeZone: "UTC" so we hit this case?

zeroshade · 2026-05-22T18:28:14Z

+			return nil, fmt.Errorf("failed to find source field ID %d in schema", sourceField.SourceID())
+		}
+		partitionColumns[i] = record.Column(colIndices[0])
+		partitionFieldsInfo[i] = partitionFieldInfo{&sourceField, sourceField.FieldID, sourceType}


Suggested change

partitionFieldsInfo[i] = partitionFieldInfo{&sourceField, sourceField.FieldID, sourceType}

partitionFieldsInfo[i] = partitionFieldInfo{

sourceField: &sourceField,

fieldID: sourceField.FieldID,

sourceType: sourceType,

}

just so we don't accidentally misorder things

zeroshade · 2026-05-22T18:28:50Z

 }

+type partitionFieldInfo struct {
+	sourceField *iceberg.PartitionField


PartitionField is a small struct, why use a pointer here instead of just using it by value?

zeroshade · 2026-05-22T18:29:40Z

+		}
+		sourceType, ok := schema.FindTypeByID(sourceField.SourceID())
+		if !ok {
+			return nil, fmt.Errorf("failed to find source field ID %d in schema", sourceField.SourceID())


can we use something like "failed to find type for source field ID" to distinguish this error from the above identical one?

fallintoplace requested a review from zeroshade as a code owner May 21, 2026 19:00

tanmayrauth reviewed May 21, 2026

View reviewed changes

fallintoplace force-pushed the fix/partition-timestamp-units branch from 9ad9bb9 to 217aa9f Compare May 21, 2026 19:37

fallintoplace mentioned this pull request May 21, 2026

Partitioned writes should handle Arrow Time64 units consistently #1115

Open

fallintoplace force-pushed the fix/partition-timestamp-units branch from 217aa9f to c873da9 Compare May 21, 2026 22:18

zeroshade requested changes May 22, 2026

View reviewed changes

fallintoplace force-pushed the fix/partition-timestamp-units branch 2 times, most recently from a138222 to a138e3d Compare May 22, 2026 19:17

fallintoplace requested a review from zeroshade May 22, 2026 21:43

fallintoplace added 2 commits May 23, 2026 13:49

fix(table): normalize timestamp units for partitioned writes

0215efa

fix(table): floor timestamp downcasts during writes

f0ee48e

fallintoplace force-pushed the fix/partition-timestamp-units branch from 64f42d6 to f0ee48e Compare May 23, 2026 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(table): normalize timestamp units for partitioned writes#1112

fix(table): normalize timestamp units for partitioned writes#1112
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/partition-timestamp-units

fallintoplace commented May 21, 2026

Uh oh!

tanmayrauth May 21, 2026 •

edited

Loading

Uh oh!

tanmayrauth commented May 21, 2026

Uh oh!

zeroshade May 22, 2026

Uh oh!

zeroshade May 22, 2026

Uh oh!

zeroshade May 22, 2026

Uh oh!

zeroshade May 22, 2026

Uh oh!

zeroshade May 22, 2026

Uh oh!

zeroshade May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-		partitionFieldsInfo[i] = partitionFieldInfo{&sourceField, sourceField.FieldID, sourceType}
+		partitionFieldsInfo[i] = partitionFieldInfo{
+		    sourceField: &sourceField,
+		    fieldID:          sourceField.FieldID,
+		    sourceType: sourceType,
+		}

Conversation

fallintoplace commented May 21, 2026

Summary

Why

Testing

Uh oh!

tanmayrauth May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanmayrauth commented May 21, 2026

Uh oh!

zeroshade May 22, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade May 22, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade May 22, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade May 22, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade May 22, 2026

Choose a reason for hiding this comment

Uh oh!

zeroshade May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tanmayrauth May 21, 2026 •

edited

Loading