Skip to content

fix(table): normalize timestamp units for partitioned writes#1112

Open
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/partition-timestamp-units
Open

fix(table): normalize timestamp units for partitioned writes#1112
fallintoplace wants to merge 2 commits into
apache:mainfrom
fallintoplace:fix/partition-timestamp-units

Conversation

@fallintoplace
Copy link
Copy Markdown
Contributor

Summary

  • convert Arrow timestamp partition values according to their Arrow unit before applying Iceberg partition transforms
  • use the table source type to choose microsecond vs nanosecond Iceberg timestamp literals
  • add partition fanout regression coverage for timestamp seconds, milliseconds, microseconds, and nanoseconds

Why

Partitioned writes compute partition keys before ToRequestedSchema normalizes Arrow timestamp arrays to the table schema. The old partition path cast raw Arrow timestamp values directly to Iceberg microsecond timestamps, so timestamp[s] and timestamp[ms] inputs could be routed to the wrong day/hour partition even though the data values were later written with normalized units.

Fixes #1111.

Testing

  • go test ./table -run TestFanoutWriter -count=1
  • git diff --check

@fallintoplace fallintoplace requested a review from zeroshade as a code owner May 21, 2026 19:00
Comment thread table/partitioned_fanout_writer.go Outdated
case arrow.Microsecond:
return value, nil
case arrow.Nanosecond:
return value / 1_000, nil
Copy link
Copy Markdown
Contributor

@tanmayrauth tanmayrauth May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go's / truncates toward zero, but unit downconversion should floor. For pre-epoch values this rounds the wrong way: ns=-1500 gives -1 here, but the correct μs bin is -2 ([-2000, -1000)). Same class of bug this PR is fixing, wrong partition routing for negative timestamps.

Try

case arrow.Nanosecond:
    return math.FloorDiv(value, 1_000), nil

Can you please add a regression test with a negative ns value (e.g. one second before epoch with a sub-μs offset) asserting the partition path.

@tanmayrauth
Copy link
Copy Markdown
Contributor

Out of scope: Time64 (line ~381) has the same unit-vs-iceberg.Time bug — follow-up PR.

Comment thread table/partitioned_fanout_writer.go Outdated
Comment on lines +539 to +546
func floorDivInt64(a, b int64) int64 {
d := a / b
if (a^b) < 0 && d*b != a {
d--
}

return d
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this already exists in the root transforms.go file, we should probably just move the version in transforms.go:579 into an internal/utils.go file and then use that in both places rather than duplicate this function.

Comment on lines +531 to +534
if (value > 0 && value > math.MaxInt64/factor) ||
(value < 0 && value < math.MinInt64/factor) {
return 0, fmt.Errorf("arrow timestamp value %d overflows int64 when scaled by %d", value, factor)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that covers this? I don't think it's covered by the current tests

Comment on lines +481 to +488
case iceberg.TimestampType, iceberg.TimestampTzType:
micros, err := arrowTimestampToMicros(value, timestampType.Unit)
if err != nil {
return nil, err
}

return iceberg.NewLiteral(iceberg.Timestamp(micros)), nil
case iceberg.TimestampNsType, iceberg.TimestampTzNsType:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Tz variants don't seem to get tested, can you add cases that have TimeZone: "UTC" so we hit this case?

Comment thread table/partitioned_fanout_writer.go Outdated
return nil, fmt.Errorf("failed to find source field ID %d in schema", sourceField.SourceID())
}
partitionColumns[i] = record.Column(colIndices[0])
partitionFieldsInfo[i] = partitionFieldInfo{&sourceField, sourceField.FieldID, sourceType}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
partitionFieldsInfo[i] = partitionFieldInfo{&sourceField, sourceField.FieldID, sourceType}
partitionFieldsInfo[i] = partitionFieldInfo{
sourceField: &sourceField,
fieldID: sourceField.FieldID,
sourceType: sourceType,
}

just so we don't accidentally misorder things

Comment thread table/partitioned_fanout_writer.go Outdated
}

type partitionFieldInfo struct {
sourceField *iceberg.PartitionField
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartitionField is a small struct, why use a pointer here instead of just using it by value?

Comment thread table/partitioned_fanout_writer.go Outdated
}
sourceType, ok := schema.FindTypeByID(sourceField.SourceID())
if !ok {
return nil, fmt.Errorf("failed to find source field ID %d in schema", sourceField.SourceID())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use something like "failed to find type for source field ID" to distinguish this error from the above identical one?

@fallintoplace fallintoplace force-pushed the fix/partition-timestamp-units branch 2 times, most recently from a138222 to a138e3d Compare May 22, 2026 19:17
@fallintoplace fallintoplace requested a review from zeroshade May 22, 2026 21:43
@fallintoplace fallintoplace force-pushed the fix/partition-timestamp-units branch from 64f42d6 to f0ee48e Compare May 23, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Partitioned writes compute timestamp partitions using raw Arrow units

3 participants