Skip to content

fix(cli): resolve partition spec fields by schema name#1379

Open
fallintoplace wants to merge 3 commits into
apache:mainfrom
fallintoplace:fix/cli-partition-spec-name-resolution
Open

fix(cli): resolve partition spec fields by schema name#1379
fallintoplace wants to merge 3 commits into
apache:mainfrom
fallintoplace:fix/cli-partition-spec-name-resolution

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • Resolve CLI --partition-spec entries against the parsed table schema instead of positional field IDs.
  • Update table create path to pass schema into partition spec parser.
  • Add regression tests for correct field-id mapping and unknown field rejection.

Testing

  • go test ./cmd/iceberg -run TestParsePartitionSpec -count=1
  • go test ./cmd/iceberg -count=1

@fallintoplace fallintoplace requested a review from zeroshade as a code owner July 4, 2026 15:56

@zeroshade zeroshade left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving partition-spec parsing against the schema. I found one issue that needs to be fixed before this is safe for REST catalogs.

  • parsePartitionSpec now builds partition fields via AddPartitionFieldByName with a nil field ID. That leaves the resulting PartitionField.FieldID at the internal unassigned value, and NewPartitionSpecOpts only initializes the spec rather than assigning IDs. Non-REST catalogs may normalize this later, but REST create-table sends cfg.PartitionSpec directly in the request payload, so create table --partition-spec ... --catalog rest can send field IDs as 0 instead of 1000+. Please assign stable partition field IDs during parsing and add coverage for parsed field IDs plus the REST create payload.

Comment thread cmd/iceberg/utils.go
for _, field := range fields {
field = strings.TrimSpace(field)
if field == "" {
continue

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing nil here leaves each generated PartitionField.FieldID at the internal unassigned value (0). NewPartitionSpecOpts validates/initializes the spec but does not assign partition field IDs, and REST create-table serializes this spec directly in the request payload. Please assign IDs while parsing (for example iceberg.PartitionDataIDStart + len(opts)) and add assertions that parsed fields receive 1000, 1001, ... plus a REST create-payload regression test.

@tanmayrauth tanmayrauth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gree with the field-id concern, confirmed the spec goes out with field-id: 0 (and it's a duplicate across fields). One test-gap note too. Details inline.

Comment thread cmd/iceberg/utils.go Outdated
Name: field,
Transform: iceberg.IdentityTransform{},
})
opts = append(opts, iceberg.AddPartitionFieldByName(field, field, iceberg.IdentityTransform{}, schema, nil))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iceberg.AddPartitionFieldByName(field, field, iceberg.IdentityTransform{}, schema, nil) — the nil field ID is the problem. NewPartitionSpecOpts only runs initialize(), never assignPartitionFieldIds, so every field keeps the
unassigned value (0). rest.go serializes cfg.PartitionSpec directly into the create-table payload, so the server gets field-id: 0 for all of them — which is also a duplicate and invalid.

Assign a stable ID here instead, e.g.:

  id := iceberg.PartitionDataIDStart + len(opts)
  opts = append(opts, iceberg.AddPartitionFieldByName(field, field, iceberg.IdentityTransform{}, schema, &id))

Since empty entries are continued before the append, len(opts) gives dense 1000, 1001, … even for input like a,,b (the old i + PartitionDataIDStart would've left a gap there — small bonus).

Comment thread cmd/iceberg/utils_test.go
}
require.Equal(t, len(tt.wantSourceIDs), got.NumFields())
for i, wantIDs := range tt.wantSourceIDs {
require.Equal(t, wantIDs, got.Field(i).SourceIDs)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.Equal(t, wantIDs, got.Field(i).SourceIDs) — the tests only assert SourceIDs, never FieldID, which is why they pass with field-id: 0 still going out. Could you add a FieldID assertion (1000, 1001, …) and a REST create-payload regression test that marshals the spec and checks field-id isn't 0? That locks in the fix above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants