fix(cli): resolve partition spec fields by schema name#1379
fix(cli): resolve partition spec fields by schema name#1379fallintoplace wants to merge 3 commits into
Conversation
zeroshade
left a comment
There was a problem hiding this comment.
Thanks for improving partition-spec parsing against the schema. I found one issue that needs to be fixed before this is safe for REST catalogs.
parsePartitionSpecnow builds partition fields viaAddPartitionFieldByNamewith a nil field ID. That leaves the resultingPartitionField.FieldIDat the internal unassigned value, andNewPartitionSpecOptsonly initializes the spec rather than assigning IDs. Non-REST catalogs may normalize this later, but REST create-table sendscfg.PartitionSpecdirectly in the request payload, socreate table --partition-spec ... --catalog restcan send field IDs as0instead of1000+. Please assign stable partition field IDs during parsing and add coverage for parsed field IDs plus the REST create payload.
| for _, field := range fields { | ||
| field = strings.TrimSpace(field) | ||
| if field == "" { | ||
| continue |
There was a problem hiding this comment.
Passing nil here leaves each generated PartitionField.FieldID at the internal unassigned value (0). NewPartitionSpecOpts validates/initializes the spec but does not assign partition field IDs, and REST create-table serializes this spec directly in the request payload. Please assign IDs while parsing (for example iceberg.PartitionDataIDStart + len(opts)) and add assertions that parsed fields receive 1000, 1001, ... plus a REST create-payload regression test.
tanmayrauth
left a comment
There was a problem hiding this comment.
gree with the field-id concern, confirmed the spec goes out with field-id: 0 (and it's a duplicate across fields). One test-gap note too. Details inline.
| Name: field, | ||
| Transform: iceberg.IdentityTransform{}, | ||
| }) | ||
| opts = append(opts, iceberg.AddPartitionFieldByName(field, field, iceberg.IdentityTransform{}, schema, nil)) |
There was a problem hiding this comment.
iceberg.AddPartitionFieldByName(field, field, iceberg.IdentityTransform{}, schema, nil) — the nil field ID is the problem. NewPartitionSpecOpts only runs initialize(), never assignPartitionFieldIds, so every field keeps the
unassigned value (0). rest.go serializes cfg.PartitionSpec directly into the create-table payload, so the server gets field-id: 0 for all of them — which is also a duplicate and invalid.
Assign a stable ID here instead, e.g.:
id := iceberg.PartitionDataIDStart + len(opts)
opts = append(opts, iceberg.AddPartitionFieldByName(field, field, iceberg.IdentityTransform{}, schema, &id))
Since empty entries are continued before the append, len(opts) gives dense 1000, 1001, … even for input like a,,b (the old i + PartitionDataIDStart would've left a gap there — small bonus).
| } | ||
| require.Equal(t, len(tt.wantSourceIDs), got.NumFields()) | ||
| for i, wantIDs := range tt.wantSourceIDs { | ||
| require.Equal(t, wantIDs, got.Field(i).SourceIDs) |
There was a problem hiding this comment.
require.Equal(t, wantIDs, got.Field(i).SourceIDs) — the tests only assert SourceIDs, never FieldID, which is why they pass with field-id: 0 still going out. Could you add a FieldID assertion (1000, 1001, …) and a REST create-payload regression test that marshals the spec and checks field-id isn't 0? That locks in the fix above.
Summary
Testing