Skip to content

avro schema writer does not sanitize field names that violate avro naming rules #2535

@SreeramGarlapati

Description

@SreeramGarlapati

problem

when writing manifest files, iceberg-rust copies iceberg field names verbatim into avro record field names. the avro spec requires names to match [A-Za-z_][A-Za-z0-9_]*. partition field names that start with digits (e.g. hash-derived names like 815d3b5701b94c78884835c1bea174bb_day) produce invalid avro schemas.

java handles this correctly in TypeToSchema.java:

String origFieldName = structField.name();
boolean isValidFieldName = AvroSchemaUtil.validAvroName(origFieldName);
String fieldName = isValidFieldName ? origFieldName : AvroSchemaUtil.sanitize(origFieldName);
Schema.Field field = new Schema.Field(fieldName, ...);
if (!isValidFieldName) {
  field.addProp(AvroSchemaUtil.ICEBERG_FIELD_NAME_PROP, origFieldName);
}

sanitization rules (from AvroSchemaUtil.sanitize()):

  • leading digit → prefix with _ (e.g. 9col_9col)
  • special chars → _x<hex> (e.g. a.ba_x2Eb)

the original name is preserved in the iceberg-field-name avro field property.

relevant code

crates/iceberg/src/avro/schema.rsschema_to_avro_schema uses field.name.clone() directly as the avro field name without validation or sanitization.

impact

  • manifests written by iceberg-rust with digit-leading partition field names are invalid avro
  • other engines (spark, trino, flink) using strict avro readers will reject these manifests
  • any table using hash-based partition transforms can produce such names

expected behavior

implement the same sanitize-on-write protocol as java:

  1. validate name against [A-Za-z_][A-Za-z0-9_]*
  2. if invalid, sanitize and store original in iceberg-field-name property
  3. always store field-id property (already done)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions