Skip to content

What's the correct semantic when projecting a required nested field from an optional struct ? #2738

@openinx

Description

@openinx

Let's say we have an iceberg schema:

    Schema schema = new Schema(
        Types.NestedField.required(0, "id", Types.LongType.get()),
        Types.NestedField.optional(3, "location", Types.StructType.of(
            Types.NestedField.required(1, "lat", Types.FloatType.get()),
            Types.NestedField.required(2, "long", Types.FloatType.get())
        ))
    );

And if someone want to do the nested projection by using the project schema:

    Schema latOnly = new Schema(
        Types.NestedField.optional(3, "location", Types.StructType.of(
            Types.NestedField.required(1, "lat", Types.FloatType.get())
        ))
    );

If the data row is :

{
   "id": 10001,
   "location": null
}

Then what's the expected projected value for the project schema latOnly ? Should we set the location.lat to be null although its field are defined required in Types.NestedField.required(1, "lat", Types.FloatType.get()) ?

I think the current StructProjection did not handle this issue correctly because it will just throw a NullPointerException when projecting the nested required field while providing a null value for the parent struct.

This is related to the broken unit tests from this PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions