Skip to content

Python: Handle optional fields in Avro conversion#5796

Merged
Fokko merged 1 commit intoapache:masterfrom
joshuarobinson:bugfix_avro_optionals
Sep 20, 2022
Merged

Python: Handle optional fields in Avro conversion#5796
Fokko merged 1 commit intoapache:masterfrom
joshuarobinson:bugfix_avro_optionals

Conversation

@joshuarobinson
Copy link
Contributor

Found by processing fields in manifestentry with empty split_offsets field.

For pos_to_dict, check if values is None before processing as list or dict. Add two unit tests to verify.

Thanks to @Fokko for the fix.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks @joshuarobinson

@Fokko Fokko changed the title Python Avro: Handle optional fields in conversion. Python: Handle optional fields in Avro conversion Sep 19, 2022
def _(list_type: ListType, values: List[Any]) -> Any:
"""In the case of a list, we'll go over the elements in the list to handle complex types"""
return [_convert_pos_to_dict(list_type.element_type, value) for value in values]
return [_convert_pos_to_dict(list_type.element_type, value) for value in values] if values is not None else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there the same problem when calling values.get(pos) in the method for StructType?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Looks like we need another guard there as well:

def test_null_struct_convert_pos_to_dict():
    data = _convert_pos_to_dict(
        Schema(
            NestedField(
                name="field",
                field_id=1,
                field_type=StructType(
                    NestedField(2, "required_field", StringType(), True),
                    NestedField(3, "optional_field", IntegerType())
                ),
                required=False
            )
        ),
        AvroStruct([None]),
    )
    assert data["field"] is None

Raises AttributeError:

>   return {field.name: _convert_pos_to_dict(field.field_type, values.get(pos)) for pos, field in enumerate(struct_type.fields)}
E   AttributeError: 'NoneType' object has no attribute 'get'

pyiceberg/manifest.py:176: AttributeError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated PR with the StructType fix as well, thanks for the catch

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, but I think there's still a possibility that this would happen in the method that handles structs.

Found by processing fields in manifestentry with empty split_offsets field.

For pos_to_dict, check if values is None before processing as list,
struct, or dict. Added unit tests to verify.

Thanks to @Fokko for the fix.
@Fokko Fokko merged commit 53e056a into apache:master Sep 20, 2022
@Fokko
Copy link
Contributor

Fokko commented Sep 20, 2022

Thanks for fixing this @joshuarobinson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants