Skip to content

AsyncFuncExpr drops async UDF return field metadata #22662

@Kontinuation

Description

@Kontinuation

Describe the bug

AsyncFuncExpr rebuilds the output Field for async scalar UDFs from only the field name, data type, and nullability. This drops any metadata attached by the UDF's return_field_from_args(...).

This causes async scalar UDF result batches to lose extension metadata that is present during planning.

I found this while implementing an async UDF version of RS_FromPath in apache/sedona-db for loading raster data using GDAL:

In that case, the async UDF returned a field representing raster data with extension metadata, but the collected result batches lost that metadata, which broke downstream logic that depended on the logical type.

To Reproduce

A minimal repro is an async scalar UDF that:

  1. returns a normal Utf8 value
  2. overrides return_field_from_args(...) to attach metadata such as:
    • ARROW:extension:name = test.async.extension

Then run:

SELECT async_extension(value) AS result FROM test_table

and inspect the collected batch schema for result.

Without a fix, the field metadata is missing from the result batch schema.

Expected behavior

The async UDF result field in collected batches should preserve the metadata computed by return_field_from_args(...).

Additional context

Root cause appears to be AsyncFuncExpr::field(...) in datafusion/physical-expr/src/async_scalar_function.rs, which reconstructs a new Field instead of preserving the already planned return_field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions