Describe the bug
AsyncFuncExpr rebuilds the output Field for async scalar UDFs from only the field name, data type, and nullability. This drops any metadata attached by the UDF's return_field_from_args(...).
This causes async scalar UDF result batches to lose extension metadata that is present during planning.
I found this while implementing an async UDF version of RS_FromPath in apache/sedona-db for loading raster data using GDAL:
In that case, the async UDF returned a field representing raster data with extension metadata, but the collected result batches lost that metadata, which broke downstream logic that depended on the logical type.
To Reproduce
A minimal repro is an async scalar UDF that:
- returns a normal
Utf8 value
- overrides
return_field_from_args(...) to attach metadata such as:
ARROW:extension:name = test.async.extension
Then run:
SELECT async_extension(value) AS result FROM test_table
and inspect the collected batch schema for result.
Without a fix, the field metadata is missing from the result batch schema.
Expected behavior
The async UDF result field in collected batches should preserve the metadata computed by return_field_from_args(...).
Additional context
Root cause appears to be AsyncFuncExpr::field(...) in datafusion/physical-expr/src/async_scalar_function.rs, which reconstructs a new Field instead of preserving the already planned return_field.
Describe the bug
AsyncFuncExprrebuilds the outputFieldfor async scalar UDFs from only the field name, data type, and nullability. This drops any metadata attached by the UDF'sreturn_field_from_args(...).This causes async scalar UDF result batches to lose extension metadata that is present during planning.
I found this while implementing an async UDF version of
RS_FromPathinapache/sedona-dbfor loading raster data using GDAL:In that case, the async UDF returned a field representing raster data with extension metadata, but the collected result batches lost that metadata, which broke downstream logic that depended on the logical type.
To Reproduce
A minimal repro is an async scalar UDF that:
Utf8valuereturn_field_from_args(...)to attach metadata such as:ARROW:extension:name = test.async.extensionThen run:
and inspect the collected batch schema for
result.Without a fix, the field metadata is missing from the result batch schema.
Expected behavior
The async UDF result field in collected batches should preserve the metadata computed by
return_field_from_args(...).Additional context
Root cause appears to be
AsyncFuncExpr::field(...)indatafusion/physical-expr/src/async_scalar_function.rs, which reconstructs a newFieldinstead of preserving the already plannedreturn_field.