Skip to content

AsyncScalarUDFs break when batch_size doesn't divide number of rows #18822

@shivbhatia10

Description

@shivbhatia10

Describe the bug

When ideal_batch_size is set to a value that doesn't evenly divide the total number of rows in some data for an AsyncScalarUDFImpl, we get this error:

Error: Internal("Arguments has mixed length. Expected length: 2, found length: 1")

for example, the numbers vary depending on both parameters. This happens for example if I have 3 rows and my batch size is set to 2, then I would get two batches, one with size 2 and one with size 1. Due to a bug in the physical expression this throws an error.

To Reproduce

I wrote a test in the fix PR which fails today: https://github.com/apache/datafusion/pull/18819/files

Expected behavior

Batch size shouldn't necessarily need to divide the number of rows evenly.

Additional context

I have a fix PR, the issue was in the AsyncFuncExpr physical expression. We were calling ColumnarValue::values_to_arrays on all the batches returned asynchronously, but this method enforces that all batches need to be the same length, which isn't necessary here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions