-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
When ideal_batch_size is set to a value that doesn't evenly divide the total number of rows in some data for an AsyncScalarUDFImpl, we get this error:
Error: Internal("Arguments has mixed length. Expected length: 2, found length: 1")
for example, the numbers vary depending on both parameters. This happens for example if I have 3 rows and my batch size is set to 2, then I would get two batches, one with size 2 and one with size 1. Due to a bug in the physical expression this throws an error.
To Reproduce
I wrote a test in the fix PR which fails today: https://github.com/apache/datafusion/pull/18819/files
Expected behavior
Batch size shouldn't necessarily need to divide the number of rows evenly.
Additional context
I have a fix PR, the issue was in the AsyncFuncExpr physical expression. We were calling ColumnarValue::values_to_arrays on all the batches returned asynchronously, but this method enforces that all batches need to be the same length, which isn't necessary here.