-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge?
File an issue to discuss about design in #10193, since it is no longer a minor change
Previously, we always provided a null array if the function supports zero args, like random(), pi(), make_array().
I think the additional null array should not provided for all the function that supports 0 args, instead handle case by case in each function. It turns out the null array is not useful either. random, pi, and uuid takes the number of rows instead of the actual null array.
We need to design an alternative way to communicate the number of rows to the function.
Proposal 1:
Add support_randomness -> bool to ScalarUDFImpl, and we provide the number of rows as the first argument for invoke.
Proposal 2:
We always provide the number of rows to invoke.
Change from
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;to
/// batch_rows is the number of rows in each batch
fn invoke(&self, args: &[ColumnarValue], batch_rows: usize) -> Result<ColumnarValue>;fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
let inputs = self
.args
.iter()
.map(|e| e.evaluate(batch))
.collect::<Result<Vec<_>>>()?;
// evaluate the function
match self.fun {
ScalarFunctionDefinition::UDF(ref fun) => fun.invoke(&inputs, batch.num_rows()),
ScalarFunctionDefinition::Name(_) => {
internal_err!(
"Name function must be resolved to one of the other variants prior to physical planning"
)
}
}
}It is more aggressive and breaks the signature, but batch is part of the evaluate(), provide information about batch and args to invoke() makes sense to me.
Proposol 3 from alamb
Introduce another function with batch rows only.
fn invoke_no_args(number_rows: usize) -> {
not_yet_impl_err!
}We need support_randomness -> bool, so we know to switch to invoke_no_args.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
I decide to go for 3, since there are already over 100 invoke() used, not worth to break the signature