Skip to content

array functions (remove, resize, extract, replace, repeat) not properly handling nulls for index arguments #22507

@Jefffrey

Description

@Jefffrey

Describe the bug

resize:

let count = count_array.value(row_index).to_usize().ok_or_else(|| {
internal_datafusion_err!("array_resize: failed to convert size to usize")
})?;

  • We aren't checking for nulls

extract:

let index = adjusted_array_index::<O>(indexes.value(row_index), len)?;

  • Same

replace_n:

let ScalarValue::Int64(Some(n)) = scalar_max else {
// null max means no replacements
return Ok(ColumnarValue::Array(list_array));
};
let result = array_replace_with_scalar_args(
&list_array,
scalar_from,
scalar_to,
*n,
)?;
Ok(ColumnarValue::Array(result))
}
(from_arg, to_arg, max_arg) => {
let from_array = from_arg.to_array(num_rows)?;
let to_array = to_arg.to_array(num_rows)?;
let max_array = max_arg.to_array(num_rows)?;
let max_array = as_int64_array(&max_array)?;
let arr_n = (0..max_array.len())
.map(|i| {
if max_array.is_null(i) {
0
} else {
max_array.value(i)
}
})
.collect::<Vec<_>>();
let result =
array_replace_internal(&list_array, &from_array, &to_array, &arr_n)?;

  • We fixed this in perf: optimize array_replace for scalar needle #22387 to equate nulls to be zero, which is what was happening implicitly (since nulls usually have 0 value in the underlying array anyway) but we should probably have proper null semantics here (i.e. make the output null)

remove_n:

fn array_remove_n_inner(args: &[ArrayRef]) -> Result<ArrayRef> {
let [array, element, max] = take_function_args("array_remove_n", args)?;
let arr_n = as_int64_array(max)?.values().to_vec();
array_remove_internal(array, element, &arr_n)
}

repeat:

/// Helper function to get count from count_array at given index
/// Return 0 for null values or non-positive count.
#[inline]
fn get_count_with_validity(count_array: &Int64Array, idx: usize) -> usize {
if count_array.is_null(idx) {
0
} else {
let c = count_array.value(idx);
if c > 0 { c as usize } else { 0 }
}
}

  • Similar to above, it explicitly handles as 0 but other systems like DuckDB have proper null semantics here

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions