-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
Various parts of the DataFusion codebase assume that the transformation between ScalarValue <--> Array have the same datatype. This would seem to be a reasonable assumption, however it does not hold for at least for DictionaryArrays
For example, a ScalarVaule that is converted to an array, casted to a DictionaryArray<_> due to coertion rules, and then converted back to a ScalarVaule. When that supposedly cast ScalarValue is converted back to an Array, it does not maintain its Dictionary encoding, instead it results in a DataType::Utf8
To Reproduce
fn bad_cast() {
// here is a problem with round trip casting to/from a dictionary
// array. It is desired to cast this ScalarValue to a Dictionary
// (for coertion, for example)
let scalar = ScalarValue::Utf8(Some("foo".to_string()));
let desired_type = DataType::Dictionary(
// key type
Box::new(DataType::Int32),
// value type
Box::new(DataType::Utf8)
);
// convert from scalar --> Array to call cast
let scalar_array = scalar.to_array();
// cast the actual value
let cast_array = kernels::cast::cast(&scalar_array, &desired_type).unwrap();
// turn it back to a scalar
let cast_scalar = ScalarValue::try_from_array(&cast_array, 0).unwrap();
// Some time later the "cast" scalar is turned back into an array:
let array = cast_scalar.to_array_of_size(10);
// The datatype should be "Dictionary" but is actually Utf8!!!
assert_eq!(array.data_type(), &desired_type)
}Running this function results in
left: `Utf8`,
right: `Dictionary(Int32, Utf8)`', src/main.rs:80:5
Expected behavior
Test case should pass
Additional context
I am not sure if it makes sense to add a ScalarValue::Dictionary type variant, or perhaps add a is_dictionary flag or something else, or maybe even just not assume a ScalarValue can be round tripped and maintain its data type
This is the root cause of #2873 -- I added a patch for that particular case but this problem can occur elsewhere