Skip to content

Round trips through ScalarValue's sometimes don't preserve types (e.g. change types from DictionaryArray) #2874

@alamb

Description

@alamb

Describe the bug

Various parts of the DataFusion codebase assume that the transformation between ScalarValue <--> Array have the same datatype. This would seem to be a reasonable assumption, however it does not hold for at least for DictionaryArrays

For example, a ScalarVaule that is converted to an array, casted to a DictionaryArray<_> due to coertion rules, and then converted back to a ScalarVaule. When that supposedly cast ScalarValue is converted back to an Array, it does not maintain its Dictionary encoding, instead it results in a DataType::Utf8

To Reproduce

fn bad_cast() {
    // here is a problem with round trip casting to/from a dictionary
    // array. It is desired to cast this ScalarValue to a Dictionary
    // (for coertion, for example)
    let scalar = ScalarValue::Utf8(Some("foo".to_string()));

    let desired_type = DataType::Dictionary(
        // key type
        Box::new(DataType::Int32),
        // value type
        Box::new(DataType::Utf8)
    );

    // convert from scalar --> Array to call cast
    let scalar_array = scalar.to_array();
    // cast the actual value
    let cast_array = kernels::cast::cast(&scalar_array, &desired_type).unwrap();
    // turn it back to a scalar
    let cast_scalar = ScalarValue::try_from_array(&cast_array, 0).unwrap();

    // Some time later the "cast" scalar is turned back into an array:
    let array = cast_scalar.to_array_of_size(10);

    // The datatype should be "Dictionary" but is actually Utf8!!!
    assert_eq!(array.data_type(), &desired_type)
}

Running this function results in

  left: `Utf8`,
 right: `Dictionary(Int32, Utf8)`', src/main.rs:80:5

Expected behavior
Test case should pass

Additional context
I am not sure if it makes sense to add a ScalarValue::Dictionary type variant, or perhaps add a is_dictionary flag or something else, or maybe even just not assume a ScalarValue can be round tripped and maintain its data type

This is the root cause of #2873 -- I added a patch for that particular case but this problem can occur elsewhere

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions