Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DictionaryArray::try_new() to create dictionaries from pre existing arrays #1300

Merged
merged 2 commits into from
Feb 15, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 10, 2022

Draft as it builds on #1263

Which issue does this PR close?

Closes #1299

Rationale for this change

Inspired by the challenge writing tests on #1263 from @viirya

Otherwise known as "no one should have to write the following to work with DictionaryArrays":

Click to expand!
  fn get_dict_arraydata(
        keys: Buffer,
        key_type: DataType,
        value_data: ArrayData,
    ) -> ArrayData {
        let value_type = value_data.data_type().clone();
        let dict_data_type =
            DataType::Dictionary(Box::new(key_type), Box::new(value_type));
        ArrayData::builder(dict_data_type)
            .len(3)
            .add_buffer(keys)
            .add_child_data(value_data)
            .build()
            .unwrap()
    }

    #[test]
    fn test_eq_dyn_dictionary_i8_array() {
        let key_type = DataType::Int8;
        // Construct a value array
        let value_data = ArrayData::builder(DataType::Int8)
            .len(8)
            .add_buffer(Buffer::from(
                &[10_i8, 11, 12, 13, 14, 15, 16, 17].to_byte_slice(),
            ))
            .build()
            .unwrap();

        let keys1 = Buffer::from(&[2_i8, 3, 4].to_byte_slice());
        let keys2 = Buffer::from(&[2_i8, 4, 4].to_byte_slice());
        let dict_array1: DictionaryArray<Int8Type> = Int8DictionaryArray::from(
            get_dict_arraydata(keys1, key_type.clone(), value_data.clone()),
        );
        let dict_array2: DictionaryArray<Int8Type> =
            Int8DictionaryArray::from(get_dict_arraydata(keys2, key_type, value_data));

        let result = eq_dyn(&dict_array1, &dict_array2);
        assert!(result.is_ok());
        assert_eq!(result.unwrap(), BooleanArray::from(vec![true, false, true]));
    }

What changes are included in this PR?

  1. Add DictionaryArray::try_new() function
  2. Tests
  3. Doc tests
  4. Update tests from Add DictionaryArray support in eq_dyn kernel #1263 to use the new API

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 10, 2022
@alamb alamb changed the title Alamb/better dictionary array creation Add DictionaryArray::try_new() to create dictionaries from pre existing arrays Feb 10, 2022
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks. This can make writing dictionary array test much easier!

@jhorstmann
Copy link
Contributor

Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.

alamb and others added 2 commits February 13, 2022 08:36
@alamb alamb force-pushed the alamb/better-dictionary-array-creation branch from 329e77b to 9044d94 Compare February 13, 2022 13:36
@alamb
Copy link
Contributor Author

alamb commented Feb 13, 2022

Looks good. The double validation for the dictionary values is a bit unfortunate, but since this is mostly a convenience feature that should be fine.

I agree it is unfortunately

I'll see if I can make a new_unchecked() and then also minimize the checking in try_new() to only verify that all indexes are within the range of the dictionary, which Ido think is required for this to be a safe API.

@alamb
Copy link
Contributor Author

alamb commented Feb 15, 2022

Filed #1313 to track performance improvement

@alamb
Copy link
Contributor Author

alamb commented Feb 15, 2022

Thanks for the comments and review @jhorstmann and @viirya

@alamb alamb merged commit 7d46ac1 into apache:master Feb 15, 2022
@alamb alamb deleted the alamb/better-dictionary-array-creation branch February 15, 2022 20:12
@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve ergonomics to construct DictionaryArrays from Key and Value arrays
3 participants