Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dyn compare for binary array #1238

Merged
merged 9 commits into from
Jan 26, 2022

Conversation

HaoYang670
Copy link
Contributor

Which issue does this PR close?

Closes #1108

Rationale for this change

What changes are included in this PR?

  1. Update the API of typed_compare, so that it supports compare 2 binary arrays
  2. Add dynamic dispatched functions to compare a binary array and a scalar
  3. Add tests for functions *_dyn_binary_scalar.
  4. DictionaryArray with binary values is not supported so far, because I could not find an easy way to build a binary dictionary array. Is there any builder such as StringDictionaryBuilder can be used for binary type?

Are there any user-facing changes?

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
… to build binary dictionary array

Signed-off-by: remzi <13716567376yh@gmail.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 25, 2022
Signed-off-by: remzi <13716567376yh@gmail.com>
@codecov-commenter
Copy link

Codecov Report

Merging #1238 (b73b3ed) into master (66e029e) will increase coverage by 0.24%.
The diff coverage is 89.75%.

❗ Current head b73b3ed differs from pull request most recent head fd97848. Consider uploading reports for the commit fd97848 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1238      +/-   ##
==========================================
+ Coverage   82.70%   82.95%   +0.24%     
==========================================
  Files         175      178       +3     
  Lines       51712    51516     -196     
==========================================
- Hits        42770    42736      -34     
+ Misses       8942     8780     -162     
Impacted Files Coverage Δ
arrow/src/alloc/mod.rs 91.42% <ø> (-1.26%) ⬇️
arrow/src/array/mod.rs 100.00% <ø> (ø)
arrow/src/buffer/immutable.rs 98.92% <ø> (ø)
arrow/src/ipc/writer.rs 83.48% <0.00%> (-0.64%) ⬇️
arrow/src/array/transform/mod.rs 84.77% <76.00%> (-0.67%) ⬇️
parquet/src/arrow/array_reader.rs 78.19% <77.77%> (+0.74%) ⬆️
...et/src/arrow/array_reader/byte_array_dictionary.rs 83.50% <83.50%> (ø)
parquet/src/arrow/array_reader/byte_array.rs 85.71% <85.71%> (-1.35%) ⬇️
arrow/src/array/data.rs 83.02% <88.00%> (+0.74%) ⬆️
arrow/src/compute/kernels/comparison.rs 91.67% <88.46%> (-0.30%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66e029e...fd97848. Read the comment docs.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me 👍

Left some minor testing suggestions

@@ -3843,6 +3960,114 @@ mod tests {
assert_eq!(neq_dyn_scalar(&array, 8).unwrap(), expected);
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could get a test with nulls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add them

let large_array = LargeBinaryArray::from_vec(
vec![b"arrow", b"datafusion", b"flight", b"parquet", &[0xff, 0xf8]],
);
let scalar = "flight".as_bytes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps one of these tests could use a non-UTF8 string as the scalar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code looks good -- thank you for the contribution @HaoYang670

I agree with @tustvold on the benefits of tests with Null and non utf-8, though that could be done in a follow on if you prefer.

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670 HaoYang670 requested a review from alamb January 26, 2022 04:17
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @HaoYang670

@alamb alamb merged commit 731e132 into apache:master Jan 26, 2022
@HaoYang670 HaoYang670 deleted the issue1108_part2_dyn_compare_binary branch January 26, 2022 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add comparison kernels for BinaryArray
4 participants