Skip to content

Conversation

@HaoYang670
Copy link
Contributor

Which issue does this PR close?

Closes #3300.

Rationale for this change

What changes are included in this PR?

  1. support binary type in InList and InSet
  2. some clean-up of the utf-8 code.

Are there any user-facing changes?

Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
Signed-off-by: remzi <13716567376yh@gmail.com>
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Sep 1, 2022
@andygrove
Copy link
Member

test failure caused by #3327

@codecov-commenter
Copy link

Codecov Report

Merging #3324 (f45c6ce) into master (3956fc2) will increase coverage by 0.00%.
The diff coverage is 86.51%.

@@           Coverage Diff           @@
##           master    #3324   +/-   ##
=======================================
  Coverage   85.82%   85.83%           
=======================================
  Files         294      294           
  Lines       53772    53833   +61     
=======================================
+ Hits        46148    46205   +57     
- Misses       7624     7628    +4     
Impacted Files Coverage Δ
...atafusion/physical-expr/src/expressions/in_list.rs 84.79% <86.51%> (+0.44%) ⬆️
datafusion/expr/src/logical_plan/plan.rs 78.22% <0.00%> (-0.17%) ⬇️
datafusion/common/src/scalar.rs 84.78% <0.00%> (+0.13%) ⬆️
datafusion/expr/src/literal.rs 93.33% <0.00%> (+6.66%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ozgrakkurt
Copy link
Contributor

@alamb could you merge this when you have time?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me -- thank you @HaoYang670

cc @liukun4515 and @Ted-Jiang who I believe have worked on this code and may know if there is any benchmark we should run

/// TODO: add switch codeGen in In_List
static OPTIMIZER_INSET_THRESHOLD: usize = 30;

macro_rules! compare_op_scalar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Comment on lines +374 to +375
.collect::<Vec<_>>();
let native_set: HashSet<&[u8]> = HashSet::from_iter(native_array);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you skip this intermediate Vec and instead go directly to HashSet?

Like .collect::<HashSet<_>>() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this in a follow-up patch as we have this problem in both binary and string

datatype => unimplemented!("Unexpected type {} for InList", datatype),
},
ColumnarValue::Array(_) => {
unimplemented!("InList does not yet support nested columns.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case is when one of the arguments to InList is a column (rather than a constant)

So like col1 IN ('1', '2', col2) type thing

(no change needed I was just trying to clarify)

@alamb alamb merged commit 61602f0 into apache:master Sep 4, 2022
@alamb
Copy link
Contributor

alamb commented Sep 4, 2022

Thanks again @HaoYang670

@ursabot
Copy link

ursabot commented Sep 4, 2022

Benchmark runs are scheduled for baseline = 786c319 and contender = 61602f0. 61602f0 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@HaoYang670 HaoYang670 deleted the 3300_inlist_support_binary branch September 4, 2022 12:02
kmitchener pushed a commit to kmitchener/arrow-datafusion that referenced this pull request Sep 4, 2022
* add binary support

Signed-off-by: remzi <13716567376yh@gmail.com>

* support inset and add tests

Signed-off-by: remzi <13716567376yh@gmail.com>

* clean

Signed-off-by: remzi <13716567376yh@gmail.com>

* fmt

Signed-off-by: remzi <13716567376yh@gmail.com>

Signed-off-by: remzi <13716567376yh@gmail.com>
&schema
);

// expression: "a in ([1, 2, 3], [4, 5, 6], null)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment should be

// expression: "a not in ([1, 2, 3], [4, 5, 6], null)"

cc @HaoYang670

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Op, my mistake. Sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind.
You can fix it in the next pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support binary types in InList expression

7 participants