Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Compute] Consolidate fill_null and coalesce #23478

Closed
asfimport opened this issue Nov 15, 2019 · 10 comments
Closed

[C++][Compute] Consolidate fill_null and coalesce #23478

asfimport opened this issue Nov 15, 2019 · 10 comments

Comments

@asfimport
Copy link

asfimport commented Nov 15, 2019

fill_null and coalesce are essentially the same kernel, except the former is binary and doesn't support an array fill value, and the latter is variadic and supports scalar and array fill values.

We should consolidate them into one kernel, picking the faster implementation.

Reporter: Ben Kietzman / @bkietz
Assignee: David Li / @lidavidm

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-7179. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
We can implement this either as a Binary or VarArgs scalar kernel

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
cc @lidavidm

@asfimport
Copy link
Author

David Li / @lidavidm:
I think this is exactly the same as Coalesce in ARROW-13136?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Uh, perhaps :-) Those SQL names are so "obvious"...

@asfimport
Copy link
Author

David Li / @lidavidm:
Comparing the two issues, the kernels look identical, so I'm going to close as duplicate.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
We actually already have a "fill_null" kernel for filling with a scalar (added in ), so this issue was about expanding that kernel to also support array fill values. That's indeed the same as coalesce. But, "fill_null" is currently a fully separate implementation (scalar_fill_null.cc), while I suppose "coalesce" might also have a specialized path for scalar fill values? That could potentially be consolidated?

@asfimport
Copy link
Author

David Li / @lidavidm:
Ah, I see, thanks Joris. It looks like coalesce has slightly more complete type support; it's also variadic instead of binary. Maybe the path here then should be to benchmark the common implementations and choose the faster one, then consolidate both into one kernel. (And maybe provide aliases from one to the other?)

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Yes, that sounds good

@asfimport
Copy link
Author

David Li / @lidavidm:
So fill_null is quite a bit faster than coalesce, as being able to assume there are only two arguments (one of which is an array) means it can just make one pass through the input and do block copies from one of the arguments. To get coalesce up to the same speed in the same cases, we'd essentially special-case this in coalesce anyways. Would it be preferable to implement that special case and drop fill_null, or keep fill_null as a convenient alias/optimized case much like we have if_else and case_when?

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 10987
#10987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants