Skip to content

Conversation

@felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Apr 13, 2023

Rationale for this change

To make run_end_encode and run_end_decode usable in all contexts, they should gracefully handle Scalar inputs.

What changes are included in this PR?

  • Make run_end_encode turn a regular Scalar into a RUN_END_ENCODED array.
  • Make run_end_decode turn a RUN_END_ENCODED

Are these changes tested?

Yes, by unit tests.

Are there any user-facing changes?

No.

@github-actions
Copy link

@felipecrv
Copy link
Contributor Author

@jorisvandenbossche @benibus @zeroshade

@westonpace is it OK/expected for/from vector functions to handle scalar inputs and always return an array? Because that's what I'm doing here.

@westonpace
Copy link
Member

As far as I know, vector functions aren't used anywhere internally beyond CallFunction, so if it runs, then I think it's ok.

@jorisvandenbossche
Copy link
Member

Checking some other vector functions passing them a scalar, there are that raise a NotImplementedError (eg sort_indices, rank) or that return an array (eg replace_with_mask, cumulative_sum, indices_nonzero).
So returning an array seems fine.

@felipecrv
Copy link
Contributor Author

@jorisvandenbossche I'm now a bit unsure about this PR because in the context I tried to use this implementation, I would need to return an array expanded from the Scalar that had length equal to the other columns in the ExecBatch.

Think SELECT run_end_encode(3) + col FROM ...

@westonpace
Copy link
Member

Perhaps a related question. Why are run end encode/decode vector functions in the first place? Wouldn't they be scalar functions?

@felipecrv
Copy link
Contributor Author

Perhaps a related question. Why are run end encode/decode vector functions in the first place? Wouldn't they be scalar functions?

I just continued the work of Tobias and didn't put much thinking into what they should have been. I don't see any reason for them to not be scalar functions.

@jorisvandenbossche
Copy link
Member

While they follow the typical requirements of a scalar function of same shape for input / output, the kernel is not strictly speaking scalar because it's not just independently element-wise (i.e. the order of the values of the input array impact the exact result array, although you could say that the exact run_ends/values are an implementation detail, as long as they still represent "logically" the same values).
I don't know if this internal/theoretical detail is enough to not consider it as a scalar function, as in practice, I assume you can use this function in most places where the kernel is restricted to be scalar? (eg in projections)

@westonpace
Copy link
Member

Fair points. I don't think I know the function registry well enough to answer.

@felipecrv
Copy link
Contributor Author

@jorisvandenbossche I think the requirements you listed make a good case for this not being a scalar function. I will close the PR because of that and my failed attempt in using this as a function that takes a scalar.

@felipecrv felipecrv closed this Apr 24, 2023
@felipecrv felipecrv deleted the run_end_encode_scalar branch April 24, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++] Handle scalars in run_end_encode and run_end_decode kernels

3 participants