Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Remove Scalar legacy cast implementation #35560

Closed
danepitkin opened this issue May 11, 2023 · 4 comments · Fixed by #39044
Closed

[C++] Remove Scalar legacy cast implementation #35560

danepitkin opened this issue May 11, 2023 · 4 comments · Fixed by #39044

Comments

@danepitkin
Copy link
Member

danepitkin commented May 11, 2023

Describe the enhancement requested

Let's delete the legacy implementation Scalar.CastTo() from Arrow.

Arrow has a compute kernel for cast operations that is now included in the base library[1]. PyArrow has migrated Scalar cast usage to this computer kernel[2].

[1] #34295
[2] #35395

Sub issues

Component(s)

C++, Python

@llama90
Copy link
Contributor

llama90 commented Dec 2, 2023

Hello,

I am interested in this issue and looking into it.

First, I found functions that are using the legacy Scalar.CastTo() in the code.

I have reorganized these functions to call the Cast compute kernel where tests are possible.

file name code snippet note
file_parquet.cc auto maybe_min = min->CastTo(field.type()); -
file_parquet.cc auto maybe_max = max->CastTo(field.type()); -
partition_test.cc ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(DictStr("")->type())); 1
partition_test.cc ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(dict_type)); 1
partition_test.cc ASSERT_OK_AND_ASSIGN(auto day, StringScalar("2020-06-08").CastTo(temporal));
scalar.cc auto maybe_repr = CastTo(utf8()); -
scalar.cc ARROW_ASSIGN_OR_RAISE(auto cast_value, from_.CastTo(dict_type.value_type())); -
scalar.cc return Int32Scalar(0).CastTo(dict_type.index_type()).Value(&out.index); -
scalar.h Result<std::shared_ptr<Scalar>> CastTo(std::shared_ptr<DataType> to) const; -
scalar_test.cc TimestampScalar(value, timestamp(in)).CastTo(timestamp(out)).ValueOrDie(); 2
scalar_test.cc TimestampScalar(1024, timestamp(TimeUnit::MILLI)).CastTo(utf8())); 2
scalar_test.cc TimestampScalar(1024, timestamp(TimeUnit::MILLI)).CastTo(int64())); 2
scalar_test.cc .CastTo(date64())); 2
scalar_test.cc ASSERT_OK_AND_ASSIGN(auto cast_to_other, scalar->CastTo(other_type))
scalar_test.cc ASSERT_OK_AND_ASSIGN(auto cast_from_other, other_scalar->CastTo(type))
scalar_test.cc StringScalar(std::string(repr)).CastTo(type));
scalar_test.cc ASSERT_OK_AND_ASSIGN(auto cast_to_string, scalar->CastTo(utf8()));
scalar_test.cc EXPECT_OK_AND_ASSIGN(auto cast_scalar, scalar.CastTo(to_type)); 3
scalar_test.cc scalar.CastTo(to_type)); 3
scalar_test.cc ASSERT_OK_AND_ASSIGN(auto cast_alpha, alpha->CastTo(ty)); 1
  • 1: Unsupported cast from string to dictionary using function cast_dictionary
  • 2: ValueOrDie called on an error: Invalid: Casting from timestamp[ns] to timestamp[us] would lose data: 1234
  • 3: Unsupported cast from list_view<item: int16> to list using function cast_list
  • In some functions, the same test cases may yield different errors depending on the additional cast kernel implementation.
    • This means that in some cases, the errors are indicated based on the error messages triggered by the functions executed first.

Looking at the table, the calling functions can be organized as follows:

  1. Cases where the cast operation is implemented and calling the Cast function works without additional implementation.
  2. Cases where additional implementation of the cast operation is needed.

For now, I have submitted a PR for the cases that are working happy case, and I am curious if this aligns with the intentions mentioned in the issue.

If PR aligns with the intent, I would like to proceed by separating the happy cases from those that require additional cast implementation, and I would appreciate your feedback.

@kou
Copy link
Member

kou commented Dec 2, 2023

Let's create sub issues and use separated PRs!

@llama90
Copy link
Contributor

llama90 commented Dec 3, 2023

Hello @kou

Thank you for your response.

I've created the issue excluding the happy cases and will first complete the PR for the happy cases.

I'm a bit concerned as I have never dealt with compute kernels and have not yet fully finalized the sorting for the dictionary for ChunkedArray and the other issues.

However, I've recently had more time to look into Arrow, so I will proceed step by step.

@kou
Copy link
Member

kou commented Dec 3, 2023

Thanks!
I've added sub-issues to this issue description.

@kou kou closed this as completed in #39044 Dec 6, 2023
kou pushed a commit that referenced this issue Dec 6, 2023
…9044)

### Rationale for this change

Remove legacy code

### What changes are included in this PR?

* Replace the legacy scalar `CastTo` implementation with the new cast compute kernel for supported types, without requiring additional casting logic for unsupported types.

### Are these changes tested?

Yes. It is passed by existing test cases.

### Are there any user-facing changes?

No.

* Closes: #35560

Authored-by: Hyunseok Seo <hsseo0501@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 15.0.0 milestone Dec 6, 2023
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…st (apache#39044)

### Rationale for this change

Remove legacy code

### What changes are included in this PR?

* Replace the legacy scalar `CastTo` implementation with the new cast compute kernel for supported types, without requiring additional casting logic for unsupported types.

### Are these changes tested?

Yes. It is passed by existing test cases.

### Are there any user-facing changes?

No.

* Closes: apache#35560

Authored-by: Hyunseok Seo <hsseo0501@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants