Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Add Compute Kernel for Casting from String to Dictionary #39049

Closed
llama90 opened this issue Dec 3, 2023 · 1 comment · Fixed by #39362
Closed

[C++] Add Compute Kernel for Casting from String to Dictionary #39049

llama90 opened this issue Dec 3, 2023 · 1 comment · Fixed by #39362

Comments

@llama90
Copy link
Contributor

llama90 commented Dec 3, 2023

Describe the enhancement requested

This is a sub-issue of the issue mentioned below.

As we can see in the parent issue, we can find tables organizing the related functions.

file name code snippet
partition_test.cc ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(DictStr("")->type()));
partition_test.cc ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(dict_type));
scalar_test.cc ASSERT_OK_AND_ASSIGN(auto cast_alpha, alpha->CastTo(ty));

The functions result in "Unsupported cast from string to dictionary using function cast_dictionary" error.

We need to implement casting from String to Dictionary.

partition_test.cc

Snippet to reproduce

TEST_F(TestPartitioning, DirectoryPartitioningFormatDictionary) {

  ...

- ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(DictStr("")->type()));
+ ASSERT_OK_AND_ASSIGN(auto dict_hello, Cast(MakeScalar("hello"), DictStr("")->type()));
}

TEST_F(TestPartitioning, DirectoryPartitioningFormatDictionaryCustomIndex) {

  ...

- ASSERT_OK_AND_ASSIGN(auto dict_hello, MakeScalar("hello")->CastTo(dict_type));
+ ASSERT_OK_AND_ASSIGN(auto dict_hello, Cast(MakeScalar("hello"), dict_type));

scalar_test.cc

Snippet to reproduce

TEST(TestDictionaryScalar, Cast) {

  ...

      // Cast string to dict(..., string)
-     ASSERT_OK_AND_ASSIGN(auto cast_alpha, alpha->CastTo(ty));
-     ASSERT_OK(cast_alpha->ValidateFull());
+     ASSERT_OK_AND_ASSIGN(auto cast_alpha, Cast(alpha, ty));
+     const auto& scalar = cast_alpha.scalar();
+     ASSERT_OK(scalar->ValidateFull());
      ASSERT_OK_AND_ASSIGN(
          auto roundtripped_alpha,
-         checked_cast<const DictionaryScalar&>(*cast_alpha).GetEncodedValue());
+         checked_cast<const DictionaryScalar&>(*scalar).GetEncodedValue());

      ...

      // dictionaries differ, though encoded values are identical
-     ASSERT_FALSE(alpha_dict.Equals(*cast_alpha));
+     ASSERT_FALSE(alpha_dict.Equals(*scalar));

Component(s)

C++

@llama90
Copy link
Contributor Author

llama90 commented Dec 4, 2023

Progress Update

What is the casting from String to Dictionary?

TEST(TestDictionaryScalar, Cast) {
for (auto index_ty : all_dictionary_index_types()) {
auto ty = dictionary(index_ty, utf8());
auto dict = checked_pointer_cast<StringArray>(
ArrayFromJSON(utf8(), R"(["alpha", null, "gamma"])"));
for (int64_t i = 0; i < dict->length(); ++i) {
auto alpha =
dict->IsValid(i) ? MakeScalar(dict->GetString(i)) : MakeNullScalar(utf8());
// Cast string to dict(..., string)
ASSERT_OK_AND_ASSIGN(auto cast_alpha, alpha->CastTo(ty));
ASSERT_OK(cast_alpha->ValidateFull());
ASSERT_OK_AND_ASSIGN(
auto roundtripped_alpha,
checked_cast<const DictionaryScalar&>(*cast_alpha).GetEncodedValue());
ASSERT_OK_AND_ASSIGN(auto i_scalar, MakeScalar(index_ty, i));
auto alpha_dict = DictionaryScalar({i_scalar, dict}, ty);
ASSERT_OK(alpha_dict.ValidateFull());
ASSERT_OK_AND_ASSIGN(
auto encoded_alpha,
checked_cast<const DictionaryScalar&>(alpha_dict).GetEncodedValue());
AssertScalarsEqual(*alpha, *roundtripped_alpha);
AssertScalarsEqual(*encoded_alpha, *roundtripped_alpha);
// dictionaries differ, though encoded values are identical
ASSERT_FALSE(alpha_dict.Equals(*cast_alpha));
}
}
}

When printing casted value (cast_alpha->ToString())

  • "alpha" -> [ "alpha" ][0]
  • null -> null
  • "gamma" -> [ "gamma" ][0]

Legacy CastTo logic

arrow/cpp/src/arrow/scalar.cc

Lines 1264 to 1269 in f7947cc

Status Visit(const DictionaryType& dict_type) {
auto& out = checked_cast<DictionaryScalar*>(out_)->value;
ARROW_ASSIGN_OR_RAISE(auto cast_value, from_.CastTo(dict_type.value_type()));
ARROW_ASSIGN_OR_RAISE(out.dictionary, MakeArrayFromScalar(*cast_value, 1));
return Int32Scalar(0).CastTo(dict_type.index_type()).Value(&out.index);
}

@kou kou closed this as completed in #39362 Jan 6, 2024
kou pushed a commit that referenced this issue Jan 6, 2024
…in test (#39362)

### Rationale for this change

Remove legacy code

### What changes are included in this PR?

Replace the legacy scalar CastTo implementation for Dictionary Scalar in test.

### Are these changes tested?

Yes. It is passed by existing test cases.

### Are there any user-facing changes?

No.

* Closes: #39049

Authored-by: Hyunseok Seo <hsseo0501@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 15.0.0 milestone Jan 6, 2024
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…calar in test (apache#39362)

### Rationale for this change

Remove legacy code

### What changes are included in this PR?

Replace the legacy scalar CastTo implementation for Dictionary Scalar in test.

### Are these changes tested?

Yes. It is passed by existing test cases.

### Are there any user-facing changes?

No.

* Closes: apache#39049

Authored-by: Hyunseok Seo <hsseo0501@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…calar in test (apache#39362)

### Rationale for this change

Remove legacy code

### What changes are included in this PR?

Replace the legacy scalar CastTo implementation for Dictionary Scalar in test.

### Are these changes tested?

Yes. It is passed by existing test cases.

### Are there any user-facing changes?

No.

* Closes: apache#39049

Authored-by: Hyunseok Seo <hsseo0501@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…calar in test (apache#39362)

### Rationale for this change

Remove legacy code

### What changes are included in this PR?

Replace the legacy scalar CastTo implementation for Dictionary Scalar in test.

### Are these changes tested?

Yes. It is passed by existing test cases.

### Are there any user-facing changes?

No.

* Closes: apache#39049

Authored-by: Hyunseok Seo <hsseo0501@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants