Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] R tests fail with "NotImplemented: construction from scalar of type numeric(0)" #37091

Closed
felipecrv opened this issue Aug 9, 2023 · 13 comments

Comments

@felipecrv
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

R tests failing with

Start test: ExtensionType scalar behaviour
  'test-scalar.R:49:3' [success]
  'test-scalar.R:50:3' [success]
  'test-scalar.R:51:3' [success]
/arrow/cpp/src/arrow/result.cc:28: ValueOrDie called on an error: NotImplemented: construction from scalar of type numeric(0)
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util7CerrLog14PrintBackTraceEv+0x39)[0x7fa51a3ea467]
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util7CerrLogD1Ev+0x5f)[0x7fa51a3ea3d7]
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util7CerrLogD0Ev+0x1c)[0x7fa51a3ea3fc]
/usr/local/lib/libarrow.so.1300(_ZN5arrow4util8ArrowLogD1Ev+0x4b)[0x7fa51a3ea20f]
/usr/local/lib/libarrow.so.1300(_ZN5arrow8internal14DieWithMessageERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x5c)[0x7fa51a1390f0]
/usr/local/lib/libarrow.so.1300(_ZN5arrow8internal17InvalidValueOrDieERKNS_6StatusE+0x88)[0x7fa51a1391b4]
/usr/local/lib/libparquet.so.1300(_ZNO5arrow6ResultISt10shared_ptrINS_5ArrayEEE10ValueOrDieEv+0x4b)[0x7fa51dfb5b95]
/usr/local/lib/libarrow.so.1300(_ZNO5arrow6ResultISt10shared_ptrINS_5ArrayEEEdeEv+0x41)[0x7fa51a1a2309]
/usr/local/lib/libarrow.so.1300(_ZNK5arrow6Scalar8ToStringB5cxx11Ev+0x296)[0x7fa51a1446f0]
/arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so(_Z16Scalar__ToStringB5cxx11RKSt10shared_ptrIN5arrow6ScalarEE+0x25)[0x7fa51ff6efa5]
/arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so(_arrow_Scalar__ToString+0x72)[0x7fa51fe4cc12]
/lib/libR.so(+0xfabce)[0x7fa529797bce]
/lib/libR.so(+0x13dde9)[0x7fa5297dade9]

Example: https://github.com/apache/arrow/actions/runs/5803060349/job/15730480958?pr=35345

Component(s)

R

@felipecrv
Copy link
Contributor Author

@kou @pitrou this could be related to my changes, but seems unrelated.

@thisisnic
Copy link
Member

For more context, the body of the failing test (with extraneous tests truncated out) in R is:

ext_array <- vctrs_extension_array(4)
ext_scalar <- scalar(ext_array)
expect_output(print(ext_scalar), "Scalar\n4")

Which converts an extension array to a scalar and then converts that to a string.

@thisisnic
Copy link
Member

I'll have a look, but also pinging @paleolimbot who knows more about our extension type implementation than I do

@paleolimbot
Copy link
Member

paleolimbot commented Aug 9, 2023

This particular test was added because before #15277 a segfault occurred when attempting to print an ExtensionScalar from R. I forget the details of why the original segfault occurred, but the fix we went with is:

arrow/r/src/scalar.cpp

Lines 60 to 73 in d3ccc83

// [[arrow::export]]
std::shared_ptr<arrow::Array> MakeArrayFromScalar(
const std::shared_ptr<arrow::Scalar>& scalar, int n) {
if (scalar->type->id() == arrow::Type::EXTENSION) {
auto extension_scalar = std::dynamic_pointer_cast<arrow::ExtensionScalar>(scalar);
auto type = std::dynamic_pointer_cast<arrow::ExtensionType>(scalar->type);
auto storage_type = type->storage_type();
auto storage = ValueOrStop(
arrow::MakeArrayFromScalar(*extension_scalar->value, n, gc_memory_pool()));
return type->WrapArray(type, storage);
} else {
return ValueOrStop(arrow::MakeArrayFromScalar(*scalar, n, gc_memory_pool()));
}
}

@felipecrv are there some PRs I could take a look at to look for changes that may have affected this bit?

@thisisnic
Copy link
Member

@paleolimbot #35344

@thisisnic
Copy link
Member

And this is where the error is raised:

Status Visit(const ExtensionType& type) {
return Status::NotImplemented("construction from scalar of type ", *scalar_.type);
}

@paleolimbot
Copy link
Member

Hmm...when I build Arrow off of the PR branch I get

✖ | 5     389 | data-type                                                                           
────────────────────────────────────────────────────────────────────────────────────────────────────
Failure (test-data-type.R:292:3): duration types work as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 34
`expected`: 33
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$DURATION) at test-data-type.R:292:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:303:3): duration types work as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 34
`expected`: 33
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$DURATION) at test-data-type.R:303:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:380:3): map type works as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 31
`expected`: 30
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$MAP) at test-data-type.R:380:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:430:3): struct type works as expected
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 27
`expected`: 26
Backtrace:
    ▆
 1. └─arrow:::expect_equal(x$id, Type$STRUCT) at test-data-type.R:430:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

Failure (test-data-type.R:458:3): DictionaryType works as expected (ARROW-3355)
`object` (`actual`) not equal to `expected` (`expected`).

  `actual`: 30
`expected`: 29
Backtrace:
    ▆
 1. └─arrow:::expect_equal(d$id, Type$DICTIONARY) at test-data-type.R:458:2
 2.   └─testthat::expect_equal(...) at tests/testthat/helper-expectation.R:43:4

...so I wonder if the type enum IDs changed and need to be synced with

arrow/r/R/enums.R

Lines 44 to 82 in 7c8f398

Type <- enum("Type::type",
"NA" = 0L,
BOOL = 1L,
UINT8 = 2L,
INT8 = 3L,
UINT16 = 4L,
INT16 = 5L,
UINT32 = 6L,
INT32 = 7L,
UINT64 = 8L,
INT64 = 9L,
HALF_FLOAT = 10L,
FLOAT = 11L,
DOUBLE = 12L,
STRING = 13L,
BINARY = 14L,
FIXED_SIZE_BINARY = 15L,
DATE32 = 16L,
DATE64 = 17L,
TIMESTAMP = 18L,
TIME32 = 19L,
TIME64 = 20L,
INTERVAL_MONTHS = 21L,
INTERVAL_DAY_TIME = 22L,
DECIMAL128 = 23L,
DECIMAL256 = 24L,
LIST = 25L,
STRUCT = 26L,
SPARSE_UNION = 27L,
DENSE_UNION = 28L,
DICTIONARY = 29L,
MAP = 30L,
EXTENSION = 31L,
FIXED_SIZE_LIST = 32L,
DURATION = 33L,
LARGE_STRING = 34L,
LARGE_BINARY = 35L,
LARGE_LIST = 36L
)

I'm not exactly sure how the wrong type ID would end up there, but if the wrong type ID was at

arrow/r/src/scalar.cpp

Lines 60 to 63 in d3ccc83

// [[arrow::export]]
std::shared_ptr<arrow::Array> MakeArrayFromScalar(
const std::shared_ptr<arrow::Scalar>& scalar, int n) {
if (scalar->type->id() == arrow::Type::EXTENSION) {

...it would crash because casting an extension array to utf-8 is not implemented (this was the error that #15277 fixed).

@thisisnic
Copy link
Member

thisisnic commented Aug 9, 2023

If I add LIST_VIEW = 26L to the list of Types in enums.R and bump the other numbers up we no longer get the failures, but I don't know why as I can't see the link between those IDs and the C++ code.

@thisisnic
Copy link
Member

@felipecrv - I have a solution for you in the above comment, which should fix things in your PR even if I'm not totally sure why it does.

@thisisnic
Copy link
Member

As this isn't a bug, I'm going to close this now

@felipecrv
Copy link
Contributor Author

@felipecrv - I have a solution for you in the above comment, which should fix things in your PR even if I'm not totally sure why it does.

Oh, I suspected that this could be due to R assuming a specific numbering of the Type enum. If this is the case, we should be explicit about the enum entry values in the C++ enum.

@bkietz, do you know if the automatic numbering of enum entries is fully specified in C++ or is the compiler free to choose anything?

@felipecrv
Copy link
Contributor Author

@thisisnic @paleolimbot thanks for looking into this and sorry for the false alarm. I opened a PR that adds explicit values on the C++ side as well so no one gets confused about this again.

@thisisnic
Copy link
Member

@felipecrv No worries, and thanks for figuring it out for us all, as I was really confused by the reasoning there! :)

bkietz pushed a commit that referenced this issue Aug 16, 2023
…enum (#37149)

### Rationale for this change

The enum type in C++ leaves the integer values implicitly defined and let the compiler assign the values to them automatically. This means an insertion of a new entry that is not at the end, causes implementations that rely on specific values (like R) to break with confusing error messages [1].

Assigning the values explicitly can communicate that these enum entry values are relied upon and can allow a more natural ordering of the list that is different from the numeric order the entries receive.

[1] #37091

### What changes are included in this PR?

 - Setting numeric values of the enum entries explicitly
 - Completing an equivalent enum on the R side 
 
### Are these changes tested?

N/A

### Are there any user-facing changes?

No.
* Closes: #37148

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…:type enum (apache#37149)

### Rationale for this change

The enum type in C++ leaves the integer values implicitly defined and let the compiler assign the values to them automatically. This means an insertion of a new entry that is not at the end, causes implementations that rely on specific values (like R) to break with confusing error messages [1].

Assigning the values explicitly can communicate that these enum entry values are relied upon and can allow a more natural ordering of the list that is different from the numeric order the entries receive.

[1] apache#37091

### What changes are included in this PR?

 - Setting numeric values of the enum entries explicitly
 - Completing an equivalent enum on the R side 
 
### Are these changes tested?

N/A

### Are there any user-facing changes?

No.
* Closes: apache#37148

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants