Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] [CI] Test reorganization triggering valgrind errors #20062

Closed
asfimport opened this issue Jan 6, 2022 · 1 comment
Closed

[R] [CI] Test reorganization triggering valgrind errors #20062

asfimport opened this issue Jan 6, 2022 · 1 comment

Comments

@asfimport
Copy link
Collaborator

It looks like the test reorganization that was part of ARROW-15010 means that some of the tests that were being skipped because re2 is known to (purposefully) not pass valgrind.

For example, the format_ISO8601 tests that are now in test-dplyr-funcs-datetime.R

used to be in test-dplyr-funcs-string.R below a skip_if_not_available("re2")

test_that("format_ISO8601", {
skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
times <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"), NA))
compare_dplyr_binding(
.input %>%
mutate(x = format_ISO8601(x, precision = "ymd", usetz = FALSE)) %>%
collect(),
times
)
if (getRversion() < "3.5") {
# before 3.5, times$x will have no timezone attribute, so Arrow faithfully
# errors that there is no timezone to format:
expect_error(
times %>%
Table$create() %>%
mutate(x = format_ISO8601(x, precision = "ymd", usetz = TRUE)) %>%
collect(),
"Timezone not present, cannot convert to string with timezone: %Y-%m-%d%z"
)
# See comment regarding %S flag in strftime tests
expect_error(
times %>%
Table$create() %>%
mutate(x = format_ISO8601(x, precision = "ymdhms", usetz = TRUE)) %>%
mutate(x = gsub("\\.0*", "", x)) %>%
collect(),
"Timezone not present, cannot convert to string with timezone: %Y-%m-%dT%H:%M:%S%z"
)
} else {
compare_dplyr_binding(
.input %>%
mutate(x = format_ISO8601(x, precision = "ymd", usetz = TRUE)) %>%
collect(),
times
)
# See comment regarding %S flag in strftime tests
compare_dplyr_binding(
.input %>%
mutate(x = format_ISO8601(x, precision = "ymdhms", usetz = TRUE)) %>%
mutate(x = gsub("\\.0*", "", x)) %>%
collect(),
times
)
}
# See comment regarding %S flag in strftime tests
compare_dplyr_binding(
.input %>%
mutate(x = format_ISO8601(x, precision = "ymdhms", usetz = FALSE)) %>%
mutate(x = gsub("\\.0*", "", x)) %>%
collect(),
times
)
})

A bit counter-intuitively "not available" for re2 includes being run on the systems that run valgrind: https://github.com/apache/arrow/blob/master/r/tests/testthat/helper-skip.R#L25-L27

We can add in that re2 skip gatting those tests (though, there might be more that need that gating too!)

https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=18175&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181&l=8758

==10242==    by 0xE266301: __invoke_impl<arrow::Result<arrow::compute::ExecBatch>, arrow::compute::(anonymous namespace)::ProjectNode::InputReceived(arrow::compute::ExecNode*, arrow::compute::ExecBatch)::<lambda(arrow::compute::ExecBatch)>&, arrow::compute::ExecBatch> (invoke.h:60)
==10242==    by 0xE266301: __invoke_r<arrow::Result<arrow::compute::ExecBatch>, arrow::compute::(anonymous namespace)::ProjectNode::InputReceived(arrow::compute::ExecNode*, arrow::compute::ExecBatch)::<lambda(arrow::compute::ExecBatch)>&, arrow::compute::ExecBatch> (invoke.h:142)
==10242==    by 0xE266301: std::_Function_handler<arrow::Result<arrow::compute::ExecBatch> (arrow::compute::ExecBatch), arrow::compute::(anonymous namespace)::ProjectNode::InputReceived(arrow::compute::ExecNode*, arrow::compute::ExecBatch)::{lambda(arrow::compute::ExecBatch)#1}>::_M_invoke(std::_Any_data const&, arrow::compute::ExecBatch&&) (std_function.h:292)
==10242==  Uninitialised value was created by a heap allocation
==10242==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==10242==    by 0xEC22BD5: allocate (new_allocator.h:115)
==10242==    by 0xEC22BD5: PODArray (pod_array.h:22)
==10242==    by 0xEC22BD5: re2::SparseSetT<void>::SparseSetT(int) (sparse_set.h:241)
==10242==    by 0xEC208E9: re2::Prog::Optimize() (prog.cc:225)
==10242==    by 0xEC393BB: re2::Compiler::Finish(re2::Regexp*) (compile.cc:1172)
==10242==    by 0xEC3B48E: re2::Compiler::Compile(re2::Regexp*, bool, long) (compile.cc:1156)
==10242==    by 0xEC1B8AE: re2::RE2::Init(re2::StringPiece const&, re2::RE2::Options const&) (re2.cc:223)
==10242==    by 0xEC1C480: re2::RE2::RE2(re2::StringPiece const&, re2::RE2::Options const&) (re2.cc:126)
==10242==    by 0xE7032B7: RegexSubstringReplacer (scalar_string.cc:3178)
==10242==    by 0xE7032B7: make_unique<arrow::compute::internal::(anonymous namespace)::RegexSubstringReplacer<arrow::StringType>, const arrow::compute::ReplaceSubstringOptions&> (make_unique.h:30)
==10242==    by 0xE7032B7: arrow::compute::internal::(anonymous namespace)::RegexSubstringReplacer<arrow::StringType>::Make(arrow::compute::ReplaceSubstringOptions const&) (scalar_string.cc:3158)
==10242==    by 0xE7063C0: arrow::compute::internal::(anonymous namespace)::ReplaceSubstring<arrow::StringType, arrow::compute::internal::(anonymous namespace)::RegexSubstringReplacer<arrow::StringType> >::Exec(arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*) (scalar_string.cc:3060)
==10242==    by 0xE2FF61E: __invoke_impl<arrow::Status, arrow::Status (*&)(arrow::compute::KernelContext*, const arrow::compute::ExecBatch&, arrow::Datum*), arrow::compute::KernelContext*, const arrow::compute::ExecBatch&, arrow::Datum*> (invoke.h:60)
==10242==    by 0xE2FF61E: __invoke_r<arrow::Status, arrow::Status (*&)(arrow::compute::KernelContext*, const arrow::compute::ExecBatch&, arrow::Datum*), arrow::compute::KernelContext*, const arrow::compute::ExecBatch&, arrow::Datum*> (invoke.h:142)
==10242==    by 0xE2FF61E: std::_Function_handler<arrow::Status (arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*), arrow::Status (*)(arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*)>::_M_invoke(std::_Any_data const&, arrow::compute::KernelContext*&&, arrow::compute::ExecBatch const&, arrow::Datum*&&) (std_function.h:292)
==10242==    by 0xE2290E4: operator() (std_function.h:622)
==10242==    by 0xE2290E4: ExecuteBatch (exec.cc:700)
==10242==    by 0xE2290E4: arrow::compute::detail::(anonymous namespace)::ScalarExecutor::Execute(std::vector<arrow::Datum, std::allocator<arrow::Datum> > const&, arrow::compute::detail::ExecListener*) (exec.cc:641)
==10242==    by 0xE243D54: arrow::compute::ExecuteScalarExpression(arrow::compute::Expression const&, arrow::compute::ExecBatch const&, arrow::compute::ExecContext*) (expression.cc:547)
==10242== 

Reporter: Jonathan Keane / @jonkeane
Assignee: Dewey Dunnington / @paleolimbot

PRs and other links:

Note: This issue was originally created as ARROW-15266. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:
Issue resolved by pull request 12090
#12090

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants