[CI][C++] Nightly test-conda-cpp-valgrind is flaky due to arrow-compute-vector-test taking closer to 500 seconds #35636

raulcd · 2023-05-17T13:19:44Z

Describe the bug, including details regarding any error messages, version, and platform.

Sometimes the test-conda-cpp-valgrind nightly test fails due to a timeout on arrow-compute-vector-test. Examples:

test-conda-cpp-valgrind-8th-May --> 8th May
test-conda-cpp-valgrind-17th-May --> 17th May

with the failure:

The following tests FAILED:
	 25 - arrow-compute-vector-test (Timeout)

When it succeeds it is close to the 500 seconds, example:

test-conda-cpp-valgrind-11th-May --> 11th May, 405 seconds
test-conda-cpp-valgrind-13th-May--> 13th May, 468 seconds

I am not sure whether we should reduce the tests time or split the test so we stop getting those timeouts.

Component(s)

C++, Continuous Integration

The text was updated successfully, but these errors were encountered:

westonpace · 2023-05-18T13:21:06Z

I think we could probably be more selective about which types we test (e.g. testing one or two of the 8 integer types is often sufficient) if we really wanted to trim down the runtimes.

raulcd · 2023-06-27T14:15:07Z

@westonpace @pitrou should we do something about this flaky one?

pitrou · 2023-06-27T14:29:23Z

@raulcd Someone should ideally take a look into it and try to reduce the runtime under Valgrind.

For example, it's not obvious that testing all datatypes exhaustively with random input is useful (why all integers, both signed and unsigned, for example?). Test sizes could also be reduced a bit.

@felipecrv Since you've been working quite a bit on kernels, would you like to take a look at this?
Note that you can use Archery to run the Valgrind CI job, for example archery docker run conda-cpp-valgrind.
I would recommended shelling in the Docker image, though (archery docker run conda-cpp-valgrind bash), allowing you to build and test more selectively.

pitrou · 2023-06-27T14:30:34Z

An orthogonal solution is to split arrow-compute-vector-test into two or three independent test executables (for example: 1) sort+rank+top/bottomK 2) select+filter 3) all the rest). But, while that would workaround the timeout problem, it wouldn't reduce the total CI runtime.

felipecrv · 2023-06-28T22:27:09Z

An orthogonal solution is to split arrow-compute-vector-test into two or three independent test executables (for example: 1) sort+rank+top/bottomK 2) select+filter 3) all the rest). But, while that would workaround the timeout problem, it wouldn't reduce the total CI runtime.

I will try to reduce number of allocations in tests (I hope it's possible to do so), but will split these tests as I don't feel safe selecting which tests to disable. When developing kernels, exhaustive tests were very helpful in catching unexpected UB and mistakes in the type dispatching logic.

felipecrv · 2023-06-29T22:52:03Z

I generate a bunch of test cases for run-end encoding (and decoding), but it's really fast compared to other kernels. Number of variations is a bad predictor of test cost.

[----------] Global test environment tear-down
[==========] 423 tests from 1 test suite ran. (326 ms total)
[  PASSED  ] 423 tests.
debug/arrow-compute-vector-test --gtest_filter="EncodeArrayTest*"  0.44s user 0.06s system 96% cpu 0.511 total

felipecrv · 2023-06-29T23:10:09Z

I tried looking for inefficiencies in test code, but none of the changes I tried made a difference. So I opened a PR containing just tests extracted.

…-test (#36401) ### Rationale for this change `arrow-compute-vector-test` is too big and takes a long time to run because of that. ### What changes are included in this PR? Extracting two tests. Timings on my machine (Debug builds with ASAN). ``` debug/arrow-compute-vector-test > /dev/null 11.54s user 0.47s system 99% cpu 12.023 total debug/arrow-compute-vector-sort-test > /dev/null 13.30s user 0.26s system 99% cpu 13.579 total debug/arrow-compute-vector-selection-test > /dev/null 6.97s user 0.22s system 99% cpu 7.207 total ``` ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #35636 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

…vector-test (apache#36401) ### Rationale for this change `arrow-compute-vector-test` is too big and takes a long time to run because of that. ### What changes are included in this PR? Extracting two tests. Timings on my machine (Debug builds with ASAN). ``` debug/arrow-compute-vector-test > /dev/null 11.54s user 0.47s system 99% cpu 12.023 total debug/arrow-compute-vector-sort-test > /dev/null 13.30s user 0.26s system 99% cpu 13.579 total debug/arrow-compute-vector-selection-test > /dev/null 6.97s user 0.22s system 99% cpu 7.207 total ``` ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#35636 Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

raulcd added the Type: bug label May 17, 2023

github-actions bot added Component: C++ Component: Continuous Integration labels May 17, 2023

raulcd mentioned this issue Jun 27, 2023

[CI][C++][Parquet] test-conda-cpp-valgrind fails on parquet encryption tests #36327

Closed

felipecrv self-assigned this Jun 27, 2023

github-actions bot mentioned this issue Jun 29, 2023

GH-35636: [C++] Extract two expensive test suites from compute-vector-test #36401

Merged

pitrou closed this as completed in #36401 Jul 3, 2023

pitrou added this to the 13.0.0 milestone Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI][C++] Nightly test-conda-cpp-valgrind is flaky due to arrow-compute-vector-test taking closer to 500 seconds #35636

[CI][C++] Nightly test-conda-cpp-valgrind is flaky due to arrow-compute-vector-test taking closer to 500 seconds #35636

raulcd commented May 17, 2023

westonpace commented May 18, 2023

raulcd commented Jun 27, 2023

pitrou commented Jun 27, 2023

pitrou commented Jun 27, 2023

felipecrv commented Jun 28, 2023

felipecrv commented Jun 29, 2023 •

edited

felipecrv commented Jun 29, 2023 •

edited

[CI][C++] Nightly test-conda-cpp-valgrind is flaky due to arrow-compute-vector-test taking closer to 500 seconds #35636

[CI][C++] Nightly test-conda-cpp-valgrind is flaky due to arrow-compute-vector-test taking closer to 500 seconds #35636

Comments

raulcd commented May 17, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Component(s)

westonpace commented May 18, 2023

raulcd commented Jun 27, 2023

pitrou commented Jun 27, 2023

pitrou commented Jun 27, 2023

felipecrv commented Jun 28, 2023

felipecrv commented Jun 29, 2023 • edited

felipecrv commented Jun 29, 2023 • edited

felipecrv commented Jun 29, 2023 •

edited

felipecrv commented Jun 29, 2023 •

edited