Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][R][C++] test-r-linux-valgrind has started failing #41148

Closed
raulcd opened this issue Apr 11, 2024 · 6 comments
Closed

[CI][R][C++] test-r-linux-valgrind has started failing #41148

raulcd opened this issue Apr 11, 2024 · 6 comments

Comments

@raulcd
Copy link
Member

raulcd commented Apr 11, 2024

Describe the bug, including details regarding any error messages, version, and platform.

The test-r-linux-valgrind job has started failing since those commits were merged (fe38d47...cd607d0) with several leaks:

2024-04-10T00:57:52.3151980Z ==774== LEAK SUMMARY:
2024-04-10T00:57:52.3152139Z ==774==    definitely lost: 0 bytes in 0 blocks
2024-04-10T00:57:52.3152329Z ==774==    indirectly lost: 0 bytes in 0 blocks
2024-04-10T00:57:52.3152513Z ==774==      possibly lost: 15,813 bytes in 36 blocks
2024-04-10T00:57:52.3152718Z ==774==    still reachable: 121,607,441 bytes in 38,708 blocks
2024-04-10T00:57:52.3152903Z ==774==                       of which reachable via heuristic:
2024-04-10T00:57:52.3153095Z ==774==                         newarray           : 4,264 bytes in 1 blocks
2024-04-10T00:57:52.3153280Z ==774==         suppressed: 0 bytes in 0 blocks
2024-04-10T00:57:52.3153481Z ==774== Reachable blocks (those to which a pointer was found) are not shown.
2024-04-10T00:57:52.3153778Z ==774== To see them, rerun with: --leak-check=full --show-leak-kinds=all
2024-04-10T00:57:52.3153941Z ==774== 
2024-04-10T00:57:52.3154114Z ==774== For lists of detected and suppressed errors, rerun with: -s
2024-04-10T00:57:52.3154330Z ==774== ERROR SUMMARY: 15 errors from 15 contexts (suppressed: 0 from 0)

Please see link of the job for more details.

Component(s)

C++, Continuous Integration, R

@zanmato1984
Copy link
Collaborator

Seems it's not actually leak, but that valgrind complains about unrecognized instruction which is very likely an AVX512 one.

Maybe we can quickly rule out the possibility of env issue, by checking if the host CPU is simd-capable? AFAIK, on certain VM env some CPU flags, esp. simd-related ones, could be missing if not explicitly configured.

@amoeba
Copy link
Member

amoeba commented Apr 12, 2024

This seems similar to #30368 but it's not clear to me how we end up with an AVX512 instruction. EXTRA_CMAKE_FLAGS=-DARROW_RUNTIME_SIMD_LEVEL=AVX2 gets set but I also see (arrow::compute::SimdLevel::type)4 which looks like it maps to SimdLevel::AVX512. @paleolimbot does this ring any bells?

@paleolimbot
Copy link
Member

paleolimbot commented Apr 13, 2024

I don't recall an unrecognized instruction in any of my adventures with that valgrind job (only memory leaks!).

@assignUser
Copy link
Member

I agree with @amoeba, what likely happened is that the runner supports avx512 (valgrind does not) and for some reason the code that get's invoked is not covered by the runtime simd level envvar. The quick way to fix this would be to also add ARROW_SIMD_LEVEL=AVX2 to that job so we don't compile anything for avx512.

Actually fixing it would probably be #30368 and checking the thing that valgrind instruments to set the simd level correctly.

@raulcd raulcd removed this from the 16.0.0 milestone Apr 16, 2024
@raulcd
Copy link
Member Author

raulcd commented Apr 16, 2024

I don't think this is a blocker so I've created RC0 for 16.0.0 without it.

@assignUser
Copy link
Member

@raulcd Looking at the likely cause you are right, if this issue would happen on CRAN it would happen with our current version (as we don't control the envvars there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants