Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test kernel_regress:skx_avx fails on RISC-V platform #4446

Open
leavelet opened this issue Jan 20, 2024 · 6 comments
Open

Test kernel_regress:skx_avx fails on RISC-V platform #4446

leavelet opened this issue Jan 20, 2024 · 6 comments

Comments

@leavelet
Copy link

Environment:

OpenBLAS version: release 0.3.26
OS: revyos
CPU: Sophgo sg2042, RISC-V rv64imafdc with rvv 0.71
Compiler: g++ 10.4, THead version. https://github.com/revyos/gcc/tree/revyos-gcc10.4-thead-dev
Compile command: make HOSTCC=gcc-10 TARGET=C910V CC=riscv64-linux-gnu-gcc-10 FC=riscv64-linux-gnu-gfortran-10 -j 64

Error log:

TEST 38/40 kernel_regress:skx_avx [FAIL]
  ERR: test_kernel_regress.c:50  expected 0.000e+00, got 2.719e+04 (diff -2.719e+04, tol 1.000e-10)

By the way, the risc-v branch stuck on the line below

OPENBLAS_NUM_THREADS=2 ./cblat3 < ./cblat3.dat
@leavelet
Copy link
Author

@RevySR

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jan 20, 2024

kernel_regress:skx_avx is DGEMM, maybe we should rename it as its history as an AVX512 bug in the SkylakeX kernel is irrelevant today...

in CI it works with a different vendor toolchain based on GCC 10.2 (see .github/workflow/c910v.yml for the URL), but of course the tests there use only qemu instead of the actual hardware

@leavelet
Copy link
Author

leavelet commented Jan 20, 2024

Since the CI with GCC 10.2 works fine, maybe it is a vendor problem. I shall work with Revy to resolve it.

@martin-frbg
Copy link
Collaborator

Any updates on this ? I've since merged the risc-v branch as I could not reproduce the problems in CI or local qemu, but I lack real C910V hardware at the moment.

@leavelet
Copy link
Author

leavelet commented Feb 7, 2024

The GEMM issue is fixed in #4454. We have found another issue in kernel/riscv64/nrm2_vector.c, which hasn't been fixed yet. Keeping this issue open until we fix the nrm2 issue or closing it to open a new one both work fine; I'm not sure which one is better.

@martin-frbg
Copy link
Collaborator

thanks. can keep this one open for simplicity (unless you expect this to take long, in which case opening a new issue with appropriate title might help others find it faster). annoying that it seems to depend so much on compiler version, or qemu vs actual hardware

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants