Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utest failures on various arches #1469

Closed
sharkcz opened this issue Feb 21, 2018 · 38 comments · Fixed by #1538
Closed

utest failures on various arches #1469

sharkcz opened this issue Feb 21, 2018 · 38 comments · Fixed by #1538

Comments

@sharkcz
Copy link
Contributor

sharkcz commented Feb 21, 2018

After commit e7366a4 we see test failures on various non-x86 arches

s390x

TEST 11/22 dsdot:dsdot_n_1 [FAIL]
  ERR: test_dsdot.c:47  expected -2.393e-03, got -2.393e-03 (diff 1.035e-10, tol 1.000e-13)
````

aarch64

TEST 11/22 dsdot:dsdot_n_1 [FAIL]
ERR: test_dsdot.c:47 expected -2.393e-03, got -2.393e-03 (diff 1.035e-10, tol 1.000e-13)


ppc64

TEST 16/21 rot:csrot_inc_0 [FAIL]
ERR: test_rot.c:109 expected 3.125e-01, got -2.148e-01 (diff 5.273e-01, tol 1.000e-04)
TEST 18/21 rot:zdrot_inc_0 [FAIL]
ERR: test_rot.c:71 expected 3.125e-01, got -2.148e-01 (diff 5.273e-01, tol 1.000e-13)

and even more on armv7
````
./openblas_utest
TEST 1/22 amax:samax [OK]
TEST 2/22 drotmg:rotmg_D1eqD2_X1eqX2 [OK]
TEST 3/22 drotmg:rotmg_issue1452 [OK]
TEST 4/22 drotmg:rotmg [OK]
TEST 5/22 axpy:caxpy_inc_0 [FAIL]
  ERR: test_axpy.c:109  expected 2.000e+00, got -3.000e+00 (diff 5.000e+00, tol 1.000e-13)
TEST 6/22 axpy:saxpy_inc_0 [FAIL]
  ERR: test_axpy.c:90  expected 2.000e+00, got 4.000e+00 (diff -2.000e+00, tol 1.000e-13)
TEST 7/22 axpy:zaxpy_inc_0 [FAIL]
  ERR: test_axpy.c:71  expected 2.000e+00, got -3.000e+00 (diff 5.000e+00, tol 1.000e-13)
TEST 8/22 axpy:daxpy_inc_0 [FAIL]
  ERR: test_axpy.c:52  expected 2.000e+00, got 4.000e+00 (diff -2.000e+00, tol 1.000e-13)
TEST 9/22 zdotu:zdotu_offset_1 [OK]
TEST 10/22 zdotu:zdotu_n_1 [OK]
TEST 11/22 dsdot:dsdot_n_1 [FAIL]
  ERR: test_dsdot.c:47  expected -2.393e-03, got -2.393e-03 (diff 1.035e-10, tol 1.000e-13)
TEST 12/22 swap:cswap_inc_0 [OK]
TEST 13/22 swap:sswap_inc_0 [OK]
TEST 14/22 swap:zswap_inc_0 [OK]
TEST 15/22 swap:dswap_inc_0 [OK]
TEST 16/22 rot:csrot_inc_0 [FAIL]
  ERR: test_rot.c:109  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-04)
TEST 17/22 rot:srot_inc_0 [FAIL]
  ERR: test_rot.c:90  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-04)
TEST 18/22 rot:zdrot_inc_0 [FAIL]
  ERR: test_rot.c:71  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-13)
TEST 19/22 rot:drot_inc_0 [FAIL]
  ERR: test_rot.c:51  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-13)
TEST 20/22 potrf:smoketest_trivial [OK]
TEST 21/22 potrf:bug_695 [OK]
TEST 22/22 fork:safety [OK]
RESULTS: 22 tests (13 ok, 9 failed, 0 skipped) ran in 3269 ms
````

This is from Fedora 27 systems.
@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 21, 2018

Thanks. The dsdot error is most likely spurious as it flags all architectures that use the generic C implementation from kernel/generic/dot.c - I have just seen it myself on arm64. A drawback of using canned "known good" results across all architectures instead of comparing to the reference implementation running on the same cpu. The ROT and AXPY are "interesting"...

@sharkcz
Copy link
Contributor Author

sharkcz commented Feb 21, 2018

Let me know if you need access to the non-x86 machines or more info.

@martin-frbg
Copy link
Collaborator

Actually it is the C implementation in kernel/arm/dot.c that is used by the arm,mips and zarch kernels that provides an inexact dsdot.

@martin-frbg
Copy link
Collaborator

I cannot debug the armv7 issues unfortunately, I can only guess that they may have crept in through the vfp changes of #1221 @ashwinyes
Similarly for ppc - I would not know how to "fix" zrot_ppc440.S (unchanged from libGoto) except by replacing it with arm/zrot.c in the KERNEL file(s).

@ashwinyes
Copy link
Contributor

@martin-frbg Looks like this is an issue with #1462. I went briefly through the commit diff and there are some serious issues with the commit, if I am not mistaken.

In utest, I assume we need to call both ref BLAS implementation and the OpenBLAS implementation and then compare the result. In the commit, all calls to the BLAS reference implementation have been removed. There is no point in comparing the results.

Also, at some places the input arrays have been modified. So even if the reference BLAS calls are restored, most likely it will generate failures.

Could you please explain what this commit is doing ??

@martin-frbg
Copy link
Collaborator

The commit restored older tests, of which only one (test_amax) had been similarly converted by xianyi (5a8447e) from the previously used cutest framework to something that would run on travis&appveyor. I have replaced the calls to the reference implementation with an array of results from running the respective netlib functions locally (on Haswell, but this should not cause large differences). Can you please point out where you think input arrays have been modified ? The x2,y2 which were previously both input and output for the reference implementation now hold the reference results. The only flaw I see with my changes is that currently the calculated and expected values are swapped in the error message.

@martin-frbg
Copy link
Collaborator

Note also that the failed tests are checking a valid but probably highly unusual corner case (zero increments), so I am not happy with just replacing your optimized axpy and rot codes with their generic C implementations that happen to pass the test. Omitting these two tests on arm may be an option if nobody has time to look at them now.

@ashwinyes
Copy link
Contributor

@martin-frbg Apologies. Spoke too soon without understanding your commit completely.

I will also look and try to fix the arm and arm64 codes when I have the bandwidth.

@martin-frbg
Copy link
Collaborator

No problem. I thought I had added the explanation about using canned results instead of live netlib calls in the commit message, but either I forgot or it got lost when I did a compressed merge to get rid of all the intermediate attempts to get prototypes right for all platforms.
The arm64 issue is (in my opinion) only that arm/dot.c should be retired in favor of wernsaar's generic/dot.c as the latter has additional typecasts for providing precise DSDOT. As your ARMV8 dot.S appeared to provide perfect DSDOT in my tests as well, I modified the ARMV8 and CORTEXA57 KERNEL files to use it in #1472. In retrospect I hope this did not hurt your thunderx platforms as their KERNEL files build on ARMV8 (the DSDOT in your dot_thunderx files looks good to me, but I will not consciously stomp on those targets as discussed recently )

@ashwinyes
Copy link
Contributor

In fact, all the kernels for ARMv8 (including ThunderX) uses the KERNEL.ARMV8 as the base for DSDOT. Your changes were affecting ThunderX also. Didnt realise it until now :)

But anyways, PR #1475 has the fix for DSDOT utest errors on ARM64.

@sharkcz
Copy link
Contributor Author

sharkcz commented Feb 27, 2018

I confirm that the "dsdot" failures are now gone on armv7 and s390x

@martin-frbg
Copy link
Collaborator

I promise to execute more restraint in the future regarding ARMV8/ThunderX... though I do wonder what I was testing when I did not see problems with the ARMV8 dot.S , only with the arm/dot.c that it was originally falling back to for dsdot.

@sharkcz
Copy link
Contributor Author

sharkcz commented Feb 27, 2018

Now also aarch64 is green.

@martin-frbg
Copy link
Collaborator

Do your builds include mips architectures as well ? I suspect those might show at least the "dsdot" failure as well.

@sharkcz
Copy link
Contributor Author

sharkcz commented Feb 27, 2018

Unfortunately not, we do builds on official Fedora arches only - armv7, aarch64, ppc64, ppc64le, s390x and x86_64

@martin-frbg
Copy link
Collaborator

Naive question: can somebody (@jcowgill, @sva-img ?) suggest an affordable and easy to use testing environment for mips/mips64 please, or should I just use qemu ?

@loveshack
Copy link

Perhaps https://gcc.gnu.org/wiki/CompileFarm for MIPS? A possible connexion with GCC is gfortran's option to call BLAS for matmul.

@martin-frbg
Copy link
Collaborator

In Qemu, mips32 fails the dsdot test (for the same reason as the others above). mips64 additionally fails both complex axpy tests, complex swap and all rot tests. cpuid_mips.c needed a patch to even compile.

@martin-frbg
Copy link
Collaborator

Thanks for the pointer to the gcc compile farm. Seems their mips64 machines are actually $300 routers, so is this all that is left of that platform nowadays ?

@martin-frbg
Copy link
Collaborator

@sharkcz are you still seeing the failures on ppc ?

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 20, 2018

yes, I do

TEST 17/22 rot:csrot_inc_0 [FAIL]
  ERR: test_rot.c:109  expected -2.148e-01, got 3.125e-01 (diff -5.273e-01, tol 1.000e-04)
TEST 18/22 rot:srot_inc_0 [OK]
TEST 19/22 rot:zdrot_inc_0 [FAIL]
  ERR: test_rot.c:71  expected -2.148e-01, got 3.125e-01 (diff -5.273e-01, tol 1.000e-13)

with recent develop branch and using POWER6 kernel. The expected/got values are now swapped though.

@martin-frbg
Copy link
Collaborator

Thanks. Actually I was not sure what kernel you are using, and had hoped it might be POWER8 which got a zrot rewrite a few weeks ago.

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 20, 2018

With POWER8 kernel I see a segfault from the dblat3 test, POWER7 gets same failures as POWER6, build with POWER5 results in assembler errors.
If you would like I can provide more details or give you access to our community VM.

@martin-frbg
Copy link
Collaborator

segfault with POWER8 is not good obviously, is it in any way apparent which of the tests in dblat3 crashes it ? POWER7 maps to POWER6 internally so no surprise, POWER5 I have no idea...
Not sure if I have any idea for fixing the zrot problem (except switching back to its generic c implementation, but not sure if this is warranted as the utest check is for a corner case only)

@quickwritereader
Copy link
Contributor

quickwritereader commented Apr 20, 2018

@martin-frbg anything wrong with z13 or power8le kernels? I have still access to those pcs. I could check at the week end

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 20, 2018

power8le is OK from what I can see in our CI

@martin-frbg
Copy link
Collaborator

So it is "only" the POWER8 target on the POWER6 hardware (or emulator) that is segfaulting ? In that case I guess it can be ignored as "probably not expected to work". @quickwritereader I believe with my trivial fix for dsdot the z13 target should pass all utests.
I guess a "not so elegant" solution for the upcoming 0.3.0 would be to simply disable these tests again on the "known bad" platforms (temporarily). None of them should be recent regressions and all are basically checking the behaviour in corner cases (X and/or Y increments of zero mostly).

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 20, 2018

the POWER8 target segfaults on Power8 HW (ppc64 big endian VM), we build POWER6 target (on Power8 HW) as a least common denominator at the distro level

@martin-frbg
Copy link
Collaborator

martin-frbg commented Apr 20, 2018

Pretty sure the POWER8 code is written with power8le in mind, perhaps this needs to be clarified somewhere. (Wrong-endian assembly would still compile I guess, just do unexpected things later)
But quickwritereader is the resident expert for this...

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 20, 2018

yes, that's very plausible. I'm thinking what is best target for a distro still supporting ppc64, I would prefer correctness over speed ...

@loveshack
Copy link

loveshack commented Apr 23, 2018 via email

@martin-frbg
Copy link
Collaborator

I took his question to mean "what TARGET should I set in the OpenBLAS build process to get something that still works on ppc64".) While I assume the zrot bug has been there for ages and only this one test uncovered it, replacing the POWER6 zrot assembly with the generic implementation will probably not hurt performance too much. .(I am much less sure if I could read from the working srot/drot implementation how to handle the incx=zero case properly in assembly)

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 23, 2018

yes, Martin's interpretation is what I meant by the question :-)

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 23, 2018

And as ppc64 is being sunset in the Linux world, then using generic zrot would definitely be acceptable for us.

@martin-frbg
Copy link
Collaborator

Could you try with the two additional lines from #1535 in KERNEL.POWER6 please ?

martin-frbg added a commit that referenced this issue Apr 23, 2018
#1535)

* Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469
@martin-frbg
Copy link
Collaborator

Thanks for testing. That would seem to leave the ARMV7 axpy and rot implementations, axpy_vfp.S and rot_vfp.S where my current thinking is that the check for incx=0,incy=0 at https://github.com/xianyi/OpenBLAS/blob/8a3b6fa108b15331c2af8777d1ea0206f85673b8/kernel/arm/axpy_vfp.S#L444-L448 (and similarly in the rot file) is premature, causing an exit before the first element of y has been updated. I suspect these lines should be commented out, or if an early exit is desired, perhaps be moved to the end of the KERNEL_S1 implementation(s). I do not have ARMV7 hardware for testing.

@martin-frbg
Copy link
Collaborator

ARMV7 changes verified via QEMU now.

@sharkcz
Copy link
Contributor Author

sharkcz commented Apr 24, 2018

Awesome, I did a build on real hw and all is OK.

zhaofengli added a commit to zhaofengli/OpenBLAS that referenced this issue Jun 7, 2021
The implementation in `riscv64/dot.c` fails the `test_dsdot` test, and
the generic kernel seems to have better precision. Tested on SiFive
FU740 (HiFive Unmatched) and QEMU.

Also see OpenMathLib#1469.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants