utest failures on various arches #1469

sharkcz · 2018-02-21T13:03:45Z

After commit e7366a4 we see test failures on various non-x86 arches

s390x

TEST 11/22 dsdot:dsdot_n_1 [FAIL]
  ERR: test_dsdot.c:47  expected -2.393e-03, got -2.393e-03 (diff 1.035e-10, tol 1.000e-13)
````

aarch64

TEST 11/22 dsdot:dsdot_n_1 [FAIL]
ERR: test_dsdot.c:47 expected -2.393e-03, got -2.393e-03 (diff 1.035e-10, tol 1.000e-13)


ppc64

TEST 16/21 rot:csrot_inc_0 [FAIL]
ERR: test_rot.c:109 expected 3.125e-01, got -2.148e-01 (diff 5.273e-01, tol 1.000e-04)
TEST 18/21 rot:zdrot_inc_0 [FAIL]
ERR: test_rot.c:71 expected 3.125e-01, got -2.148e-01 (diff 5.273e-01, tol 1.000e-13)

and even more on armv7
````
./openblas_utest
TEST 1/22 amax:samax [OK]
TEST 2/22 drotmg:rotmg_D1eqD2_X1eqX2 [OK]
TEST 3/22 drotmg:rotmg_issue1452 [OK]
TEST 4/22 drotmg:rotmg [OK]
TEST 5/22 axpy:caxpy_inc_0 [FAIL]
  ERR: test_axpy.c:109  expected 2.000e+00, got -3.000e+00 (diff 5.000e+00, tol 1.000e-13)
TEST 6/22 axpy:saxpy_inc_0 [FAIL]
  ERR: test_axpy.c:90  expected 2.000e+00, got 4.000e+00 (diff -2.000e+00, tol 1.000e-13)
TEST 7/22 axpy:zaxpy_inc_0 [FAIL]
  ERR: test_axpy.c:71  expected 2.000e+00, got -3.000e+00 (diff 5.000e+00, tol 1.000e-13)
TEST 8/22 axpy:daxpy_inc_0 [FAIL]
  ERR: test_axpy.c:52  expected 2.000e+00, got 4.000e+00 (diff -2.000e+00, tol 1.000e-13)
TEST 9/22 zdotu:zdotu_offset_1 [OK]
TEST 10/22 zdotu:zdotu_n_1 [OK]
TEST 11/22 dsdot:dsdot_n_1 [FAIL]
  ERR: test_dsdot.c:47  expected -2.393e-03, got -2.393e-03 (diff 1.035e-10, tol 1.000e-13)
TEST 12/22 swap:cswap_inc_0 [OK]
TEST 13/22 swap:sswap_inc_0 [OK]
TEST 14/22 swap:zswap_inc_0 [OK]
TEST 15/22 swap:dswap_inc_0 [OK]
TEST 16/22 rot:csrot_inc_0 [FAIL]
  ERR: test_rot.c:109  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-04)
TEST 17/22 rot:srot_inc_0 [FAIL]
  ERR: test_rot.c:90  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-04)
TEST 18/22 rot:zdrot_inc_0 [FAIL]
  ERR: test_rot.c:71  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-13)
TEST 19/22 rot:drot_inc_0 [FAIL]
  ERR: test_rot.c:51  expected 1.000e+00, got -2.148e-01 (diff 1.215e+00, tol 1.000e-13)
TEST 20/22 potrf:smoketest_trivial [OK]
TEST 21/22 potrf:bug_695 [OK]
TEST 22/22 fork:safety [OK]
RESULTS: 22 tests (13 ok, 9 failed, 0 skipped) ran in 3269 ms
````

This is from Fedora 27 systems.

The text was updated successfully, but these errors were encountered:

martin-frbg · 2018-02-21T13:25:17Z

Thanks. The dsdot error is most likely spurious as it flags all architectures that use the generic C implementation from kernel/generic/dot.c - I have just seen it myself on arm64. A drawback of using canned "known good" results across all architectures instead of comparing to the reference implementation running on the same cpu. The ROT and AXPY are "interesting"...

sharkcz · 2018-02-21T13:43:00Z

Let me know if you need access to the non-x86 machines or more info.

martin-frbg · 2018-02-25T12:50:52Z

Actually it is the C implementation in kernel/arm/dot.c that is used by the arm,mips and zarch kernels that provides an inexact dsdot.

martin-frbg · 2018-02-26T20:44:34Z

I cannot debug the armv7 issues unfortunately, I can only guess that they may have crept in through the vfp changes of #1221 @ashwinyes
Similarly for ppc - I would not know how to "fix" zrot_ppc440.S (unchanged from libGoto) except by replacing it with arm/zrot.c in the KERNEL file(s).

ashwinyes · 2018-02-27T01:46:26Z

@martin-frbg Looks like this is an issue with #1462. I went briefly through the commit diff and there are some serious issues with the commit, if I am not mistaken.

In utest, I assume we need to call both ref BLAS implementation and the OpenBLAS implementation and then compare the result. In the commit, all calls to the BLAS reference implementation have been removed. There is no point in comparing the results.

Also, at some places the input arrays have been modified. So even if the reference BLAS calls are restored, most likely it will generate failures.

Could you please explain what this commit is doing ??

martin-frbg · 2018-02-27T06:57:27Z

The commit restored older tests, of which only one (test_amax) had been similarly converted by xianyi (5a8447e) from the previously used cutest framework to something that would run on travis&appveyor. I have replaced the calls to the reference implementation with an array of results from running the respective netlib functions locally (on Haswell, but this should not cause large differences). Can you please point out where you think input arrays have been modified ? The x2,y2 which were previously both input and output for the reference implementation now hold the reference results. The only flaw I see with my changes is that currently the calculated and expected values are swapped in the error message.

martin-frbg · 2018-02-27T07:51:22Z

Note also that the failed tests are checking a valid but probably highly unusual corner case (zero increments), so I am not happy with just replacing your optimized axpy and rot codes with their generic C implementations that happen to pass the test. Omitting these two tests on arm may be an option if nobody has time to look at them now.

ashwinyes · 2018-02-27T08:54:10Z

@martin-frbg Apologies. Spoke too soon without understanding your commit completely.

I will also look and try to fix the arm and arm64 codes when I have the bandwidth.

martin-frbg · 2018-02-27T09:16:56Z

No problem. I thought I had added the explanation about using canned results instead of live netlib calls in the commit message, but either I forgot or it got lost when I did a compressed merge to get rid of all the intermediate attempts to get prototypes right for all platforms.
The arm64 issue is (in my opinion) only that arm/dot.c should be retired in favor of wernsaar's generic/dot.c as the latter has additional typecasts for providing precise DSDOT. As your ARMV8 dot.S appeared to provide perfect DSDOT in my tests as well, I modified the ARMV8 and CORTEXA57 KERNEL files to use it in #1472. In retrospect I hope this did not hurt your thunderx platforms as their KERNEL files build on ARMV8 (the DSDOT in your dot_thunderx files looks good to me, but I will not consciously stomp on those targets as discussed recently )

ashwinyes · 2018-02-27T10:54:43Z

In fact, all the kernels for ARMv8 (including ThunderX) uses the KERNEL.ARMV8 as the base for DSDOT. Your changes were affecting ThunderX also. Didnt realise it until now :)

But anyways, PR #1475 has the fix for DSDOT utest errors on ARM64.

sharkcz · 2018-02-27T11:32:46Z

I confirm that the "dsdot" failures are now gone on armv7 and s390x

martin-frbg · 2018-02-27T11:48:51Z

I promise to execute more restraint in the future regarding ARMV8/ThunderX... though I do wonder what I was testing when I did not see problems with the ARMV8 dot.S , only with the arm/dot.c that it was originally falling back to for dsdot.

sharkcz · 2018-02-27T13:43:34Z

Now also aarch64 is green.

martin-frbg · 2018-02-27T14:20:34Z

Do your builds include mips architectures as well ? I suspect those might show at least the "dsdot" failure as well.

sharkcz · 2018-02-27T14:29:12Z

Unfortunately not, we do builds on official Fedora arches only - armv7, aarch64, ppc64, ppc64le, s390x and x86_64

martin-frbg · 2018-03-11T18:18:56Z

Naive question: can somebody (@jcowgill, @sva-img ?) suggest an affordable and easy to use testing environment for mips/mips64 please, or should I just use qemu ?

loveshack · 2018-03-14T11:57:28Z

Perhaps https://gcc.gnu.org/wiki/CompileFarm for MIPS? A possible connexion with GCC is gfortran's option to call BLAS for matmul.

martin-frbg · 2018-03-19T07:18:07Z

In Qemu, mips32 fails the dsdot test (for the same reason as the others above). mips64 additionally fails both complex axpy tests, complex swap and all rot tests. cpuid_mips.c needed a patch to even compile.

martin-frbg · 2018-03-25T18:45:58Z

Thanks for the pointer to the gcc compile farm. Seems their mips64 machines are actually $300 routers, so is this all that is left of that platform nowadays ?

martin-frbg · 2018-04-19T09:27:33Z

@sharkcz are you still seeing the failures on ppc ?

sharkcz · 2018-04-20T10:30:47Z

yes, I do

TEST 17/22 rot:csrot_inc_0 [FAIL]
  ERR: test_rot.c:109  expected -2.148e-01, got 3.125e-01 (diff -5.273e-01, tol 1.000e-04)
TEST 18/22 rot:srot_inc_0 [OK]
TEST 19/22 rot:zdrot_inc_0 [FAIL]
  ERR: test_rot.c:71  expected -2.148e-01, got 3.125e-01 (diff -5.273e-01, tol 1.000e-13)

with recent develop branch and using POWER6 kernel. The expected/got values are now swapped though.

martin-frbg · 2018-04-20T11:53:41Z

Thanks. Actually I was not sure what kernel you are using, and had hoped it might be POWER8 which got a zrot rewrite a few weeks ago.

sharkcz · 2018-04-20T12:36:14Z

With POWER8 kernel I see a segfault from the dblat3 test, POWER7 gets same failures as POWER6, build with POWER5 results in assembler errors.
If you would like I can provide more details or give you access to our community VM.

martin-frbg · 2018-04-20T13:35:22Z

segfault with POWER8 is not good obviously, is it in any way apparent which of the tests in dblat3 crashes it ? POWER7 maps to POWER6 internally so no surprise, POWER5 I have no idea...
Not sure if I have any idea for fixing the zrot problem (except switching back to its generic c implementation, but not sure if this is warranted as the utest check is for a corner case only)

quickwritereader · 2018-04-20T13:58:11Z

@martin-frbg anything wrong with z13 or power8le kernels? I have still access to those pcs. I could check at the week end

sharkcz · 2018-04-20T14:06:33Z

power8le is OK from what I can see in our CI

martin-frbg · 2018-04-20T14:17:26Z

So it is "only" the POWER8 target on the POWER6 hardware (or emulator) that is segfaulting ? In that case I guess it can be ignored as "probably not expected to work". @quickwritereader I believe with my trivial fix for dsdot the z13 target should pass all utests.
I guess a "not so elegant" solution for the upcoming 0.3.0 would be to simply disable these tests again on the "known bad" platforms (temporarily). None of them should be recent regressions and all are basically checking the behaviour in corner cases (X and/or Y increments of zero mostly).

sharkcz · 2018-04-20T14:35:00Z

the POWER8 target segfaults on Power8 HW (ppc64 big endian VM), we build POWER6 target (on Power8 HW) as a least common denominator at the distro level

martin-frbg · 2018-04-20T14:47:47Z

Pretty sure the POWER8 code is written with power8le in mind, perhaps this needs to be clarified somewhere. (Wrong-endian assembly would still compile I guess, just do unexpected things later)
But quickwritereader is the resident expert for this...

sharkcz · 2018-04-20T17:43:17Z

yes, that's very plausible. I'm thinking what is best target for a distro still supporting ppc64, I would prefer correctness over speed ...

loveshack · 2018-04-23T09:34:51Z

yes, that's very plausible. I'm thinking what is best target for a distro still supporting ppc64, I would prefer correctness over speed ...

I don't have a requirement for ppc, but I was going to suggest RHEL7/Fedora (though I don't know SuSE); then I noticed who provides developer shell access. Why not RHEL7?

martin-frbg · 2018-04-23T12:40:57Z

I took his question to mean "what TARGET should I set in the OpenBLAS build process to get something that still works on ppc64".) While I assume the zrot bug has been there for ages and only this one test uncovered it, replacing the POWER6 zrot assembly with the generic implementation will probably not hurt performance too much. .(I am much less sure if I could read from the working srot/drot implementation how to handle the incx=zero case properly in assembly)

sharkcz · 2018-04-23T12:44:56Z

yes, Martin's interpretation is what I meant by the question :-)

sharkcz · 2018-04-23T12:49:00Z

And as ppc64 is being sunset in the Linux world, then using generic zrot would definitely be acceptable for us.

martin-frbg · 2018-04-23T14:16:13Z

Could you try with the two additional lines from #1535 in KERNEL.POWER6 please ?

#1535) * Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469

martin-frbg · 2018-04-23T17:55:31Z

Thanks for testing. That would seem to leave the ARMV7 axpy and rot implementations, axpy_vfp.S and rot_vfp.S where my current thinking is that the check for incx=0,incy=0 at https://github.com/xianyi/OpenBLAS/blob/8a3b6fa108b15331c2af8777d1ea0206f85673b8/kernel/arm/axpy_vfp.S#L444-L448 (and similarly in the rot file) is premature, causing an exit before the first element of y has been updated. I suspect these lines should be commented out, or if an early exit is desired, perhaps be moved to the end of the KERNEL_S1 implementation(s). I do not have ARMV7 hardware for testing.

martin-frbg · 2018-04-24T20:58:03Z

ARMV7 changes verified via QEMU now.

sharkcz · 2018-04-24T21:56:46Z

Awesome, I did a build on real hw and all is OK.

The implementation in `riscv64/dot.c` fails the `test_dsdot` test, and the generic kernel seems to have better precision. Tested on SiFive FU740 (HiFive Unmatched) and QEMU. Also see OpenMathLib#1469.

martin-frbg mentioned this issue Feb 25, 2018

Fix limited DSDOT precision on arm,aarch64 and zarch #1472

Merged

martin-frbg mentioned this issue Mar 4, 2018

sdot yields wrong results for odd sizes > 2^24 and all sizes > 2^29 #1326

Closed

martin-frbg mentioned this issue Mar 17, 2018

Test failures for 32-bit macOS compile #1492

Closed

This was referenced Mar 29, 2018

[WIP] armv7 support numba/numba#1968

Merged

linalg test failures on ARMv7 scipy/scipy#8638

Closed

martin-frbg mentioned this issue Apr 10, 2018

Correct precision of mips dsdot #1515

Merged

martin-frbg mentioned this issue Apr 23, 2018

Use generic zrot.c on ppc64/POWER6 to work around utest failure from … #1535

Merged

martin-frbg added a commit that referenced this issue Apr 23, 2018

Use generic zrot.c on ppc64/POWER6 to work around utest failure from … (

8a3b6fa

#1535) * Use generic C implementation of zrot on ppc64/POWER6 to work around utest failure from #1469

martin-frbg mentioned this issue Apr 24, 2018

Fix handling of zero INCX, INCY in ArmV7 AXPY and ROT #1538

Merged

martin-frbg closed this as completed in #1538 Apr 25, 2018

martin-frbg mentioned this issue May 23, 2018

v.0.3.0 test fail on power9 #1571

Closed

atsampson mentioned this issue May 25, 2018

sswap/dswap utests fail with kernel/x86/swap.S #1575

Closed

martin-frbg mentioned this issue Jul 7, 2018

Test failures on mips64 with OpenBLAS 0.3.1 #1672

Closed

martin-frbg mentioned this issue Mar 14, 2019

Compilation with target POWER8 fails in AIX 7.2 #1997

Closed

mepholic mentioned this issue Oct 13, 2019

OpenBLAS on dSPACE realtime hardware #2102

Open

utest failures on various arches #1469

utest failures on various arches #1469

Comments

sharkcz commented Feb 21, 2018

martin-frbg commented Feb 21, 2018 • edited

sharkcz commented Feb 21, 2018

martin-frbg commented Feb 25, 2018

martin-frbg commented Feb 26, 2018

ashwinyes commented Feb 27, 2018

martin-frbg commented Feb 27, 2018

martin-frbg commented Feb 27, 2018

ashwinyes commented Feb 27, 2018

martin-frbg commented Feb 27, 2018

ashwinyes commented Feb 27, 2018

sharkcz commented Feb 27, 2018

martin-frbg commented Feb 27, 2018

sharkcz commented Feb 27, 2018

martin-frbg commented Feb 27, 2018

sharkcz commented Feb 27, 2018

martin-frbg commented Mar 11, 2018

loveshack commented Mar 14, 2018

martin-frbg commented Mar 19, 2018

martin-frbg commented Mar 25, 2018

martin-frbg commented Apr 19, 2018

sharkcz commented Apr 20, 2018

martin-frbg commented Apr 20, 2018

sharkcz commented Apr 20, 2018

martin-frbg commented Apr 20, 2018

quickwritereader commented Apr 20, 2018 • edited

sharkcz commented Apr 20, 2018

martin-frbg commented Apr 20, 2018

sharkcz commented Apr 20, 2018

martin-frbg commented Apr 20, 2018 • edited

sharkcz commented Apr 20, 2018

loveshack commented Apr 23, 2018 via email

martin-frbg commented Apr 23, 2018

sharkcz commented Apr 23, 2018

sharkcz commented Apr 23, 2018

martin-frbg commented Apr 23, 2018

martin-frbg commented Apr 23, 2018

martin-frbg commented Apr 24, 2018

sharkcz commented Apr 24, 2018

martin-frbg commented Feb 21, 2018 •

edited

quickwritereader commented Apr 20, 2018 •

edited

martin-frbg commented Apr 20, 2018 •

edited