Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random failure in doctesting matrix_mod2_dense.pyx #475

Closed
strogdon opened this issue Jun 20, 2017 · 32 comments
Closed

Random failure in doctesting matrix_mod2_dense.pyx #475

strogdon opened this issue Jun 20, 2017 · 32 comments

Comments

@strogdon
Copy link
Contributor

This is in Prefix where I have Sage built with debugging enabled, CFLAGS="-march=native -O0 -pipe -g -ggdb", ... , etc.

sage -t --long usr/lib64/python2.7/site-packages/sage/matrix/matrix_mod2_dense.pyx
**********************************************************************
File "usr/lib64/python2.7/site-packages/sage/matrix/matrix_mod2_dense.pyx", line 843, in sage.matrix.matrix_mod2_dense.Matrix_mod2_dense._multiply_strassen
Failed example:
    A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
Expected:
    True
Got:
    False
**********************************************************************
1 item had failures:
   1 of  18 in sage.matrix.matrix_mod2_dense.Matrix_mod2_dense._multiply_strassen
    [333 tests, 1 failure, 17.80 s]
----------------------------------------------------------------------
sage -t --long usr/lib64/python2.7/site-packages/sage/matrix/matrix_mod2_dense.pyx  # 1 doctest failed

I have seen the failure for some time. It always occurs when doctesting Sage. However, lately I see the failure when running the test individually - though randomly. Initially I thought the failure was due to my CFLAGS for debugging, but I'm not sure anymore.

@strogdon
Copy link
Contributor Author

Also, the doctest runs very inefficiently in Prefix when it passes.

Prefix:

sage -t --long usr/lib64/python2.7/site-packages/sage/matrix/matrix_mod2_dense.pyx
    [333 tests, 23.84 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 24.0 seconds
    cpu time: 137.2 seconds
    cumulative wall time: 23.8 seconds

real    0m26.633s
user    2m19.493s
sys     0m0.360s

Gentoo:

sage -t --long /usr/lib/python2.7/site-packages/sage/matrix/matrix_mod2_dense.pyx
    [333 tests, 12.07 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 12.3 seconds
    cpu time: 12.7 seconds
    cumulative wall time: 12.1 seconds

real    0m15.739s
user    0m15.693s
sys     0m0.620s

The Gentoo machine is very old, the host machine (w/ Debian) of the Prefix is fairly new.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 20, 2017

Hum... can you post ldd -r usr/lib64/python2.7/site-packages/sage/matrix/matrix_mod2_dense.so please.

@strogdon strogdon changed the title Randon failure in doctesting matrix_mod2_dense.pyx Random failure in doctesting matrix_mod2_dense.pyx Jun 21, 2017
@strogdon
Copy link
Contributor Author

I thought that might be of interest.

ldd -r ~/usr/lib/python2.7/site-packages/sage/matrix/matrix_mod2_dense.so 
        linux-vdso.so.1 (0x00007ffe7cfb1000)
        libgmp.so.10 => /storage/strogdon/gentoo-rap/usr/lib64/libgmp.so.10 (0x00007f7f56e4b000)
        libm4ri-0.0.20140914.so => /storage/strogdon/gentoo-rap/usr/lib64/libm4ri-0.0.20140914.so (0x00007f7f56c0b000)
        libgd.so.3 => /storage/strogdon/gentoo-rap/usr/lib64/libgd.so.3 (0x00007f7f56998000)
        libpython2.7.so.1.0 => /storage/strogdon/gentoo-rap/usr/lib64/libpython2.7.so.1.0 (0x00007f7f56552000)
        libc.so.6 => /storage/strogdon/gentoo-rap/lib64/libc.so.6 (0x00007f7f561bc000)
        libpng16.so.16 => /storage/strogdon/gentoo-rap/usr/lib64/libpng16.so.16 (0x00007f7f55f7a000)
        libgomp.so.1 => /storage/strogdon/gentoo-rap/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libgomp.so.1 (0x00007f7f55d59000)
        libpthread.so.0 => /storage/strogdon/gentoo-rap/lib64/libpthread.so.0 (0x00007f7f55b3c000)
        libm.so.6 => /storage/strogdon/gentoo-rap/lib64/libm.so.6 (0x00007f7f55838000)
        libz.so.1 => /storage/strogdon/gentoo-rap/usr/lib64/libz.so.1 (0x00007f7f5561b000)
        libfontconfig.so.1 => /storage/strogdon/gentoo-rap/usr/lib64/libfontconfig.so.1 (0x00007f7f553cb000)
        libfreetype.so.6 => /storage/strogdon/gentoo-rap/usr/lib64/libfreetype.so.6 (0x00007f7f550d8000)
        libjpeg.so.62 => /storage/strogdon/gentoo-rap/usr/lib64/libjpeg.so.62 (0x00007f7f54e43000)
        libdl.so.2 => /storage/strogdon/gentoo-rap/lib64/libdl.so.2 (0x00007f7f54c3f000)
        libutil.so.1 => /storage/strogdon/gentoo-rap/lib64/libutil.so.1 (0x00007f7f54a3c000)
        /storage/strogdon/gentoo-rap/lib64/ld-linux-x86-64.so.2 (0x00007f7f5733c000)
        libexpat.so.1 => /storage/strogdon/gentoo-rap/usr/lib64/libexpat.so.1 (0x00007f7f54808000)
        libbz2.so.1 => /storage/strogdon/gentoo-rap/usr/lib64/libbz2.so.1 (0x00007f7f545f8000)
        libgcc_s.so.1 => /storage/strogdon/gentoo-rap/usr/lib/gcc/x86_64-pc-linux-gnu/5.4.0/libgcc_s.so.1 (0x00007f7f543e2000)

@strogdon
Copy link
Contributor Author

strogdon commented Jun 21, 2017

I can replicate things from the Sage prompt.

sage: A = random_matrix(GF(2),2701,3000)
sage: B = random_matrix(GF(2),3000,3172)
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
False
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
False
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
False
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)
True

@strogdon
Copy link
Contributor Author

I'm wondering if there is a subtle race condtion controlling this. On my Gentoo box where I have 4 threads
the Sage code

sage: A = random_matrix(GF(2),2701,3000)
sage: B = random_matrix(GF(2),3000,3172)
sage: A._multiply_strassen(B, 256) == A._multiply_m4rm(B, 0)

is always run in serial mode. The same is true for vanilla Sage on the same box. But the computer where Prefix is installed has 12 threads and the above code is run in parallel. I'm not sure why, but when the code is run htop shows all threads to be engaged.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 21, 2017

You probably have openmp enabled for m4ri. If you recompile m4ri without it on the prefix (there shouldn't be a need to rebuild sage I think) does it improve things? Also do you have openmp enabled for m4ri in pure gentoo? The clue is to whether ldd -r shows libgomp in the output.

@strogdon
Copy link
Contributor Author

OK, I have openmp enabled everywhere possible on both gentoo and prefix. However, rebuilding just m4ri with -openmp seemed to fix things in prefix - no more parallel execution. There must be a bug in m4ri - the same gcc-5.4 is used in gentoo and prefix.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 22, 2017

Hum... Do you have OPENMP_NUM_THREADS (or any OPENMP variables) in your environment by default on the prefix?

@strogdon
Copy link
Contributor Author

Nope. No OPENMP anything defined anywhere. I noticed that m4ri does have a testsuite. Maybe that would be helpful to include in the ebuild? I'm wondering if CFLAGS/LDFLAGS like -march=native -O0 -pipe -g -ggdb -fopenmp -Wl,-O1 -Wl,--as-needed cause odd things?

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 22, 2017

Nope, it shouldn't. Just because there is no src_test in the ebuild doesn't mean you cannot run tests. In this case the default just works. I just run ebuild m4ri-20140914-r1.ebuild test and the testsuite ran and passed with flying colors.

@strogdon
Copy link
Contributor Author

test_trsm, test_invert and test_multiplication failed here with openmp enabled. All passed without openmp.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 22, 2017

OK everything passes with openmp here on pure Gentoo. There may be a problem with openmp on your prefix. Although it could be a bug in m4ri that only appears in the right circumstances.

@strogdon
Copy link
Contributor Author

I ran the tests again and this time only test_invert and test_multiplication failed.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 22, 2017

Randomness. I may have to check the openmp code. It shouldn't happen.

@strogdon
Copy link
Contributor Author

The code may have buggy use of L3 cache in tuning. My processor

$ lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Model name:            Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz
Stepping:              7
CPU MHz:               3201.000
CPU max MHz:           3201.0000
CPU min MHz:           1200.0000
BogoMIPS:              6384.46
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid

If I disable L3 cache by adding

--with-cachesize=32000:256000:0

to econf in the ebuild (I believe that's what's happening) then all tests pass with openmp.

@strogdon
Copy link
Contributor Author

I may have to multipy the L1 and L2 cache by 1024?

@strogdon
Copy link
Contributor Author

strogdon commented Jun 24, 2017

Well actually doing the above (setting L3=0) disables parallel testing. Not good, so its not implemented correctly.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 24, 2017

Multiplication by 1024 is necessary I believe (it says size in bytes). L3=0 disable parallel testing? You mean make -j$N?

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 24, 2017

There is one thing to try, update ax_gcc_x86_cpuid.m4 in the m4 folder by the latest version from https://www.gnu.org/software/autoconf-archive/ax_gcc_x86_cpuid.html#ax_gcc_x86_cpuid
This is one of the macro needed to figure the cache and the only that has had significant update in the last few years compared to the date of the other macros in the m4 folder.

@strogdon
Copy link
Contributor Author

Let me address the parallel testing, I'll report on updating the macro in another post. I misinterpreted my results. With L3=0 and openmp enabled the testsuite is run in parallel. All 12 threads are used and all tests pass. If I use a built m4ri with L3=0 and openmp enable to run the Sage code in #475 (comment) then I get no failure whatsoever. If I observe which threads are used in running the Sage code it would appear it is being run serially and not in parallel. However, on closer examination 12 threads are being used, though very briefly. This briefness is why I thought setting L3=0 had disabled parallel testing since when L3 != 0 running the Sage code results in 12 theads being engaged for a considerable, unrealistic time. So, in short, I do think there is an L3 cache issue.

@strogdon
Copy link
Contributor Author

Updating the macro file did not improve things. So far, the only thing that works here is setting L3=0. I was motivated to try L3=0 since my gentoo box, where everything is fine with openmp enabled, has processors without L3 cache.

@strogdon
Copy link
Contributor Author

It appears that if L3=0 then for building purposes the L3 CACHE value is set equal to the L2 CACHE value. Setting L3=L2 works here.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 24, 2017

OK, I will open an issue for upstream m4ri. But I have a feeling the macro for detecting the size of the cache may need an update which is in another upstream.

@kiwifb
Copy link
Collaborator

kiwifb commented Jun 24, 2017

@strogdon if you don't specify the L3 cache, what value does the autodetection give you?

@strogdon
Copy link
Contributor Author

After ebuild m4ri-20140914-r1.ebuild configure

$ grep L*_CACHE Makefile

M4RI_CPU_L1_CACHE = 32768
M4RI_CPU_L2_CACHE = 262144
M4RI_CPU_L3_CACHE = 12582912

which are correct, but perhaps the issue is with how the CACHE values are used?

@strogdon
Copy link
Contributor Author

strogdon commented Jul 5, 2017

Well, this may be a gentoo prefix issue! After unpacking the m4ri tarball and using the Debian toolchain with

./configure --enable-openmp --enable-sse2 --disable-static
make
make check-TESTS

all tests pass. The Debian gcc is gcc version 4.7.2 (Debian 4.7.2-5) and it appears the tests are run using all threads.

@kiwifb
Copy link
Collaborator

kiwifb commented Jul 5, 2017

Hum... could be issues with gcc or glibc. Still very curious.

@strogdon
Copy link
Contributor Author

strogdon commented Jul 5, 2017

In prefix changing CFLAGS from CFLAGS="-march=native -O0 -pipe -g -ggdb" to CFLAGS="-march=native -O2 -pipe" allowed all tests to pass. Could optimization be doing this?

@kiwifb
Copy link
Collaborator

kiwifb commented Jul 5, 2017

This is most unusual. Does it work with -O1 or -O3?

@strogdon
Copy link
Contributor Author

strogdon commented Jul 5, 2017

O1 and O3 work but O0 doesn't.

@kiwifb
Copy link
Collaborator

kiwifb commented Jul 5, 2017

I can deal with that. The need for at least some optimisation is rare but not unheard of. It still makes me feel something is wrong in either the compiler or the program. But this can be filtered.

@kiwifb
Copy link
Collaborator

kiwifb commented Jul 5, 2017

Reproduced at -O0 on my gentoo box.

@kiwifb kiwifb closed this as completed in 8955ee5 Jul 6, 2017
kiwifb added a commit that referenced this issue Jul 6, 2017
Package-Manager: portage-2.3.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants