-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: --cpuclock-test fails on HP workstation with 32 cores (2 socket) #479
Comments
@bcran Do you have hyperthreading turned on? |
@sitsofe Yes, I do. The CPU configuration is:
|
Can you include the first few lines of cpuclock-test that talk about whether reliable_tsc is yes or no? |
|
Disabling hyperthreading causes it to pass most times, but it still occasionally fails (twice in twenty runs just now). |
I've also run fio-3.0 on the same hardware under Linux (openSUSE). In that environment, it passes every time. |
I'm starting to wonder if it's got something to do with the number of CPUs being used and the scheduling order created by the OS' scheduler. Across multiple runs do you find it is always the same CPUs (e.g. CPU 16/CPU 17) that the failure is reported on? |
fio on Windows with a large number of CPUs/cores frequently fails while running ./fio --cpuclock-test even though "reliable_tsc: yes" is reported. Using clang's thread sanitizer via CC=clang ./configure --extra-cflags="-fsanitize=thread" and running the same on Linux also generates multiple warnings similar to the following on a VM with 16 cores: WARNING: ThreadSanitizer: data race (pid=23780) Atomic write of size 4 at 0x7ffecb865a3c by thread T15 (mutexes: write M169): #0 __tsan_atomic32_fetch_add /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:591 (fio+0x000000471505) axboe#1 atomic32_inc_return /home/fio/gettime.c:567:13 (fio+0x0000004c56c1) axboe#2 clock_thread_fn /home/fio/gettime.c:607 (fio+0x0000004c56c1) Previous read of size 4 at 0x7ffecb865a3c by thread T4 (mutexes: write M147): #0 clock_thread_fn /home/fio/gettime.c:611:19 (fio+0x0000004c56e2) Location is stack of main thread. Mutex M169 (0x7d700000f6a0) created at: #0 pthread_mutex_init /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1119 (fio+0x00000043b695) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:694:3 (fio+0x0000004c4c12) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) Mutex M147 (0x7d700000f178) created at: #0 pthread_mutex_init /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1119 (fio+0x00000043b695) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:694:3 (fio+0x0000004c4c12) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) Thread T15 (tid=23796, running) created by main thread at: #0 pthread_create /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (fio+0x00000042c9a6) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:697:7 (fio+0x0000004c4c38) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) Thread T4 (tid=23785, finished) created by main thread at: #0 pthread_create /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (fio+0x00000042c9a6) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:697:7 (fio+0x0000004c4c38) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) SUMMARY: ThreadSanitizer: data race /home/fio/gettime.c:567:13 in atomic32_inc_return Avoid accessing t->seq directly and use __sync_val_compare_and_swap to get at it. This shuts up the sanitizer, makes the test work on Windows and hopefully means the appropriate memory fencing will be in place preventing unwanted compiler or CPU reordering. Fixes: axboe#479 Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
@bcran could you see if https://github.com/sitsofe/fio/tree/cycletest is any good for you? If so I'll open a pull request. |
Note that the branch changes the logic associated with seq wraparound - it could be I've misinterpreted how to cope with that... |
That branch does seem to fix the problem - I saw no failures on over 25 runs. |
Frustratingly the branch doesn't seem to fix the problem, only make it less frequent. Using the following shows the problem is still there (but it can take over 30 iterations before the problem appears): cls; $c=0; $LASTEXITCODE=0; while ($LASTEXITCODE -eq 0) { fio.exe --cpuclock-test; $c+=1 }; write-host $c |
Looks to me like all that's missing is actually a memory syncronization at the end of the while loop. But I think the compare-and-exchange is a better approach, so we should switch to that. Can both of you try my cpuclock-test branch? It's just this single patch on top of master: http://git.kernel.dk/cgit/fio/commit/?h=cpuclock-test&id=a250edd0678ac6aef34bfbd01423a7b87a1d5f8d I've tested it on my laptop, test box, and two development skylake boxes. Don't see any failures, but on the other hand, neither of them had any failures with the existing code... So that may not mean a lot. |
@axboe Kudos to you Jens - your solution is both faster AND more correct than my attempts. I'm above 600 runs and no problems reported. |
Thanks for testing! I'm going to pull this in. |
Work here too. Thanks! |
fio on Windows with a 16 or 32 CPUs frequently fails while running ./fio --cpuclock-test even though "reliable_tsc: yes" is reported. Using clang's thread sanitizer via CC=clang ./configure --extra-cflags="-fsanitize=thread" and running the same on Linux also generates multiple warnings similar to the following on a VM with 16 cores: WARNING: ThreadSanitizer: data race (pid=23780) Atomic write of size 4 at 0x7ffecb865a3c by thread T15 (mutexes: write M169): #0 __tsan_atomic32_fetch_add /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:591 (fio+0x000000471505) axboe#1 atomic32_inc_return /home/fio/gettime.c:567:13 (fio+0x0000004c56c1) axboe#2 clock_thread_fn /home/fio/gettime.c:607 (fio+0x0000004c56c1) Previous read of size 4 at 0x7ffecb865a3c by thread T4 (mutexes: write M147): #0 clock_thread_fn /home/fio/gettime.c:611:19 (fio+0x0000004c56e2) Location is stack of main thread. Mutex M169 (0x7d700000f6a0) created at: #0 pthread_mutex_init /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1119 (fio+0x00000043b695) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:694:3 (fio+0x0000004c4c12) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) Mutex M147 (0x7d700000f178) created at: #0 pthread_mutex_init /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1119 (fio+0x00000043b695) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:694:3 (fio+0x0000004c4c12) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) Thread T15 (tid=23796, running) created by main thread at: #0 pthread_create /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (fio+0x00000042c9a6) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:697:7 (fio+0x0000004c4c38) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) Thread T4 (tid=23785, finished) created by main thread at: #0 pthread_create /home/clang-3.9/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (fio+0x00000042c9a6) axboe#1 fio_monotonic_clocktest /home/fio/gettime.c:697:7 (fio+0x0000004c4c38) axboe#2 parse_cmd_line /home/fio/init.c:2710:15 (fio+0x0000004ce8e5) axboe#3 parse_options /home/fio/init.c:2828:14 (fio+0x0000004cf3da) axboe#4 main /home/fio/fio.c:47:6 (fio+0x00000054b991) SUMMARY: ThreadSanitizer: data race /home/fio/gettime.c:567:13 in atomic32_inc_return Fix the above by doing the following: - Add a configure check for __sync_val_compare_and_swap and add a helper atomic32_cas_return that uses it. - Add comments noting that the atomic32_* functions act as full barriers. - Don't access t->seq directly when protecting a critical region and instead use the atomic32_* helpers to update/read it. The above fixes the sanitizer warnings and makes the test pass on Windows. Fixes: axboe#479 Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Compiling with CC=clang ./configure --extra-cflags='-fsanitize=thread' make and then running ./fio --cpuclock-test generates warnings like WARNING: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (pid=324) #0 pthread_mutex_unlock <null> (fio+0x44ce3e) axboe#1 clock_thread_fn gettime.c:604:2 (fio+0x4d16c6) Location is heap block of size 480 at 0x7b5000000000 allocated by main thread: #0 malloc <null> (fio+0x42ea4b) axboe#1 fio_monotonic_clocktest gettime.c:690:13 (fio+0x4d0b1a) axboe#2 parse_cmd_line init.c:2792:15 (fio+0x4dad0b) axboe#3 parse_options init.c:2920:14 (fio+0x4db7b7) axboe#4 main fio.c:47 (fio+0x4247fa) Mutex M142 (0x7b5000000038) created at: #0 pthread_mutex_init <null> (fio+0x42f6ba) axboe#1 fio_monotonic_clocktest gettime.c:706:3 (fio+0x4d0c03) axboe#2 parse_cmd_line init.c:2792:15 (fio+0x4dad0b) axboe#3 parse_options init.c:2920:14 (fio+0x4db7b7) axboe#4 main fio.c:47 (fio+0x4247fa) SUMMARY: ThreadSanitizer: unlock of an unlocked mutex (or by a wrong thread) (fio+0x44ce3e) in __interceptor_pthread_mutex_unlock valgrind --tool=helgrind ./fio --cpuclock-test shows a similar warning: ==6607== Thread axboe#3 unlocked lock at 0x639A730 currently held by thread axboe#1 ==6607== at 0x4C3233B: mutex_unlock_WRK (hg_intercepts.c:1094) ==6607== by 0x4C35CE7: pthread_mutex_unlock (hg_intercepts.c:1115) ==6607== by 0x41B872: clock_thread_fn (gettime.c:604) ==6607== by 0x4C349E1: mythread_wrapper (hg_intercepts.c:389) ==6607== by 0x59A836C: start_thread (in /usr/lib64/libpthread-2.25.so) ==6607== by 0x5ED4B4E: clone (in /usr/lib64/libc-2.25.so) ==6607== Lock at 0x639A730 was first observed ==6607== at 0x4C35CA3: pthread_mutex_init (hg_intercepts.c:787) ==6607== by 0x41D2DA: fio_monotonic_clocktest (gettime.c:706) ==6607== by 0x4232EC: parse_cmd_line (init.c:2792) ==6607== by 0x424372: parse_options (init.c:2920) ==6607== by 0x40E2EA: main (fio.c:47) ==6607== Address 0x639a730 is 176 bytes inside a block of size 480 alloc'd ==6607== at 0x4C2EF7B: malloc (vg_replace_malloc.c:299) ==6607== by 0x41D227: fio_monotonic_clocktest (gettime.c:690) ==6607== by 0x4232EC: parse_cmd_line (init.c:2792) ==6607== by 0x424372: parse_options (init.c:2920) ==6607== by 0x40E2EA: main (fio.c:47) ==6607== Block was alloc'd by thread axboe#1 This first issue ("unlock of an unlocked mutex (or by a wrong thread) t->started") occurs because fio uses a mutexes to arrange for all the cycle measurement threads to start their timing together but http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_lock.html warns: "If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results". Address this by reworking fio to use a condition plus a condition variable to signal all threads when its safe to proceed. ThreadSanitizer has a second warning too: ================== WARNING: ThreadSanitizer: data race (pid=324) Read of size 4 at 0x7ffffceafdf4 by thread T2 (mutexes: write M143): #0 clock_thread_fn gettime.c:614:10 (fio+0x4d1743) Previous atomic write of size 4 at 0x7ffffceafdf4 by thread T1 (mutexes: write M141): #0 __tsan_atomic32_compare_exchange_val <null> (fio+0x479237) axboe#1 atomic32_compare_and_swap gettime.c:576:9 (fio+0x4d1785) axboe#2 clock_thread_fn gettime.c:619 (fio+0x4d1785) Location is stack of main thread. Mutex M143 (0x7b5000000088) created at: #0 pthread_mutex_init <null> (fio+0x42f6ba) axboe#1 fio_monotonic_clocktest gettime.c:705:3 (fio+0x4d0bf7) axboe#2 parse_cmd_line init.c:2792:15 (fio+0x4dad0b) axboe#3 parse_options init.c:2920:14 (fio+0x4db7b7) axboe#4 main fio.c:47 (fio+0x4247fa) Mutex M141 (0x7b5000000010) created at: #0 pthread_mutex_init <null> (fio+0x42f6ba) axboe#1 fio_monotonic_clocktest gettime.c:705:3 (fio+0x4d0bf7) axboe#2 parse_cmd_line init.c:2792:15 (fio+0x4dad0b) axboe#3 parse_options init.c:2920:14 (fio+0x4db7b7) axboe#4 main fio.c:47 (fio+0x4247fa) Thread T2 (tid=327, running) created by main thread at: #0 pthread_create <null> (fio+0x42f3c6) axboe#1 fio_monotonic_clocktest gettime.c:708:7 (fio+0x4d0c20) axboe#2 parse_cmd_line init.c:2792:15 (fio+0x4dad0b) axboe#3 parse_options init.c:2920:14 (fio+0x4db7b7) axboe#4 main fio.c:47 (fio+0x4247fa) Thread T1 (tid=326, running) created by main thread at: #0 pthread_create <null> (fio+0x42f3c6) axboe#1 fio_monotonic_clocktest gettime.c:708:7 (fio+0x4d0c20) axboe#2 parse_cmd_line init.c:2792:15 (fio+0x4dad0b) axboe#3 parse_options init.c:2920:14 (fio+0x4db7b7) axboe#4 main fio.c:47 (fio+0x4247fa) SUMMARY: ThreadSanitizer: data race gettime.c:614:10 in clock_thread_fn The second issue ("t->seq data race") seems to be because mixing atomic and non-atomic operations on the same address might not be safe (e.g. the compiler may be allowed to make dangerous optimisations). Fix this waring by using a __sync_fetch_and_add() to do the read and remove the no longer needed __sync_synchronize(). Signed-off-by: Sitsofe Wheeler <sitsofe@yahoo.com>
On my HP Z820 workstation, FIO 3.0 running on Windows fails the cpuclock-test, reporting:
The text was updated successfully, but these errors were encountered: