Use acq/rel semantics to pass flags/pointers in getrf_parallel.#2469
Use acq/rel semantics to pass flags/pointers in getrf_parallel.#2469martin-frbg merged 1 commit intoOpenMathLib:developfrom
Conversation
|
I must admit I am a bit confused about the use of barriers vs. locks on ARMV8 - could you comment on #2363 where a memory barrier was seen to be insufficient due to the weak memory on that platform ? |
|
@martin-frbg Is there an official list of the compilers/versions that are supported? Given that manylinux1 is there, while I think it's clearer, this pre-dates support of the c11 atomics. While I think the acq/rel functions make it clearer, if that is requirement I'll probably re-work this to place the existing barrier types in the code in the correct places. |
|
No concise list as such, just a few warnings in the readme for specific platforms. Unfortunately requiring specific minimum versions causes problems for non-priviledged users on sites that use (possibly older) "enterprise" distributions. So if it would be possible to enclose this in appropriate ifdefs (for STDC_VERSION >= 201112L ) that would be preferable. |
62b6787 to
a344c0e
Compare
|
Looks like it's hitting a clang bug now. (I can make it compile by adding an explicit cast but need to convince myself that this is actually still the same thing) |
|
I don't believe declaring these types as _Atomic is needed. _Atomic means that any operation on them is implicitly made atomic by the compiler (E.g. foo++ means atomically read the current value of foo and increment it by one). If this works with just volatile in old compilers, we're not asking for those kinds of guarantees and the _atomic* builtins don't require this either, "The ‘__atomic’ builtins can be used with any integral scalar or pointer type that is 1, 2, 4, or 8 bytes in length. " Removing the ifs declaring the types with _Atomic makes clang happy. |
|
Alright thanks, so let's nuke the atomics then. BTW something in your force-push seems to have added a lot of unrelated changes that got committed to develop in the meantime. I assume git will manage to sort this out during a merge, but anybody trying to cherry-pick by downloading the current patch file would get a surprise. Have not seen this before, would you happen to know how to rectify it ? |
|
Yea, this was my doing as I rebased on top of the develop branch. Let me see if I can undo it. Seems like the #ifdef .. _Atomic should go universally? |
|
Looks like the arm32 gcc build hung the utest fork test? |
The current implementation has locks, but the locks each only have a critical section of one variable so atomic reads/writes with barriers can be used to achieve the same behavior. Like the previous patch, pthread_mutex_lock isn't fair, so in a tight loop the previous thread that has the lock can keep it starving another thread, even if that thread is about to write the data that will stop the current thread from spinning. On a 64c Arm system this improves performance by 20x on sgesv.goto.
|
Looks like it's ready. Thanks a lot for your work on this. |
|
Thanks @martin-frbg. Please feel free to ping me if you see other similar issues, i'm happy to help. |
The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.
Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.
On a 64c Arm system this reduces variability by 20x on sgesv.goto.