Conversation
MartinNowak
commented
Feb 3, 2015
- less overhead than pthread_mutex
- uses test and test-and-set algorithm with configurable backoff
- less overhead than pthread_mutex - uses test and test-and-set algorithm with configurable backoff
master
spinLock
|
I was tempted to do something similar when seeing the API profiling (#1147). It seems a lot of people tell you not to try to roll your own locking methods, though. |
static if (X86) | ||
{ | ||
enum pauseThresh = 16; | ||
void pause() { asm @trusted nothrow { rep; nop; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most prefer _mm_pause()
here, i.e. pause
as instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the same and DMD doesn't know pause.
http://stackoverflow.com/questions/7086220/what-does-rep-nop-mean-in-x86-assembly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I (mis?)read somewhere that pause
would require less power than rep nop
, but if they have the same machine code, that's obviously nonsense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't knew either. First tried to use pause (Issue 14120) then found a REP_NOP macro somewhere, which the disassembler showed me as pause.
It's a clever encoding, in that it preserves the semantics on older hardware.
Same as the HLE prefixes.
http://www.felixcloutier.com/x86/XACQUIRE:XRELEASE.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rep; nop
is supported by older x86 chips. Which is a bonus.
Because it has a lot of gotchas, the dead-lock of the testers being one of them. |
I thought we wouldn't need a recursive lock, but maybe we do. |
It's the newly added runFinalizer tests in rt.lifetime. They try to produce a FinalizeError without an InvalidMemoryError. That revealed, that I need to set gcx.running during runFinalizers. Now the irony is, that even throwing a preallocated exception allocates a TraceInfo class which then fails and triggers an InvalidMemoryError. So it's not possible to see a FinalizeError unless you override the traceHandler. |
Still fighting with the FinalizeError issues. I need to explicitly catch Error and rethrow them whenever finalizers are run so that I can perform at least some basic cleanup. That still might leave the GC in an invalid state. |
How about this instead, we just kill the process on FinalizeError and print the stack trace? |
We hit a similar problem with Threads and nothrow functions before. |
Ouch. How did we miss that so far? So much for the @nogc attributes of onOutOfMemoryError and onInvalidMemoryOperationError. Could the default trace handler use C-malloced memory instead? |
Exactly, it's a mess.
Yes, but only because finalization is done after thread_resumeAll. |
Sounds reasonable. It might need a lot of changes to the stack tracing though, as it relies on the GC quite a bit. |
We should probably delay the whole story for 2.068, because it seems pretty risky. It's only a tiny gain and I'm still working on thread local caches which heavily reduces the lock contention, making the gain even smaller. |
Yeah, delaying this is probably better. Wouldn't most problems disappear if we'd worked towards moving finalization out of the GC lock and allowing allocation in destructors? |
That's definitely something I want to do. Not sure if we can move it out of the lock though. |
Yes, but it required quite some tricks to allow concurrent access to the freebits metadata. |
Reopened #1447. |