Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

spinlock for GC #1153

Closed
wants to merge 1 commit into from
Closed

spinlock for GC #1153

wants to merge 1 commit into from

Conversation

MartinNowak
Copy link
Member

  • less overhead than pthread_mutex
  • uses test and test-and-set algorithm with configurable backoff

- less overhead than pthread_mutex
- uses test and test-and-set algorithm with configurable backoff
@MartinNowak
Copy link
Member Author

master

R tree1            0.843 s,    22 MB,   68 GC  416 ms, Pauses  298 ms <    7 ms
R conmsg           0.866 s,     5 MB,  191 GC   47 ms, Pauses    7 ms <    0 ms
R huge_single      0.028 s,  1501 MB,    3 GC    1 ms, Pauses    0 ms <    0 ms
R tree2            1.223 s,     1 MB,  216 GC   90 ms, Pauses    3 ms <    0 ms
R concpu           0.112 s,     5 MB,   13 GC    6 ms, Pauses    6 ms <    4 ms
R testgc3          2.243 s,   210 MB,   15 GC  564 ms, Pauses  419 ms <   99 ms
R conalloc         0.113 s,     5 MB,   14 GC    2 ms, Pauses    2 ms <    0 ms
R conappend        0.031 s,     5 MB,    4 GC    0 ms, Pauses    0 ms <    0 ms
R words            1.249 s,   341 MB,    9 GC   31 ms, Pauses   31 ms <   16 ms
R rand_large       0.649 s,    92 MB, 3820 GC  270 ms, Pauses  112 ms <    0 ms
R dlist            2.109 s,    22 MB,   53 GC  303 ms, Pauses  186 ms <    9 ms
R rand_small       0.741 s,    12 MB, 2032 GC  394 ms, Pauses  209 ms <    0 ms
R slist            2.043 s,    22 MB,   53 GC  259 ms, Pauses  144 ms <    4 m

spinLock

R tree1            0.786 s,    22 MB,   68 GC  407 ms, Pauses  296 ms <    7 ms
R conmsg           0.749 s,    12 MB,   33 GC   82 ms, Pauses   32 ms <    1 ms
R huge_single      0.028 s,  1501 MB,    3 GC    1 ms, Pauses    0 ms <    0 ms
R tree2            1.144 s,     1 MB,  216 GC   82 ms, Pauses    3 ms <    0 ms
R concpu           0.112 s,     5 MB,   14 GC    4 ms, Pauses    4 ms <    1 ms
R testgc3          2.219 s,   210 MB,   15 GC  572 ms, Pauses  422 ms <   99 ms
R conalloc         0.111 s,     5 MB,   12 GC    2 ms, Pauses    1 ms <    1 ms
R conappend        0.034 s,     5 MB,    5 GC    0 ms, Pauses    0 ms <    0 ms
R words            1.271 s,   341 MB,    9 GC   48 ms, Pauses   47 ms <   25 ms
R rand_large       0.676 s,    92 MB, 3820 GC  301 ms, Pauses  112 ms <    0 ms
R dlist            2.040 s,    22 MB,   53 GC  304 ms, Pauses  181 ms <    9 ms
R rand_small       0.722 s,    12 MB, 2032 GC  388 ms, Pauses  210 ms <    0 ms
R slist            1.985 s,    22 MB,   53 GC  263 ms, Pauses  142 ms <    4 ms

chart

@rainers
Copy link
Member

rainers commented Feb 3, 2015

I was tempted to do something similar when seeing the API profiling (#1147). It seems a lot of people tell you not to try to roll your own locking methods, though.

static if (X86)
{
enum pauseThresh = 16;
void pause() { asm @trusted nothrow { rep; nop; } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most prefer _mm_pause() here, i.e. pause as instruction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I (mis?)read somewhere that pause would require less power than rep nop, but if they have the same machine code, that's obviously nonsense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't knew either. First tried to use pause (Issue 14120) then found a REP_NOP macro somewhere, which the disassembler showed me as pause.

It's a clever encoding, in that it preserves the semantics on older hardware.
Same as the HLE prefixes.
http://www.felixcloutier.com/x86/XACQUIRE:XRELEASE.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rep; nop is supported by older x86 chips. Which is a bonus.

@MartinNowak
Copy link
Member Author

It seems a lot of people tell you not to try to roll your own locking methods, though.

Because it has a lot of gotchas, the dead-lock of the testers being one of them.
Will close this temporarily.

@MartinNowak MartinNowak closed this Feb 3, 2015
@MartinNowak
Copy link
Member Author

I thought we wouldn't need a recursive lock, but maybe we do.

@MartinNowak
Copy link
Member Author

It's the newly added runFinalizer tests in rt.lifetime. They try to produce a FinalizeError without an InvalidMemoryError. That revealed, that I need to set gcx.running during runFinalizers. Now the irony is, that even throwing a preallocated exception allocates a TraceInfo class which then fails and triggers an InvalidMemoryError. So it's not possible to see a FinalizeError unless you override the traceHandler.

@MartinNowak
Copy link
Member Author

Still fighting with the FinalizeError issues.
MartinNowak@f4b989a

I need to explicitly catch Error and rethrow them whenever finalizers are run so that I can perform at least some basic cleanup. That still might leave the GC in an invalid state.
That whole stuff just used to work accidentally.

@MartinNowak
Copy link
Member Author

How about this instead, we just kill the process on FinalizeError and print the stack trace?
I don't really want to make the GC Exception/Error safe as that costs performance and adds complexity.

@MartinNowak
Copy link
Member Author

We hit a similar problem with Threads and nothrow functions before.
Issue 7018 – thrown Error from different thread should lead to program abort

@MartinNowak MartinNowak modified the milestones: 2.068, 2.067 Feb 3, 2015
@rainers
Copy link
Member

rainers commented Feb 3, 2015

even throwing a preallocated exception allocates a TraceInfo class

Ouch. How did we miss that so far? So much for the @nogc attributes of onOutOfMemoryError and onInvalidMemoryOperationError. Could the default trace handler use C-malloced memory instead?
BTW: printing a stack trace needs the GC again, e.g. in toString. I'm not sure how this can work if the Error was raised from within the GC itself. There is no proper cleanup due to nothrow attributes anyway.

@MartinNowak
Copy link
Member Author

Exactly, it's a mess.

Could the default trace handler use C-malloced memory instead?

Yes, but only because finalization is done after thread_resumeAll.

@rainers
Copy link
Member

rainers commented Feb 3, 2015

How about this instead, we just kill the process on FinalizeError and print the stack trace?

Sounds reasonable. It might need a lot of changes to the stack tracing though, as it relies on the GC quite a bit.

@MartinNowak
Copy link
Member Author

We should probably delay the whole story for 2.068, because it seems pretty risky. It's only a tiny gain and I'm still working on thread local caches which heavily reduces the lock contention, making the gain even smaller.

@rainers
Copy link
Member

rainers commented Feb 3, 2015

Yeah, delaying this is probably better. Wouldn't most problems disappear if we'd worked towards moving finalization out of the GC lock and allowing allocation in destructors?

@MartinNowak
Copy link
Member Author

Wouldn't most problems disappear if we'd worked towards moving finalization out of the GC lock and allowing allocation in destructors?

That's definitely something I want to do. Not sure if we can move it out of the lock though.
I also thought about parallelizing finalization, but it's probably not worth the trouble.

@MartinNowak MartinNowak added the GC garbage collector label Mar 7, 2015
@MartinNowak MartinNowak modified the milestones: 2.069, 2.068 Sep 9, 2015
@MartinNowak
Copy link
Member Author

Wouldn't most problems disappear if we'd worked towards moving finalization out of the GC lock and allowing allocation in destructors?

Yes, but it required quite some tricks to allow concurrent access to the freebits metadata.
Maybe it could be done by splitting the sweep phase into a finalize and free phase and do the latter with the lock hold during recover. In any case quite some work.

@MartinNowak MartinNowak mentioned this pull request Dec 1, 2015
@MartinNowak
Copy link
Member Author

Reopened #1447.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
GC garbage collector
Projects
None yet
3 participants