Refactor GC #1081

MartinNowak · 2015-01-05T15:31:42Z

simplify heap grow code

rainers · 2015-01-05T19:10:09Z

It's a lot more readable this way, but obviously there's still a bug somewhere. One functional change is that the old version still did a collection if the GC is disabled but the OS does not return more memory.

Regarding style: please don't put if(cond) statement; in a single line, that's not used elsewhere and doesn't help readability.

MartinNowak · 2015-01-05T20:16:04Z

old version still did a collection if the GC is disabled but the OS does not return more memory.

Yes, it did so, but I thought that was rather a bug.
Doesn't look like, will put it back in.

yebblies · 2015-01-06T03:45:11Z

old version still did a collection if the GC is disabled but the OS does not return more memory.

I sometimes use GC.disable to prevent crashes when using libraries that allocate in destructors, so it really does look like a bug.

MartinNowak · 2015-01-06T03:59:47Z

And it doesn't seem to be the reason for the test failure. I'd also say disable should be disable.

rainers · 2015-01-06T07:31:44Z

I sometimes use GC.disable to prevent crashes when using libraries that allocate in destructors, so it really does look like a bug.

There is also the use case "I want my program fast for this section of code but I don't want it to fail arbitrarily" where the current behaviour makes sense. Maybe it should be some multi-level option?
BTW: a collection is run at the very end of the program even if the GC is disabled.

yebblies · 2015-01-06T11:38:19Z

BTW: a collection is run at the very end of the program even if the GC is disabled.

Not if I exit(0).

MartinNowak · 2015-01-06T19:15:13Z

OK, found the bug.
Who would have thought that allocPage(bin) might fail even though fullcollect() says that it has freed some pages. That might happen when all the freed pages are in pools for large objects, and therefor aren't resused for bin sized allocations.
Now it uses the same logic as bigAlloc.
https://github.com/D-Programming-Language/druntime/pull/1081/files?diff=unified#diff-6f1ab0423fff9dcd084ecf9a677dc426R2089

MartinNowak · 2015-01-06T19:17:08Z

This pull contains one semantic change, namely disable really means disable and we never collect.
The previous behavior was most likely a bug, and if it's wanted we should add a new allocTillYouBounce switch.

rainers · 2015-01-07T07:19:27Z

The previous behavior was most likely a bug, and if it's wanted we should add a new allocTillYouBounce switch.

I think we should be more conservative here. Hitting the address space limit is not so uncommon for 32-bit processes, especially if you don't have the /LARGEADDRESSAWARE bit specified.
I think the old behavior should be the default, with a switch (gc-opt?) to enable the hard limit.

MartinNowak · 2015-01-07T20:54:10Z

I think we should be more conservative here.

Thanks for being so persistent about this, a simple look on the docs would have solved the question.
http://dlang.org/library/core/memory/gc.disable.html
Now fixed, it's also the reason for the tricky logic.

rainers · 2015-01-08T05:55:38Z

I wasn't aware it's in the docs. LGTM, though I'll probably hate it for the conflicts it will cause for other changes to the GC. ;-)

rainers · 2015-01-08T05:55:55Z

Auto-merge toggled on

Refactor GC

MartinNowak · 2015-01-08T08:53:23Z

LGTM, though I'll probably hate it for the conflicts it will cause for other changes to the GC.

True, but the bad/fragile code is one of the biggest hurdles to actually improve the GC.
I want to improve the growth strategy, hence this change. This is the biggest issue currently.
This graph is from running Higgs unittests, black is heap size, red is used mem and the x-axis is number of collections. Couldn't be worse than that.

BTW, I already tried to use SIMD in the marking code with pretty bad results. Either the benchmark was completely mem-bandwidth limited (SIMD just adds more instructions) or it was CPU limited (slist) and suffered from the increased complexity. It should be possible though to turn almost the whole mark function into SIMD code, thereby saving somewhat on instruction count. That will only return a marginal interest.
I'm not too optimistic for parallel marking either as that doesn't help with the memory-bandwidth. It can only help a little with interleaving execution and memory latency.

Ultimately we have to figure out how to do incremental collections, it really hurts having to mark 500MB to collect 10MB.
So I actually looked at write barriers to build a generational GC. There are quite some caveats but it's doable. Quite a lot of people would hate the 1-5% slowdown for write barriers though, not sure how to deal with this. The write barrier itself could be a function in object.d, that might be enabled with some version (UseNewGC), but then we'd need to ship different phobos libs???

I found an extremely interesting paper (Connectivity-Based Garbage Collection) that achieves incremental collection by using type information rather than age of allocations and it doesn't require barriers or compacting.
The main idea here is that to collect data of type Foo you only need to mark anything that might actually point to Foo. In that sense our NO_SCAN area is already a bipartition.

MartinNowak · 2015-01-09T06:49:12Z

The new growth strategy really looks promising, same graph as above, much less time spend collecting.
https://github.com/MartinNowak/druntime/tree/gcGrowth
It still needs a little polishing though.

Orvid · 2015-01-13T08:05:25Z

I agree, this definitely is a pain to rebase upon, especially with the change to the location of setBits, as I get to do the rebase and attempt to keep it with the design it had at the time... (refering to the struct destructors PR)

MartinNowak · 2015-01-13T21:27:33Z

I agree, this definitely is a pain to rebase upon, especially with the change to the location of setBits, as I get to do the rebase and attempt to keep it with the design it had at the time... (refering to the struct destructors PR)

Sorry for that, but there are many more changes to come, because the code base is crap.
Without refactoring it's not possible to do architectural changes.

deadalnix · 2015-02-27T23:10:17Z

@MartinNowak can you post how you generate these graph ? I'd be interested to generate them for SDC if that is possible.

MartinNowak · 2015-02-28T00:01:46Z

I patched the GC using this, then fed the output into dub run plotd -- -d lx,ly,ly --margin-bounds 50,1000,50,1000. The output will be somewhere in ~/.dub/packages/plotd-0.6.0/plotcli.png.

diff --git a/src/gc/gc.d b/src/gc/gc.d
index 4ce4ccf..50768d2 100644
--- a/src/gc/gc.d
+++ b/src/gc/gc.d
@@ -2401,6 +2401,8 @@ struct Gcx
     {
         MonoTime start, stop, begin;

+        printStats();
+
         if (GC.config.profile)
         {
             begin = start = currTime;
@@ -2454,16 +2456,34 @@ struct Gcx
         {
             stop = currTime;
             recoverTime += (stop - start);
-            ++numCollections;
         }

         running = false; // only clear on success
+        printStats();
+        ++numCollections;

         updateCollectThresholds();

         return freedLargePages + freedSmallPages;
     }

+    private void printStats() nothrow
+    {
+        size_t heap, free;
+        foreach (p; pooltable[0 .. npools])
+        {
+            heap += p.npages * PAGESIZE;
+            free += p.freepages * PAGESIZE;
+        }
+
+        foreach (n; 0 .. B_PAGE)
+            for (auto el = bucket[n]; el; el = el.next)
+                free += binsize[n];
+
+        assert(free <= heap);
+        printf("%zu,%zu,%zu\n", numCollections, heap, heap - free);
+    }
+
     /**
      * Returns true if the addr lies within a marked block.
      *

MartinNowak force-pushed the refactorGC branch 2 times, most recently from 6e02921 to b4f89c2 Compare January 5, 2015 21:22

MartinNowak force-pushed the refactorGC branch 2 times, most recently from 95d38f9 to 7fa750e Compare January 6, 2015 04:44

MartinNowak force-pushed the refactorGC branch from 7fa750e to 6ed1ea9 Compare January 6, 2015 19:00

MartinNowak added 4 commits January 7, 2015 21:26

refactoring - remove malloc state machine

b548dd1

refactoring - remove bigAlloc state machine

7b1fd14

refactoring - move small alloc into Gcx

70e46a8

refactoring - set bits in Gcx.alloc methods

58bf850

MartinNowak force-pushed the refactorGC branch from 6ed1ea9 to 58bf850 Compare January 7, 2015 20:51

rainers added a commit that referenced this pull request Jan 8, 2015

Merge pull request #1081 from MartinNowak/refactorGC

7acbeee

Refactor GC

rainers merged commit 7acbeee into dlang:master Jan 8, 2015

MartinNowak deleted the refactorGC branch January 8, 2015 08:18

MartinNowak mentioned this pull request Jan 8, 2015

allocating a pool must be done in the if (disabled) block #1090

Merged

MartinNowak mentioned this pull request Jan 13, 2015

new heap growth strategy #1098

Merged

MartinNowak added the GC garbage collector label Mar 7, 2015

Uh oh!

Refactor GC #1081

Refactor GC #1081

Uh oh!

Conversation

MartinNowak commented Jan 5, 2015

Uh oh!

rainers commented Jan 5, 2015

Uh oh!

MartinNowak commented Jan 5, 2015

Uh oh!

yebblies commented Jan 6, 2015

Uh oh!

MartinNowak commented Jan 6, 2015

Uh oh!

rainers commented Jan 6, 2015

Uh oh!

yebblies commented Jan 6, 2015

Uh oh!

MartinNowak commented Jan 6, 2015

Uh oh!

MartinNowak commented Jan 6, 2015

Uh oh!

rainers commented Jan 7, 2015

Uh oh!

MartinNowak commented Jan 7, 2015

Uh oh!

rainers commented Jan 8, 2015

Uh oh!

rainers commented Jan 8, 2015

Uh oh!

MartinNowak commented Jan 8, 2015

Uh oh!

MartinNowak commented Jan 9, 2015

Uh oh!

Orvid commented Jan 13, 2015

Uh oh!

MartinNowak commented Jan 13, 2015

Uh oh!

deadalnix commented Feb 27, 2015

Uh oh!

MartinNowak commented Feb 28, 2015

Uh oh!

Uh oh!