[WIP] Sociomantic CDGC port #985

mihails-strasuns-sociomantic · 2014-10-06T16:47:30Z

Here is initial port result available for early experiments. It can be compiled with make -f posix.mak GC_TYPE=concurrent and passes the test suite with only shared library tests disabled (ef20b7a).

There are still many issues to be aware of:

Documentation is largely missing. Working on it, reading @leandro-lucarella-sociomantic old posts (http://www.llucax.com.ar/blog/blog/tag/understanding%20the%20current%20gc) may help in the meanwhile
Code style differs from Phobos standards. To be fixed soon.
Shared library support is completely missing. Proxy infrastructure similar to one in existing gc needs to be added and I don't know if actual implementation will work in such environments or more changes will be needed.
Deadlock issue (http://www.dsource.org/projects/tango/ticket/2087) still remains. It is not critical to our code because it almost never uses threads so no big effort was put into it but this can be huge problem for any other project.

In general this is very far from something that can be merged upstream straight away and replace default GC on linux. It can be interesting for other projects with similar architecture and requirements and probably helpful for anyone else working on better D GC.

And contributions are welcome via pull requests towards this PR base branch (https://github.com/mihails-strasuns-sociomantic/druntime-1/tree/sociomantic-cdgc-wip) or as e-mail patches (public@dicebot.lv)

MartinNowak · 2014-10-06T21:08:06Z

Now that we have druntime configuration options (#817) I would suggest to add this as an optional GC that can be selected at program start.

mihails-strasuns-sociomantic · 2014-10-06T23:06:13Z

Have totally missed that PR. I have actually considered implementing similar runtime approach (as CDGC naturally uses env vars for all configuration) but it did seem wrong to include code for all garbage collectors in distribution.

I'll change that once code in general will gets at least somewhat close to possibility of being merged.

mihails-strasuns-sociomantic · 2014-10-06T23:07:29Z

btw, @MartinNowak do you have any hints about shared library support? What was needed to be changed in original GC to implement it other than proxy.d thing?

leandro-lucarella-sociomantic · 2014-10-07T09:35:19Z

I recommend reading the blog in chronological order :)
http://www.llucax.com.ar/blog/blog/tag/understanding%20the%20current%20gc?sort=+date

BTW, that documents the Tango "basic" GC, which was the base GC used by druntime. The fork was done years ago, so there are some difference, but I think the basics are the same in both CDGC and druntime.

leandro-lucarella-sociomantic · 2014-10-07T09:37:50Z

Now that we have druntime configuration options (#817) I would suggest to add this as an optional GC that can be selected at program start.

Oh, cool! BTW, I think at some point both GC's should be merged (this is what I wanted to do when I started my attempt to port the GC but it was unrealistic in terms of time). CDGC can already be configured to be concurrent or not at runtime via ENV vars.

leandro-lucarella-sociomantic · 2014-10-07T09:40:40Z

I wonder if the code in that PR has any relation with the ENV var parser I wrote for CDGC, they look, at least in concept, mostly the same. I guess the opts module in CDGC can go now...

MartinNowak · 2014-10-07T19:47:07Z

btw, @MartinNowak do you have any hints about shared library support? What was needed to be changed in original GC to implement it other than proxy.d thing?

GC proxy is complete BS, it is a crappy hack to avoid the most obvious ODR violations.
The only thing that I did on the GC for shared libraries was adding the runFinalizers method, which takes a segment and frees every object that depends on a finalizer in that segment.

jacob-carlborg · 2014-10-08T06:31:54Z

src/gc/concurrent/dynarray.d

+ * Authors:   Leandro Lucarella <llucax@gmail.com>
+ */
+
+module gc.concurrent.dynarray;


There already is a module for arrays that doesn't use the GC, rt.util.container.array. They don't seem to have the exact same functionality but perhaps they can be merged into one implementation?

mihails-strasuns-sociomantic · 2014-10-08T12:37:06Z

@jacob-carlborg thanks for you comments but.. I have already mentioned that sources are pending full rewrite to adjust to Phobos code style (currently using original one), I am afraid it wasn't most useful time investment :P

@MartinNowak thanks!

jacob-carlborg · 2014-10-08T17:06:10Z

@mihails-strasuns-sociomantic right, I missed that. But why did you create a pull request in the first place?

leandro-lucarella-sociomantic · 2014-10-08T17:09:59Z

On Wed, Oct 08, 2014 at 10:06:12AM -0700, jacob-carlborg wrote:

@mihails-strasuns-sociomantic right, I missed that. But why did you
create a pull request in the first place?

I can think of at least 2 reasons:

So people can start testing it. Style is very important to get it
merged but have 0 effect on testing.
So people can start reviewing the most important aspects of the PR,
for example what to do with shared libraries and the (in)famous
proxy. Doing this with the code at hand is much easier :-)

mihails-strasuns-sociomantic · 2014-10-08T17:14:45Z

But why did you create a pull request in the first place?

Because I have been asked to do so :) That way anyone can experiment with it or borrow some implementation ideas while it gets slowly tweaked to be merge-ready

Moves default gc implementation to `gc.basic` and gcstub to `gc.stub`. Build system uses `gc.basic` by default, run `make -f posix.mak GC_TYPE=stub` to pick stub one

Does not compile, files are copied from tango run-time as-is

In D2 runtime stack bottom data is provided by thread runtime and thus its needs to be iniitalized before any relevant functions get called.

Forked process must avoid any of deinitialization - it triggers GC cleanup stage in D2 runtime (it was added as part of shared library suppport). There is a special C standard library function _Exit for that.

Includes all straightforward adjustments for different language constructs, runtime difference and module names. Can be built using `make -f posix.mak GC_TYPE=concurrent`

D2 runtime introduces new attributes GC must support

Adds malloc overload that returns actual allocation size (which equals capacity at the point of allocation) as an out parameter. gc_qalloc updated to use that overload to fill the BlkInfo.size so that druntime can write necessary metadata to the correct end of block.

Pool update did not trigger cached size update if the pool was cached

Probably just `shared` actually but making CDGC shared-correct is more advanced task

This is better to be fixed in druntime by moving capacity to the end of block for all sizes. However this simple hack similar to one present in default d2 gc is enough to pass tests.

CDGC is currently missing important bits of infrastructure for shared library support. This temporarily disables the relevant test case so that other ones can pass

Originally it was ended for early attempt of precise scanning support but this approach has been reconsidered and if precise scanning is to be added it will be done in a different way.

leandro-lucarella-sociomantic · 2014-10-10T15:46:03Z

Just for reference: https://issues.dlang.org/show_bug.cgi?id=10184

It worked in D1 because of more permissive implicit conversions

MartinNowak · 2014-10-18T17:56:40Z

mak/MANIFEST

+		src\gc\basic\bits.d \
+		src\gc\basic\stats.d \
+		src\gc\basic\proxy.d
+endif


The manifest should list all druntime files independent of any settings.

MartinNowak · 2014-10-18T18:27:08Z

I tried our GC benchmark suite (fix runbench script by MartinNowak · Pull Request #998 · D-Programming-Language/druntime) and the results were not so good.
Some tests fail or never finish execution, probably due to memory corruptions.
Most of the ones that do pass take significantly longer with the concurrent GC than with the basic GC.

While the times might be caused by some bugs and can likely be improved this GC design is inherently slower than a pausing GC. Additionally to the fork overhead it has to copy memory pages whenever the concurrent program modifies them. So clearly this might be an interesting alternative GC for low latency applications but it achieves this at the cost of a higher CPU/memory bandwidth usage.
Other schemes like performing GC in idle times (vibe.d) or using a region allocator per server request might be equally interesting to achieve low latency.

leandro-lucarella-sociomantic · 2014-10-20T10:41:28Z

On Sat, Oct 18, 2014 at 11:27:10AM -0700, Martin Nowak wrote:

I tried our GC benchmark suite (fix runbench script by MartinNowak · Pull Request #998 · D-Programming-Language/druntime) and the results were not so good.
Some tests fail or never finish execution, probably due to memory corruptions.
Most of the ones that do pass take significantly longer with the concurrent GC than with the basic GC.

While the times might be caused by some bugs and can likely be improved this GC design is inherently slower than a pausing GC. Additionally to the fork overhead it has to copy memory pages whenever the concurrent programs modifies them. So clearly this might be an interesting alternative GC for low latency applications but it achieves this at the cost of a higher CPU/memory bandwidth usage.

This does not match my testing (years ago, so maybe the basic GC in
druntime improved significantly since then). To my surprise the
concurrent GC did better for a real application (Dil, the only one that
was fairly maintained for D1 back then), and my guess is because the
program could keep working while the GC was working in another core. The
fork overhead is almost zero (unless you work with a very big heap,
basically the fork time is proportional to the page table). The COW
copying of course depends on the type of application, if the application
is writing on every single page very fast (before the mark phase is
done) you'll end up duplicating the whole memory. But I don't think
that's the most common case for regular application (even when there
might be a test case stressing this).

So, even when the current implementation could be buggy and suboptimal,
I really don't think the algorithm is inherently slow, and certainly not
slower than the basic one (in terms of CPU cycles, yes, is less
efficient, it definitely needs to do more work, but that work is
naturally parallelized in the concurrent GC, while in the basic one is
serialized with the applications work).

MartinNowak · 2014-10-21T20:40:35Z

but that work is naturally parallelized in the concurrent GC, while in the basic one is
serialized with the applications work

True that, there is a good chance that many real world apps benefit from that and the low latency.
Will anyone work on fixing the bugs? I'd really like to have this as an optional GC for 2.067.

mihails-strasuns-sociomantic · 2014-10-22T04:56:08Z

I'd really like to have this as an optional GC for 2.067

This is very unlikely. I have switched to other tasks related to D2 porting and was planning to get back to CDGC only when we have at least one internal service completely switched to D2. Until then investing more time into it would have been very impractical.

Of course if anyone else wants to contribute in the meanwhile, there will be no objections :P

quickfur · 2014-11-10T19:54:35Z

Wow, totally missed this one! Looking forward to having this merge in the (not-so-near?) future!

DemiMarie · 2015-11-23T04:57:58Z

Any chance on getting this merged?

mihails-strasuns-sociomantic · 2015-11-23T05:08:39Z

Pretty much zero. I have explained it during last DConf talk but should have written a not here too. There are several issues with this PR:

GC in this PR is based on old GC implementation from Tango runtime, while existing upstream one have received many performance improvements
It isn't widely applicable as it traded worse throughput for better latency (by design) which is not advantageous for non real-time cases
There is still issue with global libc mutex which makes it almost unusable in multi-threaded code

So despite the fact I did this port as proof of concept to ensure it won't become migration blocker, it is now almost certain this port won't be used at all. Depending on performance profiling of our ported applications one of two likely approaches will be taken instead:

either reimplement similar concurrent fork-based behaviour on top of existing upstream GC instead
or throw away CDGC completely, switching real-time applications to fully manual std.allocator management and using stock GC for non-critical apps

This PR mostly remains as a reference of anyone curious for now.

schveiguy · 2015-11-23T13:01:58Z

Per @mihails-strasuns-sociomantic, this won't be updated/merged, closing so the auto tester doesn't bother with it. Please reopen if things change.

jacob-carlborg · 2015-11-23T16:12:07Z

Why not upstream this as an additional GC? Does Java have multiple GC to choose from?

mihails-strasuns-sociomantic · 2015-11-24T04:50:02Z

Because there is no point in upstreaming something we don't use and thus won't maintain, it will quickly die from the bitrot (or waste someone else to maintain it). That would be impolite at best in my opinion.

leandro-lucarella-sociomantic · 2015-11-25T13:04:35Z

It isn't widely applicable as it traded worse throughput for better latency (by design) which is not advantageous for non real-time cases

This is not entirely true, not always. If your application is not using all the cores, then the concurrent GC might speed up your application, because now the mark phase of the collection is using a core that wasn't used before, so your application get some free panellization.

leandro-lucarella-sociomantic · 2015-11-25T13:08:38Z

But I generally agree this PR should stay closed for now.

mihails-strasuns-sociomantic · 2015-11-25T13:19:54Z

Thanks for clarification.

jacob-carlborg reviewed Oct 8, 2014
View reviewed changes

Mihails Strasuns added 12 commits October 9, 2014 14:30

Define dedicated package per gc implementation

f8e3706

Moves default gc implementation to `gc.basic` and gcstub to `gc.stub`. Build system uses `gc.basic` by default, run `make -f posix.mak GC_TYPE=stub` to pick stub one

Initial dump of non-ported CDGC sources

cae2c35

Does not compile, files are copied from tango run-time as-is

Fix thread runtime initialization order

4d041f1

In D2 runtime stack bottom data is provided by thread runtime and thus its needs to be iniitalized before any relevant functions get called.

Replace exit() with _Exit()

6a8049d

Forked process must avoid any of deinitialization - it triggers GC cleanup stage in D2 runtime (it was added as part of shared library suppport). There is a special C standard library function _Exit for that.

D2 port of CDGC sources

7dcf9fb

Includes all straightforward adjustments for different language constructs, runtime difference and module names. Can be built using `make -f posix.mak GC_TYPE=concurrent`

Handle APPENDABLE memory block flag

d36a670

D2 runtime introduces new attributes GC must support

Fix realloc + pool cache interaction

6c1f71f

Pool update did not trigger cached size update if the pool was cached

Global gc variable must be __gshared

de6c1df

Probably just `shared` actually but making CDGC shared-correct is more advanced task

HACK: ignore interior pointers in gc_free

63fa608

This is better to be fixed in druntime by moving capacity to the end of block for all sizes. However this simple hack similar to one present in default d2 gc is enough to pass tests.

HACK: disable shared library tests

2c12fcf

CDGC is currently missing important bits of infrastructure for shared library support. This temporarily disables the relevant test case so that other ones can pass

Remove all traces of PointerMap

dbbcd74

Originally it was ended for early attempt of precise scanning support but this approach has been reconsidered and if precise scanning is to be added it will be done in a different way.

Clean c-stule array declarations

05345b4

MartinNowak modified the milestones: 2.065, 2.067 Oct 10, 2014

Fix 32-bit compilation

1df58a2

It worked in D1 because of more permissive implicit conversions

MartinNowak reviewed Oct 18, 2014
View reviewed changes

MartinNowak removed this from the 2.067 milestone Jan 23, 2015

MartinNowak added the GC garbage collector label Mar 7, 2015

schveiguy closed this Nov 23, 2015

llucax referenced this pull request in FraMecca/CDGC Aug 28, 2018

Create Milestones.md

2214435

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Sociomantic CDGC port #985

[WIP] Sociomantic CDGC port #985

mihails-strasuns-sociomantic commented Oct 6, 2014

MartinNowak commented Oct 6, 2014

mihails-strasuns-sociomantic commented Oct 6, 2014

mihails-strasuns-sociomantic commented Oct 6, 2014

leandro-lucarella-sociomantic commented Oct 7, 2014

leandro-lucarella-sociomantic commented Oct 7, 2014

leandro-lucarella-sociomantic commented Oct 7, 2014

MartinNowak commented Oct 7, 2014

jacob-carlborg Oct 8, 2014

mihails-strasuns-sociomantic commented Oct 8, 2014

jacob-carlborg commented Oct 8, 2014

leandro-lucarella-sociomantic commented Oct 8, 2014

mihails-strasuns-sociomantic commented Oct 8, 2014

leandro-lucarella-sociomantic commented Oct 10, 2014

MartinNowak Oct 18, 2014

MartinNowak commented Oct 18, 2014

leandro-lucarella-sociomantic commented Oct 20, 2014

MartinNowak commented Oct 21, 2014

mihails-strasuns-sociomantic commented Oct 22, 2014

quickfur commented Nov 10, 2014

DemiMarie commented Nov 23, 2015

mihails-strasuns-sociomantic commented Nov 23, 2015

schveiguy commented Nov 23, 2015

jacob-carlborg commented Nov 23, 2015

mihails-strasuns-sociomantic commented Nov 24, 2015

leandro-lucarella-sociomantic commented Nov 25, 2015

leandro-lucarella-sociomantic commented Nov 25, 2015

mihails-strasuns-sociomantic commented Nov 25, 2015

[WIP] Sociomantic CDGC port #985

[WIP] Sociomantic CDGC port #985

Conversation

mihails-strasuns-sociomantic commented Oct 6, 2014

MartinNowak commented Oct 6, 2014

mihails-strasuns-sociomantic commented Oct 6, 2014

mihails-strasuns-sociomantic commented Oct 6, 2014

leandro-lucarella-sociomantic commented Oct 7, 2014

leandro-lucarella-sociomantic commented Oct 7, 2014

leandro-lucarella-sociomantic commented Oct 7, 2014

MartinNowak commented Oct 7, 2014

jacob-carlborg Oct 8, 2014

Choose a reason for hiding this comment

mihails-strasuns-sociomantic commented Oct 8, 2014

jacob-carlborg commented Oct 8, 2014

leandro-lucarella-sociomantic commented Oct 8, 2014

mihails-strasuns-sociomantic commented Oct 8, 2014

leandro-lucarella-sociomantic commented Oct 10, 2014

MartinNowak Oct 18, 2014

Choose a reason for hiding this comment

MartinNowak commented Oct 18, 2014

leandro-lucarella-sociomantic commented Oct 20, 2014

MartinNowak commented Oct 21, 2014

mihails-strasuns-sociomantic commented Oct 22, 2014

quickfur commented Nov 10, 2014

DemiMarie commented Nov 23, 2015

mihails-strasuns-sociomantic commented Nov 23, 2015

schveiguy commented Nov 23, 2015

jacob-carlborg commented Nov 23, 2015

mihails-strasuns-sociomantic commented Nov 24, 2015

leandro-lucarella-sociomantic commented Nov 25, 2015

leandro-lucarella-sociomantic commented Nov 25, 2015

mihails-strasuns-sociomantic commented Nov 25, 2015