You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed some performance issues with CGC that I need to write down, so we can investigate+fix later.
CGC scheduling. The current scheduling strategy for CGC is like this: (1) snapshot at fork, (2) do child work, (3) check to see if the CGC was stolen, and if not, then (4) push the continuation and switch to the CGC.
But, this is not great, because it unnecessarily delays the CGC. If we've already committed to the snapshot, it's best to make progress on the snapshot as soon as possible.
A better strategy would be to snapshot, immediately push the continuation, and then switch to the CGC.
CGC chaining. Handling CGC at joins is tricky. To maintain invariants on pointer directions, it's possible to have an uncollected heap in a CGC-chain. (This reason is subtle and difficult to quickly summarize. See case 2 of Fig 4.6 in my thesis for more info.)
However, this might be Very Bad™️, because the uncollected heap might be large (and in need of collection).
I think I've managed to trigger this with a stress test. In parallel-ml-bench, branch em, benchmark tangle sometimes experiences an explosion in memory size with the following command: bin/tangle.mpl-em.bin @mpl procs 72 -- -p-ent 0.99 -warmup 5 -repeat 20 -elem-size 1000000 -num-elems 200 -num-tangles 100. I have not seen the explosion when I use the restriction @mpl max-cc-depth 1 --, which prevents funky CGC joins. A little bit of investigation with @mpl log-level cc-collection:info -- seems to confirm: without the restriction, there are (very ineffective) CGCs happening at depth 3.
One possible fix would be to check the size of the uncollected heap, and trigger a collection on it (if necessary) when it is pushed into the chain at a join. The details might be hairy... need to check snapshot invariants.
The text was updated successfully, but these errors were encountered:
shwestrick
changed the title
CGC improvements
CGC performance improvements
Nov 5, 2022
I've noticed some performance issues with CGC that I need to write down, so we can investigate+fix later.
CGC scheduling. The current scheduling strategy for CGC is like this: (1) snapshot at fork, (2) do child work, (3) check to see if the CGC was stolen, and if not, then (4) push the continuation and switch to the CGC.
But, this is not great, because it unnecessarily delays the CGC. If we've already committed to the snapshot, it's best to make progress on the snapshot as soon as possible.
A better strategy would be to snapshot, immediately push the continuation, and then switch to the CGC.
CGC chaining. Handling CGC at joins is tricky. To maintain invariants on pointer directions, it's possible to have an uncollected heap in a CGC-chain. (This reason is subtle and difficult to quickly summarize. See case 2 of Fig 4.6 in my thesis for more info.)
However, this might be Very Bad™️, because the uncollected heap might be large (and in need of collection).
I think I've managed to trigger this with a stress test. In
parallel-ml-bench
, branchem
, benchmarktangle
sometimes experiences an explosion in memory size with the following command:bin/tangle.mpl-em.bin @mpl procs 72 -- -p-ent 0.99 -warmup 5 -repeat 20 -elem-size 1000000 -num-elems 200 -num-tangles 100
. I have not seen the explosion when I use the restriction@mpl max-cc-depth 1 --
, which prevents funky CGC joins. A little bit of investigation with@mpl log-level cc-collection:info --
seems to confirm: without the restriction, there are (very ineffective) CGCs happening at depth 3.One possible fix would be to check the size of the uncollected heap, and trigger a collection on it (if necessary) when it is pushed into the chain at a join. The details might be hairy... need to check snapshot invariants.
The text was updated successfully, but these errors were encountered: