Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller.reapAllVats() could reset reapCountdown #8665

Open
warner opened this issue Dec 15, 2023 · 5 comments
Open

controller.reapAllVats() could reset reapCountdown #8665

warner opened this issue Dec 15, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Dec 15, 2023

What is the Problem Being Solved?

Vats are normally GCed (dispatch.bringOutYourDead()) by a per-vat timer named reapCountdown. This is initialized to reapInterval (set to 1000 on mainnet), and decremented for each delivery to the vat. When it hits zero, a BOYD is performed, and the counter is reset.

We recently landed controller.reapAllVats(), which schedules an immediate (well, at the start of the next controller.run() BOYD to all vats. However this doesn't change the "organic" BOYD timing: reapCountdown is not affected, so the vat will still do the usual BOYD when it hits zero.

For purposes of our benchmarks, we'd like to reduce the variation caused by organically-scheduled BOYDs. We can do a reapAllVats() just before starting the benchmark, and again just afterwards (so the object counts are accurate), but it would be less noisy if we could reduce the frequency of BOYDs in the middle. We should consider configuring the benchmark environment to use defaultReapInterval: 'never', however it would also make a certain amount of sense if reapAllVats() were to reset the counters back to reapInterval.

(Now I'm on the fence about whether this is a good idea.. using defaultReapInterval: 'never' seems best, and if we use that, then it doesn't matter whether we reset the counters. But I'll capture the idea for further discussion).

Description of the Design

If implemented, we would:

  • add vatKeeper.resetReapCountdown()
  • change kernel.reapAllVats() to call resetReapCountdown just after enqueuing the BOYD

Security Considerations

none

Scaling Considerations

none

Test Plan

Update the unit test to inspect the kvStore after calling reapAllVats(), to assert that the counters have all been reset.

Upgrade Considerations

none

cc @FUDCo

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Dec 15, 2023
@warner
Copy link
Member Author

warner commented Dec 21, 2023

It might also be useful to change reapAllVats to look at this countdown, and skip reaping the vats which haven't received any deliveries since the last one. We might then change the name to reapAllDirtyVats.

In my ghost-replay testing, I'm doing a cleanup pass before submitting the new operation (so I can tell what DB changes are a result of my injected action, versus GC consequences that were just waiting around for a BOYD), and I've found that I need to make two reapAllVats() calls to clean everything up. This is the usual "one hop per BOYD" issue, where vatA gets reaped before vatB, but vatB releases some things that were coming from vatA, and you need a BOYD on vatA to let the cleanup process terminate. (The worst case would be a linked list that travels through multiple vats: you could only free one link per reapAllVats()). Each reapAllVats() must reload all vats to delivery the BOYD, which requires a transcript replay, and the vat-warehouse LRU cache gets thrashed, now that we have more vats than its maxVats size.

Making reapAllVats() free for untouched vats would make it a lot cheaper to do several of them, both avoiding the replay and not knocking out the still-active vats.

@warner
Copy link
Member Author

warner commented Dec 21, 2023

We might even want this to also trigger a heap snapshot for all vats (which, really, would mean removing the enqueue-BOYD step and just follow the "time to make a heap snapshot" path, since that path also does a BOYD). While testing large GC operations, I'm noticing a significant amount of time being spent replaying a vat which just completed a large BOYD. The unfortunate LRU-thrashing property of reapAllVats, combined with needing to do multiple ones (to handle multiple hops through the object graph), means:

  • we start the first reapAllVats()
    • this triggers a large BOYD in v9, which is duly executed and added to the transcript
    • we BOYD all the other vats, which knocks v9 out of the active set
  • we start the second reapAllVats()
    • when this reaches v9 (to do the second BOYD), we have to page v9 back in
    • so we re-execute the previous large BOYD before we can new-execute the second one

This results in duplicate work, some of which is super-expensive (millions of syscalls, 10 minutes of wallclock).

Of course, this would also be handled elegantly by @mhofman 's suggestion to schedule heap snapshots based on computrons, rather than simply counting deliveries. The large BOYD would trigger a snapshot. #6786

We should also change the snapshot trigger to look at (or remember) the last delivery, and refrain from a new BOYD if that was the last thing we did in that vat.

@FUDCo
Copy link
Contributor

FUDCo commented Dec 22, 2023

It might also be useful to change reapAllVats to look at this countdown, and skip reaping the vats which haven't received any deliveries since the last one. We might then change the name to reapAllDirtyVats.

This makes a lot of sense to me. I don't think we need to rename the operation, we just note that reaping vats that haven't had any activity in them since the last time goes really fast.

@FUDCo
Copy link
Contributor

FUDCo commented Dec 22, 2023

The large BOYD would trigger a snapshot

We'd have this weird situation where BOYD triggers a snapshot and snapshot triggers a BOYD, so it seems like it might mainly be a question of which goes first. Perhaps the alternate route is to reframe reapAllVats to directly initiate a snapshot rather than a BOYD, though it seems weird to initiate snapshots on vats that are currently swapped out because they got swapped out in a post-BOYD state to begin with. The observation is that it's the BOYD of other vats that causes stuff to be freed, in which world swapping a vat in just to immediate swap it out again wouldn't necessarily be so crazy, since if significant GC happens the second snapshot could be smaller.

The other thing that might be worth considering is having the BOYD-them-all logic to somehow be in cahoots with the cache, perhaps by using knowledge of what's currently swapped out vs. what's swapped in to tweak the order of sweeps so they don't interact with the cache so pathologically. I have memories of Alan Karp talking about big numerical calculations at IBM where MRU instead of LRU gave optimal cache benefit.

@mhofman
Copy link
Member

mhofman commented Dec 22, 2023

It might also be useful to change reapAllVats to look at this countdown, and skip reaping the vats which haven't received any deliveries since the last one. We might then change the name to reapAllDirtyVats.

Yup I suggested something similar in #8626 (comment), and on slack but using the vatPos instead. reapCountdown would be a lot cleaner.

this would also be handled elegantly by @mhofman 's suggestion to schedule heap snapshots based on computrons, rather than simply counting deliveries. The large BOYD would trigger a snapshot. #6786

We should also change the snapshot trigger to look at (or remember) the last delivery, and refrain from a new BOYD if that was the last thing we did in that vat.

As I mention in that issue, I think we should only run BOYD and snapshotting together, triggered by the computrons of previous deliveries. As such the schedule would not count the cost of BOYD and snapshotting towards the interval, but I think that's fine. If anything, a costly BOYD should increase the interval at which you perform it (which could be captured as adding the computron cost of the last BYOD to the "next reap meter" instead of decreasing from it)

it seems weird to initiate snapshots on vats that are currently swapped out because they got swapped out in a post-BOYD state to begin with

I don't understand. If BOYD and snapshotting is an atomic operation, how can that happen?

Perhaps the alternate route is to reframe reapAllVats to directly initiate a snapshot rather than a BOYD

That doesn't seem right. Snapshotting is conceptually an internal detail of some types of workers, where BOYD is a concept applying to all vats. If anything snapshots should be an implicit effect of BOYD for those workers, and never explicitly performed by the controller.

@warner warner self-assigned this Jan 9, 2024
warner added a commit that referenced this issue Mar 29, 2024
`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 200.

This caused BOYD to happen once every 200 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`.

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The current dirt level is recorded in
`vNN.reapDirt`.

The kernel will automatically upgrade both the kernel-wide and the
per-vat state upon the first reboot with the new kernel code. The old
`reapCountdown` value is used to initialize the vat's
`reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the
existing schedule. Vats which used `reapInterval = 'never'` (eg comms)
will get a `reapDirtThreshold` of all 'never' values, so they continue
to inhibit BOYD. Otherwise, all vats get a `threshold.gcKrefs` of 20.

We do not track dirt when the corresponding threshold is 'never', to
avoid incrementing the comms dirt counters forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
warner added a commit that referenced this issue Apr 9, 2024
`dispatch.bringOutYourDead()`, aka "reap", triggers garbage collection
inside a vat, and gives it a chance to drop imported c-list vrefs that
are no longer referenced by anything inside the vat.

Previously, each vat has a configurable parameter named
`reapInterval`, which defaults to a kernel-wide
`defaultReapInterval` (but can be set separately for each vat). This
defaults to 1, mainly for unit testing, but real applications set it
to something like 200.

This caused BOYD to happen once every 200 deliveries, plus an extra
BOYD just before we save an XS heap-state snapshot.

This commit switches to a "dirt"-based BOYD scheduler, wherein we
consider the vat to get more and more dirty as it does work, and
eventually it reaches a `reapDirtThreshold` that triggers the
BOYD (which resets the dirt counter).

We continue to track `dirt.deliveries` as before, with the same
defaults. But we add a new `dirt.gcKrefs` counter, which is
incremented by the krefs we submit to the vat in GC deliveries. For
example, calling `dispatch.dropImports([kref1, kref2])` would increase
`dirt.gcKrefs` by two.

The `reapDirtThreshold.gcKrefs` limit defaults to 20. For normal use
patterns, this will trigger a BOYD after ten krefs have been dropped
and retired. We choose this value to allow the #8928 slow vat
termination process to trigger BOYD frequently enough to keep the BOYD
cranks small: since these will be happening constantly (in the
"background"), we don't want them to take more than 500ms or so. Given
the current size of the large vats that #8928 seeks to terminate, 10
krefs seems like a reasonable limit. And of course we don't want to
perform too many BOYDs, so `gcKrefs: 20` is about the smallest
threshold we'd want to use.

External APIs continue to accept `reapInterval`, and now also accept
`reapGCKrefs`.

* kernel config record
  * takes `config.defaultReapInterval` and `defaultReapGCKrefs`
  * takes `vat.NAME.creationOptions.reapInterval` and `.reapGCKrefs`
* `controller.changeKernelOptions()` still takes `defaultReapInterval`
   but now also accepts `defaultReapGCKrefs`

The APIs available to userspace code (through `vatAdminSvc`) are
unchanged (partially due to upgrade/backwards-compatibility
limitations), and continue to only support setting `reapInterval`.
Internally, this just modifies `reapDirtThreshold.deliveries`.

* `E(vatAdminSvc).createVat(bcap, { reapInterval })`
* `E(adminNode).upgrade(bcap, { reapInterval })`
* `E(adminNode).changeOptions({ reapInterval })`

Internally, the kernel-wide state records `defaultReapDirtThreshold`
instead of `defaultReapInterval`, and each vat records
`.reapDirtThreshold` in their `vNN.options` key instead of
`vNN.reapInterval`. The current dirt level is recorded in
`vNN.reapDirt`.

The kernel will automatically upgrade both the kernel-wide and the
per-vat state upon the first reboot with the new kernel code. The old
`reapCountdown` value is used to initialize the vat's
`reapDirt.deliveries` counter, so the upgrade shouldn't disrupt the
existing schedule. Vats which used `reapInterval = 'never'` (eg comms)
will get a `reapDirtThreshold` of all 'never' values, so they continue
to inhibit BOYD. Otherwise, all vats get a `threshold.gcKrefs` of 20.

We do not track dirt when the corresponding threshold is 'never', to
avoid incrementing the comms dirt counters forever.

This design leaves room for adding `.computrons` to the dirt record,
as well as tracking a separate `snapshotDirt` counter (to trigger XS
heap snapshots, ala #6786). We add `reapDirtThreshold.computrons`, but
do not yet expose an API to set it.

Future work includes:
* upgrade vat-vat-admin to let userspace set `reapDirtThreshold`

New tests were added to exercise the upgrade process, and other tests
were updated to match the new internal initialization pattern.

We now reset the dirt counter upon any BOYD, so this also happens to
help with #8665 (doing a `reapAllVats()` resets the delivery counters,
so future BOYDs will be delayed, which is what we want). But we should
still change `controller.reapAllVats()` to avoid BOYDs on vats which
haven't received any deliveries.

closes #8980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

3 participants