ic-ref-test: Test stopping state #63

nomeata · 2021-12-04T16:59:38Z

This allows the test driver to withhold the response to a message, and
control when they are released, in order to produce situations with
outstanding call contexts.

In an ideal world (from our pov), we could instrument and control the
system's scheduler this way, but we can't. So instead, we use some tricks.
Ideally, the details of this trick are irrelevant to the users of this
function (yay, abstraction), and if we find better tricks, we can swap them
out easily. We'll see if that holds water.

One problem with this approach is that a test failure could mean that the
system doesn't pass the test, but it could also mean that the system has a
bug that prevents this trick from working, so take care.

The current trick is: Create a canister (the "stopper"). Make it its own
controller. Tell the canister to stop itself. This call will now hang,
because a canister cannot stop itself. We can release the call (producing a
reject) by starting the canister again.

Some tests are added, and others are now using this mechanism.

This has uncovered bugs in ic-ref related to the handling of stopped.

nomeata · 2021-12-04T17:01:20Z

This is a draft because I found some problems with the IC.Ref code, related to stopping canisters and the handlng of response messages to stopped/empty/deleted canisters, which should not cause rejects, but be treated like traps. This might even be a bug in the spec itself, didn't check yet, and running out of time now. Will pick this up somem other time.

This allows the test driver to withhold the response to a message, and control when they are released, in order to produce situations with outstanding call contexts. In an ideal world (from our pov), we could instrument and control the system's scheduler this way, but we can't. So instead, we use some tricks. Ideally, the details of this trick are irrelevant to the users of this function (yay, abstraction), and if we find better tricks, we can swap them out easily. We'll see if that holds water. One problem with this approach is that a test failure could mean that the system doesn't pass the test, but it could also mean that the system has a bug that prevents this trick from working, so take care. The current trick is: Create a canister (the "stopper"). Make it its own controller. Tell the canister to stop itself. This call will now hang, because a canister cannot stop itself. We can release the call (producing a reject) by starting the canister again. This has uncovered bugs in `ic-ref` related to the handling of stopped.

nomeata · 2021-12-04T18:33:01Z

Ok, I patched over the most egregious errors in the reference implementation, so this test now passes.

@marcin-dziadus, do you want to check if it works with the replica as well before we merge it.

We can add more tests related to outstanding messages later. In particular I would like to test upgrading with outstanding messages, or callbacks to deleted canisters etc. This is something that we don't see a lot in the wild, is not tested here a lot, and maybe not so much in the replica's tests, so it's a bit of blind spot.

I think we can make the universal canister cope with callbacks after upgrades, or after deleting/reinstallation, by serializing the callback table.

nomeata · 2021-12-04T18:34:30Z

(this PR was created while traveling in Ghana, on a mobile phone, with external keyboard, via mosh on a rented server. Also a way to work…)

marcin-dziadus · 2021-12-15T09:14:29Z

Ok, I patched over the most egregious errors in the reference implementation, so this test now passes.

@marcin-dziadus, do you want to check if it works with the replica as well before we merge it.

We can add more tests related to outstanding messages later. In particular I would like to test upgrading with outstanding messages, or callbacks to deleted canisters etc. This is something that we don't see a lot in the wild, is not tested here a lot, and maybe not so much in the replica's tests, so it's a bit of blind spot.

I think we can make the universal canister cope with callbacks after upgrades, or after deleting/reinstallation, by serializing the callback table.

Sorry for the delay, I've been offline for a while.
I'm afraid that it wouldn't work with the replica, but this PR is not to blame here - the reference implementation diverged slightly from the replica. I'd suggest moving forward with the PR. Verifying that everything works after the replica catches up with the ref implementation will be on my plate. @nomeata What do you think?

nomeata · 2021-12-15T09:38:02Z

Depends:

if we think the tests are correct with regard to the spec, and the replica isn't quite in compliance, then we should merge this, use the feature in the replica's CI to mark these tests as known-to-fail, and fix eventually in replica.

But if they fail in the replica because the tests are not good (maybe they need to wait a bit in some place or another?) we should fix the tests.

How do the tests fail against the replica?

marcin-dziadus · 2021-12-15T09:47:41Z

Depends:

if we think the tests are correct with regard to the spec, and the replica isn't quite in compliance, then we should merge this, use the feature in the replica's CI to mark these tests as known-to-fail, and fix eventually in replica.

But if they fail in the replica because the tests are not good (maybe they need to wait a bit in some place or another?) we should fix the tests.

How do the tests fail against the replica?

Thanks for the clarification! Performance counting is currently uncovered in the replica which causes the canister parsing to fail.

marcin-dziadus · 2021-12-15T10:13:54Z

Depends:
if we think the tests are correct with regard to the spec, and the replica isn't quite in compliance, then we should merge this, use the feature in the replica's CI to mark these tests as known-to-fail, and fix eventually in replica.
But if they fail in the replica because the tests are not good (maybe they need to wait a bit in some place or another?) we should fix the tests.
How do the tests fail against the replica?

Thanks for the clarification! Performance counting is currently uncovered in the replica which causes the canister parsing to fail.

I think it falls under the first type of situations you described.

nomeata · 2021-12-15T16:58:16Z

But this PR is independent of performance counters, isn't it? I'm a bit confused now :-)

marcin-dziadus · 2021-12-15T17:14:34Z

But this PR is independent of performance counters, isn't it? I'm a bit confused now :-)

It is :) My point is that running the most recent version of ic-ref against the replica is doomed to fail at the moment (because of a reason unrelated to your PR). Does that make sense?

Technically it is possible to check what you asked for, but not without a hassle.

nomeata · 2021-12-15T18:59:28Z

Oh, I see. But it’s kinda unforunate to be blocked here. And we don’t need to: The script that runs ic-ref-test against the replica has provisions to exclude individual tests, see
https://github.com/dfinity/ic/blob/master/tests/ic-ref-test/run#L107-L135

So it should be possible to bump ic-ref-test in the replica’s CI to benefit from new tests even if some are not passing, by adding them to that file.

Anyways, sounds like we can merge this

nomeata force-pushed the joachim/stopping branch from 908cc56 to ceb30f0 Compare December 4, 2021 17:02

Work around the issue for now

ab8e23b

nomeata marked this pull request as ready for review December 4, 2021 18:33

nomeata added 4 commits December 5, 2021 12:45

Test that stopping canisters cannot be deleted

c66545d

Redo tests related to uninstallation and outstanding calls

f6de711

Comment

7bb4c82

Cleanup

d3bbf98

marcin-dziadus approved these changes Dec 15, 2021

View reviewed changes

nomeata merged commit 22f0fff into master Dec 15, 2021

nomeata deleted the joachim/stopping branch December 15, 2021 19:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ic-ref-test: Test stopping state #63

ic-ref-test: Test stopping state #63

nomeata commented Dec 4, 2021 •

edited

nomeata commented Dec 4, 2021

nomeata commented Dec 4, 2021

nomeata commented Dec 4, 2021

marcin-dziadus commented Dec 15, 2021

nomeata commented Dec 15, 2021

marcin-dziadus commented Dec 15, 2021

marcin-dziadus commented Dec 15, 2021

nomeata commented Dec 15, 2021

marcin-dziadus commented Dec 15, 2021 •

edited

nomeata commented Dec 15, 2021

ic-ref-test: Test stopping state #63

ic-ref-test: Test stopping state #63

Conversation

nomeata commented Dec 4, 2021 • edited

nomeata commented Dec 4, 2021

nomeata commented Dec 4, 2021

nomeata commented Dec 4, 2021

marcin-dziadus commented Dec 15, 2021

nomeata commented Dec 15, 2021

marcin-dziadus commented Dec 15, 2021

marcin-dziadus commented Dec 15, 2021

nomeata commented Dec 15, 2021

marcin-dziadus commented Dec 15, 2021 • edited

nomeata commented Dec 15, 2021

nomeata commented Dec 4, 2021 •

edited

marcin-dziadus commented Dec 15, 2021 •

edited