Handle mesh-provider IBC Failure Cases #38

teddyknox · 2022-10-07T02:34:51Z

EDIT 10/19

I went through and implemented some todos related to IBC failures in mesh-provider. The only unintuitive change involves a shift from unilateral staking to bilateral staking. That is, mesh-provider only updates its staking state after it receives confirmation via ack from mesh-consumer on the "stake" IBC call. By contrast, mesh-provider continues to offer unilateral unstaking, since we must assume that the consumer chain might be byzantine.

Qs

@ethanfrey is there a way to mock IBC calls for testing?

Closes #37

JakeHartnell · 2022-10-07T19:32:11Z

How should the mesh-provider ibc handler notify the UI that the stake/unstake operation failed? My intuition is that we'd want to publish an event...?

An event sounds reasonable to me.

After the list_validators IBC call has been attempted 5 times, we give up. Would it be desirable to offer an execution endpoint for restarting the retries?

Perhaps? Can probably hold off on this for now. It's easy enough to add later. Maybe @ethanfrey has different thoughts.

Should the retry count for list_validators be configurable or is a constant OK?

IMO constant is OK for now, but if you want to make it configurable we can.

contracts/mesh-provider/src/ibc.rs

contracts/mesh-provider/src/contract.rs

ethanfrey · 2022-10-18T06:27:03Z

I agree with @JakeHartnell comments on how it should work. Going to review the code now.

ethanfrey

Made comments that handling of unstake should actually be like it was done. But stake is the one that should have such changes. And a bunch of minor points

contracts/mesh-provider/src/contract.rs

contracts/mesh-provider/src/ibc.rs

ethanfrey · 2022-10-18T06:37:00Z

contracts/mesh-provider/src/ibc.rs

+
+    let mut val = VALIDATORS
+        .may_load(deps.storage, &validator)?
+        .ok_or_else(|| ContractError::UnknownValidator(validator.clone()))?;


Maybe we return Response with a custom event saying this?

contracts/mesh-provider/src/ibc.rs

teddyknox

Thanks for the review Ethan. I've pushed with edits.

contracts/mesh-provider/src/contract.rs

contracts/mesh-provider/src/ibc.rs

JakeHartnell

LGTM

Art3miX

I'm not sure what should be covered under the POC, but RETRIES storage is shared "state", so if for example I stake and fail, the next successful stake call will reset RETRIES, or another example, if I stake and fail and you unstake and fail, we both will have 5 retries instead of 5 per each.

I think (and I might be wrong) that we should move retries per case (stake, unstake, all_val) into its own storages, and also use an ID, which can be the sender addr, so we can only check retries for that sender.

Also a question regarding the above, what should we do in the example:
"I stake and fail and next msg I unstake and fail."
I should not be able to unstake when im still retrying to stake.

Maybe we should keep stake and unstake retries shared under 1 storage, and when we call a msg, we first check if we are in a process of retrying, if we are still retrying we just fail the TX with "Still retrying last messages" sort of messages.
Basically allow to move on if we are not retrying.

ethanfrey · 2022-10-25T06:55:30Z

Good point @Art3miX

I mentioned in some comment that retries make sense for the ListValidators query, which is required for startup, to avoid any hickups on startup. This is valid global state - should only be called once.

But we should just handle stake and unstake failures locally. Stake by reverting the deposit (as if they never did anything, no unbonding period). Unstake by just ignoring failure for MVP and open an issue how to better communicate it (point is byzantine network should not be able to prevent honest provider from unstaking)

ethanfrey

Main point is removing retries from stake and unstake. I feel rather strong about stake - we try once and undo if it fails.

For unstake (where we don't take actions based on results), I see no harm in retrying, but it does seem unnecessary and if done properly will need a per-packet/sender counter, which is more complex.

ethanfrey · 2022-10-25T06:56:12Z

contracts/mesh-provider/src/contract.rs

    CONFIG.save(deps.storage, &state)?;
+    RETRIES.save(deps.storage, &RetryState {
+        list_validators_retries_remaining: LIST_VALIDATORS_MAX_RETRIES,
+        stake_retries_remaining: STAKE_MAX_RETRIES,


I would remove both of these stake/unstake retries

contracts/mesh-provider/src/ibc.rs

ethanfrey · 2022-10-25T07:00:46Z

contracts/mesh-provider/src/ibc.rs

+        return Ok(IbcBasicResponse::new().add_event(Event::new("ack_stake_zero_amount")));
+    }
+
+    let mut val = VALIDATORS


This is basically the code from https://github.com/CosmWasm/mesh-security/pull/38/files#diff-ba0374507dc354d977b251871cc89ff4552166c72b7b326edb14a9686ce8d5bdL118-L128 which is good.

I think the below point was not properly resolved (returning an event rather than erroring in ack), but this case should never happen.

contracts/mesh-provider/src/ibc.rs

ethanfrey · 2022-10-25T07:03:33Z

contracts/mesh-provider/src/ibc.rs

 ) -> Result<IbcBasicResponse, ContractError> {
-    // TODO: release the bonded stake, adjust numer
-    unimplemented!();
+    let mut retries = RETRIES.load(deps.storage)?;


Remove all retries logic here. Only on list_validators.

What we do need to do is actually return the stake to the user. We never updated our internal stake/validator book-keeping (that is done on ack_stake), but we did receive some bonded tokens from the lockup contract. We should tell that contract that this user will not be bonding this stake and can release it there. Otherwise, those tokens remain locked forever in failure mode.

(Look at logic called on successful claim, after unbonding, to see how to release it)

Sorry for my absence, I've been onboarding to a new job. Is it safe to assume we should be returning back
coin(amount.u128(), config.rewards_ibc_denom)?

contracts/mesh-provider/src/contract.rs

ethanfrey · 2022-10-25T07:05:21Z

contracts/mesh-provider/src/ibc.rs

 ) -> Result<IbcBasicResponse, ContractError> {
-    // TODO: unrelease the bonded stake, remove claim


I would make this a no-op.
No retries, no action needed.
Same as ack_unstake

JakeHartnell

Please rebase and implement the changes Ethan has suggested.

JakeHartnell requested review from ethanfrey and JakeHartnell October 7, 2022 17:54

JakeHartnell mentioned this pull request Oct 8, 2022

Implement mesh-provider IBC packet failure cases #37

Closed

3 tasks

JakeHartnell reviewed Oct 10, 2022

View reviewed changes

contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved

JakeHartnell reviewed Oct 10, 2022

View reviewed changes

contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved

JakeHartnell reviewed Oct 10, 2022

View reviewed changes

contracts/mesh-provider/src/contract.rs Outdated Show resolved Hide resolved

ethanfrey reviewed Oct 18, 2022

View reviewed changes

teddyknox commented Oct 19, 2022

View reviewed changes

contracts/mesh-provider/src/contract.rs Outdated Show resolved Hide resolved

contracts/mesh-provider/src/contract.rs Outdated Show resolved Hide resolved

contracts/mesh-provider/src/ibc.rs Show resolved Hide resolved

contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved

teddyknox marked this pull request as ready for review October 19, 2022 13:46

JakeHartnell approved these changes Oct 21, 2022

View reviewed changes

Art3miX reviewed Oct 25, 2022

View reviewed changes

ethanfrey reviewed Oct 25, 2022

View reviewed changes

JakeHartnell requested changes Oct 31, 2022

View reviewed changes

Teddy Knox added 3 commits November 8, 2022 20:54

Handle mesh-provider ibc failure cases

5f4d211

Update with code review feedback from Jake, Ethan

abef98d

Update test

ac5de41

teddyknox force-pushed the mp-handle-ibc-failures branch from 546a3d6 to ac5de41 Compare November 9, 2022 02:45

Remove stake, unstake retries

8898193

This was referenced Dec 13, 2022

Continue mesh-provider IBC Failure Cases #67

Closed

Mesh provider IBC Failure #68

Merged

JakeHartnell merged commit 8f581c3 into CosmWasm:main Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle mesh-provider IBC Failure Cases #38

Handle mesh-provider IBC Failure Cases #38

teddyknox commented Oct 7, 2022 •

edited

JakeHartnell commented Oct 7, 2022

ethanfrey commented Oct 18, 2022

ethanfrey left a comment

ethanfrey Oct 18, 2022

teddyknox left a comment

JakeHartnell left a comment

Art3miX left a comment

ethanfrey commented Oct 25, 2022 •

edited

ethanfrey left a comment

ethanfrey Oct 25, 2022

ethanfrey Oct 25, 2022

ethanfrey Oct 25, 2022

teddyknox Nov 9, 2022

ethanfrey Oct 25, 2022

JakeHartnell left a comment

		) -> Result<IbcBasicResponse, ContractError> {
		// TODO: unrelease the bonded stake, remove claim

Handle mesh-provider IBC Failure Cases #38

Handle mesh-provider IBC Failure Cases #38

Conversation

teddyknox commented Oct 7, 2022 • edited

Qs

JakeHartnell commented Oct 7, 2022

ethanfrey commented Oct 18, 2022

ethanfrey left a comment

Choose a reason for hiding this comment

ethanfrey Oct 18, 2022

Choose a reason for hiding this comment

teddyknox left a comment

Choose a reason for hiding this comment

JakeHartnell left a comment

Choose a reason for hiding this comment

Art3miX left a comment

Choose a reason for hiding this comment

ethanfrey commented Oct 25, 2022 • edited

ethanfrey left a comment

Choose a reason for hiding this comment

ethanfrey Oct 25, 2022

Choose a reason for hiding this comment

ethanfrey Oct 25, 2022

Choose a reason for hiding this comment

ethanfrey Oct 25, 2022

Choose a reason for hiding this comment

teddyknox Nov 9, 2022

Choose a reason for hiding this comment

ethanfrey Oct 25, 2022

Choose a reason for hiding this comment

JakeHartnell left a comment

Choose a reason for hiding this comment

teddyknox commented Oct 7, 2022 •

edited

ethanfrey commented Oct 25, 2022 •

edited