Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle mesh-provider IBC Failure Cases #38

Merged
merged 4 commits into from Dec 14, 2022

Conversation

teddyknox
Copy link

@teddyknox teddyknox commented Oct 7, 2022

EDIT 10/19

I went through and implemented some todos related to IBC failures in mesh-provider. The only unintuitive change involves a shift from unilateral staking to bilateral staking. That is, mesh-provider only updates its staking state after it receives confirmation via ack from mesh-consumer on the "stake" IBC call. By contrast, mesh-provider continues to offer unilateral unstaking, since we must assume that the consumer chain might be byzantine.

Qs

  • @ethanfrey is there a way to mock IBC calls for testing?

Closes #37

@JakeHartnell
Copy link
Collaborator

How should the mesh-provider ibc handler notify the UI that the stake/unstake operation failed? My intuition is that we'd want to publish an event...?

An event sounds reasonable to me.

After the list_validators IBC call has been attempted 5 times, we give up. Would it be desirable to offer an execution endpoint for restarting the retries?

Perhaps? Can probably hold off on this for now. It's easy enough to add later. Maybe @ethanfrey has different thoughts.

Should the retry count for list_validators be configurable or is a constant OK?

IMO constant is OK for now, but if you want to make it configurable we can.

@ethanfrey
Copy link
Member

I agree with @JakeHartnell comments on how it should work. Going to review the code now.

Copy link
Member

@ethanfrey ethanfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made comments that handling of unstake should actually be like it was done. But stake is the one that should have such changes. And a bunch of minor points

contracts/mesh-provider/src/contract.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/contract.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved

let mut val = VALIDATORS
.may_load(deps.storage, &validator)?
.ok_or_else(|| ContractError::UnknownValidator(validator.clone()))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we return Response with a custom event saying this?

contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved
Copy link
Author

@teddyknox teddyknox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Ethan. I've pushed with edits.

contracts/mesh-provider/src/contract.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/contract.rs Outdated Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Show resolved Hide resolved
contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved
@teddyknox teddyknox marked this pull request as ready for review October 19, 2022 13:46
Copy link
Collaborator

@JakeHartnell JakeHartnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@Art3miX Art3miX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what should be covered under the POC, but RETRIES storage is shared "state", so if for example I stake and fail, the next successful stake call will reset RETRIES, or another example, if I stake and fail and you unstake and fail, we both will have 5 retries instead of 5 per each.

I think (and I might be wrong) that we should move retries per case (stake, unstake, all_val) into its own storages, and also use an ID, which can be the sender addr, so we can only check retries for that sender.

Also a question regarding the above, what should we do in the example:
"I stake and fail and next msg I unstake and fail."
I should not be able to unstake when im still retrying to stake.

Maybe we should keep stake and unstake retries shared under 1 storage, and when we call a msg, we first check if we are in a process of retrying, if we are still retrying we just fail the TX with "Still retrying last messages" sort of messages.
Basically allow to move on if we are not retrying.

@ethanfrey
Copy link
Member

ethanfrey commented Oct 25, 2022

Good point @Art3miX

I mentioned in some comment that retries make sense for the ListValidators query, which is required for startup, to avoid any hickups on startup. This is valid global state - should only be called once.

But we should just handle stake and unstake failures locally. Stake by reverting the deposit (as if they never did anything, no unbonding period). Unstake by just ignoring failure for MVP and open an issue how to better communicate it (point is byzantine network should not be able to prevent honest provider from unstaking)

Copy link
Member

@ethanfrey ethanfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main point is removing retries from stake and unstake. I feel rather strong about stake - we try once and undo if it fails.

For unstake (where we don't take actions based on results), I see no harm in retrying, but it does seem unnecessary and if done properly will need a per-packet/sender counter, which is more complex.

CONFIG.save(deps.storage, &state)?;
RETRIES.save(deps.storage, &RetryState {
list_validators_retries_remaining: LIST_VALIDATORS_MAX_RETRIES,
stake_retries_remaining: STAKE_MAX_RETRIES,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove both of these stake/unstake retries

contracts/mesh-provider/src/ibc.rs Show resolved Hide resolved
return Ok(IbcBasicResponse::new().add_event(Event::new("ack_stake_zero_amount")));
}

let mut val = VALIDATORS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically the code from https://github.com/CosmWasm/mesh-security/pull/38/files#diff-ba0374507dc354d977b251871cc89ff4552166c72b7b326edb14a9686ce8d5bdL118-L128 which is good.

I think the below point was not properly resolved (returning an event rather than erroring in ack), but this case should never happen.

contracts/mesh-provider/src/ibc.rs Outdated Show resolved Hide resolved
) -> Result<IbcBasicResponse, ContractError> {
// TODO: release the bonded stake, adjust numer
unimplemented!();
let mut retries = RETRIES.load(deps.storage)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all retries logic here. Only on list_validators.

What we do need to do is actually return the stake to the user. We never updated our internal stake/validator book-keeping (that is done on ack_stake), but we did receive some bonded tokens from the lockup contract. We should tell that contract that this user will not be bonding this stake and can release it there. Otherwise, those tokens remain locked forever in failure mode.

(Look at logic called on successful claim, after unbonding, to see how to release it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my absence, I've been onboarding to a new job. Is it safe to assume we should be returning back
coin(amount.u128(), config.rewards_ibc_denom)?

contracts/mesh-provider/src/contract.rs Show resolved Hide resolved
) -> Result<IbcBasicResponse, ContractError> {
// TODO: unrelease the bonded stake, remove claim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a no-op.
No retries, no action needed.
Same as ack_unstake

Copy link
Collaborator

@JakeHartnell JakeHartnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase and implement the changes Ethan has suggested.

@JakeHartnell JakeHartnell merged commit 8f581c3 into CosmWasm:main Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement mesh-provider IBC packet failure cases
4 participants