New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle mesh-provider IBC Failure Cases #38
Conversation
An event sounds reasonable to me.
Perhaps? Can probably hold off on this for now. It's easy enough to add later. Maybe @ethanfrey has different thoughts.
IMO constant is OK for now, but if you want to make it configurable we can. |
I agree with @JakeHartnell comments on how it should work. Going to review the code now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made comments that handling of unstake should actually be like it was done. But stake is the one that should have such changes. And a bunch of minor points
|
||
let mut val = VALIDATORS | ||
.may_load(deps.storage, &validator)? | ||
.ok_or_else(|| ContractError::UnknownValidator(validator.clone()))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we return Response with a custom event saying this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review Ethan. I've pushed with edits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what should be covered under the POC, but RETRIES
storage is shared "state", so if for example I stake and fail, the next successful stake call will reset RETRIES
, or another example, if I stake and fail and you unstake and fail, we both will have 5 retries instead of 5 per each.
I think (and I might be wrong) that we should move retries per case (stake, unstake, all_val) into its own storages, and also use an ID, which can be the sender addr, so we can only check retries for that sender.
Also a question regarding the above, what should we do in the example:
"I stake and fail and next msg I unstake and fail."
I should not be able to unstake when im still retrying to stake.
Maybe we should keep stake and unstake retries shared under 1 storage, and when we call a msg, we first check if we are in a process of retrying, if we are still retrying we just fail the TX with "Still retrying last messages" sort of messages.
Basically allow to move on if we are not retrying.
Good point @Art3miX I mentioned in some comment that retries make sense for the ListValidators query, which is required for startup, to avoid any hickups on startup. This is valid global state - should only be called once. But we should just handle stake and unstake failures locally. Stake by reverting the deposit (as if they never did anything, no unbonding period). Unstake by just ignoring failure for MVP and open an issue how to better communicate it (point is byzantine network should not be able to prevent honest provider from unstaking) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main point is removing retries from stake and unstake. I feel rather strong about stake - we try once and undo if it fails.
For unstake (where we don't take actions based on results), I see no harm in retrying, but it does seem unnecessary and if done properly will need a per-packet/sender counter, which is more complex.
CONFIG.save(deps.storage, &state)?; | ||
RETRIES.save(deps.storage, &RetryState { | ||
list_validators_retries_remaining: LIST_VALIDATORS_MAX_RETRIES, | ||
stake_retries_remaining: STAKE_MAX_RETRIES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove both of these stake/unstake retries
return Ok(IbcBasicResponse::new().add_event(Event::new("ack_stake_zero_amount"))); | ||
} | ||
|
||
let mut val = VALIDATORS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the code from https://github.com/CosmWasm/mesh-security/pull/38/files#diff-ba0374507dc354d977b251871cc89ff4552166c72b7b326edb14a9686ce8d5bdL118-L128 which is good.
I think the below point was not properly resolved (returning an event rather than erroring in ack), but this case should never happen.
) -> Result<IbcBasicResponse, ContractError> { | ||
// TODO: release the bonded stake, adjust numer | ||
unimplemented!(); | ||
let mut retries = RETRIES.load(deps.storage)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all retries logic here. Only on list_validators.
What we do need to do is actually return the stake to the user. We never updated our internal stake/validator book-keeping (that is done on ack_stake
), but we did receive some bonded tokens from the lockup contract. We should tell that contract that this user will not be bonding this stake and can release it there. Otherwise, those tokens remain locked forever in failure mode.
(Look at logic called on successful claim, after unbonding, to see how to release it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my absence, I've been onboarding to a new job. Is it safe to assume we should be returning back
coin(amount.u128(), config.rewards_ibc_denom)
?
) -> Result<IbcBasicResponse, ContractError> { | ||
// TODO: unrelease the bonded stake, remove claim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a no-op.
No retries, no action needed.
Same as ack_unstake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase and implement the changes Ethan has suggested.
546a3d6
to
ac5de41
Compare
EDIT 10/19
I went through and implemented some todos related to IBC failures in mesh-provider. The only unintuitive change involves a shift from unilateral staking to bilateral staking. That is, mesh-provider only updates its staking state after it receives confirmation via ack from mesh-consumer on the "stake" IBC call. By contrast, mesh-provider continues to offer unilateral unstaking, since we must assume that the consumer chain might be byzantine.
Qs
Closes #37