Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Monotonicity Enforcement #141

Merged
merged 34 commits into from May 10, 2021
Merged

Time Monotonicity Enforcement #141

merged 34 commits into from May 10, 2021

Conversation

AdityaSripal
Copy link
Member

@AdityaSripal AdityaSripal commented Apr 26, 2021

Description

  • enforce time monotoncity
  • create time misbehaviour
  • early return on identical updates
  • misbehaviour detection within updateclient

closes: #XXXX


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer
  • Review Codecov Report in the comment section below once CI passes

@codecov-commenter
Copy link

codecov-commenter commented Apr 26, 2021

Codecov Report

Merging #141 (76e932a) into main (5b9e3c0) will increase coverage by 13.06%.
The diff coverage is 95.40%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main     #141       +/-   ##
===========================================
+ Coverage   65.92%   78.99%   +13.06%     
===========================================
  Files         131      109       -22     
  Lines        8382     6516     -1866     
===========================================
- Hits         5526     5147      -379     
+ Misses       2476     1009     -1467     
+ Partials      380      360       -20     
Impacted Files Coverage Δ
...clients/07-tendermint/types/misbehaviour_handle.go 85.45% <69.23%> (-5.85%) ⬇️
modules/core/02-client/keeper/client.go 98.52% <100.00%> (+0.25%) ⬆️
.../light-clients/07-tendermint/types/client_state.go 70.00% <100.00%> (ø)
...odules/light-clients/07-tendermint/types/header.go 100.00% <100.00%> (ø)
.../light-clients/07-tendermint/types/misbehaviour.go 72.13% <100.00%> (-0.89%) ⬇️
modules/light-clients/07-tendermint/types/store.go 89.24% <100.00%> (+0.35%) ⬆️
...odules/light-clients/07-tendermint/types/update.go 81.41% <100.00%> (+3.52%) ⬆️
testing/simapp/encoding.go
testing/simapp/export.go
testing/app.go
... and 19 more

Copy link
Member Author

@AdityaSripal AdityaSripal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to keep the FrozenHeight to being the height of the offending header that caused misbehaviour. With time misbehaviour i use the larger height. Since we no longer use its specific value, it doesn't really matter so long as its non-zero. @colin-axner argues for just making the height {0-1} since it isn't used for anything other than a boolean value.

I figure if we're keeping it, we may as well make it meaningful. Care to weigh in @cwgoes ?

// If client state is not frozen after clientState CheckHeaderAndUpdateState,
// then write the update state changes, and set new consensus state.
// Else the update was proof of misbehaviour and we must emit appropriate misbehaviour events.
if !newClientState.IsFrozen() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was too much confusion regarding separation of responsibilities for detecting misbehaviour here. Because conflicting header can be detected here, but time monotonicity can't. Thus, it makes more sense to just make it the responsibility of client developers to do this correctly so we have clear separation of responsibility.

Here i just check if new clientstate is frozen and if so emit appropriate events/write state

I think resulting code is cleaner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly

Solo machines store the consensus state in the client state. Thus the only protection cached context adds is against metadata, but it still seems confusing to me. As client developers should be very aware not to write unwanted state changes for an update which is actually evidence of misbehaviour. What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?

We should either be as defensive as possible by assuming client developers miss checks or we should be as explicit as possible in saying it is entirely the responsibility of the app developer. If we cache the context, then I think we might as well do the duplicate consensus state check (and return an error if a duplicate update is successful)

I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior

Regardless, these requirements should be clearly documented in a light_client.md under docs/. These are subtle checks that are essential for security

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly

They are responsible for telling core IBC if an update was misbehaviour. They are not responsible for rolling back all state changes.

What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?

This is definitely possible, i guess its up to us what we want to enable. The downside is accidentally leaving in metadata writes after misbehaviour that we intend to write only for valid updates. Think @cwgoes can weigh in on tradeoff between flexibility and opinionated code. I believe being opinionated here and having the ClientKeeper write metadata on valid update makes more sense. Light client implementations are fully responsible for doing update logic (UpdateClient will do none of that).
But ClientKeeper will take the returned output, and do all of the necessary store writes. I think that's a clean separation of responsibility.

I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior

I tried doing this and the code got super ugly because there was freezing logic in both the ClientKeeper and in tendermint's update function. That could have been much cleaner if it was just in one place.

Furthermore, I think it's possible to take all client developer checks that must be done by every light client and put them in the ClientKeeper to minimize the possibility of light-client developer error.
But I think in practice, this would make things less secure if it trades off too much on separation of concerns.
Critically, I think it just needs to be clear to a reviewer/developer where a particular check is supposed to happen.
My proposal is that we create a very clear separation of concern that acts as a contract between core IBC and light client developer.

Light client implementation must give core IBC the updated clientstate/consensus state. And it must return a frozen client state if the update was evidence of misbehaviour.
Core IBC will in turn store the clientstate (and consensus if valid update), write all callback state changes on successful updates, and emit appropriate events.

This means that there may be redundant checks happening in light clients, that may be missed by some of them. But it gives a very clear rule for what a light client implementation is responsible for. Even though i place responsibility of all misbehaviour checks on light client. As a reader and reviewer I can analyze the light-client implementation in isolation and check that it is catching all misbehaviour and holding up its side of bargain.

Without this, I need to be checking whether ClientKeeper+misbehaviour together are catching all misbehaviour. And I need to make sure together they don't miss a gap between them. And that they aren't doing redundant checks. It's also harder as time goes on to determine where a check should go. We would need to make a subjective decision on whether we think some check is universal or not.

For these reasons I think clear separation of concerns is more important than putting all universal checks (even subtle ones) in the ClientKeeper. But yes, this should absolutely be documented in light_client.md. Will do so once there's consensus on this point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of if we allow metadata writes on misbehaviour, we still want to cache so we can discard on error.

Developers shouldn't be forced to revert state themselves on error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you make great points.

it must return a frozen client state if the update was evidence of misbehaviour.

I agree with this.

As a reader and reviewer I can analyze the light-client implementation in isolation and check that it is catching all misbehaviour and holding up its side of bargain.

I like this, and I think we can still achieve this with a duplicate check. My concern is that allowing a duplicate update at an existing height is a critical security vulnerability and I'm hesitant to let it go by when we have the capacity to do the check. This is the code I have in mind:

consState, exists := keeper.GetConsensusState()

newClientState, newConsensusState, err := CheckHeaderAndUpdateState()
if err != nil {
    return err
}

// write client state, errors returned later revert state changes

switch {
case: newCilentState.IsFrozen()
    // use logic you have
case: exists && !reflect.DeepEqual(consState, newConsensusState)
    // light client implementation missed misbehaviour handling
    return err
default:
    // regular update code
}

I don't see why this code gets ugly? It allows light clients to fully implement misbehaviour logic without relying on 02-client and it allows 02-client to prevent duplicate updates which are misbehaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of if we allow metadata writes on misbehaviour, we still want to cache so we can discard on error.

Developers shouldn't be forced to revert state themselves on error

Do you have the use case in mind that update is being called by an external module? Messages that result in errors always have state changes reverted by baseapp. I think this is a safe assumption to make

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have the use case in mind that update is being called by an external module? Messages that result in errors always have state changes reverted by baseapp. I think this is a safe assumption to make

Oh yes you're correct about this. We should only cache if we discard metadata on misbehaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that allowing a duplicate update at an existing height is a critical security vulnerability and I'm hesitant to let it go by when we have the capacity to do the check.

Here's a question - is this always true? Certainly it is a problem if unequal consensus states at the same height would allow for violation of exactly-once packet delivery guarantees or timeouts, but there could conceivably be client types which allow duplicate consensus states, just not verification at them (so they are only intermediate update points) - for example, a (non-Tendermint) consensus algorithm could have a block history which looks like this:

blocks

Is this a case we want to consider? There is something to be said for not overly constraining what it means for clients to be "correct", since clients implement all of the packet data / timeout / etc. verification functions anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! I didn't realize intermediate update points were a possibility.

In light of our discussion yesterday, I don't see the usefulness of adding this check if in the near future, light client implementations will be responsible for getting/setting client/consensus states. In this design, light clients should definitely be aware to guard against duplicate updates which constitute misbehaviour

@colin-axner colin-axner added this to the 1.0.0 milestone Apr 27, 2021
Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, nice work!

modules/core/02-client/keeper/client.go Outdated Show resolved Hide resolved
modules/core/02-client/keeper/client.go Outdated Show resolved Hide resolved
// If client state is not frozen after clientState CheckHeaderAndUpdateState,
// then write the update state changes, and set new consensus state.
// Else the update was proof of misbehaviour and we must emit appropriate misbehaviour events.
if !newClientState.IsFrozen() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we cache the context if it is now the full responsibility of the client developers to handle all instances of misbehaviour correctly

Solo machines store the consensus state in the client state. Thus the only protection cached context adds is against metadata, but it still seems confusing to me. As client developers should be very aware not to write unwanted state changes for an update which is actually evidence of misbehaviour. What if a client wanted to write metadata everytime it handled misbehaviour in an update client message?

We should either be as defensive as possible by assuming client developers miss checks or we should be as explicit as possible in saying it is entirely the responsibility of the app developer. If we cache the context, then I think we might as well do the duplicate consensus state check (and return an error if a duplicate update is successful)

I'd actually prefer to be as defensive as possible. In which case, we should keep the cached context and return an error if a duplicate update occurs without the client detecting misbehavior

Regardless, these requirements should be clearly documented in a light_client.md under docs/. These are subtle checks that are essential for security

modules/core/02-client/keeper/client.go Show resolved Hide resolved
modules/light-clients/07-tendermint/types/update_test.go Outdated Show resolved Hide resolved
@colin-axner
Copy link
Contributor

changelog entry is needed (I keep forgetting as well)

@AdityaSripal AdityaSripal mentioned this pull request Apr 28, 2021
6 tasks
@colin-axner colin-axner mentioned this pull request Apr 29, 2021
9 tasks
Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting to approve once we resolve decisions on

  • using cache ctx (no preference)
  • returning error if client did not handle misbehaviour (duplicate update produces different consensus state)

I have a preference for doing the consensus state check, but willing to concede if others consider it unnecessary

modules/core/02-client/keeper/client_test.go Outdated Show resolved Hide resolved
modules/core/02-client/keeper/client_test.go Outdated Show resolved Hide resolved
@cwgoes
Copy link
Contributor

cwgoes commented Apr 30, 2021

I figure if we're keeping it, we may as well make it meaningful. Care to weigh in @cwgoes ?

Just to refresh my memory, the goal here is backwards compatibility, right? Otherwise it seems odd to me to keep a field with a range (integer) far greater than the set of semantically meaningful values (0, non-zero). For backwards compatibility I guess there isn't too much difference between 0, 1 and 0, nonzero height - although the latter does run the risk of appearing to be more semantically relevant than it actually is - and we don't really want IBC users to start using this value for other kinds of data processing, if they really care about misbehaviour heights they should read them from the event logs - those are both some (weak) reasons to prefer 0, 1 I think.

Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! In light of our discussion yesterday, I think the current solution aligns perfectly with future changes

@colin-axner
Copy link
Contributor

@AdityaSripal I fixed the merge conflicts. Everything was straightforward except for the check in update client for if the client is frozen. I update the code to be semantically equivalent:

Before

if !client.IsFrozen {

} else {

}

after

if status := client.Status; status != exported.Frozen {

} else {

}

It is just slightly odd because we aren't checking if the client is active, just that it is not frozen

Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK modulo quick confirmation question

@colin-axner colin-axner enabled auto-merge (squash) May 10, 2021 11:02
@colin-axner colin-axner merged commit 2d3132f into main May 10, 2021
@colin-axner colin-axner deleted the alderfly-ibc-fix branch May 10, 2021 11:11
faddat referenced this pull request in notional-labs/ibc-go Feb 23, 2022
faddat referenced this pull request in notional-labs/ibc-go Mar 1, 2022
…hains

* Adding github workflows for coco

* Adding CoCo & FreeFlix chains details

* updating node details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants