Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(CIP-145): updates from forum discussion #149

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

smrz2001
Copy link
Contributor

@smrz2001 smrz2001 commented Dec 15, 2023

Updates based on this forum discussion.

cc @m0ar

@smrz2001 smrz2001 requested a review from oed December 15, 2023 23:46
@smrz2001 smrz2001 changed the title fix: pairing updates from forum discussion fix: updates from forum discussion Dec 16, 2023
@smrz2001 smrz2001 changed the title fix: updates from forum discussion fix(CIP-145): updates from forum discussion Dec 16, 2023
CIPs/cip-145.md Outdated
Using multi-prev Data Events allows us to reduce the number of uncovered events and converge the stream so that there is
only a single uncovered event, without any data abandoned on pruned branches. The stream's converged/diverged state can
be determined by looking at the `prev` fields of all the Data Events for that stream.
1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we consider branches that contain dominant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would time events get pruned?
Consider Fig. 6:

  • Data B expires at time 3<n<4
  • Data C has a time event at 4
  • If Time 3 is pruned, then it's unclear why Data B should remain valid

Copy link

@m0ar m0ar Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Since the data A branch is the tip according to the rest of the rules in fig 4, I'm not sure why we should try to merge the descendants of T1 🤔
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good question. @AaronGoldman and I had a good discussion about this scenario.
image
In this example, Data C can only be created after Data B has been validated to either be within the expiration timeout or to already have a valid Time Event.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Of course, if Time 3 for Data B is available before Data C is created, the CID of Time 3 will be included in Data C's multi-prev.

To your point, @m0ar, including Time 1's descendants allows multi-prev to cover existing history (even if pruned) in the stream state, whereas today, pruned events are lost from the stream state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Can we really make this assumption? In theory the author of Data C could "import" Data B into the log even though it's invalid, e.g. Time 3 doesn't exist or is too late.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Can we really make this assumption? In theory the author of Data C could "import" Data B into the log even though it's invalid, e.g. Time 3 doesn't exist or is too late.

Events like Time 3 are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Agree that Data C's validity is not predicated on the validity of A or B. However, the validity of Data B is predicated on Time 3 being available. Therefore, it doesn't seem right to say that we can prune Time 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Agree that Data C's validity is not predicated on the validity of A or B. However, the validity of Data B is predicated on Time 3 being available. Therefore, it doesn't seem right to say that we can prune Time 3.

Yes, what would you think about including Time 3 in the multi-prev of the next event to be added to the stream? That's what we meant to say here:

Events like Time 3 are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.

What if a new event Data D (occurring after Time 4) had a prev of [Time 4, Time 3]? Even though Time 3 will not take precedence during tip selection, it will always remain part of the stream history.

Or, let's say, Time 3 didn't show up until we already had Time 4 -> Data D -> Time 5 -> Data E -> Time 6. Then Data F (occurring after Time 6) would have a prev of [Time 6, Time 3]. Data B would remain in an "unverified" state until Time 3 was discovered.

Tracking Data B's validity this way would be a little more complicated than the usual flow, but always possible. Moreover, now all events related to the stream would be part of the DAG, which is great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I mean! Just wanted to be clear that Time 3 can't be pruned if we expect the creator of Data D to include it in prev.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smrz2001 since it seems hat we are in agreement, maybe you can update the language to be clear that the TimeEvent doesn't get pruned?

CIPs/cip-145.md Outdated
Comment on lines 62 to 63
3. The Data Event that is covered by the earliest Time Event wins (see event `A` in fig. 5).
4. If two Data Events share the earliest timestamp, then the branch of the Data Event with the lower CID wins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for a Data Event to "win" in the context of multiple prev?

Copy link
Contributor Author

@smrz2001 smrz2001 Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We updated the wording. A Data Event "winning" here meant that that Data Event would be marked the tip by the protocol. An application would be able to make use of this information to create a merge Data Event, though other candidate CIDs would also be present in the prev field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the prev field? I don't follow why the protocol needs to care about "winning"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it's prev array when it's being constructed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get where you're going, but I'm not sure how well interoperability would work if two projects have different ideas on tip consensus 🤔

Or do you with application here mean the ceramic node? As in, the stream type implementation would decide on how to solve conflicts? I think that would make sense if so, as there may be other valid interpretations of this depending on the stream type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when I say protocol here I'm referring to the event streaming protocol. Stream type handlers is an application.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes full-on sense then 👌

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the prev field? I don't follow why the protocol needs to care about "winning"?
e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it's prev array when it's being constructed.

To answer your question, @oed, there are two reasons for this:

  • It is simpler for applications to just be given a tip per some default, predictable algorithm. They can choose to override this order, but don't have to.
  • There is always some eventually consistent state of a diverged stream, even if the controller never comes back to create the Merge Event, because the default precedence rules can be used to determine the tip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the application need to be given all tips anyway? In order to include them all in the prev array? Are we simply talking about the ordering of the CIDs in the returned array here?

CIPs/cip-145.md Outdated
3. This branch later forks into additional branches for `Data E`, `Time 4`, and `Data F`. Based on rules (3) and (4),
the branches for `Data E` and `Data F` are the only branches considered for tip selection.
![Alt text](../assets/cip-145/rules4.png)
4. Based on rule (5), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be rule 4.

Suggested change
4. Based on rule (5), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is
4. Based on rule (4), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we end up in the case where there is a yet-unknown, earlier anchor for the other branch in transit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes, it should be rule 4. It is time that made the decision because a Data Event without a Time Event is as if it occurred at time infinity.

Can we end up in the case where there is a yet-unknown, earlier anchor for the other branch in transit?

Yes, that's possible. If there was an as of yet unknown Time 5.5 corresponding to Data F, then Time 5.5 becomes the tip.

While a lot less likely in the absence of a malicious CAS attempting a late-publishing attack, it is also possible for example for a Time 1.5 covering Data B to be discovered late. This would rewind the state of the stream, marking Time 5 the tip.

Having said that, this spec provides a way for the application to resolve such a situation without data loss. A user can decide whether to override the default tip with a new event, while keeping the stream history intact.

CIPs/cip-145.md Outdated
be determined by looking at the `prev` fields of all the Data Events for that stream.
1. If a stream is in a diverged state, each uncovered event is a candidate tip.
2. Branches that do not contain dominant Data Events cannot be the tip.
3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to clarify that it's not just this event that will be included in the new tip, but its corresponding branch. I like first more than earliest, because the latter made me think of anchor time instead of ordering.

Suggested change
3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point.
3. For branches that contain dominant Data Events, consider the first Data Event after a fork point when electing a new tip branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we actually did mean to refer to anchor time here. This helps keep the language consistent with other places.

We hope this earlier clarification of a fork point help clarify what we mean:

* A `fork point` for a branch is the earliest event on that branch that is not on another branch. 

CIPs/cip-145.md Outdated
1. If a stream is in a diverged state, each uncovered event is a candidate tip.
2. Branches that do not contain dominant Data Events cannot be the tip.
3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point.
4. The branch with the earliest Data Event becomes the tip.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth mentioning the case for step 4 in the example below, where one candidate branch is anchored and one isn't. Is the anchored one earlier by definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the anchored one is earlier by definition. A Data Event without a Time Event is as if it occurred at time infinity.

We'll update the rule to state this.

CIPs/cip-145.md Outdated
Using multi-prev Data Events allows us to reduce the number of uncovered events and converge the stream so that there is
only a single uncovered event, without any data abandoned on pruned branches. The stream's converged/diverged state can
be determined by looking at the `prev` fields of all the Data Events for that stream.
1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we consider branches that contain dominant
Copy link

@m0ar m0ar Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Since the data A branch is the tip according to the rest of the rules in fig 4, I'm not sure why we should try to merge the descendants of T1 🤔
image

@m0ar
Copy link

m0ar commented Dec 19, 2023

In general, these are the most important features from my perspective:

  • All known commits, pruned or not, are pinned by the node
  • A new node can learn about all known commits, pruned or not, from other nodes

The reason is that a historical shuffle should not prevent resolution of a once-valid commit, which we should be able to implement in the client using these two properties. Otherwise, there is an avenue for abusing late publish as an undo button. We would like to be able to rely heavily on deterministic resolution of state, and this is OK as long as the commits are preserved and communicated regardless of a consensus change.

If I understand this CIP correctly, I think the scope of a late publishing attack would be equated to just adding a new commit on the tip, which is anyway possible to do for the controller. What I'm not sure about though is what the merge of a divergent branch means for the state of the stream, but I'm not sure if this is relevant in this context.

@smrz2001
Copy link
Contributor Author

In general, these are the most important features from my perspective:

  • All known commits, pruned or not, are pinned by the node
  • A new node can learn about all known commits, pruned or not, from other nodes

The reason is that a historical shuffle should not prevent resolution of a once-valid commit, which we should be able to implement in the client using these two properties. Otherwise, there is an avenue for abusing late publish as an undo button. We would like to be able to rely heavily on deterministic resolution of state, and this is OK as long as the commits are preserved and communicated regardless of a consensus change.

If I understand this CIP correctly, I think the scope of a late publishing attack would be equated to just adding a new commit on the tip, which is anyway possible to do for the controller. What I'm not sure about though is what the merge of a divergent branch means for the state of the stream, but I'm not sure if this is relevant in this context.

The two big advantages of this CIP are:

  • The tip event is a discovery mechanism for and commitment to not only the prev-chain but the prev-DAG.
  • You can at any point in time account for which branches are pruned and which have been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants