fix(CIP-145): updates from forum discussion #149

smrz2001 · 2023-12-15T23:46:02Z

Updates based on this forum discussion.

oed · 2023-12-18T09:33:13Z

CIPs/cip-145.md

-Using multi-prev Data Events allows us to reduce the number of uncovered events and converge the stream so that there is
-only a single uncovered event, without any data abandoned on pruned branches. The stream's converged/diverged state can
-be determined by looking at the `prev` fields of all the Data Events for that stream.
+1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we consider branches that contain dominant


Why would time events get pruned?
Consider Fig. 6:

Data B expires at time 3<n<4

Data C has a time event at 4

If Time 3 is pruned, then it's unclear why Data B should remain valid

Good point. Since the data A branch is the tip according to the rest of the rules in fig 4, I'm not sure why we should try to merge the descendants of T1 🤔

Yes, good question. @AaronGoldman and I had a good discussion about this scenario.

In this example, Data C can only be created after Data B has been validated to either be within the expiration timeout or to already have a valid Time Event.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Of course, if Time 3 for Data B is available before Data C is created, the CID of Time 3 will be included in Data C's multi-prev.

To your point, @m0ar, including Time 1's descendants allows multi-prev to cover existing history (even if pruned) in the stream state, whereas today, pruned events are lost from the stream state.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Can we really make this assumption? In theory the author of Data C could "import" Data B into the log even though it's invalid, e.g. Time 3 doesn't exist or is too late.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Can we really make this assumption? In theory the author of Data C could "import" Data B into the log even though it's invalid, e.g. Time 3 doesn't exist or is too late.

Events like Time 3 are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Agree that Data C's validity is not predicated on the validity of A or B. However, the validity of Data B is predicated on Time 3 being available. Therefore, it doesn't seem right to say that we can prune Time 3.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Agree that Data C's validity is not predicated on the validity of A or B. However, the validity of Data B is predicated on Time 3 being available. Therefore, it doesn't seem right to say that we can prune Time 3.

Yes, what would you think about including Time 3 in the multi-prev of the next event to be added to the stream? That's what we meant to say here:

Events like Time 3 are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.

What if a new event Data D (occurring after Time 4) had a prev of [Time 4, Time 3]? Even though Time 3 will not take precedence during tip selection, it will always remain part of the stream history.

Or, let's say, Time 3 didn't show up until we already had Time 4 -> Data D -> Time 5 -> Data E -> Time 6. Then Data F (occurring after Time 6) would have a prev of [Time 6, Time 3]. Data B would remain in an "unverified" state until Time 3 was discovered.

Tracking Data B's validity this way would be a little more complicated than the usual flow, but always possible. Moreover, now all events related to the stream would be part of the DAG, which is great.

Yes, this is what I mean! Just wanted to be clear that Time 3 can't be pruned if we expect the creator of Data D to include it in prev.

@smrz2001 since it seems hat we are in agreement, maybe you can update the language to be clear that the TimeEvent doesn't get pruned?

oed · 2023-12-18T09:34:24Z

CIPs/cip-145.md

+3. The Data Event that is covered by the earliest Time Event wins (see event `A` in fig. 5).
+4. If two Data Events share the earliest timestamp, then the branch of the Data Event with the lower CID wins.


What does it mean for a Data Event to "win" in the context of multiple prev?

We updated the wording. A Data Event "winning" here meant that that Data Event would be marked the tip by the protocol. An application would be able to make use of this information to create a merge Data Event, though other candidate CIDs would also be present in the prev field.

Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the prev field? I don't follow why the protocol needs to care about "winning"?

e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it's prev array when it's being constructed.

I get where you're going, but I'm not sure how well interoperability would work if two projects have different ideas on tip consensus 🤔

Or do you with application here mean the ceramic node? As in, the stream type implementation would decide on how to solve conflicts? I think that would make sense if so, as there may be other valid interpretations of this depending on the stream type.

Yes, when I say protocol here I'm referring to the event streaming protocol. Stream type handlers is an application.

Makes full-on sense then 👌

Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the prev field? I don't follow why the protocol needs to care about "winning"?
e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it's prev array when it's being constructed.

To answer your question, @oed, there are two reasons for this:

It is simpler for applications to just be given a tip per some default, predictable algorithm. They can choose to override this order, but don't have to.

There is always some eventually consistent state of a diverged stream, even if the controller never comes back to create the Merge Event, because the default precedence rules can be used to determine the tip.

Wouldn't the application need to be given all tips anyway? In order to include them all in the prev array? Are we simply talking about the ordering of the CIDs in the returned array here?

m0ar · 2023-12-19T08:30:25Z

CIPs/cip-145.md

+3. This branch later forks into additional branches for `Data E`, `Time 4`, and `Data F`. Based on rules (3) and (4),
+   the branches for `Data E` and `Data F` are the only branches considered for tip selection.
+   ![Alt text](../assets/cip-145/rules4.png)
+4. Based on rule (5), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is


I think this should be rule 4.

Suggested change

4. Based on rule (5), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is

4. Based on rule (4), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is

Can we end up in the case where there is a yet-unknown, earlier anchor for the other branch in transit?

Good catch, yes, it should be rule 4. It is time that made the decision because a Data Event without a Time Event is as if it occurred at time infinity.

Can we end up in the case where there is a yet-unknown, earlier anchor for the other branch in transit?

Yes, that's possible. If there was an as of yet unknown Time 5.5 corresponding to Data F, then Time 5.5 becomes the tip.

While a lot less likely in the absence of a malicious CAS attempting a late-publishing attack, it is also possible for example for a Time 1.5 covering Data B to be discovered late. This would rewind the state of the stream, marking Time 5 the tip.

Having said that, this spec provides a way for the application to resolve such a situation without data loss. A user can decide whether to override the default tip with a new event, while keeping the stream history intact.

m0ar · 2023-12-19T08:35:48Z

CIPs/cip-145.md

-be determined by looking at the `prev` fields of all the Data Events for that stream.
+1. If a stream is in a diverged state, each uncovered event is a candidate tip.
+2. Branches that do not contain dominant Data Events cannot be the tip.
+3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point.


Would be nice to clarify that it's not just this event that will be included in the new tip, but its corresponding branch. I like first more than earliest, because the latter made me think of anchor time instead of ordering.

Suggested change

3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point.

3. For branches that contain dominant Data Events, consider the first Data Event after a fork point when electing a new tip branch.

Yes, we actually did mean to refer to anchor time here. This helps keep the language consistent with other places.

We hope this earlier clarification of a fork point help clarify what we mean:

* A `fork point` for a branch is the earliest event on that branch that is not on another branch.

m0ar · 2023-12-19T08:38:36Z

CIPs/cip-145.md

+1. If a stream is in a diverged state, each uncovered event is a candidate tip.
+2. Branches that do not contain dominant Data Events cannot be the tip.
+3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point.
+4. The branch with the earliest Data Event becomes the tip.


It might be worth mentioning the case for step 4 in the example below, where one candidate branch is anchored and one isn't. Is the anchored one earlier by definition?

Yes, the anchored one is earlier by definition. A Data Event without a Time Event is as if it occurred at time infinity.

We'll update the rule to state this.

m0ar · 2023-12-19T09:00:13Z

CIPs/cip-145.md

-Using multi-prev Data Events allows us to reduce the number of uncovered events and converge the stream so that there is
-only a single uncovered event, without any data abandoned on pruned branches. The stream's converged/diverged state can
-be determined by looking at the `prev` fields of all the Data Events for that stream.
+1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we consider branches that contain dominant


Good point. Since the data A branch is the tip according to the rest of the rules in fig 4, I'm not sure why we should try to merge the descendants of T1 🤔

m0ar · 2023-12-19T09:21:34Z

In general, these are the most important features from my perspective:

All known commits, pruned or not, are pinned by the node
A new node can learn about all known commits, pruned or not, from other nodes

The reason is that a historical shuffle should not prevent resolution of a once-valid commit, which we should be able to implement in the client using these two properties. Otherwise, there is an avenue for abusing late publish as an undo button. We would like to be able to rely heavily on deterministic resolution of state, and this is OK as long as the commits are preserved and communicated regardless of a consensus change.

If I understand this CIP correctly, I think the scope of a late publishing attack would be equated to just adding a new commit on the tip, which is anyway possible to do for the controller. What I'm not sure about though is what the merge of a divergent branch means for the state of the stream, but I'm not sure if this is relevant in this context.

smrz2001 · 2024-01-12T22:58:40Z

In general, these are the most important features from my perspective:

All known commits, pruned or not, are pinned by the node

A new node can learn about all known commits, pruned or not, from other nodes

The reason is that a historical shuffle should not prevent resolution of a once-valid commit, which we should be able to implement in the client using these two properties. Otherwise, there is an avenue for abusing late publish as an undo button. We would like to be able to rely heavily on deterministic resolution of state, and this is OK as long as the commits are preserved and communicated regardless of a consensus change.

If I understand this CIP correctly, I think the scope of a late publishing attack would be equated to just adding a new commit on the tip, which is anyway possible to do for the controller. What I'm not sure about though is what the merge of a divergent branch means for the state of the stream, but I'm not sure if this is relevant in this context.

The two big advantages of this CIP are:

The tip event is a discovery mechanism for and commitment to not only the prev-chain but the prev-DAG.
You can at any point in time account for which branches are pruned and which have been merged.

fix: pairing updates from forum discussion

8e8a89f

smrz2001 requested a review from oed December 15, 2023 23:46

smrz2001 assigned smrz2001 and AaronGoldman Dec 15, 2023

smrz2001 changed the title ~~fix: pairing updates from forum discussion~~ fix: updates from forum discussion Dec 16, 2023

smrz2001 changed the title ~~fix: updates from forum discussion~~ fix(CIP-145): updates from forum discussion Dec 16, 2023

oed reviewed Dec 18, 2023

View reviewed changes

fix: pairing updates

0cfb7d0

m0ar reviewed Dec 19, 2023

View reviewed changes

fix: pairing updates

7879373

fix: pairing updates

93193e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(CIP-145): updates from forum discussion #149

fix(CIP-145): updates from forum discussion #149

smrz2001 commented Dec 15, 2023 •

edited

oed Dec 18, 2023

m0ar Dec 19, 2023 •

edited

smrz2001 Dec 20, 2023

oed Dec 20, 2023

smrz2001 Jan 5, 2024

oed Jan 6, 2024

smrz2001 Jan 8, 2024

oed Jan 9, 2024

oed Jan 12, 2024

oed Dec 18, 2023

smrz2001 Dec 19, 2023 •

edited

oed Dec 19, 2023

oed Dec 19, 2023

m0ar Dec 19, 2023

oed Dec 19, 2023

m0ar Dec 19, 2023

smrz2001 Jan 11, 2024

oed Jan 12, 2024

m0ar Dec 19, 2023

m0ar Dec 19, 2023

smrz2001 Jan 11, 2024

m0ar Dec 19, 2023

smrz2001 Jan 12, 2024

m0ar Dec 19, 2023

smrz2001 Jan 11, 2024

m0ar Dec 19, 2023 •

edited

m0ar commented Dec 19, 2023 •

edited

smrz2001 commented Jan 12, 2024

		3. The Data Event that is covered by the earliest Time Event wins (see event `A` in fig. 5).
		4. If two Data Events share the earliest timestamp, then the branch of the Data Event with the lower CID wins.

	4. Based on rule (5), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is
	4. Based on rule (4), since there is a Time Event for `Data E` but not for `Data F`, only the branch for `Data E` is

	3. For branches that contain dominant Data Events, consider the earliest Data Event after a fork point.
	3. For branches that contain dominant Data Events, consider the first Data Event after a fork point when electing a new tip branch.

fix(CIP-145): updates from forum discussion #149

Are you sure you want to change the base?

fix(CIP-145): updates from forum discussion #149

Conversation

smrz2001 commented Dec 15, 2023 • edited

Choose a reason for hiding this comment

m0ar Dec 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smrz2001 Dec 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m0ar Dec 19, 2023 • edited

Choose a reason for hiding this comment

m0ar commented Dec 19, 2023 • edited

smrz2001 commented Jan 12, 2024

smrz2001 commented Dec 15, 2023 •

edited

m0ar Dec 19, 2023 •

edited

smrz2001 Dec 19, 2023 •

edited

m0ar Dec 19, 2023 •

edited

m0ar commented Dec 19, 2023 •

edited