Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling of deletes and channel state #16

Open
cryptix opened this issue Oct 24, 2023 · 16 comments
Open

handling of deletes and channel state #16

cryptix opened this issue Oct 24, 2023 · 16 comments

Comments

@cryptix
Copy link
Contributor

cryptix commented Oct 24, 2023

This is more of a conceptual question to get more understanding than about specific wording of the spec.

Why aren't post/deletes part of channel state replies?

The way i'm seeing it, getting for e.g. two topics without the delete in the middle, where the 2nd/older topic would have non-recent links, would require a certain amount of trust to the peer, no?

Adding the delete in the middle would at least show "well this thing is gone" now and maybe i still have that old topic anyway.

tangentially related: can all post's be deleted?

If i create a new channel and then delete the join, what's the most recent state? empty channel? no channel? if somebody else joins and i join again, should i reference the delete?

@cryptix
Copy link
Contributor Author

cryptix commented Oct 24, 2023

given that the verification requires to have the original message (to check that the public key matches), i could see that as an motivation not to hold onto them in any case.

@hackergrrl
Copy link
Member

hackergrrl commented Oct 24, 2023

Great questions cryptix!

Why aren't post/deletes part of channel state replies?

The way i'm seeing it, getting for e.g. two topics without the delete in the middle, where the 2nd/older topic would have non-recent links, would require a certain amount of trust to the peer, no?

Adding the delete in the middle would at least show "well this thing is gone" now and maybe i still have that old topic anyway.

This is a good point. I like the idea of this request returning deletions relevant to the channel state. Curious if @cblgh has any feedback re: the implementation complexity of such an edit.

tangentially related: can all post's be deleted?

Right now one would need to issue a post/delete with EVERY HASH THEY EVER CREATED. This is unwieldy, and there's been discussion of adding another post type or augmenting this post type with a "time range" option for deletes.

If i create a new channel and then delete the join, what's the most
recent state? empty channel? no channel? if somebody else joins and
i join again, should i reference the delete?

Great Q! The spec definitely doesn't address this case.

If we say that it deletes the channel, then as the post deletion propagated, everyone would eventually delete that channel. It is an extra bit of implementation logic though, to check for this case.

If we say that that the channel remains, I could see that being a bit weird from a user perspective -- having an empty channel with no members.

More broadly then, if all posts in a channel are deleted, does that channel remain or be destroyed? I lean toward considering the channel not existing.

@cblgh
Copy link
Member

cblgh commented Oct 24, 2023

Why aren't post/deletes part of channel state replies?

The way i'm seeing it, getting for e.g. two topics without the delete in the middle, where the 2nd/older topic would have non-recent links, would require a certain amount of trust to the peer, no?
Adding the delete in the middle would at least show "well this thing is gone" now and maybe i still have that old topic anyway.

This is a good point. I like the idea of this request returning deletions relevant to the channel state. Curious if @cblgh has any feedback re: the implementation complexity of such an edit.

As indicated in the public cabal I don't quite understand the question being posed. Can someone that understands it rephrase it? I really don't grok where delete is getting into the picture in the quoted paragraph of the opening post

My flawed attempt at understanding it, demonstrated with a scenario:

Let's say we have two users that are setting a topic of a channel, User A and user B. User A posts post/topic (call it topic 1) to channel C. User B then posts another post/topic (call it topic 2) to channel C.

My read of what @cryptix was saying was that someone requesting channel state for channel C would benefit from peers verifiably deleting the obsolete topic, topic 1, with a post/delete. (How off the mark is this @cryptix? ^^')

But since the only posts a user may delete are those posts they have authored themselves, this only makes sense for the case where a single user has posted two topics in a row for the same channel (and as such may delete the first post/topic of the two because they authored it). In my posited scenario, the only user who may delete topic 1 is user A (the author of the post setting topic 1), which is why I think I am misunderstanding something and requested a more complete example over the public cabal X)

cc @hackergrrl

@hackergrrl
Copy link
Member

hackergrrl commented Oct 24, 2023

@cblgh I think I understand the [understandable] confusion! My understanding of the situation is this:

Right now, all deletes are propagated using the Channel Time Range Request: chat message deletes, but also deletions of topic edits, joins, and parts. I think @cryptix is asking why Channel State Request will provide NEW topics but not info on DELETED topics. I agree that it seems strange, since deletions of topics is also conceptually part of channel state.

(I'm going to constrain my responses on this issue to just here, rather than discuss partly here & partly on the pub cabal.)

@hackergrrl
Copy link
Member

Proposals for spec (and thus implementation) changes:

  1. Clarify that if all posts in a channel are deleted such that the clauses in 5.4 Channels are no longer true, the channel no longer exists either.
  2. Specify that Channel Time Range Request returns post/deletes only for post/text posts, and Channel State Request returns post/deletes for post/join, post/leave, and post/topic.

cc @cblgh @cryptix @mycognosist

@cryptix
Copy link
Contributor Author

cryptix commented Oct 25, 2023

👍🏻 on the proposed changed. Seems much more consistent.

why Channel State Request will provide NEW topics but not info on DELETED topics

yup, that part. Holding on to the deletes should keep the links tree intact, i think and make clear that a peer isn't lying or just sending out-of-date info..?

At first i thought about not keeping track of deletes at all but now I'm not 100% sure that would be correct.. in a sense those are only important for people that saw the message referenced in a delete?

can all post's be deleted?

why are words so hard..? 😅 that should have been: can all types of post's be deleted? but I think you also covered that in your change proposal now, @hackergrrl.

@cblgh
Copy link
Member

cblgh commented Oct 25, 2023

Proposals for spec (and thus implementation) changes:

1. Clarify that if all posts in a channel are deleted such that the clauses in _5.4 Channels_ are no longer true, the channel no longer exists either.

Seems reasonable!

2. Specify that `Channel Time Range Request` returns `post/delete`s only for `post/text` posts, and `Channel State Request` returns `post/delete`s for `post/join`, `post/leave`, and `post/topic`.

I have qualms!

My concern: Channel State Request lacks a history limiting mechanism. With the proposed change we could have a really old channel whose response to a Channel State Request blasts a metric ton of hashes for deletes of posts that nobody actually even has anymore; we are stuck returning those post/delete hashes forever for that channel. This kind of everlasting channel state history is a complaint I've seen aired wrt Matrix so I think it is something to be aware about.

Channel Time Range Request's time range enables graceful loss of such hashes over time.

One solution I'm playing with in the request I'm introducing in the moderation work is a field oldest, it has this description currently:

Field oldest is a time, expressed in milliseconds since UNIX Epoch, limiting the amount of history sent if set. If oldest > 0 then the posts whose hashes are being returned should fulfill post.timestamp >= oldest. Implementations are recommended to set a large value for oldest, for example on the order of 1 year from the time of requesting: oldest = - 31536000000.

Though this mechanism may not play well with channel membership, specifically in the case of post/leave, as well as retrieving post/info for old-ass nicknames.

@mycognosist
Copy link
Contributor

In response to the first suggested spec change: I don't have a strong preference either way. It sort of feels like an implementation detail to me (whether the channel is considered empty or non-existent) but I can see that there are side-effects - for example, inclusion of empty channels in channel list requests (where it might be preferable not to include empty channels at all).

Regarding the second proposed change: I am still thinking through the ramifications. I think @cblgh raises an important point about the lack of a time-based limiting mechanism for channel state requests. Need to ferment this some more before I can offer further input.

@cryptix
Copy link
Contributor Author

cryptix commented Oct 27, 2023

Agreed that we shouldn't let the state grow uncontrollably! The more I think about it the more I'm leaning towards not storing post/deletes at all. Use and forget, done.

Forwarding them to "live" (future=1) requests seems sensible to me to, still. I'd like to ensure a peer is not being withheld information or being gamed by a malicious peer, trying to suppress things.

How do other's feel about the holes in the link dag? It could be that my fear there is purely academic. I guess they would heal? A few examples would help me. I will try to see I've I can make one or two.

@mycognosist
Copy link
Contributor

Forwarding them to "live" (future=1) requests seems sensible to me to, still.

Agreed. Ensuring the post/delete has been passed on to at least one peer before forgetting sounds desirable to me. Deletions (the right to be forgotten) seem to be a very important part of the protocol and I wouldn't want to take actions which degraded their efficacy in some way. Keen to hear what @hackergrrl and @cblgh think about it.

How do other's feel about the holes in the link dag?

Since links are primarily intended to provide a causal proof for post ordering - and assuming that non-deleted post/texts will greatly outnumber deleted post/texts - I don't see it as a major problem. I also think it would be useful to think through with some examples.

I haven't implemented links fully in cable.rs yet; @cblgh will likely have a more informed opinion.

@ahdinosaur
Copy link

i was just thinking about the Cable spec and realized the implications of how deletes are implemented (if i understand correctly): a delete removes the entire message (including link metadata), not just the content. versus say bamboo or ppppp.

How do other's feel about the holes in the link dag?

personally, i'd prefer deletes removed the content and not the link metadata, so a delete couldn't affect causality. i love tangles and i too probably have a kink for technical purity. 💜

@ahdinosaur
Copy link

My concern: Channel State Request lacks a history limiting mechanism. With the proposed change we could have a really old channel whose response to a Channel State Request blasts a metric ton of hashes for deletes of posts that nobody actually even has anymore; we are stuck returning those post/delete hashes forever for that channel. This kind of everlasting channel state history is a complaint I've seen aired wrt Matrix so I think it is something to be aware about.

Channel Time Range Request's time range enables graceful loss of such hashes over time.

One solution I'm playing with in the request I'm introducing in the moderation work is a field oldest, it has this description currently:

since i'm on the soapbox, might as well proselytize the magical power of tangles and predsl.

for example, Staltz's new p2p data structures with a history limiting mechanism,

but also sorry if this isn't helpful, feel free to ignore me! 👻

@hackergrrl
Copy link
Member

hackergrrl commented Nov 9, 2023

@cryptix wrote:

The more I think about it the more I'm leaning towards not storing post/deletes at all. Use and forget, done.

@mycognosist wrote:

Ensuring the post/delete has been passed on to at least one peer before forgetting sounds desirable to me.

🙀

It's important to me that deletions are very durable, and are a part of the eventually consistent state. Simultaneously, I definitely agree with y'all that we should be very judicious about not having any responses be grow-only sets! Thanks for pointing that out @cblgh.

Having thought about it again, I like the idea of ALL post/deletes coming in through Channel Time Range Request. The way I'm thinking about it in my head is

  • Channel State Request: "please give me the latest state info so I don't need to dig deep into historic posts"
  • Channel Time Range Request: "what new stuff has happened in this channel since $TIME?"

Fortunately this is already how Channel Time Range Request works! Although it uses some vague-as-heck language, saying " SHOULD include the hashes of all known post/text and post/delete posts made to a channel" which is silly because post/delete has no channel field. :)

So, a new edit proposal:

  • Revise Channel Time Range Request to be very explicit about exactly what ought to be returned. Here is, I think, everything it ought to return, within the time range given:
    1. All post/text
    2. All post/delete that pertain to {post/text, post/topic} for this channel
    3. All post/delete that pertain to a member or ex-member of this channel
    4. If the client doesn't know what kind of post/* type a delete pertains to, return it here too.

(2) means that deletes will generally not propagate for deletions of post/{join,leave,delete}.

(3) means that duplicate hashes can be returned if Alice has other channels in common with Bob if he deletes one of his post/info posts. I think this is ok, since clients are expected to deduplicate & not re-request known posts. Fancy client implementations could deduplicate at the socket level and know not to send that hash down all live Channel Time Range Requests, too.

(4) This is for the case when a client sees a post/delete for a post they don't have: they can't tell if it's a join or a topic or a chat msg or something else. This will actually also happen if a client fully deletes the post in question locally without a tombstone: they won't know what ANY deletions are about when someone makes a Channel Time Range Request. :)

@hackergrrl
Copy link
Member

@cblgh @mycognosist @cryptix I'm going to make these edits pretty soon if there are no objections or calls for further discussion.

@cblgh
Copy link
Member

cblgh commented Nov 17, 2023

So, a new edit proposal:

* Revise `Channel Time Range Request` to be very explicit about exactly what ought to be returned. Here is, I think, everything it ought to return, within the time range given:
  
  1. All `post/text`
  2. All `post/delete` that pertain to `{post/text, post/topic}` for this channel
  3. All `post/delete` that pertain to a member or ex-member of this channel
  4. If the client doesn't know what kind of `post/*` type a delete pertains to, return it here too.

(2) means that deletes will generally not propagate for deletions of post/{join,leave,delete}.

(3) means that duplicate hashes can be returned if Alice has other channels in common with Bob if he deletes one of his post/info posts. I think this is ok, since clients are expected to deduplicate & not re-request known posts. Fancy client implementations could deduplicate at the socket level and know not to send that hash down all live Channel Time Range Requests, too.

(4) This is for the case when a client sees a post/delete for a post they don't have: they can't tell if it's a join or a topic or a chat msg or something else. This will actually also happen if a client fully deletes the post in question locally without a tombstone: they won't know what ANY deletions are about when someone makes a Channel Time Range Request. :)

@hackergrrl A big ol 👍 from me—these revisions will help implementors a lot! I'll have to revisit my logic and the indexing to be certain, but I think these are roughly the interpretations glyph & I landed in when doing our respective implementations :) Once this lands I'll make time to patch the places in cable-core.js that don't conform if any.

To clarify my own understanding by rephrasing - does the revision essentially say:

"A channel time range request, for a given time range, should return all hashes for:"

  1. post/text posted in the channel
  2. post/delete targeting a post/text or post/topic posted to the channel
  3. post/delete targeting any post type authored by a member/ex-member of the channel (i.e. includes delete of post/join, post/info, post/leave)
  4. post/delete for any unknown post types / posts we don't have knowledge about

@hackergrrl
Copy link
Member

@cblgh Yes, 👍🏻 to your rephrase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants