Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does merge() take ownership of the other side? #135

Open
exi opened this issue Mar 6, 2023 · 6 comments
Open

Why does merge() take ownership of the other side? #135

exi opened this issue Mar 6, 2023 · 6 comments

Comments

@exi
Copy link

exi commented Mar 6, 2023

With the current merge() API, I'm forced to clone the other side I'm merging from every time, because merge takes ownership.

Why is this necessary and can't there be a variant of merge with merge(&mut self, other: &Self) ?

@davidrusu
Copy link
Member

Hi @exi,

We take ownership because merge typically will want to move the data from the other side into your state.

E.g. say you have two sets:

A = {1,2,3}
B = {3,4,5}

A.merge(B)
// -> A = {1,2,3,4,5} -- 4, 5 have been moved to A (without cloning)

But I'm curious, do you have a need to hold onto the other state after merging? I assumed most cases will be throwing away the other state after they merge it into their state.

@exi
Copy link
Author

exi commented Mar 6, 2023

I actually have multiple such situations.

For merge():
I'm using CRDTs in a multi-party sync client that gets a full Orswot from one client, merges it into the local server and then broadcasts it out to all the other clients. In this case I have to clone the Orswot every time I want to merge it into the server before sending it along to other clients.
Clients already connect with a mostly complete state on their own and will in 99% of the cases not send data to the server that is new to the server, so in that case there would be no reason to consume the contents in most cases.

For apply():
Pretty much same scenario, for incremental updates, the server gets serialized "Op" from a client, applies it locally and then broadcasts them to other clients. I need to clone the Op every time here too.
I'm aware that in case of Op, there is a high chance that the apply would consume or clone the contents anyways, so it's less of an issue.

@davidrusu
Copy link
Member

I see, for the merge case, would it not make more sense to broadcast the merged state rather than the "other" state? This should speed up convergence a little.

You would need a signal to tell you if the other state had any new information, and then only broadcasting your state when you see new information. Perhaps that's a change worth making to the merge api?

// Merge the other state into our state, return true if our state was modified, false otherwise
fn merge(&mut self, other: Self) -> bool;

This way you can decide to broadcast based on the result of the merge?

@exi
Copy link
Author

exi commented Mar 7, 2023

These are good suggestions. Returning bool would already be very helpful.
What would be even better is returning a Vec<Op> of the operations it performed to bring local and other together.
That way I can only broadcast a list of Ops to the other clients instead of lugging the whole thing around.
This is under the assumption that the other clients are in-sync with the server most of the time, but if they are not, I can force a full-sync by sending hashes of the CRDT around.

@davidrusu
Copy link
Member

Interesting idea.

Returning a Vec<Op> from merge feels like it might be expanding the scope of merge beyond it's original purpose.

But computing this delta would be a useful API that we could implement:

fn delta(&self, other: Self) -> Vec<Op>

Thinking through this. You could collapse that Vec<Op> down to a single state by apply those delta Op's to the default CRDT (i.e. the empty ORSWOT) to get a "delta state". This would change the delta API to:

// The return value is the minimal state that when merged would bring you in sync with other.
fn delta(&self, other: Self) -> Self 

The delta state should be more space efficient than a vec of independent ops.

So your flow would be:

// on receive `other` state
let delta = self.state.delta(other);

// delta will be empty if `other` had no new information.
if self.state.merge(delta.clone()) {
    // broadcast delta
}

Impl thoughts:

  • delta() should take ownership of other since the return value will hold only new information moved from other. Anything not present in the delta state is a no-op so it should be thrown away / de-allocated.
  • merge() should continue to take ownership of other since if it's paired with delta, the delta will only hold new information which will be consumed in the merge.
  • merge()/apply() should return the bool so that you can decide to broadcast or not.

What do you think of something like that?

@davidrusu
Copy link
Member

I can force a full-sync by sending hashes of the CRDT around.

Better than hashes would be to send the vector clocks: Orswot::clock()

Use VClock::partial_cmp() to decide whether the sender is:

  • ahead of us: in which case the receiver can pull from the sender
  • behind us / concurrent: in which case the receiver can push state to the sender
  • in sync: do nothing.

Hashes would only tell you that you are out of sync and would result in more messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants