Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why single-writer append-only log at all? #11

Closed
Nuhvi opened this issue Feb 24, 2023 · 4 comments
Closed

Why single-writer append-only log at all? #11

Nuhvi opened this issue Feb 24, 2023 · 4 comments

Comments

@Nuhvi
Copy link

Nuhvi commented Feb 24, 2023

Hello @AljoschaMeyer, thanks again for the reply at Iroh.

I have discovered Bamboo as I have been struggling with many aspects of Hypercores, and while I appreciated how much simpler Bamboo is, and the fact that it is implemented in Rust, I finally realized my main use case for append-only logs, is append-only B-tree and a public (not encrypted) one for that matter.

So I started questioning what I get from using a single-writer append-only log at all, as opposed to a combination of Sparse Merkle tree for sync and range queries and Hyper Logical Clocks + Last Write Wins for conflict resolution.

Sure, this focus on the eventual consistency of the state of the key-value store, not the history of "commits" to use "Git" terminology.

But it seems to me that:
1- I win a much simpler Multi-writer
2- I can still signal that my key was compromised if I still have a copy (by publishing a tombstone entry)
3- I can choose to arrange roots / commits into a Bamboo if I really need to linearize them while still have sparse replication for them.
4- I can still create a (list) by creating ordered keys.
5- Write access control becomes much easier than juggling lots of logs and doing topological sorting
6- I can still do sparse replication of data from multiple non-trusted peers, and the worst they can do is hide some commits, just like they can hide more recent blocks in an append-only log, or even older blocks if they are my only peers.
7- While order transparency (happened before X) is lost, anyone can still use OpenTimeStamp to stamp any commit they see, and get an approximate timestamp linked to that too not just order.

So basically, I am trying to find out what do I lose by abandoning append-only logs?

The best use cases I can think of are:
1- Something like Certificate Transparency, when you are more interested to prove history hasn't been rewritten ever. And without using Blockchains.
2- Streaming video or audio or any streamable data where order matters.

Do you think append-only logs are crucial for the security in a p2p system and that my proposed solution contains some flawed reasoning, or would cause catastrophic failures?

@Nuhvi
Copy link
Author

Nuhvi commented Feb 24, 2023

Also, I am hopeful that the same system can work for encrypted data, which is a hope based on the viability of ORE of keys to mask their structure while maintaining range queries.

I would love to know your opinion there as well.

But even if that fails, you can still encode an encrypted append-only B-tree or Cryptree in the system above, and get the same performance you get from reading a Hyperbee from a blind peer who doesn't have its encryption keys. I assume.

@AljoschaMeyer
Copy link
Owner

AljoschaMeyer commented Feb 24, 2023

Generally speaking, logs enable you to determine which data two peers need to exchange in order to reconcile their state extremely efficiently. This comes at the cost of requiring a total order on all items (i.e., being single-writer), and making deletion problematic even when all peers would cooperate.

If your system design does not mind those two drawbacks, go for a log format of your choice. But if any of these drawbacks are problematic (and for very many systems they are), you probably shouldn't use logs. Grow-only sets are perhaps the most straightforward alternative, but all set reconciliation approaches come with costly protocols for determining which data peers need to exchange. That cost might be redundant data transfer, a high number of roundtrips, time-intensive local computations, space-intensive local computations, expensive auxiliary data structures, etc. So if you are thinking about using set reconciliation, you should carefully examine the computational complexity involved in the approaches you consider.

I believe that quite a few projects use logs even though they cannot/shouldn't accept the single-writer restriction or the data permanence, simply because practical set-based approaches have only recently entered the open source p2p sphere. Then again, we have yet to see some more expensive set reconciliation approach demonstrate its feasibility at scale.

As usual in distributed systems, not only is there no free lunch, there might be no enjoyable lunch at all.

@AljoschaMeyer
Copy link
Owner

When you reconciled sets, you can still have the items contain hashes of older items. So in particular, you can always recreate anything you could do with logs. You never lose expressivity when moving away from logs, you only lose efficiency (in exchange for more flexibility).

@Nuhvi
Copy link
Author

Nuhvi commented Feb 24, 2023

Thanks a lot @AljoschaMeyer, I am currently working in JavaScript and React Native code bases so can't pretend that I am fazed by the efficiency cost at the moment.

But would love to see data, benchmarks, or even a rough estimation of the scale of that price.
Alas, I am sure implementation details would dwarf any theoretical differences between set reconciliation and append-only logs.

So I am left with: If set-reconciliation is performant enough that it is not the bottleneck of any application it is used at (I won't use it for VoIP) then it doesn't really matter how much more efficient it would be if it was a log does it?

@Nuhvi Nuhvi closed this as completed Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants