-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why single-writer append-only log at all? #11
Comments
Also, I am hopeful that the same system can work for encrypted data, which is a hope based on the viability of ORE of keys to mask their structure while maintaining range queries. I would love to know your opinion there as well. But even if that fails, you can still encode an encrypted append-only B-tree or Cryptree in the system above, and get the same performance you get from reading a Hyperbee from a blind peer who doesn't have its encryption keys. I assume. |
Generally speaking, logs enable you to determine which data two peers need to exchange in order to reconcile their state extremely efficiently. This comes at the cost of requiring a total order on all items (i.e., being single-writer), and making deletion problematic even when all peers would cooperate. If your system design does not mind those two drawbacks, go for a log format of your choice. But if any of these drawbacks are problematic (and for very many systems they are), you probably shouldn't use logs. Grow-only sets are perhaps the most straightforward alternative, but all set reconciliation approaches come with costly protocols for determining which data peers need to exchange. That cost might be redundant data transfer, a high number of roundtrips, time-intensive local computations, space-intensive local computations, expensive auxiliary data structures, etc. So if you are thinking about using set reconciliation, you should carefully examine the computational complexity involved in the approaches you consider. I believe that quite a few projects use logs even though they cannot/shouldn't accept the single-writer restriction or the data permanence, simply because practical set-based approaches have only recently entered the open source p2p sphere. Then again, we have yet to see some more expensive set reconciliation approach demonstrate its feasibility at scale. As usual in distributed systems, not only is there no free lunch, there might be no enjoyable lunch at all. |
When you reconciled sets, you can still have the items contain hashes of older items. So in particular, you can always recreate anything you could do with logs. You never lose expressivity when moving away from logs, you only lose efficiency (in exchange for more flexibility). |
Thanks a lot @AljoschaMeyer, I am currently working in JavaScript and React Native code bases so can't pretend that I am fazed by the efficiency cost at the moment. But would love to see data, benchmarks, or even a rough estimation of the scale of that price. So I am left with: If set-reconciliation is performant enough that it is not the bottleneck of any application it is used at (I won't use it for VoIP) then it doesn't really matter how much more efficient it would be if it was a log does it? |
Hello @AljoschaMeyer, thanks again for the reply at Iroh.
I have discovered Bamboo as I have been struggling with many aspects of Hypercores, and while I appreciated how much simpler Bamboo is, and the fact that it is implemented in Rust, I finally realized my main use case for append-only logs, is append-only B-tree and a public (not encrypted) one for that matter.
So I started questioning what I get from using a single-writer append-only log at all, as opposed to a combination of Sparse Merkle tree for sync and range queries and Hyper Logical Clocks + Last Write Wins for conflict resolution.
Sure, this focus on the eventual consistency of the state of the key-value store, not the history of "commits" to use "Git" terminology.
But it seems to me that:
1- I win a much simpler Multi-writer
2- I can still signal that my key was compromised if I still have a copy (by publishing a tombstone entry)
3- I can choose to arrange roots / commits into a Bamboo if I really need to linearize them while still have sparse replication for them.
4- I can still create a (list) by creating ordered keys.
5- Write access control becomes much easier than juggling lots of logs and doing topological sorting
6- I can still do sparse replication of data from multiple non-trusted peers, and the worst they can do is hide some commits, just like they can hide more recent blocks in an append-only log, or even older blocks if they are my only peers.
7- While order transparency (happened before X) is lost, anyone can still use OpenTimeStamp to stamp any commit they see, and get an approximate timestamp linked to that too not just order.
So basically, I am trying to find out what do I lose by abandoning append-only logs?
The best use cases I can think of are:
1- Something like Certificate Transparency, when you are more interested to prove history hasn't been rewritten ever. And without using Blockchains.
2- Streaming video or audio or any streamable data where order matters.
Do you think append-only logs are crucial for the security in a p2p system and that my proposed solution contains some flawed reasoning, or would cause catastrophic failures?
The text was updated successfully, but these errors were encountered: