Comparison to Kappa-db? #37

okdistribute · 2020-08-03T19:10:24Z

What's the problem you want solved?

Hi, I see in the readme you compare to DAT, but it might be best to split out the comparison a bit to be more accurate, since the ecosystem is quite large.

Is there a solution you'd like to recommend?

Hyperspace will be the new RPC module for creating applications that are compatible in the dat ecosystem https://github.com/hyperspace-org. This has the same concerns you note with Hypercore, multi-writer is not possible out of the box and it is a bit more complex to do that.

Kappa-db (github.com/kappa-db/) is quite close to Earthstar, but it is less 'batteries-included' and more for customizing database behaviors. I really like the approach earthstar has taken to make these patterns more accessible to the common dev!

Thanks ~K

cinnamon-bun · 2020-08-03T21:22:28Z

Sure! I'll update README eventually, but here's a start. Please let me know if I got anything wrong about kappa-db!

The short answer is "Earthstar is like CouchDB; Kappa-db is like SSB"

Data structure & mutability

Kappa-db is a bundle of append-only logs (hypercores), one per author per device. It builds indexes by processing messages from the logs, in order, to build up a reduced state. The logs grow forever.

Messages can't be edited, instead you put more messages on the end that can modify the reduced state. Messages can probably be missing, using hypercore sparse mode? So maybe old data can be deleted that way.

Earthstar is a key-value database (but we use the words "path" and "document instead). It has fewer guarantees than Kappa-db and more flexibility. You can hold any subset of the documents, sync them in any order, do partial sync, drop ones you don't want.

Documents can be overwritten with newer versions. There's built-in access control so multiple authors can be allowed or forbidden to mutate the same documents.

Keys, Identities, Devices, Multi-writer

Both use ed25519 keys for identity and signatures.

Kappa-db has one identity per device.

Earthstar has one identity per author (across multiple devices).

Kappa-db and Earthstar both allow multi-writer (multiple authors putting data into the same space).

Universes of data, and how people join them

Each kappa-db (bundle of logs) is a separate unrelated universe of data, and it's named by ... the swarm key? Anyone with the swarm key can host or write to it?

Earthstar is split up into "workspaces" which are separate universes of data. Anyone with the workspace address can host or write to it. Soon there will also be invite-only workspaces -- anyone can host, but you need the secret key to write.

Indexing

Kappa-db apps are responsible for writing their own indexer / reducer. It's more work to write that code, but you can customize the index for the complex queries your app needs. If the messages are patch-style and build on each other, you need the whole history for it to make sense. If the messages are more standalone in their design you could safely drop some using sparse mode, if you wanted to.

In Earthstar, the core library does basic indexing for you and provides a way to query your documents based on their properties. It's NoSQL style (think MongoDB). It assumes documents are standalone and independent, not patches that build on each other and need to be reduced.

Syncing

Kappa-db uses hyperswarm to find peers and the hypercore protocol to sync data, with some custom stuff to handle multiple hypercores.

Earthstar is not well developed here. It can connect to cloud peers over HTTP for syncing, in the style of SSB pubs (see earthstar-pub). I'm planning to add hyperswarm also for direct p2p connections.

Kappa-db relies on "logs that sync" as the underlying abstraction. Earthstar relies on a specification for "signed versioned documents".

Conflicts

Kappa-db can't have low-level conflicts because each person (and device) has their own feed. Maybe the feeds disagree about something; that's up to the app's indexing code to figure out.

Earthstar resolves conflicts with a very simple last-write-wins rule, but it keeps the conflicting document versions so apps can do something fancier if they want, or ask the user what to do.

Clocks and timestamps

Low-level kappa-db will work with clock skew but the app might rely on timestamps in other ways (e.g. cabal sorts messages by timestamp?)

Earthstar won't work well if peers have very inaccurate clocks. It refuses to sync documents from more than 10 minutes in the future. This could be loosened; it could be fixed if we require each path to be restricted to one author, and forbid paths that anyone can write to. (Details)

Maturity & adoption

Kappa-db is medium maturity; hypercore is high maturity and widely used.

Earthstar is new and nobody is using it yet :)

okdistribute · 2020-08-04T04:33:39Z

Awesome thanks @cinnamon-bun! Thorough review!

Sure! I'll update README eventually, but here's a start. Please let me know if I got anything wrong about kappa-db!

The short answer is "Earthstar is like CouchDB; Kappa-db is like SSB"

Data structure & mutability

Kappa-db is a bundle of append-only logs (hypercores), one per author per device. It builds indexes by processing messages from the logs, in order, to build up a reduced state. The logs grow forever.

Universes of data, and how people join them

Each kappa-db (bundle of logs) is a separate unrelated universe of data, and it's named by ... the swarm key? Anyone with the swarm key can host or write to it?

Earthstar is split up into "workspaces" which are separate universes of data. Anyone with the workspace address can host or write to it. Soon there will also be invite-only workspaces -- anyone can host, but you need the secret key to write.

Multiple authors can be made per Kappa-db using .writer(name). Multiple Kappa-db can be made per device! Seems like a Kappa-db might be more like an Earthstar's Workspace?

Keys, Identities, Devices, Multi-writer

Both use ed25519 keys for identity and signatures.

Kappa-db has one identity per device.

See above.

Messages can't be edited, instead you put more messages on the end that can modify the reduced state. Messages can probably be missing, using hypercore sparse mode? So maybe old data can be deleted that way.

Yes that's right, sparse mode gives one the power of deletion! Editing a k/v is also possible by using a materialized view. See http://npmjs.com/unordered-materialized-kv for an example, but there are other approaches as well!

Indexing

Kappa-db apps are responsible for writing their own indexer / reducer. It's more work to write that code, but you can customize the index for the complex queries your app needs. If the messages are patch-style and build on each other, you need the whole history for it to make sense. If the messages are more standalone in their design you could safely drop some using sparse mode, if you wanted to.

In Earthstar, the core library does basic indexing for you and provides a way to query your documents based on their properties. It's NoSQL style (think MongoDB). It assumes documents are standalone and independent, not patches that build on each other and need to be reduced.

Applications like Cabal, Mapeo, Cobox, Sonar which are built on kappa-db aren't using patches. See Automerge/Hypermerge for an approach in which patches are used. Kappa-db apps usually use leveldb in production which also has a query interface -- but I really like that Earthstar has a sqlite adapter! Would be very cool to pull that into kappa world :)

Syncing

Kappa-db uses hyperswarm to find peers and the hypercore protocol to sync data, with some custom stuff to handle multiple hypercores.

Earthstar is not well developed here. It can connect to cloud peers over HTTP for syncing, in the style of SSB pubs (see earthstar-pub). I'm planning to add hyperswarm also for direct p2p connections.

Kappa-db relies on "logs that sync" as the underlying abstraction. Earthstar relies on a specification for "signed versioned documents".

With this PR, (kappa-db/kappa-core#14) kappa-db will no longer require the use of hypercore.

Hyperswarm isn't required in kappa-db's dependencies at all, since hypercore can work over any Node.js stream (e.g., we aren't using Hyperswarm with kappa-db in Mapeo :))

Conflicts

Kappa-db can't have low-level conflicts because each person (and device) has their own feed. Maybe the feeds disagree about something; that's up to the app's indexing code to figure out.

Earthstar resolves conflicts with a very simple last-write-wins rule, but it keeps the conflicting document versions so apps can do something fancier if they want, or ask the user what to do.

Clocks and timestamps

Low-level kappa-db will work with clock skew but the app might rely on timestamps in other ways (e.g. cabal sorts messages by timestamp?)

Some kappa apps have relied on timestamps, others rely on back-links (a DAG-like approach). Because hypercore has a sequence defined already, wall-clock timestamps ought to only be used if there's a conflict, but even then, you can sort of get a lamport clock for free in that regard!

Earthstar won't work well if peers have very inaccurate clocks. It refuses to sync documents from more than 10 minutes in the future. This could be loosened; it could be fixed if we require each path to be restricted to one author, and forbid paths that anyone can write to. (Details)

I know this is unrelated to the topic, but does Earthstar guard against very old messages? In practice with our work in the field, devices that go offline will sometimes more likely revert to some date far in the past, like some day in 2017 (when the phone was born, perhaps? :p) or 3 November 1971 ;p

Maturity & adoption

Kappa-db is medium maturity; hypercore is high maturity and widely used.

Earthstar is new and nobody is using it yet :)

cinnamon-bun · 2020-08-04T18:08:18Z

Thanks @okdistribute ! Just thinking through a few more details here:

Multiple authors can be made per Kappa-db using .writer(name). Multiple Kappa-db can be made per device! Seems like a Kappa-db might be more like an Earthstar's Workspace?

Aha, right! The unique thing about Earthstar is one author can use the same identity on multiple devices. You don't have to worry about forking your feed because it's not a feed.

Yes a Kappa-db is the equivalent of a Workspace, they're both a "collection of people's feeds" / a "unit of community".

sparse mode gives one the power of deletion!

Nice! Is sparse mode deletion under the control of the reader, not the author? (Maybe unless the app has a special message type which is a deletion request.)

And in Earthstar an author can overwrite their own data with an empty document, and separately a reader can choose to locally delete documents (called "forgetting"). Earthstar really wants to physically delete data whenever possible, for privacy.

kappa-db will no longer require the use of hypercore

Huh! Maybe Earthstar could work as an alternative backend for kappa-next? It might break some assumptions though:

Messages from a single author can arrive out of order
You can locally delete messages, do kappa-db indexes expect that?

I think both of those cases could already happen in kappadb sparse mode, so maybe it would work!

does Earthstar guard against very old messages?

Thanks for asking, I'm starting to realize how often device clocks can be unreliable. How do you handle this in your work? Can the device at least store previous timestamps so when it reboots it can continue where it left off instead of resetting to 1970?

Earthstar has a minimum allowed timestamp which is in 1970. It doesn't inherently limit documents by relative time like N days ago, but you can filter them that way when syncing.

The big problem is that the devices in 1970 won't accept documents "from the future" e.g. the devices with accurate clocks.

I'll probably add options to help with inaccurate clocks, which will come with some caveats.

okdistribute · 2020-08-04T18:27:14Z

Nice! Is sparse mode deletion under the control of the reader, not the author? (Maybe unless the app has a special message type which is a deletion request.)

Yeah, the app would have to create a special 'tombstone' request.. cobox is interested in implementing this for hyperdrive-backed kappa-db, called kappa-drive; Mapeo has this in production, although we don't give user the ability to clear histories.. yet. One day we plan on implementing sparse mode into Mapeo, but the datasets people are working with aren't quite big enough yet to make that a necessity (minus media files, just talking the dataset).

And in Earthstar an author can overwrite their own data with an empty document, and separately a reader can choose to locally delete documents (called "forgetting"). Earthstar really wants to physically delete data whenever possible, for privacy.

This is a great idea!

kappa-db will no longer require the use of hypercore

Huh! Maybe Earthstar could work as an alternative backend for kappa-next? It might break some assumptions though:
* Messages from a single author can arrive out of order

* You can locally delete messages, do kappa-db indexes expect that?

Yeah, messages could arrive out of order or be missing from the same author already with the sparse implementation of hypercore -- that really just matters on how you write your index. See https://github.com/frando/kappa-sparse-indexer for an implementation, although this is still a relatively new approach! It seems to work for cobox and sonar though. cc @Frando

Thanks for asking, I'm starting to realize how often device clocks can be unreliable. How do you handle this in your work? Can the device at least store previous timestamps so when it reboots it can continue where it left off instead of resetting to 1970?

Earthstar has a minimum allowed timestamp which is in 1970. It doesn't inherently limit documents by relative time like N days ago, but you can filter them that way when syncing.

The big problem is that the devices in 1970 won't accept documents "from the future" e.g. the devices with accurate clocks.

I'll probably add options to help with inaccurate clocks, which will come with some caveats.

Good idea. Many researchers in this space say to never use wall clocks. I agree with them! Although it can be useful and/or important for the user to know the wall clock time, for usability purposes. In those cases, you could recommend using a timeserver if users are online?

A DAG approach can cause performance issues due to having to walk the tree, and a vector clock has disk space/network tradeoffs. We had a conversation about this in cabal-core which has some insights but still remains unresolved. So I think it really depends on your use case, if users will have lots of space and not super low bandwidth, a vector clock seems like a good bet.

I really like this talk by @jlongster! https://www.dotconferences.com/2019/12/james-long-crdts-for-mortals.

(EDIT: Mapeo uses a DAG https://www.npmjs.com/package/unordered-materialized-kv)

cinnamon-bun added the documentation Improvements or additions to documentation label Aug 3, 2020

cinnamon-bun added the question Further information is requested label Aug 5, 2020

arj03 mentioned this issue Aug 18, 2020

Define good / hard problems we need to tackle to move ssb forward scuttlebutt-eu/important-documents#39

Open

cinnamon-bun added this to the (discussion) milestone Feb 10, 2021

earthstar-project locked and limited conversation to collaborators Feb 17, 2022

sgwilym converted this issue into discussion #228 Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Comparison to Kappa-db? #37

Comparison to Kappa-db? #37

okdistribute commented Aug 3, 2020

cinnamon-bun commented Aug 3, 2020

okdistribute commented Aug 4, 2020 •

edited

Data structure & mutability

Universes of data, and how people join them

Keys, Identities, Devices, Multi-writer

Indexing

Syncing

Conflicts

Clocks and timestamps

Maturity & adoption

cinnamon-bun commented Aug 4, 2020

okdistribute commented Aug 4, 2020 •

edited

This issue was moved to a discussion.

This issue was moved to a discussion.

Comparison to Kappa-db? #37

Comparison to Kappa-db? #37

Comments

okdistribute commented Aug 3, 2020

What's the problem you want solved?

Is there a solution you'd like to recommend?

cinnamon-bun commented Aug 3, 2020

Data structure & mutability

Keys, Identities, Devices, Multi-writer

Universes of data, and how people join them

Indexing

Syncing

Conflicts

Clocks and timestamps

Maturity & adoption

okdistribute commented Aug 4, 2020 • edited

Data structure & mutability

Universes of data, and how people join them

Keys, Identities, Devices, Multi-writer

Indexing

Syncing

Conflicts

Clocks and timestamps

Maturity & adoption

cinnamon-bun commented Aug 4, 2020

okdistribute commented Aug 4, 2020 • edited

This issue was moved to a discussion.

okdistribute commented Aug 4, 2020 •

edited

okdistribute commented Aug 4, 2020 •

edited