Backend architecture: Datomic, datahike, OpenCrux, datalevin, Fluree #9

tangjeff0 · 2020-04-30T14:51:10Z

	Datomic	datahike	OpenCrux
scalability	100B datoms	M of entities	dependent on document size
time	uni-temporal	uni-temporal	bi-temporal
license	closed-source	EPL 1.0	MIT
storage services	DynamoDB, Cassandra, JDBC SQLs	LevelDB, Redis, PostgreSQL	RocksDB, LMDB, Kafka, JDBC SQLs

tangjeff0 · 2020-05-01T18:14:13Z

From Jeroen in the Slack:

Maybe start with Datomic (the best known and most mature option) and postpone this decision? Unless someone has a clear vision on this. I think these three should be mostly compatible at query and data model level. If something becomes difficult with Datomic, reconsider (e.g. when implementing collaboration features)? Or when people have trouble setting up the free version, and don’t want to pay for the commercial version, reconsider. If this is an upfront certainty, go for datahike or OpenCrux right away? (edited)

refset · 2020-05-02T21:02:11Z

I can only speak for Crux on these points...

Pros:

Regular, fully-featured releases w/ transparent roadmap (e.g. upcoming JSON and SQL support might help non-Clojure Athens users to build tools/integrations): https://github.com/juxt/crux/projects/1
Low memory requirements makes it particularly suitable for self-hosting (this is mostly because the query engine is lazy)
Setting up a collaborative Crux-backed environment could be as simple as having a group of users share access to a managed Kafka service, see https://juxt.pro/blog/posts/crux-confluent-cloud.html (vs. always having to maintain a bunch of centralised DB infrastructure somewhere)
Dev team that is excited and keen to see Athens succeed
There's a tantalising possibility that bitemporality could be an invaluable capability in a collaborative context. We're already thinking about the feasibility of using Hybrid Logical Clocks in place of a simple valid-time timestamp (see: https://jaredforsyth.com/posts/hybrid-logical-clocks/ & CockroachDB)
An EQL syntax is available for "pull" EQL-like syntax? xtdb/xtdb#849

Cons:

We're still in Beta - so there may be a few API changes, but nothing too fundamental
Crux is schemaless, but not magic, so you still need to have some idea of what your schema looks like :)

Hope that helps!

Edit: this might be of interest: https://findka.com/blog/migrating-to-biff/ (Firebase-like stack on top of Crux)

tangjeff0 · 2020-05-02T22:39:09Z

From Christopher Small, author of datsync

My hope is that DatSync will be able to support Datahike on the backend, and I have no objections to supporting Crux if there aren't technical blockers.
Is it [Athens] mainly focused on small deployments for a sort of DIY self-hosted Roam? If so, and you'd mostly be expecting data from small numbers of users, you can probably get things working with any of these tools.
If however, you are hoping to have large centralized (but OSS) hosting available, then you'd need to think about scalability, and I think your best option there would be Datomic.
Datahike has pretty decent query performance, but writes have the potential to be a bottleneck, so you can look into where that pain point hits. For lots of (relatively) small deployments though, datahike would be perfect.
If I knew more about crux I might be able to say more about its advantages, but if you are looking to use DataScript on the client, Datahike is a fork, and so likely to have more impedence match.

refset · 2020-05-02T22:45:31Z

We've not looked at datsync in any detail but we have spent some time thinking about crux->datascript replication already: https://github.com/crux-labs/crux-datascript/blob/master/src/crux_datascript/core.clj

whilo · 2020-05-04T07:04:42Z

Just to also chime in and add a few things that have not been said yet:

Yes, Datahike is still very much compatible with DataScript and moreover we are aiming to port our query engine with durability back over to ClojureScript in our next release as well (after 0.3.0 which is pending), so Datahike will be able to substitute for DataScript and optionally provide client-side durability at the same time. We have implemented all our abstractions as replikativ libraries in a platform neutral way from the start, the main thing missing is to provide ClojureScript asynchronous IO support in Datahike's query engine code. This is a very doable task, it was just easier and more attractive to get the JVM version working well first. Replicating Datahike will be possible with P2P web technology, such as demonstrated in https://lambdaforge.io/2019/12/08/replicate-datahike-wherever-you-go.html. We are convinced that we need to find better business models than the current data silo approach.

We also provide a Datomic compatible core API that is used by our commercial clients, so if you decide to stick to the common subset, you will be able to swap Datahike in at any point. If you hit missing features or incompatibilities, please open an issue. We are currently working on our write throughput and I am confident that we can scale to Datomic size deployments in principle, it was just a matter of priorities.

We, the members of LambdaForge, are also big fans of the Zettelkasten method (even before we were aware of Roam) and use https://org-roam.readthedocs.io/en/latest/ at the moment. We would be super happy to see a reliable open source implementation like Athens to succeed, so keep going 💯 !

I think ideally the backends should be exchangeable, so even if you decide for one, keep in mind when you buy into its specific semantics.

jelmerderonde · 2020-05-15T06:46:39Z

Although I don't consider myself an expert in databases, I guess one of the (future) advantages of Datahike would be that it could potentially enable "local first" as described here: https://www.inkandswitch.com/local-first.html. For me this would be great to have in a tool like Athens because you could easily edit offline on multiple machines, while having confidence that your edits could later on combine seamlessly.

tangjeff0 · 2020-05-15T13:00:11Z

Thanks so much for sharing that link. Several engineers (including myself) are quite interested in local first applications. We've discussed databases like OrbitDB, Gun, and Scuttlebutt. Datahike is very interesting for this reason.

jelmerderonde · 2020-05-16T14:48:50Z

@tangjeff0 no problem. I guess Datahike isn't quite there yet, but maybe @whilo can share something about whether Datahike would allow a local-first workflow in the future?

whilo · 2020-05-16T21:26:21Z

Yes, since our early work on http://replikativ.io/, which was predating most of these other local first approaches, but did not attract a large community back then and also did not have a nice programming model such as Datalog, we wanted to be local-first. We aim to port Datahike back to ClojureScript in our next iteration. Do you think open-collective would work to fund this work? Any help would be appreciated, as we are currently still hammering out Datomic compatibility and some scalability issues in the JVM version.

tangjeff0 · 2020-05-27T22:08:57Z

Will re-open when after v1 is complete

pepoospina · 2020-07-06T19:17:58Z

TL;DR;

Do you plan to support block-level access control and notifications/subscriptions? If so, how do you plan to do this? Maybe the DB is a deal-breaker.

Hi there. I've been discussing with @tangjeff0 a little bit on Twitter about your plans and how they could be linked with ours.

I also had some experience working with heavily nested and linked content with my previous project www.collectiveone.org and I have a couple of comments regarding the DB and how to handle the multi-player case:

access control at block level: Ideally you want access control at the block level. But you need some sort of "default" inheritance logic to be able to switch access control of a whole area at once. This is done in notion at the page level, with stuff like (permissions of this page are defined by this other workspace...). Besides inheritance, I would like to have composability: so that you can say stuff like "those with access to A AND or OR access to B can access this". Also, access control must be super fast as it is computed almost every time a block is read.
subscriptions and notifications: Ideally here you also need some sort of inheritance logic, so that If I have a block to which I want to be notified of changes, I get notified every time any of its children blocks changes. Each user can have different notifications settings for each object, and one block can be in many places at the same time, so it's very hard to, once there is an event on one block, determine who you need to send that email/push notification.

I did this in Postgres the last time I tried and relied a lot on algorithmic recursion, so I navigated the DB in many directions before determining what to do, or who to send a message to. This was too slow. I am not an expert in big data systems, so I really wonder how these problems should be actually handled.

tangjeff0 · 2020-10-15T16:42:40Z

Another factor I'd like to point out is the conflict resolution story, whether it be distributed or centralized.

https://stackoverflow.com/questions/31092669/how-does-google-docs-deal-with-editing-collisions
http://thegeez.net/2017/01/04/wiki_clojure_yada_datomic_client.html
@smothers is working on a "local-first Firebase"
https://github.com/metasoarous/datsync/wiki/Literature via @metasoarous

vHanda · 2020-10-16T10:43:06Z

Another option could be to use a Git Repository as a backend. This would require creating a REST API on top of the Git repo to parse the documents, but it would result in greater compatibility with existing tools. One will be easily able to have to files locally, and even use other markdown editors or more advanced editors like Obsidian. And there is also a mobile app already ready (GitJournal - I'm the author)

This would result in a very different architecture though. I'm willing to help, if you want to go down this route. I would love more tools to be compatible with each other.

almereyda · 2020-10-30T17:31:54Z

You could also consider https://github.com/terminusdb/terminusdb-server

agentydragon · 2021-02-25T20:49:10Z

I'd just like to add that for me, Athens being open-source is a significant advantage over Roam, and if Athens ends up requiring a closed-source backend to be most useful, that advantage would be diminished.

Also it would be nice to abstract the backend-talking code to allow people to potentially run Athens on other backends, as long as they support some defined protocol.

tangjeff0 · 2021-02-25T20:55:50Z

Protocol is always most ideal but hardest to pull off. Crux, Datomic, Datascript, Datahike will inevitably have some differences with each other.

Agree that closed-source backend diminishes value. Inevitably parts of our infrastructure will be closed, but if there is a fully open-source full-stack solution for users to self-host, super great.

Also just learned about https://github.com/fluree/ from Matei. Clojure, Web3, open-source.

https://www.youtube.com/watch?v=uSum3uynHy4&feature=youtu.be

pepoospina · 2021-02-26T11:10:35Z

Hi there! I'm glad to see some movement here 🙂

We have been working on an interface specification for our Athens-like app so that the backend is abstracted. We have also been working hard on a NodeJS + DGraph backend API that is AGPL-like open-sourced.

I'd bet the interface supports (or will support) all the needs of Athens. Who knows! 💪. It includes backlinks and search features, granular access control (and thus multi-player), and fast data creation and fetching.

Reusing our backend, or just the interface, will also provide interoperability among our apps. Users will be able to embed and edit blocks from Athens in Intercreativity, for example. They can also "fork" them as we want to support GIT-like flows with content.

Oh, and eventually Athens could connect to other data storage solutions. We have prototypes for OrbitDB, Ethereum, Kusama, and IndexedDB (local).

This is a recent demo of our latest milestone (a simple case where users mix private with public content). We are about to release a new version where users can explore a feed of blog posts.

If you want to run it, this repo should run ok on Ubuntu or Mac. It is our latest development version.

Oh, and this is our discord in case you want to reach us. 👋

mateicanavra · 2021-02-26T19:19:36Z

The video @tangjeff0 mentioned above covers both the broad vision and technical details of Fluree better than I could, but here's my quick take:

Fluree is an in-memory, semantic graph database backed by a permissioned blockchain, built with Clojure and open-source.

It can be containerized (with Kubernetes support) and optionally decentralized (e.g. using StorJ via Tardigrade), run as a standalone JVM service, or embedded inside the browser as a web worker. Read more here about the query server (fluree-db) and the ledger server (fluree-ledger)

Since Fluree extends RDF (official W3C standard for data interchange), it immediately becomes interoperable with the linked open datasets on the semantic web. One interesting use case would be to directly query DBPedia or Wikidata from within Athens and combine it with your own data at runtime, without an API. Additionally an RDF foundation means you can build ontologies with any of the modeling languages that build on top of it (RDFS, OWL, etc., which are the official recommendations of W3C), which opens up capabilities for inferencing and automated reasoning.

From my view, Fluree could be a powerhouse tool to strongly differentiate Athens from Roam and every other "tool for networked thought." Between RDF standards and a permissioned blockchain (which allows for block/cell-level access control), you could seamlessly and securely deploy Athens at an individual, team, or enterprise level using the same scalable infrastructure.

Would love to get the Fluree team's thoughts here...

lambduhh · 2021-02-28T21:22:59Z

@quoll I would like to advocate for the adoption of https://github.com/threatgrid/asami but feel like it would better be left up to the expert :) Athens is currently Clojurescript/re-frame/datascript/posh (I'm working on sunsetting posh rn actually)

What are your thoughts on whether Asami would be a good fit as a graph-DB for us?

Selfishly, I will admit I would LOVE the excuse to combine our opensource powers to leverage the benefits of bi-directional knowledge linking, use Asami in the wild and possibly have the opportunity to work with you in a technical aspect to help implement it if we do end up going this way... and I don't think I'd be the only one!

quoll · 2021-02-28T21:39:06Z

Love to help. I hope to have Asami 2.0-alpha out by the end of the week. This will have storage when on the JVM. JavaScript is coming, but in the meantime it will have save/load functions.
Unfortunately, Asami doesn’t have all the APIs of the other stores, e.g. the Pull API.

agentydragon · 2021-03-14T21:53:21Z

I've looked a bit into Datahike. From what I learned it looks like:

Datahike is a Datomic layer that assumes there's some kind of store under it.
There's several store implementations (file, LevelDB, in-memory), and there's at least one Datahike app that has an IPFS store

As someone new to Clojure, this makes me less nervous about depending on a backend that has a Datomic-like API, and optimistic about Datahike, becaue it would still allow freedom of backing storage system.

Tr3yb0 · 2021-04-15T13:03:54Z

@mateicanavra laid out Fluree for us very well in his comment above. I will elaborate a little on some of the points made and bring up one additional one, which is one of the most powerful parts of Fluree.
The foundation of RDF is intended to enable data interoperability across the semantic web and provides a very flexible data model for the applications built on top of it. Our immutable ledger brings both decentralization and horizontal scaling in the transaction tier, if that is needed, as well as some benefits that are brought to bear from querying historical data states in from earlier in the block chain.
We have segregated the query and transactions tiers, such that the query engine and an LRU cache of data can be loaded in-memory on the client device using a service worker. I would imagine for a personal Athens graph, that may be the entire thing, which enables millisecond query responses. The db (query peer) is also linearly scalable, but im not sure that really applies to the use case here.
The biggest advantage Fluree brings are SmartFunctions. Because each transaction is encrypted with the user's private key, the data can be permissioned at the individual RDF element level. You could write the SmartFunctions in such a way that no one else would have access to them and a user could share as desired.

refset · 2021-04-20T11:16:26Z

Noting this work-in-progress Datahike backend for the benefit of those following this issue: https://github.com/athensresearch/athens-backend

Also, I recently pulled together a comparison matrix for various Clojure-Datalog stores: https://clojurelog.github.io/

Made code better based on jeff's great review.

Sid migrate `:drop/...` events

tangjeff0 added the help wanted label Apr 30, 2020

tangjeff0 changed the title ~~Decide on a backend architecture: Datomic, datahike, or OpenCrux~~ Architect backend: Datomic, datahike, or OpenCrux Apr 30, 2020

tangjeff0 mentioned this issue May 1, 2020

Create a minimal backend #2

Closed

5 tasks

tangjeff0 added the question label May 6, 2020

tangjeff0 removed the help wanted label May 14, 2020

tangjeff0 added experience wanted and removed question labels May 15, 2020

tangjeff0 changed the title ~~Architect backend: Datomic, datahike, or OpenCrux~~ Research and debate backend architecture: Datomic, datahike, OpenCrux, OrbitDB May 15, 2020

tangjeff0 added type: 🙃 later and removed ⭐️⭐️⭐️ hard labels May 27, 2020

tangjeff0 closed this as completed May 27, 2020

tangjeff0 reopened this May 29, 2020

tangjeff0 changed the title ~~Research and debate backend architecture: Datomic, datahike, OpenCrux, OrbitDB~~ Backend architecture: Datomic, datahike, OpenCrux, OrbitDB... May 29, 2020

tangjeff0 mentioned this issue Jun 10, 2020

Daily Notes #66

Closed

4 tasks

tangjeff0 pinned this issue Jul 9, 2020

tangjeff0 mentioned this issue Jan 12, 2021

Decentralized technologies #546

Closed

tangjeff0 unpinned this issue Jan 19, 2021

tangjeff0 pinned this issue Jan 19, 2021

tangjeff0 unpinned this issue Jan 22, 2021

tangjeff0 removed the type: 🙃 later label Feb 3, 2021

tangjeff0 changed the title ~~Backend architecture: Datomic, datahike, OpenCrux, OrbitDB...~~ Backend architecture: Datomic, datahike, OpenCrux, datalevin Apr 27, 2021

tangjeff0 changed the title ~~Backend architecture: Datomic, datahike, OpenCrux, datalevin~~ Backend architecture: Datomic, datahike, OpenCrux, datalevin, Fluree May 14, 2021

sid597 added the i/high impact label Jun 9, 2021

sid597 added a commit to sid597/athens that referenced this issue Jun 9, 2021

Merge pull request athensresearch#9 from sid597/add_db-switcher

157a03f

Made code better based on jeff's great review.

sid597 added the t/feature label Jun 13, 2021

neotyk added a commit that referenced this issue Jul 14, 2021

Merge pull request #9 from sid597/sid-migrate-drop

f6f32b3

Sid migrate `:drop/...` events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend architecture: Datomic, datahike, OpenCrux, datalevin, Fluree #9

Backend architecture: Datomic, datahike, OpenCrux, datalevin, Fluree #9

tangjeff0 commented Apr 30, 2020 •

edited

Loading

tangjeff0 commented May 1, 2020

refset commented May 2, 2020 •

edited

Loading

tangjeff0 commented May 2, 2020

refset commented May 2, 2020

whilo commented May 4, 2020 •

edited

Loading

jelmerderonde commented May 15, 2020

tangjeff0 commented May 15, 2020

jelmerderonde commented May 16, 2020

whilo commented May 16, 2020

tangjeff0 commented May 27, 2020

pepoospina commented Jul 6, 2020

tangjeff0 commented Oct 15, 2020

vHanda commented Oct 16, 2020

almereyda commented Oct 30, 2020

agentydragon commented Feb 25, 2021

tangjeff0 commented Feb 25, 2021 •

edited

Loading

pepoospina commented Feb 26, 2021

mateicanavra commented Feb 26, 2021 •

edited

Loading

lambduhh commented Feb 28, 2021

quoll commented Feb 28, 2021

agentydragon commented Mar 14, 2021

Tr3yb0 commented Apr 15, 2021

refset commented Apr 20, 2021

Backend architecture: Datomic, datahike, OpenCrux, datalevin, Fluree #9

Backend architecture: Datomic, datahike, OpenCrux, datalevin, Fluree #9

Comments

tangjeff0 commented Apr 30, 2020 • edited Loading

tangjeff0 commented May 1, 2020

refset commented May 2, 2020 • edited Loading

tangjeff0 commented May 2, 2020

refset commented May 2, 2020

whilo commented May 4, 2020 • edited Loading

jelmerderonde commented May 15, 2020

tangjeff0 commented May 15, 2020

jelmerderonde commented May 16, 2020

whilo commented May 16, 2020

tangjeff0 commented May 27, 2020

pepoospina commented Jul 6, 2020

tangjeff0 commented Oct 15, 2020

vHanda commented Oct 16, 2020

almereyda commented Oct 30, 2020

agentydragon commented Feb 25, 2021

tangjeff0 commented Feb 25, 2021 • edited Loading

pepoospina commented Feb 26, 2021

mateicanavra commented Feb 26, 2021 • edited Loading

lambduhh commented Feb 28, 2021

quoll commented Feb 28, 2021

agentydragon commented Mar 14, 2021

Tr3yb0 commented Apr 15, 2021

refset commented Apr 20, 2021

tangjeff0 commented Apr 30, 2020 •

edited

Loading

refset commented May 2, 2020 •

edited

Loading

whilo commented May 4, 2020 •

edited

Loading

tangjeff0 commented Feb 25, 2021 •

edited

Loading

mateicanavra commented Feb 26, 2021 •

edited

Loading