Skip to content

Current limitations and future directions

Adam Frey edited this page Aug 2, 2016 · 3 revisions

The current functionality (DataScript databases as materialized views of a Datomic database), is pretty limited, but leaves a lot of room for extensibility.

But there's a lot missing:

  • Data scoping mechanisms: Currently, assumption is that we sync the whole db
  • Optimistic client-side db updates and off-line functionality: Currently, all transactions have to go through Datomic transactor before any changes can affect client side state.
  • Security protocols: If all data is synced, and clients can submit any transaction they want, there will obviously be problems.
  • In general, scalability: Because of the data scoping problem, we have some pretty hard limits on the amount of data that can live in the system.

On the bright side though, it should be sufficient for building small systems with strong consistency at the cost of availability and partition tolerance.

A lot of the things mentioned above can be addressed without even modifying the library as is. But additional functionality around standardizing these things would mean more automated setup and reduce potential for errors at some of the more critical points in the system. Below are some descriptions of existing characteristics and ideas for how we can improve things in the future.

Current setup

Presently, transactions from clients get sent directly to the server and transacted there, with datoms from those transactions being sent out to all clients. Once transactions complete on clients, all UIs update. If a transaction fails due to conflicts, the client can be notified and an error message can be rendered (Datsync does not currently provide any automation here though).

Strong Eventual consistency

One of the first futures goals we hope to accomplish for Datsync is support for strong eventual consistency, enabling optimistic updates and offline updates on clients. There are many routes we could take here:

Transaction metadata and history

DataScript doesn't track transaction metadata or history. Unfortunately, these features of Datomic would give us what I think is the best path forward for implementing eventual consistency.

The general idea here is that if we track transaction data and history on clients, we can implement a transaction metadata system which enables us to track what changes have been confirmed by the server, and which haven't. Garbage collection could be built around this metadata, such that history is only retained for transactions which have been transacted on the server.

Keeping transaction metadata as datoms is actually not particularly challenging (it just requires using a customized version of the d/transact! function (for this reason, it would be nice if DataScript had some kind of DBRef protocol...)).

Keeping the history may be a bit more difficult, since DataScript prevents saving history by defining eq and hash on datoms such that the tx id and the added bool are ignored.

It's not clear whether defining our own Datom type which did not have these properties would solve this problem, or whether there would be other work involved.

One possible solution is that the top node in our client app's reactive atom tree is not the DataScript db, but a log of every message that has been sent or received (somewhat along the lines of the re-frame idea).

The DataScript db can actually be written as a reactively materialized view of that log! The main problem with this approach is that the logic of garbage collecting the log is rather open ended.

An alternative approach to tracking said metadata, would be to do it externally to the DataScript DB, such as in LocalStorage in the form of an index or even a simply value with the transaction id of the last server committed transaction. In a normal DB, the index is also stored and maintained separate to the data since it provides a lot of benefits and ensures a clear separation.

Conflict resolution

The first step will be to let whoever writes to the transactor first win if there are transaction conflicts in our eventually consistent system. Ideally though, we'd have support for more nuanced control of the conflict resolution process, potentially even allowing user interaction as part of the resolution process.

Side note: transaction commit systems

This work also feeds into the possibility of transaction commit systems, where form views can persist "WIP" changes between clients (allowing for collaborative editing), but don't show up in main views of data which has been assumed "committed" or "confirmed" somehow by the user.

Confirmed and unconfirmed client dbs

Another approach towards solving this problem is to have two databases; one with only confirmed changes, and one with unconfirmed changes. If a transaction sent to the server fails, a client side handler could respond by rolling back to the last set of unconfirmed changes. While this may be a little easier to implement, the story here for conflict resolution and commit systems could be made more difficult. It's rather compelling and intriguing that building history and time into the databases so elegantly solves not only the main problem of attaining strong eventual consistency, but also these other related issues.

Restricting server -> client notifications

There are situations where syncing an entire database is fine (when there are few clients and the data is relatively small). But in general, this is not a safe assumption for scalability. We need some way to restrict the data which gets sent to each client. This is important not only for performance and scalability, but also for security; We shouldn't be sending data where it's not supposed to go.

Security

The idea for security has already been discussed in the community. Datomic allows you to create db filters that only contain datoms for which certain predicates are true. Using predicates which filter out all entities for which a specific user does not have read authorization solves the read problem. Similarly, transactions can be tested against write authorization predicates, creating a safety net around the central server database. More pertinently to the discussion at hand though, is the issue of space.

Scope

As long as the data pertinent to any given user is relatively small, you can probably get away with just doing security filters. However, the problem becomes more challenging when users should be expected to have access to more data than could fit (comfortably even) in their client's memory. When this becomes a problem, our strategy will be to compose reactive streams of the data.

What this may looks like is that the client will specify the scope of data they wish to have in scope ("checked out"). There might be categorical things which should just always be kept in sync for convenience, like perhaps tag or user preference data. There might be other data though -- like collections of some sort relevant to the domain model -- which you want to selectively sync based on some more restricted definition of what's in scope.

Query scopes

Ideally, this scope definition would just be composed of Dat(aScript|omic) query or pull expressions, much like you find in om-next (but potentially more flexible). Maybe even for this there would have to be some gating of updates for situations where perfect real time could be sacrificed for scalability.

On the horizon for scalability: If you had chains of progressive filters, you could potentially build an onyx workflow programatically which ran the scope definitions in stages, pushing changes through at the very end out to clients. This will take some time to implement, and will come after in process server side reaction workflow. But think about it as potentially on the horizon.