[WIP] Sync primitives #142

radex · 2018-11-08T15:11:04Z

No description provided.

peril-nozbe · 2018-11-08T15:11:12Z

	Fails
🚫	Please add a description to your PR.

	Warnings
⚠️	Please add a changelog entry for your changes. (Add #trivial to skip this.)
⚠️	There are library changes, but not tests. That's OK as long as you're refactoring existing code

Generated by 🚫 dangerJS

flettling

I left a few comments below, additionally here's two remarks regarding the questions you asked in the other thread @radex :

a) why isn't it possible for you to use per-database sync, or at least something closer to it but this GraphSQL abstraction

There’s probably a slight misunderstanding. We do sync the whole database, not a single collection (We have the slices of the database that correspond to projects in our data model but they span across all collections - that s a different story though). We just sync per collection because in the watermelon DB schema a collection corresponds with one field of our GraphQL API (think: with one “endpoint” in REST terms). The configuration per field (“endpoint”) is necessary because every endpoint has different requirements in terms of the inputs (that’s a shortcoming of our backend. It’s basically a convention vs configuration problem. Ideally all endpoints would have the same requirements but specifically our backend doesn’t have that (yet)).

In a sync run we then run the sync for all collections. In an order that we determined such that relations are solved correctly.

Does that make sense? Feel free to ask again if I didn’t explain it good enough.

b) how can we structure the standard sync adapter implementation so that it's easier for people with special cases to reuse as much standard 🍉 code as possible

Great question! Currently there’s mainly the applyRemoteChanges function. I’d find three more things helpful:

Some function to retrieve the locally changed/deleted records (Let’s call it gatherLocalChanges for here)
Some infrastructure to handle conflicts. In KintoJS for instance they have the concept of a SyncResultObject where they store an array with tuples of remote and local objects. On can then iterate this array, and decide for the resolution.
Some infrastructure to handle client vs server side IDs (see my other comment why I find that necessary)

The procedure for syncing and the handoff between standard sync adapter and custom functions would roughly be

Import changes (custom code)
ApplyRemoteChanges (standard sync adapter)
GatherLocalChanges (standard sync adapter)
Push Changes (custom code)
1. Push Single Change (Custom code)
2. Apply remote response to local database (standard sync adapter)
Resolve Conflicts (standard sync adapter)
Push Conflicts (custom code)

(Note 4.1. and 4.2. For ApplyRemoteChanges there should be a version for multiple as well as for a single record. That’s because after each push the backend would respond with the now updated remote record and one would want to use a standard sync adapter function to apply the new last_modified timestamp as well as the new sync status to the local database)

flettling · 2018-11-24T13:23:44Z

src/sync/index.js

+  // - error handling — preventing data corruption in case sync fails
+  // - error logging - logging invalid scenarios (suggesting a bug) — e.g.
+  // - sync adapters - should this be THE implemention? A collection of helpers for others to use to build their own sync engines/adapters? Should this be a class, similar to DatabaseAdapter?
+  // - What happens if I create then delete something locally - do i send it to sync?


I'd argue generally no. One notable exception might be if one would want to track a history of all objects including those that were never synced.

In Kinto JS they don't sync deleted records that were never pushed. See line 1118 in https://github.com/Kinto/kinto.js/blob/master/src/collection.js

flettling · 2018-11-24T13:27:05Z

src/sync/index.js

+  //   - failure to push changes
+  //   - failure to update after push
+  //   - bad timestamps?
+  // - batching (i.e. splitting into smaller chunks) — necessary? how much can wmelon take?


One note here:
There's two kinds of batching.

Batching DB writes

Batching network requests.

Regarding 2 one needs to be carefull with the last_modified timestamp. Ideally one would want to write the last_modified timestamp after every successful object such that unnecessary resyncing is avoided when the synchronisation fails in the middle of a run. However this won't work that easily when one parallelises network requests. Then one would need to write the last_modified timestamp after each batch to be safe.

ah, i didn't even consider batching (in the sense of "splitting into multiple batches") for network. Seems too complex, and error prone, and probably with more perf issues than benefits on mobile networks.

flettling · 2018-11-24T13:29:36Z

src/sync/index.js

+  //   - markLocalChangesAsSynced()
+  //     - destroy deleted records permanently
+  //     - mark `created` and `updated` records as `synced` + reset _changed
+  //   - END ACTION


what about resolving conflicts?
The procedure should go on with

resolve conflicts locally

push local resolutions

do a final pull to get server side changes that happened in the meantime

See:
#142 (comment)

flettling · 2018-11-24T13:32:45Z

src/sync/index.js

+    //   - if already exists (error), update
+    // - destroy deleted records permanently
+    //   - if already deleted, ignore
+    //   - TODO: Should we delete with descendants? Or just let sync point to record to delete?


with descendants you mean relations?

I'd argue to not automatically permanently delete children - because then one relies on the backend having the exact same logics for cascading deletes as the client.

flettling · 2018-11-24T13:36:02Z

src/sync/index.js

+  remoteChanges: SyncDatabaseChangeSet,
+): Promise<void> {
+  return db.action(async action => {
+    // TODO: Does the order of collections matter? Should they be done one by one? Or all at once?


it depends on how your backend is working but generally the order of the collection should be such that the parents of a relation are synced first. If you sync a child first (and let's assume your backend doesn't even reject that) then you end up with an invalid relation in your backend database for a short time until the parent is synced.

(I haven't thought through the case of cyclic relations yet, fortunately we don't have that in our datamodel)

In case of this function that part doesn't matter since we're talking strictly local DB writes only. I decided it's best (for performance and consistency) to do all DB writes in a single big batch

radex · 2018-11-27T13:04:43Z

There’s probably a slight misunderstanding. We do sync the whole database, not a single collection (We have the slices of the database that correspond to projects in our data model but they span across all collections - that s a different story though). We just sync per collection because in the watermelon DB schema a collection corresponds with one field of our GraphQL API (think: with one “endpoint” in REST terms). The configuration per field (“endpoint”) is necessary because every endpoint has different requirements in terms of the inputs (that’s a shortcoming of our backend. It’s basically a convention vs configuration problem. Ideally all endpoints would have the same requirements but specifically our backend doesn’t have that (yet)).
In a sync run we then run the sync for all collections. In an order that we determined such that relations are solved correctly.

Right. I didn't mean to suggest you sync one collection, I did mean "why you sync per-collection".

I understand the difficulty of having a backend that wants to sync per-table. We have the same problem right now, but have decided to solve this on backend side.

We came to the conclusion that per-collection sync is highly undesirable because:

during sync, app is cycled through multiple incomplete states (e.g. there exists a task locally but not yet its project), which can lead to bugs, poor UX, poor performance (multiple re-renders), crashes (if a missing relation is not expected by the code)
if sync fails mid-way, the app will be left in an incomplete/inconsistent state.
performance will suck, especially on the go, since with 10 collections you might need to make 20 HTTP requests
doing sync in a specific order could at least partly resolve the first issue, but at the cost of extra complexity, because deletions should be resolved in opposite orders to insertions.

Why not add an extra call to your backend, that acts as a proxy, and that calls all the sub-endpoints in a transaction?

radex · 2018-11-27T13:11:24Z

Great question! Currently there’s mainly the applyRemoteChanges function. I’d find three more things helpful:

Some function to retrieve the locally changed/deleted records (Let’s call it gatherLocalChanges for here)

Done:

WatermelonDB/src/sync/impl.js

Line 164 in fae1583

export function fetchLocalChanges(db: Database): Promise<SyncLocalChanges> {

Some infrastructure to handle conflicts. In KintoJS for instance they have the concept of a SyncResultObject where they store an array with tuples of remote and local objects. On can then iterate this array, and decide for the resolution.

WatermelonDB/src/sync/syncHelpers.js

Line 11 in fae1583

export function resolveConflict(local: RawRecord, remote: DirtyRaw): DirtyRaw {

This is implemented as a fixed conflict resolver. The strategy is per-column client wins, i.e. in case of record conflict, the resolved version will have server's version of the record to get new changes, but the columns that were changed locally since last sync will remain.

In rare circumstances it might make sense to have extra per-collection logic (e.g. resolving JSON fields or preserving some imporant multi-column invariants), so I'm open to suggestions for the API and implementation.

radex · 2018-11-27T13:18:25Z

Import changes (custom code)
ApplyRemoteChanges (standard sync adapter)
GatherLocalChanges (standard sync adapter)
Push Changes (custom code)
Push Single Change (Custom code)
Apply remote response to local database (standard sync adapter)
Resolve Conflicts (standard sync adapter)
Push Conflicts (custom code)

I don't understand this part. How do you want to apply remote changes locally without resolving conflicts at that point in time? In my design, resolution happens during application of remote changes locally. When pushing local changes, the local state ALREADY takes into account remote changes, including conflict resolutions. So you only need two HTTP requests.

(Note 4.1. and 4.2. For ApplyRemoteChanges there should be a version for multiple as well as for a single record. That’s because after each push the backend would respond with the now updated remote record and one would want to use a standard sync adapter function to apply the new last_modified timestamp as well as the new sync status to the local database)

right, but you can just apply a subset of remote changes using the same function

radex · 2018-11-27T13:19:58Z

@fletling Please take a look at my responses this week if you can. I'd like to finish the initial implementation of a universal sync adapter this week.

radex · 2018-11-28T11:04:14Z

@fletling I re-wrote from scratch the sync implementation and user guide documentation here: https://github.com/Nozbe/WatermelonDB/pull/169/files#diff-595321de84a9dfb7ab970bf2b68131e1R40 — I'd very much appreciate your feedback :) I know it's probably not what you need for your project right now as the whole thing, but we both want to get this stuff right in our apps, and I want you to be able to use as much of this shared code as possible for your implementation.

radex added 4 commits November 8, 2018 13:50

[sync] first api draft

a29ef4b

[sync] sync/helpers

06cca3e

[sync] basic unit tests

6a5271e

[sync] resolveConflict

bf0572a

radex self-assigned this Nov 8, 2018

[sync] high level synchronize() function

ab96830

radex mentioned this pull request Nov 9, 2018

Write up Sync details #5

Closed

radex added 16 commits November 13, 2018 08:48

Merge branch 'master' into sync-primitives

d7a8778

[sync] Sketch out applyRemoteChanges - WIP

9c235ab

[sync] replaceRaw

8f0e020

[sync] helpers - prepareCreateFromRaw, prepareUpdateFromRaw

fb96933

[sync] Sketch out fetchLocalChanges, markLocalChangesAsSynced impl

71f9266

[sync] document solution sketch for non-blocking push stage

d1ad065

Merge branch 'master' into sync-primitives

80f7738

[sync] test fetchLocalChanges

7dc4bbf

[test] clean up

3df6050

[sync] clean up fetchLocalChanges

4db68f0

[sync] fix bug + refactor tests

552d98e

[sync] failing test for markLocalChangesAsSynced

bc69afd

[sync] test stubs

4a1dc92

[sync] fetchLocalChanges also returns affected results

c4a1558

[sync] markLocalChangesAsSynced - initial impl

21402db

[sync] clean up a bit (still ugly)

759cd6e

radex changed the base branch from master to sync-primitives-1 November 14, 2018 12:16

Merge branch 'sync-primitives-2' into sync-primitives

cffdf32

radex changed the base branch from sync-primitives-1 to sync-primitives-2 November 14, 2018 13:17

Merge branch 'sync-primitives-2' into sync-primitives

31692cc

flettling reviewed Nov 24, 2018

View reviewed changes

radex closed this Feb 12, 2019

radex deleted the sync-primitives branch February 12, 2019 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Sync primitives #142

[WIP] Sync primitives #142

radex commented Nov 8, 2018

peril-nozbe bot commented Nov 8, 2018 •

edited

flettling left a comment

flettling Nov 24, 2018

radex Nov 27, 2018

flettling Nov 24, 2018

radex Nov 27, 2018

flettling Nov 24, 2018

radex Nov 27, 2018

flettling Nov 24, 2018

radex Nov 27, 2018

flettling Nov 24, 2018

radex Nov 27, 2018

radex commented Nov 27, 2018

radex commented Nov 27, 2018

radex commented Nov 27, 2018 •

edited

radex commented Nov 27, 2018

radex commented Nov 28, 2018

[WIP] Sync primitives #142

[WIP] Sync primitives #142

Conversation

radex commented Nov 8, 2018

peril-nozbe bot commented Nov 8, 2018 • edited

flettling left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

radex commented Nov 27, 2018

radex commented Nov 27, 2018

radex commented Nov 27, 2018 • edited

radex commented Nov 27, 2018

radex commented Nov 28, 2018

peril-nozbe bot commented Nov 8, 2018 •

edited

radex commented Nov 27, 2018 •

edited