-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Sync primitives #142
Conversation
Generated by 🚫 dangerJS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments below, additionally here's two remarks regarding the questions you asked in the other thread @radex :
a) why isn't it possible for you to use per-database sync, or at least something closer to it but this GraphSQL abstraction
There’s probably a slight misunderstanding. We do sync the whole database, not a single collection (We have the slices of the database that correspond to projects in our data model but they span across all collections - that s a different story though). We just sync per collection because in the watermelon DB schema a collection corresponds with one field of our GraphQL API (think: with one “endpoint” in REST terms). The configuration per field (“endpoint”) is necessary because every endpoint has different requirements in terms of the inputs (that’s a shortcoming of our backend. It’s basically a convention vs configuration problem. Ideally all endpoints would have the same requirements but specifically our backend doesn’t have that (yet)).
In a sync run we then run the sync for all collections. In an order that we determined such that relations are solved correctly.
Does that make sense? Feel free to ask again if I didn’t explain it good enough.
b) how can we structure the standard sync adapter implementation so that it's easier for people with special cases to reuse as much standard 🍉 code as possible
Great question! Currently there’s mainly the applyRemoteChanges
function. I’d find three more things helpful:
- Some function to retrieve the locally changed/deleted records (Let’s call it
gatherLocalChanges
for here) - Some infrastructure to handle conflicts. In KintoJS for instance they have the concept of a
SyncResultObject
where they store an array with tuples of remote and local objects. On can then iterate this array, and decide for the resolution. - Some infrastructure to handle client vs server side IDs (see my other comment why I find that necessary)
The procedure for syncing and the handoff between standard sync adapter and custom functions would roughly be
- Import changes (custom code)
- ApplyRemoteChanges (standard sync adapter)
- GatherLocalChanges (standard sync adapter)
- Push Changes (custom code)
- Push Single Change (Custom code)
- Apply remote response to local database (standard sync adapter)
- Resolve Conflicts (standard sync adapter)
- Push Conflicts (custom code)
(Note 4.1. and 4.2. For ApplyRemoteChanges there should be a version for multiple as well as for a single record. That’s because after each push the backend would respond with the now updated remote record and one would want to use a standard sync adapter function to apply the new last_modified timestamp as well as the new sync status to the local database)
// - error handling — preventing data corruption in case sync fails | ||
// - error logging - logging invalid scenarios (suggesting a bug) — e.g. | ||
// - sync adapters - should this be THE implemention? A collection of helpers for others to use to build their own sync engines/adapters? Should this be a class, similar to DatabaseAdapter? | ||
// - What happens if I create then delete something locally - do i send it to sync? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue generally no. One notable exception might be if one would want to track a history of all objects including those that were never synced.
In Kinto JS they don't sync deleted records that were never pushed. See line 1118 in https://github.com/Kinto/kinto.js/blob/master/src/collection.js
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
// - failure to push changes | ||
// - failure to update after push | ||
// - bad timestamps? | ||
// - batching (i.e. splitting into smaller chunks) — necessary? how much can wmelon take? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note here:
There's two kinds of batching.
- Batching DB writes
- Batching network requests.
Regarding 2 one needs to be carefull with the last_modified timestamp. Ideally one would want to write the last_modified timestamp after every successful object such that unnecessary resyncing is avoided when the synchronisation fails in the middle of a run. However this won't work that easily when one parallelises network requests. Then one would need to write the last_modified timestamp after each batch to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, i didn't even consider batching (in the sense of "splitting into multiple batches") for network. Seems too complex, and error prone, and probably with more perf issues than benefits on mobile networks.
// - markLocalChangesAsSynced() | ||
// - destroy deleted records permanently | ||
// - mark `created` and `updated` records as `synced` + reset _changed | ||
// - END ACTION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about resolving conflicts?
The procedure should go on with
- resolve conflicts locally
- push local resolutions
- do a final pull to get server side changes that happened in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See:
#142 (comment)
// - if already exists (error), update | ||
// - destroy deleted records permanently | ||
// - if already deleted, ignore | ||
// - TODO: Should we delete with descendants? Or just let sync point to record to delete? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with descendants you mean relations?
I'd argue to not automatically permanently delete children - because then one relies on the backend having the exact same logics for cascading deletes as the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
remoteChanges: SyncDatabaseChangeSet, | ||
): Promise<void> { | ||
return db.action(async action => { | ||
// TODO: Does the order of collections matter? Should they be done one by one? Or all at once? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends on how your backend is working but generally the order of the collection should be such that the parents of a relation are synced first. If you sync a child first (and let's assume your backend doesn't even reject that) then you end up with an invalid relation in your backend database for a short time until the parent is synced.
(I haven't thought through the case of cyclic relations yet, fortunately we don't have that in our datamodel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of this function that part doesn't matter since we're talking strictly local DB writes only. I decided it's best (for performance and consistency) to do all DB writes in a single big batch
Right. I didn't mean to suggest you sync one collection, I did mean "why you sync per-collection". I understand the difficulty of having a backend that wants to sync per-table. We have the same problem right now, but have decided to solve this on backend side. We came to the conclusion that per-collection sync is highly undesirable because:
Why not add an extra call to your backend, that acts as a proxy, and that calls all the sub-endpoints in a transaction? |
Done: Line 164 in fae1583
WatermelonDB/src/sync/syncHelpers.js Line 11 in fae1583
This is implemented as a fixed conflict resolver. The strategy is per-column client wins, i.e. in case of record conflict, the resolved version will have server's version of the record to get new changes, but the columns that were changed locally since last sync will remain. In rare circumstances it might make sense to have extra per-collection logic (e.g. resolving JSON fields or preserving some imporant multi-column invariants), so I'm open to suggestions for the API and implementation. |
I don't understand this part. How do you want to apply remote changes locally without resolving conflicts at that point in time? In my design, resolution happens during application of remote changes locally. When pushing local changes, the local state ALREADY takes into account remote changes, including conflict resolutions. So you only need two HTTP requests.
right, but you can just apply a subset of remote changes using the same function |
@fletling Please take a look at my responses this week if you can. I'd like to finish the initial implementation of a universal sync adapter this week. |
@fletling I re-wrote from scratch the sync implementation and user guide documentation here: https://github.com/Nozbe/WatermelonDB/pull/169/files#diff-595321de84a9dfb7ab970bf2b68131e1R40 — I'd very much appreciate your feedback :) I know it's probably not what you need for your project right now as the whole thing, but we both want to get this stuff right in our apps, and I want you to be able to use as much of this shared code as possible for your implementation. |
No description provided.