Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Sync primitives #142

Closed
wants to merge 23 commits into from
Closed

[WIP] Sync primitives #142

wants to merge 23 commits into from

Conversation

radex
Copy link
Collaborator

@radex radex commented Nov 8, 2018

No description provided.

@radex radex self-assigned this Nov 8, 2018
@peril-nozbe
Copy link

peril-nozbe bot commented Nov 8, 2018

Fails
🚫

Please add a description to your PR.

Warnings
⚠️

Please add a changelog entry for your changes. (Add #trivial to skip this.)

⚠️

There are library changes, but not tests. That's OK as long as you're refactoring existing code

Generated by 🚫 dangerJS

@radex radex mentioned this pull request Nov 9, 2018
@radex radex changed the base branch from master to sync-primitives-1 November 14, 2018 12:16
@radex radex changed the base branch from sync-primitives-1 to sync-primitives-2 November 14, 2018 13:17
Copy link
Contributor

@flettling flettling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments below, additionally here's two remarks regarding the questions you asked in the other thread @radex :

a) why isn't it possible for you to use per-database sync, or at least something closer to it but this GraphSQL abstraction

There’s probably a slight misunderstanding. We do sync the whole database, not a single collection (We have the slices of the database that correspond to projects in our data model but they span across all collections - that s a different story though). We just sync per collection because in the watermelon DB schema a collection corresponds with one field of our GraphQL API (think: with one “endpoint” in REST terms). The configuration per field (“endpoint”) is necessary because every endpoint has different requirements in terms of the inputs (that’s a shortcoming of our backend. It’s basically a convention vs configuration problem. Ideally all endpoints would have the same requirements but specifically our backend doesn’t have that (yet)).

In a sync run we then run the sync for all collections. In an order that we determined such that relations are solved correctly.

Does that make sense? Feel free to ask again if I didn’t explain it good enough.

b) how can we structure the standard sync adapter implementation so that it's easier for people with special cases to reuse as much standard 🍉 code as possible

Great question! Currently there’s mainly the applyRemoteChanges function. I’d find three more things helpful:

  • Some function to retrieve the locally changed/deleted records (Let’s call it gatherLocalChanges for here)
  • Some infrastructure to handle conflicts. In KintoJS for instance they have the concept of a SyncResultObject where they store an array with tuples of remote and local objects. On can then iterate this array, and decide for the resolution.
  • Some infrastructure to handle client vs server side IDs (see my other comment why I find that necessary)

The procedure for syncing and the handoff between standard sync adapter and custom functions would roughly be

  1. Import changes (custom code)
  2. ApplyRemoteChanges (standard sync adapter)
  3. GatherLocalChanges (standard sync adapter)
  4. Push Changes (custom code)
    1. Push Single Change (Custom code)
    2. Apply remote response to local database (standard sync adapter)
  5. Resolve Conflicts (standard sync adapter)
  6. Push Conflicts (custom code)

(Note 4.1. and 4.2. For ApplyRemoteChanges there should be a version for multiple as well as for a single record. That’s because after each push the backend would respond with the now updated remote record and one would want to use a standard sync adapter function to apply the new last_modified timestamp as well as the new sync status to the local database)

// - error handling — preventing data corruption in case sync fails
// - error logging - logging invalid scenarios (suggesting a bug) — e.g.
// - sync adapters - should this be THE implemention? A collection of helpers for others to use to build their own sync engines/adapters? Should this be a class, similar to DatabaseAdapter?
// - What happens if I create then delete something locally - do i send it to sync?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue generally no. One notable exception might be if one would want to track a history of all objects including those that were never synced.

In Kinto JS they don't sync deleted records that were never pushed. See line 1118 in https://github.com/Kinto/kinto.js/blob/master/src/collection.js

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

// - failure to push changes
// - failure to update after push
// - bad timestamps?
// - batching (i.e. splitting into smaller chunks) — necessary? how much can wmelon take?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note here:
There's two kinds of batching.

  1. Batching DB writes
  2. Batching network requests.

Regarding 2 one needs to be carefull with the last_modified timestamp. Ideally one would want to write the last_modified timestamp after every successful object such that unnecessary resyncing is avoided when the synchronisation fails in the middle of a run. However this won't work that easily when one parallelises network requests. Then one would need to write the last_modified timestamp after each batch to be safe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i didn't even consider batching (in the sense of "splitting into multiple batches") for network. Seems too complex, and error prone, and probably with more perf issues than benefits on mobile networks.

// - markLocalChangesAsSynced()
// - destroy deleted records permanently
// - mark `created` and `updated` records as `synced` + reset _changed
// - END ACTION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about resolving conflicts?
The procedure should go on with

  • resolve conflicts locally
  • push local resolutions
  • do a final pull to get server side changes that happened in the meantime

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// - if already exists (error), update
// - destroy deleted records permanently
// - if already deleted, ignore
// - TODO: Should we delete with descendants? Or just let sync point to record to delete?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with descendants you mean relations?

I'd argue to not automatically permanently delete children - because then one relies on the backend having the exact same logics for cascading deletes as the client.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

remoteChanges: SyncDatabaseChangeSet,
): Promise<void> {
return db.action(async action => {
// TODO: Does the order of collections matter? Should they be done one by one? Or all at once?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on how your backend is working but generally the order of the collection should be such that the parents of a relation are synced first. If you sync a child first (and let's assume your backend doesn't even reject that) then you end up with an invalid relation in your backend database for a short time until the parent is synced.

(I haven't thought through the case of cyclic relations yet, fortunately we don't have that in our datamodel)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of this function that part doesn't matter since we're talking strictly local DB writes only. I decided it's best (for performance and consistency) to do all DB writes in a single big batch

@radex
Copy link
Collaborator Author

radex commented Nov 27, 2018

There’s probably a slight misunderstanding. We do sync the whole database, not a single collection (We have the slices of the database that correspond to projects in our data model but they span across all collections - that s a different story though). We just sync per collection because in the watermelon DB schema a collection corresponds with one field of our GraphQL API (think: with one “endpoint” in REST terms). The configuration per field (“endpoint”) is necessary because every endpoint has different requirements in terms of the inputs (that’s a shortcoming of our backend. It’s basically a convention vs configuration problem. Ideally all endpoints would have the same requirements but specifically our backend doesn’t have that (yet)).
In a sync run we then run the sync for all collections. In an order that we determined such that relations are solved correctly.

Right. I didn't mean to suggest you sync one collection, I did mean "why you sync per-collection".

I understand the difficulty of having a backend that wants to sync per-table. We have the same problem right now, but have decided to solve this on backend side.

We came to the conclusion that per-collection sync is highly undesirable because:

  • during sync, app is cycled through multiple incomplete states (e.g. there exists a task locally but not yet its project), which can lead to bugs, poor UX, poor performance (multiple re-renders), crashes (if a missing relation is not expected by the code)
  • if sync fails mid-way, the app will be left in an incomplete/inconsistent state.
  • performance will suck, especially on the go, since with 10 collections you might need to make 20 HTTP requests
  • doing sync in a specific order could at least partly resolve the first issue, but at the cost of extra complexity, because deletions should be resolved in opposite orders to insertions.

Why not add an extra call to your backend, that acts as a proxy, and that calls all the sub-endpoints in a transaction?

@radex
Copy link
Collaborator Author

radex commented Nov 27, 2018

Great question! Currently there’s mainly the applyRemoteChanges function. I’d find three more things helpful:

Some function to retrieve the locally changed/deleted records (Let’s call it gatherLocalChanges for here)

Done:

export function fetchLocalChanges(db: Database): Promise<SyncLocalChanges> {

Some infrastructure to handle conflicts. In KintoJS for instance they have the concept of a SyncResultObject where they store an array with tuples of remote and local objects. On can then iterate this array, and decide for the resolution.

export function resolveConflict(local: RawRecord, remote: DirtyRaw): DirtyRaw {

This is implemented as a fixed conflict resolver. The strategy is per-column client wins, i.e. in case of record conflict, the resolved version will have server's version of the record to get new changes, but the columns that were changed locally since last sync will remain.

In rare circumstances it might make sense to have extra per-collection logic (e.g. resolving JSON fields or preserving some imporant multi-column invariants), so I'm open to suggestions for the API and implementation.

@radex
Copy link
Collaborator Author

radex commented Nov 27, 2018

Import changes (custom code)
ApplyRemoteChanges (standard sync adapter)
GatherLocalChanges (standard sync adapter)
Push Changes (custom code)
Push Single Change (Custom code)
Apply remote response to local database (standard sync adapter)
Resolve Conflicts (standard sync adapter)
Push Conflicts (custom code)

I don't understand this part. How do you want to apply remote changes locally without resolving conflicts at that point in time? In my design, resolution happens during application of remote changes locally. When pushing local changes, the local state ALREADY takes into account remote changes, including conflict resolutions. So you only need two HTTP requests.

(Note 4.1. and 4.2. For ApplyRemoteChanges there should be a version for multiple as well as for a single record. That’s because after each push the backend would respond with the now updated remote record and one would want to use a standard sync adapter function to apply the new last_modified timestamp as well as the new sync status to the local database)

right, but you can just apply a subset of remote changes using the same function

@radex
Copy link
Collaborator Author

radex commented Nov 27, 2018

@fletling Please take a look at my responses this week if you can. I'd like to finish the initial implementation of a universal sync adapter this week.

@radex
Copy link
Collaborator Author

radex commented Nov 28, 2018

@fletling I re-wrote from scratch the sync implementation and user guide documentation here: https://github.com/Nozbe/WatermelonDB/pull/169/files#diff-595321de84a9dfb7ab970bf2b68131e1R40 — I'd very much appreciate your feedback :) I know it's probably not what you need for your project right now as the whole thing, but we both want to get this stuff right in our apps, and I want you to be able to use as much of this shared code as possible for your implementation.

@radex radex closed this Feb 12, 2019
@radex radex deleted the sync-primitives branch February 12, 2019 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants