Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication of a new document causes multiple conflicts #3518

Closed
episage opened this issue Apr 30, 2018 · 15 comments
Closed

Replication of a new document causes multiple conflicts #3518

episage opened this issue Apr 30, 2018 · 15 comments

Comments

@episage
Copy link

episage commented Apr 30, 2018

Sync Gateway version

{"committed_update_seq":6,"compact_running":false,"db_name":"testing_1525080771752","disk_format_version":0,"instance_start_time":1525080782148506,"purge_seq":0,"state":"Online","update_seq":6}

Operating system

version: '2'

services:

  sgw1:
    image: couchbase/sync-gateway:2.0.0-community
    restart: always
    command: -adminInterface :4985 /etc/sync_gateway/config.json
    ports:
    - "127.0.0.1:5001:4984"
    - "127.0.0.1:5002:4985"

Config file

default, nothing changed, look at command above

Log output

sgw1_1  | 2018-04-30T09:33:02.146Z HTTP:  #001: PUT /testing_1525080771752/ (as ADMIN)
sgw1_1  | 2018-04-30T09:33:02.146Z Opening db /testing_1525080771752 as bucket "testing_1525080771752", pool "default", server <walrus:>
sgw1_1  | 2018-04-30T09:33:02.146Z Opening Walrus database testing_1525080771752 on <walrus:>
sgw1_1  | 2018-04-30T09:33:02.147Z Design docs for current view version (2.0) do not exist - creating...
sgw1_1  | 2018-04-30T09:33:02.147Z Design docs successfully created for view version 2.0.
sgw1_1  | 2018-04-30T09:33:02.147Z Verifying view availability for bucket testing_1525080771752...
sgw1_1  | 2018-04-30T09:33:02.148Z Views ready for bucket testing_1525080771752.
sgw1_1  | 2018-04-30T09:33:02.148Z Initializing changes cache for database testing_1525080771752 with sequence: 0
sgw1_1  | 2018-04-30T09:33:02.148Z Using default sync function 'channel(doc.channels)' for database "testing_1525080771752"
sgw1_1  | 2018-04-30T09:33:02.148Z HTTP+: #001:     --> 201 created  (2.7 ms)
sgw1_1  | 2018-04-30T09:33:02.149Z HTTP:  #002: PUT /testing_1525080771763/ (as ADMIN)
sgw1_1  | 2018-04-30T09:33:02.150Z Opening db /testing_1525080771763 as bucket "testing_1525080771763", pool "default", server <walrus:>
sgw1_1  | 2018-04-30T09:33:02.153Z Opening Walrus database testing_1525080771763 on <walrus:>
sgw1_1  | 2018-04-30T09:33:02.153Z Design docs for current view version (2.0) do not exist - creating...
sgw1_1  | 2018-04-30T09:33:02.153Z Design docs successfully created for view version 2.0.
sgw1_1  | 2018-04-30T09:33:02.154Z Verifying view availability for bucket testing_1525080771763...
sgw1_1  | 2018-04-30T09:33:02.156Z Views ready for bucket testing_1525080771763.
sgw1_1  | 2018-04-30T09:33:02.156Z Initializing changes cache for database testing_1525080771763 with sequence: 0
sgw1_1  | 2018-04-30T09:33:02.157Z Using default sync function 'channel(doc.channels)' for database "testing_1525080771763"
sgw1_1  | 2018-04-30T09:33:02.158Z HTTP+: #002:     --> 201 created  (8.8 ms)
sgw1_1  | 2018-04-30T09:33:02.177Z HTTP:  #003: POST /testing_1525080771752/_bulk_docs (as ADMIN)
sgw1_1  | 2018-04-30T09:33:02.179Z HTTP+: #003:     --> 201   (1.9 ms)

Expected behavior

SGW should not return conflicts for "rev": "2-A" and "rev": "1-A" when querying: http://localhost:5002/testing_1525080771752/_changes?active_only=true&style=all_docs
becasue each replicated document via post_bulk_docs has a unique revision and number

Actual behavior

SGW returns conflicts on http://localhost:5002/testing_1525080771752/_changes?active_only=true&style=all_docs.
Every single document is conflicted (probably because each document's parent is equal to -1).

{
  "results": [
    {
      "seq": 4,
      "id": "[docX.mouse]-ID_X",
      "changes": [
        {
          "rev": "3-B"
        },
        {
          "rev": "1-A"
        },
        {
          "rev": "2-A"
        },
        {
          "rev": "3-A"
        }
      ]
    },
    {
      "seq": 6,
      "id": "[docX.test]",
      "changes": [
        {
          "rev": "1-KA"
        },
        {
          "rev": "1-K"
        }
      ]
    }
  ],
  "last_seq": "6"
}

Steps to reproduce

            const docs = [];
            var sameId = `ID_X`;
            var counter = 0;
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `1-A`,
              sound: 'squeek',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `2-A`,
              sound: 'moo',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-A`,
              sound: 'moom',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-B`,
              sound: 'pi',
              random: `(${counter++})`,
            });

            docs.push({
              _id: `[docX.test]`,
              _rev: `1-K`,
              keks: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.test]`,
              _rev: `1-KA`,
              keks: `(${counter++})`,
            });
            
            // create conflicts
            sourceAdapter.docsBulkWrite(docs, false, (error, writeResults) => {
              callback(error, writeResults);
            });
            
@tleyden
Copy link
Contributor

tleyden commented Apr 30, 2018

@episage it looks like you're creating conflicts with the SG bulk_docs API, and then you're surprised to get conflicts back from the SG changes API. If you don't want conficts back on the SG changes API, then why are you creating them?

That's my interpretation on what I'm seeing based on:

           docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-A`,
              sound: 'moom',
              random: `(${counter++})`,
            });
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-B`,
              sound: 'pi',
              random: `(${counter++})`,
            });
           // create conflicts
            sourceAdapter.docsBulkWrite(docs, false, (error, writeResults) => {
              callback(error, writeResults);
            });

Here it looks like you are creating two conflicting revisions (3-A, 3-B) for the [docX.mouse]-${sameId} doc. I'm assuming that sourceAdapter.docsBulkWrite() eventually calls down to the _bulk_docs SG API endpoint with new_edits=false to indicate that SG should allow conflicts? (side note: the docs might need some improvement on that parameter, and so I filed a docs ticket)

If you can point to the code that is called from sourceAdapter.docsBulkWrite(), that might be helpful. Basically I'd like to confirm the value of the new_edits parameter you're passing to the _bulk_docs SG API endpoint.

Also, taking a step back, if you could give a high level description of what you're trying to accomplish (and avoid), it might help in terms of recommending a best approach.

Generally speaking, you want to avoid creating conflicting revisions in Sync Gateway, and so you'd never really want to purposefully insert conflicting revisions (if that's what you're actually doing).

@episage
Copy link
Author

episage commented Apr 30, 2018

@tleyden
I want to support offline-first functionality in my app.

The use case:
Client A was offline for 1 hour.

  • he created a doc [docX.mouse]-${sameId}
  • doc [docX.mouse]-${sameId} is completely new, with sameId equal to some random UUID
  • he modified doc [docX.mouse]-${sameId} multiple times
  • doc [docX.mouse]-${sameId} has multiple revisions in his database
  • client A wants to publish his changes, aka. replicate [docX.mouse]-${sameId} to SGW
  • logically thinking conflicts on [docX.mouse]-${sameId} cannot happen because the [docX.mouse]-${sameId} is completely new (with UUID), nobody knew of it's existence before

Here is a better example:

            const docs = [];
            var sameId = generateUuid();

            // this is the initial document
            // client is OFFLINE
            docs.push({ 
              _id: `[docX.mouse]-${sameId}`,
              _rev: `1-A1`,
              sound: 'squeek',
            });

            // later on client realized that the mouse should do "meow" sound
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `2-A2`,
              sound: 'meow',
            });

            // later on client thought that mouses do "roar" sounds
            docs.push({
              _id: `[docX.mouse]-${sameId}`,
              _rev: `3-A3`,
              sound: 'roar',
            });

            // client went ONLINE, thus the replication executed docsBulkWrite
            // I expect no conflicts with the mouse document
            
            sourceAdapter.docsBulkWrite(docs, false, (error, writeResults) => {
              callback(error, writeResults);
            });

Here is the code of the adapter docsBulkWrite:

  function docsBulkWrite(docsArray, newEdits, callback) {
    const bulkDocsBody = {
      docs: docsArray,
      new_edits: !!newEdits,
      all_or_nothing: false,
    };

    client.database.post_bulk_docs(
      databaseName,
      bulkDocsBody,
      (networkError, isSuccess, data, http) => {
        if (networkError) {
          return callback({ networkError, http });
        }
        if (!isSuccess) {
          return callback({ data, http });
        }

        const responsesByIdAndRev = data.reduce((acc, workingResponse) => {
          const id = workingResponse.id;
          if (typeof acc[id] === 'undefined') {
            acc[id] = {};
          }
          if (workingResponse.rev) {
            acc[id][workingResponse.rev] = workingResponse;
          } else {
            if (typeof acc[id].unknown === 'undefined') {
              acc[id].unknown = [];
            }
            acc[id].unknown.push(workingResponse);
          }
          return acc;
        }, {});

        let writeResults;
        if (bulkDocsBody.new_edits === false) {
          // response doesnt contain 'ok' nut contains id & rev

          writeResults = docsArray.map((doc) => {
            var id = doc._id;
            var rev = doc._rev;

            const responsesByRev = responsesByIdAndRev[id];
            const workingResponse = responsesByRev[rev]
              ? responsesByRev[rev]
              : responsesByRev.unknown.length > 0
                ? responsesByRev.unknown.shift()
                : new Error(`Missing response for ${id}/${rev}`);

            if (!workingResponse) {
              // assume error when no response
              return new WriteResult(
                WriteStatus.ERROR,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            } else if (workingResponse.ok) {
              return new WriteResult(
                WriteStatus.OK,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            } else if (
              workingResponse.ok === undefined &&
              workingResponse.id &&
              workingResponse.rev
            ) {
              return new WriteResult(
                WriteStatus.OK,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            }
            return new WriteResult(
              WriteStatus.ERROR,
              doc,
              doc._id,
              doc._rev,
              workingResponse,
            );
          });
        } else {
          writeResults = docsArray.map((doc) => {
            const idRev = doc._id + doc._rev;

            const workingResponse = responsesByIdAndRev[idRev];
            if (!workingResponse) {
              // assume error when no response
              return new WriteResult(
                WriteStatus.ERROR,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            } else if (workingResponse.ok) {
              return new WriteResult(
                WriteStatus.OK,
                doc,
                doc._id,
                doc._rev,
                null,
              );
            }
            return new WriteResult(
              WriteStatus.ERROR,
              doc,
              doc._id,
              doc._rev,
              workingResponse,
            );
          });
        }

        return callback(null, writeResults);
      },
    );
  }

After the code executes, I expect the SGW to contain NO conflicts since the REVs are in-tact.
Inside the docsBulkWrite callback:
image
error is null
writeResults are all ok

HOWEVER, the database testing_1525110520386 - which is the source database, shows conflicts as you can see in the picture below:
image

Extra: Here is the raw document in question (parent = -1):
image

I would expect the document to have NO conflicts.

Why does it happen? Is it an SGW error?

@episage
Copy link
Author

episage commented Apr 30, 2018

NB, the documentation for SGW regarding replication is not-for-humans. Every time replication topic comes back, it makes me sick.
Especially that meaningless nesting of variables/arrays/objects such as "ok" in JSON data structures.

@tleyden
Copy link
Contributor

tleyden commented Apr 30, 2018

I want to support offline-first functionality in my app.

What kind of app is this? I take it this isn't using a Couchbase Lite SDK? Is this a Phonegap-style app with "custom sync" to Sync Gateway?

Client A was offline for 1 hour.

Is Client A creating conflicting revisions with it's own changes, or changes from another client?

@tleyden
Copy link
Contributor

tleyden commented Apr 30, 2018

From a high level design perspective, I would say that conflicts should only arise during concurrent offline updates to a doc by different devices:

  • Device 1: Pull doc1, rev1-a
  • Device 2: Pull doc1, rev1-a
  • Device 1 + 2: Go offline
  • Device 1: Write doc1, rev2-a
  • Device 2: Write doc1, rev2-b
  • Device 1 + 2: Go online

at this point they will need to push conflicting revisions that are in conflict, since they diverged off of "doc1, rev1-a".

OTOH, if the updates are on a single device, then you would always base changes on the previous revision, and you'd get a linear (non-conflicting) history of updates.

Just want to focus on the high level usage before trying to drill in on the lower level potential API bugs.

@tleyden
Copy link
Contributor

tleyden commented Apr 30, 2018

I think I see what's happening:

Extra: Here is the raw document in question (parent = -1):

So it's creating three separate root revisions. In other words, your revision 2-A2 is being specified as having no parent, so it ends up as a root revision.

In the documentation for the bulk_docs endpoint, it says:

_rev | stringRevision identifier of the parent revision the new one should replace. (Not used when creating a new document.)

What might be happening this that you are sending _rev: 2-A2" but Sync Gateway doesn't have an existing _rev: 2-A2, it's probably creating a new root revision for it. Regardless though, I'm pretty sure that you aren't' using the SG replication API the same way that Couchbase Lite is doing.

Overall, it looks like you are recreating a lot of the functionality that Couchbase Lite provides rather than using Couchbase Lite, since maybe it doesn't suit your use case. If that's the case, you'll probably have to dig deeper into the Couchbase Lite code to see how it works (before it moved to the websocket based replication protocol, if you want to push over HTTP/REST).

Another approach that might be easier to read the Couchbase Lite code is just to run the Todo sample app and sniff the replication using tools like:

  • Wireshark
  • ngrep
  • Charles Proxy / HTTPScoop

That way you'll be able to see examples of the replication API usage.

Closing the issue because I don't see enough evidence of this being a Sync Gateway bug.

@tleyden tleyden closed this as completed Apr 30, 2018
@tleyden
Copy link
Contributor

tleyden commented Apr 30, 2018

@episage one other thing that might interest you is the /{db}/_revtree/{doc} endpoint:

https://developer.couchbase.com/documentation/mobile/2.0/references/sync-gateway/admin-rest-api/index.html?v=2.0#/document/get__db___revtree__doc_

It returns a graphviz diagram of the revtree that can be visualized using graphviz (or webgraphviz.com if you don't want to install anything)

Some examples of what the rendered revtrees look like (some with multiple-roots): #2847 (comment)

@episage
Copy link
Author

episage commented Apr 30, 2018

@tleyden I'm writing SGW JS client (for browsers).
Thanks for the links

@tleyden
Copy link
Contributor

tleyden commented Apr 30, 2018

Have you checked out PouchDB?

You could probably leverage a lot of their existing work .. or at least use it as comparison for debugging (especially coupled w/ the network sniffing tools I posted).

There has been few SGW tickets from people who are using PouchDB and ran into some minor compatibility issues.

@episage
Copy link
Author

episage commented Apr 30, 2018

@tleyden I did many times and had many attempts at it but it's not compatible with SGW. It used to be a couple of years ago.

@episage
Copy link
Author

episage commented Apr 30, 2018

and the minor compatibility issues are really major

@tleyden
Copy link
Contributor

tleyden commented Apr 30, 2018

Ok good to know

@jamesnocentini
Copy link
Contributor

the documentation for SGW regarding replication is not-for-humans...Especially that meaningless nesting of variables/arrays/objects such as "ok" in JSON data structures.

@episage could you share a link to an example of that? It probably needs to be updated.

@episage
Copy link
Author

episage commented May 3, 2018

@jamiltz AFAIK Couchbase/Sync Gateway don't have any documentation regarding writing a replicator. Thus, it's very difficult to write one.
I made a lot of effort to combine CouchDB documentation and own findings to write a working replicator in JS (browser-compatible JS).
By working, I mean one that doesn't contain bugs. It's very easy to make one in the replication protocol.

In terms of conflict resolution the only article that has some meaningful code and notes is this one:
https://developer.couchbase.com/documentation/mobile/current/guides/sync-gateway/resolving-conflicts/index.html
In terms of quality of code in the article, ehm... it's disputable. I had hard time understanding how to resolve a conflict and how to create it.
Why is the article hard to comprehend?

  • follows JS anti-patterns such as well-known callback-hell
  • It's one, huge wall of text instead of step by step instruction with examples
  • It's missing critical information (I guess the code will work when you deploy it in testing env but it's definitely not production ready)
  • the missing bit, which is also missing in the SGW/CBS documentation, is poorly documented _revisions parameter which is crucial when creating/updaing documents (that was my problem)
  • I think there are more issues but I have finally got the code working so I did not focus on proofreading

@jamesnocentini
Copy link
Contributor

@episage Thanks for the detailed feedback! Those are all valid points and noted https://issues.couchbase.com/browse/DOC-3575

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants