Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement _bulk_get aka _bulk_revs #3424

Closed
nolanlawson opened this issue Jan 20, 2015 · 22 comments
Closed

Implement _bulk_get aka _bulk_revs #3424

nolanlawson opened this issue Jan 20, 2015 · 22 comments
Labels
enhancement Feature request

Comments

@nolanlawson
Copy link
Member

Counterpart to pouchdb/express-pouchdb#196.

@nolanlawson
Copy link
Member Author

It's debatable, though, whether we actually want to implement a db.bulkGet() or db.bulkRevs(). For local-local replication it doesn't offer any benefit, and may actually confuse users.

As a first pass, though, we can simply update replicate.js to use a server-side bulk endpoint if it exists.

@dholth
Copy link
Contributor

dholth commented Jan 27, 2015

@robertkeizer
Copy link
Contributor

+1

@nolanlawson
Copy link
Member Author

@daleharvey On the CouchDB side (COUCHDB-2310), there definitely seems to be an "intent to ship" in both 1.x and 2.x.

So how do you feel about adding db.bulkRevs()? I'm leaning towards implementing it at the PouchDB API level, i.e. not simply an endpoint on PouchDB Server that wraps Promise.all and db.get. Although most endusers will never use it themselves, it's nice for consistency's sake.

@daleharvey
Copy link
Member

I am not so sure about this, if couchdb do add it then we probably have to for compat, but I think we would be able to spec and implement stream before this bulkrevs is ever released and replication-stream is a much more desirable end goal, going from what we have now to adding bulkDocs / bulkRevs (+detection) to adding Streaming + detection doesnt seem ideal

@nolanlawson
Copy link
Member Author

So you think we should just jump directly to the replication-stream stuff?

@robertkeizer
Copy link
Contributor

I'm finding that a middleware layer that performs the replication-stream stuff works fairly well. Have some hacky code working for express-pouchdb but need to clean it up before committing.

Personally I'd like to see Pouch implement this without replication-stream functionality as soon as couch has it.

@dholth
Copy link
Contributor

dholth commented Feb 9, 2015

_bulk_revs (a more efficient API that can replace the 1-at-a-time
GET API) is useful for replication, but it's not necessarily just
about replication. You might just want to get a lot of things with a
single call.

On Mon, Feb 9, 2015, at 08:50 AM, Robert Keizer wrote:

I'm finding that a middleware layer that performs the
replication-stream stuff works fairly well. Have some hacky code
working for express-pouchdb but need to clean it up before committing.

Personally I'd like to see Pouch implement this without
replication-stream functionality as soon as couch has it.

— Reply to this email directly or view it on GitHub[1].

Links:

  1. Implement _bulk_get aka _bulk_revs #3424 (comment)

@daleharvey
Copy link
Member

You might just want to get a lot of things with a single call.

Hrm, I thought allDocs supported fetching multiple documents, doesnt look like so.

If couch implements it then we should, but in terms of focus for replication improvements I think -stream is the best thing to focus on, it is going to be very hard to implement any of these new changes well since they will be implemented at different times across different platforms, I think its worth focussing on what gets us the best bang for buck

@dholth
Copy link
Contributor

dholth commented Feb 9, 2015

Maybe it does. The thing I found irritating was that none of the proposed or existing bulk APIs supported ALL of the parameters allowed in a GET request, they are arbitrarily different making them harder to learn.

nolanlawson added a commit that referenced this issue May 2, 2015
Related to pouchdb/express-pouchdb#196 and
apache/couchdb-chttpd#33

This adds a `bulkGet()` function to all PouchDB
objects, which in the case of local databases runs
a shim that simply collates a bunch of `get()` requests.
This logic is in `bulk-get-shim.js`.

In the case of the http adapter, the client checks if
the server responds to `_bulk_get` with a 40x error. If
no error is returned, then the server is assumed to
implement `_bulk_get` and that API is used. Else the shim
is used.

I have tested this against CouchDB 1.6 and PouchDB Server
(82b297e) with 100% success in both. Also manual testing
showed that the `_bulk_get` API was used for PouchDB Server
but not CouchDB.

Three cheers for bulk replication!
nolanlawson added a commit to pouchdb/express-pouchdb that referenced this issue May 2, 2015
nolanlawson added a commit that referenced this issue May 2, 2015
Related to pouchdb/express-pouchdb#196 and
apache/couchdb-chttpd#33

This adds a `bulkGet()` function to all PouchDB
objects, which in the case of local databases runs
a shim that simply collates a bunch of `get()` requests.
This logic is in `bulk-get-shim.js`.

In the case of the http adapter, the client checks if
the server responds to `_bulk_get` with a 40x error. If
no error is returned, then the server is assumed to
implement `_bulk_get` and that API is used. Else the shim
is used.

I have tested this against CouchDB 1.6 and PouchDB Server
(82b297e) with 100% success in both. Also manual testing
showed that the `_bulk_get` API was used for PouchDB Server
but not CouchDB.

Three cheers for bulk replication!
nolanlawson added a commit to pouchdb/express-pouchdb that referenced this issue May 2, 2015
nolanlawson added a commit that referenced this issue May 2, 2015
Related to pouchdb/express-pouchdb#196 and
apache/couchdb-chttpd#33

This adds a `bulkGet()` function to all PouchDB
objects, which in the case of local databases runs
a shim that simply collates a bunch of `get()` requests.
This logic is in `bulk-get-shim.js`.

In the case of the http adapter, the client checks if
the server responds to `_bulk_get` with a 40x error. If
no error is returned, then the server is assumed to
implement `_bulk_get` and that API is used. Else the shim
is used.

I have tested this against CouchDB 1.6 and PouchDB Server
(82b297e) with 100% success in both. Also manual testing
showed that the `_bulk_get` API was used for PouchDB Server
but not CouchDB.

Three cheers for bulk replication!
nolanlawson added a commit to pouchdb/express-pouchdb that referenced this issue May 3, 2015
@daleharvey daleharvey added the enhancement Feature request label Aug 4, 2015
nolanlawson added a commit to pouchdb/express-pouchdb that referenced this issue Aug 16, 2015
nolanlawson added a commit that referenced this issue Aug 16, 2015
nolanlawson added a commit that referenced this issue Aug 16, 2015
@willholley
Copy link
Member

@nolanlawson when testing against the CouchDB implementation, our attachment tests are failing. Looking specifically at suite2 test.attachments.js- local:http / Attachments replicate back and forth, I see the following requests around the failure:

without _bulk_get:

GET http://localhost:2020/test_attach_remote/doc?revs=true&open_revs=[%221-67421d156cc0ba65d13b9c601c558381%22]&_nonce=1441282816675

returns

[{ "ok": {
  "_id": "doc",
  "_rev": "1-67421d156cc0ba65d13b9c601c558381",
  "_revisions": {
     "start": 1,
     "ids": [
        "67421d156cc0ba65d13b9c601c558381"
     ]
  },
  "_attachments": {
     "foo.txt": {
        "content_type": "text/plain",
        "revpos": 1,
        "digest": "md5-UVHkefsf5MqAl/2ALjx5Zg==",
        "length": 3,
        "stub": true
     }
  }
}}]

with _bulk_get:

 POST http://localhost:2020/test_attach_remote/_bulk_get?_nonce=1441284778447
 {"revs":true,"attachments":true,"binary":true,"docs":[{"id":"doc","rev":"1-67421d156cc0ba65d13b9c601c558381"}]}

returns

 {"results": [{
  "id": "doc", 
   "docs": [{
    "ok": {
     "_id":"doc",
     "_rev":"1-67421d156cc0ba65d13b9c601c558381",
     "_attachments":{
      "foo.txt":{
       "content_type":"text/plain",
       "revpos":1,
       "digest":"md5-UVHkefsf5MqAl/2ALjx5Zg==",
       "length":3,
       "stub":true
      }
     }
    }
   }]
 }]}

so the core information around the attachment looks the same.

However, when this is processed at

for (var key in docInfo.data._attachments) {
, the docInfo object is very different.

without _bulk_get:

    {
  "metadata": {
    "id": "doc2",
    "rev": "1-67421d156cc0ba65d13b9c601c558381",
    "revisions": {
      "start": 1,
      "ids": [
        "67421d156cc0ba65d13b9c601c558381"
      ]
    },
    "rev_tree": [
      {
        "pos": 1,
        "ids": [
          "67421d156cc0ba65d13b9c601c558381",
          {
            "status": "available"
          },
          []
        ]
      }
    ]
  },
  "data": {
    "_attachments": {
      "foo.txt": {
        "data": {},
        "digest": "md5-rL0Y20zC+Fzt72VPzMSk2A==",
        "content_type": "text/plain",
        "revpos": 1,
        "length": 3
      }
    }
  }
}

with bulk_get:

    {
  "metadata": {
    "id": "doc2",
    "rev": "1-67421d156cc0ba65d13b9c601c558381",
    "rev_tree": [
      {
        "pos": 1,
        "ids": [
          "67421d156cc0ba65d13b9c601c558381",
          {
            "status": "available"
          },
          []
        ]
      }
    ]
  },
  "data": {
    "_attachments": {
      "foo.txt": {
        "digest": "md5-UVHkefsf5MqAl/2ALjx5Zg==",
        "content_type": "text/plain",
        "length": 3,
        "revpos": 1,
        "stub": true
      }
    }
  }
}

Note that when using _bulk_get, the attachment remains as a stub and the digest is different (but matches what was returned from the server in both cases).

@willholley
Copy link
Member

I think the problem is simply that we don't call fetchAttachments for attachment stubs returned by _bulk_get

willholley added a commit that referenced this issue Sep 14, 2015
/_bulk_get requires that revs=true / attachments=true
are specified as query parameters rather than in the
document body.
nolanlawson added a commit that referenced this issue Sep 14, 2015
willholley added a commit that referenced this issue Sep 14, 2015
/_bulk_get requires that revs=true / attachments=true
are specified as query parameters rather than in the
document body.
@nolanlawson
Copy link
Member Author

_bulk_get will be released in pouchdb 5.0.0 and available in the next release of pouchdb server

@nolanlawson
Copy link
Member Author

this is merged now

@dholth
Copy link
Contributor

dholth commented Oct 5, 2015

:-)

@dholth
Copy link
Contributor

dholth commented Oct 6, 2015

So this means PouchDB will attempt to use _bulk_get for replication now?

@willholley
Copy link
Member

yep - if the endpoint is there, it will use it.

@ronycohen
Copy link

which current couchDB version will work with _bulk_get ?

@nolanlawson
Copy link
Member Author

CouchDB 2.0+ (unreleased) and PouchDB Server 1.0.0. Cloudant soon.

@ronycohen
Copy link

Thank you @nolanlawson

@yaronyg
Copy link
Member

yaronyg commented Oct 8, 2015

@nolanlawson @daleharvey At the risk of sounding like a hopeless fan boy this is super cool! Thank you!!!!

BTW, as a side note today we did a demo using PouchDB with memdown on top of node.ch on top of Windows IoT on top of a Raspberry PI 2. It worked super well! So my thanks only increases!

@nolanlawson
Copy link
Member Author

@yaronyg woop woop! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request
Projects
None yet
Development

No branches or pull requests

7 participants