New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for bulk get with Accept:"multipart/mixed" or "multipart/related" #1195

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@AlexanderKaraberov
Contributor

AlexanderKaraberov commented Mar 2, 2018

Overview

CouchDB currently has support for _bulk_get requests which are intended to improve performance of client pull replications, by allowing the client to request multiple documents in one request. But current implementation has some limitations namely: attachment bodies can only be encoded inline (base64) not as MIME multipart bodies. Another issue which is much more important I believe: current _bulk_get implementation in CouchDB is not compatible with the Couchbase Lite 1.4.x for iOS and Android which powers plenty of CouchDB mobile clients (in our company as well) as a replication and persistence layer. This is because CBL's BulkDownloader accepts multipart/related content type and not application/json. I understand that probably this was done to make life easier for web clients and in-browser DBs such as PouchDB or whatnot, but still this functionality might be extended to provide even more use cases for CouchDB’s _bulk_get. In order to sum up I believe that this is a must-have feature.
This PR adds support for multipart/related/multipart/mixed bulk get response which is compatible with Couchbase Lite and Sync Gateway implementations. At first I implemented this functionality in our own CouchDB 2.x fork which we run in production, because this was very crucial for us, as I mentioned before, but then we realised that it would be great to contribute this to the upstream CouchDB as well.
Logic and implementation is pretty straightforward and compact: depending on the case MochiReq:accepts_content_type("multipart/mixed" or "multipart/related") I pick either a default CouchDB's _bulk_get implementation or a multipart one which is based on existing couch_doc:doc_to_multi_part_stream() and couch_httpd_db:encode_multipart_stream() functionality.
We’ve already tested this with some of our production CouchDB databases and speedup was very perceptible. I think this feature will be very useful because there may be a lot of Couchbase Lite <-> CouchDB users who can't benefit from the _bulk_get because of application/json-only response. Moreover this is a completely additive change which doesn't break or change existing functionality. Test suite is completely based on the chttpd_db_bulk_get_test I've just changed test fixture and some helpers a little bit.

Testing recommendations

For testing create a db1, put some docs there:

curl -u admin:pass -X PUT http://127.0.0.1:5983/db1
curl -u admin:pass -H "Content-Type: application/json" -X PUT http://127.0.0.1:5983/db1/doc1 -d '{"type":"a"}'
curl -u admin:pass -H "Content-Type: application/json" -X PUT http://127.0.0.1:5983/db1/doc2 -d '{"type":"a"}'
curl -u admin:pass -H "Content-Type: application/json" -X PUT http://127.0.0.1:5983/db1/doc3 -d '{"type":"b"}'

Also add attachments for some of the docs (.png images or whatnot).
Add this request body to bulkGetBody.txtfile (revisions have to be the same as in the corresponding documents):

{
	"docs": [
		{
			"id": "doc1",
			"rev": "3-22e6a8c0ef498f4d23b76ecaa58125ab"
		},
		{
			"id": "doc2",
			"rev": "4-cf4a89993599f6e8c0e3b22d617284b5"
		},
		{
			"id": "doc3",
			"rev": "14-db7f140e90ac3c45f7610eb496d1f883"
		}
	]
}

After this you can execute this curl command:

curl -u admin:pass -X POST -H "Content-Type: application/json" -H "Accept: multipart/related" -H "X-Accept-Part-Encoding: gzip" --data "@bulkGetBody.txt" -o "response.txt" http://127.0.0.1:5983/db1/_bulk_get?revs=true&attachments=true

Then in the response.txt you can observe this:


-----------------------------6ee0167da37f8532e116048afff16018
Content-Type: application/json

{"_id":"doc1","_rev":"3-22e6a8c0ef498f4d23b76ecaa58125ab","type":"global-local","_revisions":{"start":3,"ids":["22e6a8c0ef498f4d23b76ecaa58125ab","2a72890299a3d2039bdbcf8ef995e11a","3b39bc699524dd81add4d6c082a7246c"]}}
-----------------------------6ee0167da37f8532e116048afff16018
X-Doc-Id: doc2
X-Rev-Id: 4-cf4a89993599f6e8c0e3b22d617284b5
Content-Type: multipart/related; boundary=---------------------------57057c1b37c205a20e0e005511f26fea

-----------------------------57057c1b37c205a20e0e005511f26fea
Content-Type: application/json

{"_id":"doc2","_rev":"4-cf4a89993599f6e8c0e3b22d617284b5","type":"abcd","_revisions":{"start":4,"ids":["cf4a89993599f6e8c0e3b22d617284b5","575eaa2b49d3ffac7130596fc5d087b8","704d58f7a66bf744f6944223eabc7c11","3b39bc699524dd81add4d6c082a7246c"]},"_attachments":{"Screen Shot 2018-03-02 at 17.12.04.png":{"content_type":"image/png","revpos":4,"digest":"md5-8MvuKqISDqfpg6YbUNhAXQ==","length":5942,"follows":true}}}
-----------------------------57057c1b37c205a20e0e005511f26fea
Content-Disposition: attachment; filename="Screen Shot 2018-03-02 at 17.12.04.png"
Content-Type: image/png
Content-Length: 5942

�PNG
�

IHDR�~����W:M�iCCPICC ProfileH��W�XS���[R�	-��)�w�W�]�t������ !���E�ׂ��EEWET\� kŮ,���@Eee],�Py�������}s��3����ɹ�����*(�E����������i�L�#��P�t`�b������������Mh
嚝4ֿ��WQ�p�l�8�39bv�ć������"�mPo2��@�� ��A��q)�ɱ��gʱ��&1>�b��T�K��@EʛY���8*R��B�@��&�}�|���������C�J��2��8�������b�F�<�������\������[�r%#k��A�����9ú��ɏ�b*�DŽ�1��k@|Q���K�]�$<Ia��������(ం#!փ�!�I
X@�x��������ٌ�>�@	��".�����f���꿌j�O;�%�-�y䀧��Ẹ/��G��?�N���9��T�Y��B&��C�V�<ؐu.�" �7�H�����\�#9|�GxJ�$<"� t���d�D�Ea5MP*��9�D�n�-T�]����搵+���@��;��u����3	��`n�P�=C�(�o��q=)���Q�U�U\�,2G���Q����}W#�|G�h�-��a����%������I��kÎK�h'<�u��j�2n90�`�ơޡ����k���K�%.��(�~A��3E�����wc.3Bȶ�e:98z ���[���l�F����
�^�F�ͮ�.o[������{7����·��=>^�������Ϥ���X}i�����p��p�KĒ��08Ь,^��������PT�w/� ��������3���������D�3J
�f�S�[z�N������P�8��I��
o0����o�� ��E4<<�yx���H���
�w>����~���u���?�?��Hm�����	pHYs�%�%�IR$���iTXtXML:com.adobe.xmp<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.4.0">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:exif="http://ns.adobe.com/exif/1.0/">
         <exif:PixelXDimension>1150</exif:PixelXDimension>
         <exif:PixelYDimension>24</exif:PixelYDimension>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>
�T5j�iDOT�(��8�j���IDATx���{PUE���^.���/�q4T4_�iS��2�b�������$��5֨�5����f�$N����#�GÐ�������7�	>������];g.WA����w�t��ݳ������=�
��:������D��u���R�����(@�
P����Z���\Y��EǪ�5PMQ�#�����ܼ*G����`�*뫫������@YY�l����.m[�|�-�[�.m��`���r�������x�]/|��Rmc]�mGn
\���� @�����( ����ܙ�2�����t�����IEND�B`�
-----------------------------57057c1b37c205a20e0e005511f26fea--
-----------------------------6ee0167da37f8532e116048afff16018
Content-Type: application/json

{"_id":"doc3","_rev":"14-db7f140e90ac3c45f7610eb496d1f883","type":"device desc","owner":"test1","_revisions":{"start":14,"ids":["db7f140e90ac3c45f7610eb496d1f883","75b189962e4db1506bdefba4e8a53506","2697b5853e6c32f1739c41482aa5f1ef","63c4159c84a80d37323b01aec7c161ca","5595d98f421c07b52a0f0cb11258c477","768778b61cbdd6b0a494983ba8ffb5da","656300127ed067c36c36f28f0e6103ed","645f7fe82e9f657079081aac88cfbfa7","c1dc26224c314cda781e630f116a691a","a1f39d16e73806dc1d65809714b1a95f","8623367b1c730c7d40e6b389dbdd8ae4","a7e171c070e7652a2b2a601d5e6ab49f","c4b0a5d0bfa102ba443a42f7e3262099","27a6e24202a782a849a36f25559a6773"]}}
-----------------------------6ee0167da37f8532e116048afff16018--

This format is completely compatible with Couchbase Lite and also Sync Gateway implementation.
For testing you can also use Couchbase Lite database. In order to do this just set _canBulkGet variable to YES and then replicate db1 to iOS mobile device.

I checked Documentation reflects the changes checkbox because I described the change quite meticulously and if this PR will be merged I would be delighted to update CouchDB docs as well. By the way I wasn't able to find a description for current _bulk_doc request in the docs either so I can contribute to both.

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

@AlexanderKaraberov AlexanderKaraberov changed the title from Add support for bulk get with Accept Content Type: "multipart/mixed", "multipart/related" to Add support for bulk get with Accept:"multipart/mixed" or "multipart/related" Mar 2, 2018

@janl

This comment has been minimized.

Member

janl commented Mar 9, 2018

Hi @AlexanderKaraberov,

this is pretty cool. Before I do a more thorough review, I’m wondering why you didn’t opt to re-use couch_doc:doc_to_multi_part_stream() / couch_httpd_db:encode_multipart_stream() for the multipart streaming?

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Mar 9, 2018

Hi @janl , thanks for a good question.
Long story short: I tried to re-use couch_doc:doc_to_multi_part_stream() but it didn’t work out for me, because content headers wasn’t encoded properly. Hence I decided to write a custom function instead of modifying/breaking existing functionality. Now I understand that this happened for the reason that I didn’t use (didn’t know about) couch_httpd_db:encode_multipart_stream() existence. Function name looks like exactly what I needed at that point of time. If you don’t mind I can try to reimplement this functionality using suggested function and test whether this will work with Sync Gateway.

@janl

This comment has been minimized.

Member

janl commented Mar 10, 2018

Yeah, I really don't feel comfortable having two multipart implementations in CouchDB. We already don't like the one we have. ;)

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Mar 15, 2018

Hi @janl , thank you for your advice to stick to the couch_doc:doc_to_multi_part_stream() and couch_httpd_db:encode_multipart_stream() functions. I realised that I wrote a lot of duplicated code :)
I pushed a revised implementation.

@wohali wohali requested a review from nickva Mar 26, 2018

@wohali wohali added the enhancement label Mar 27, 2018

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Jul 18, 2018

Hi,
is there a small chance this PR might be reviewed before the 2.2.0 release? It would be nice to have this functionality in the CouchDB 2.2

@wohali

This comment has been minimized.

Member

wohali commented Aug 6, 2018

@AlexanderKaraberov Sorry about not being able to get this included in 2.2.0, but it might land for 2.2.1 or 2.3.0.

@mikerhodes @davisp another one that could use your reviews. Would also love any comments from the PouchDB crew e.g. @garrensmith @daleharvey

@wohali

This comment has been minimized.

Member

wohali commented Nov 7, 2018

@AlexanderKaraberov Can you resolve the conflicts on this PR please?

@garrensmith do you think you could review this PR please?

Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
@nickva

Great work! Thanks for contributing and sorry for taking so long to review. Added a few comments. There are some functional changes and some style comments.

Don't forget to rebase on master and squash the commits. There is a merge conflict, where we updated options to take user's context.

Thank you!

Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated

@AlexanderKaraberov AlexanderKaraberov force-pushed the Spotme:spotme/bulk_get_multipart branch 4 times, most recently from 2194d92 to 9f0a7ca Nov 9, 2018

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Nov 9, 2018

Hello @nickva
Once again thank you for a thorough review and your comments. I've addressed all the requested changes as well as fixed formatting style for the whole multipart case clause. All the commits squashed into one, and branch is rebased onto master

@AlexanderKaraberov AlexanderKaraberov force-pushed the Spotme:spotme/bulk_get_multipart branch from 9f0a7ca to b3ddb8f Nov 9, 2018

@nickva

This comment has been minimized.

Contributor

nickva commented Nov 9, 2018

Looks much better. Nice work cleaning it up.

I see eunit test failures locally and in Travis. Not sure exactly why, maybe because of the name change of the function or maybe something on master changed?

Also see a minor comment about simplifying meck unloading.

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Nov 9, 2018

Valid point. meck:unload() has been incorporated. As for failed test this failure is due to the fact Options list which is passed to the fabric_doc_open_revs:go() has to contain user_ctx. Fixed as well. Reran chttpd tests locally: All 232 tests passed.

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Nov 9, 2018

@nickva I checked Travis logs and seemingly only one job Job #6237.5 for OTP 17.5 failed for some odd reason. All other ones are successful as well as chttpd for the failed one. I've skimmed through the logs of this job and I do believe this error has nothing to do with my functionality. Perhaps some configs are wrong:

 Failed: 0.  Skipped: 0.  Passed: 1004.
One or more tests were cancelled.
ERROR: One or more eunit tests failed.
ERROR: eunit failed while processing /home/travis/build/apache/couchdb/src/couch: rebar_abort
make[1]: *** [eunit] Error 1
make[1]: Leaving directory `/home/travis/build/apache/couchdb'
make: *** [check] Error 2
The command "make check" exited with 2.
@nickva

This comment has been minimized.

Contributor

nickva commented Nov 9, 2018

Ah it's probably flaky, I'll try to restart it

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Nov 9, 2018

@nickva Thanks! Indeed it was some intermittent failure. All is good now. Let me know if I need to address something else before this can be merged into master.

@nickva

You can also squash the commits if you fix it up and rebase again. There were some recent changes in the replicator but shouldn't affect this PR.

Thank you again for you contribution.

Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated
Show resolved Hide resolved src/chttpd/src/chttpd_db.erl Outdated

@AlexanderKaraberov AlexanderKaraberov force-pushed the Spotme:spotme/bulk_get_multipart branch from cb2dca5 to a3c1098 Nov 12, 2018

@AlexanderKaraberov AlexanderKaraberov force-pushed the Spotme:spotme/bulk_get_multipart branch from a3c1098 to 4434bc4 Nov 12, 2018

@nickva

nickva approved these changes Nov 12, 2018

@nickva

This comment has been minimized.

Contributor

nickva commented Nov 12, 2018

@AlexanderKaraberov

+1

Thank you for your contribution and for your patience when reviewing the PR

Let's do a final ping on this to other developers who looked at this before

@garrensmith @rnewson @janl @wohali
Any objections or concerns to merging this?

@nickva

This comment has been minimized.

Contributor

nickva commented Nov 14, 2018

Added a rebased on master PR #1195

Not sure why "update branch" button in GH doesn't allow rebasing on master :-)

@nickva

This comment has been minimized.

Contributor

nickva commented Nov 14, 2018

@AlexanderKaraberov

Thank you again for you contribution. I rebased and merged your PR.

Please make a documentation PR to go along with it.

@nickva nickva closed this Nov 14, 2018

@AlexanderKaraberov

This comment has been minimized.

Contributor

AlexanderKaraberov commented Nov 16, 2018

@nickva Thanks! Sure, I shall allocate some time in order to update related documentation and submit a PR for it.

@wohali

This comment has been minimized.

Member

wohali commented Nov 16, 2018

@AlexanderKaraberov Thanks, please be advised I am moving swiftly into 2.3.0 release time, so a little urgency would be much appreciated. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment