New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of #leaf.atts keys #1560

Merged
merged 1 commit into from Aug 15, 2018

Conversation

Projects
None yet
4 participants
@nickva
Contributor

nickva commented Aug 15, 2018

#leaf.atts data structure is a [{Position, AttachmentLength}, ...] proplist
which keeps track of attachment lengths and it is used when calculating
external data size of documents. Position is supposed to uniquely identify an
attachment in a file stream. Initially it was just an integer file offset. Then,
after some refactoring work it became a list of {Position, Size} tuples.

During the PSE work streams were abstracted such that each engine can supply
its own stream implementation. The position in the stream then became a tuple
that looks like {couch_bt_engine_stream,{<0.1922.0>,[{4267,21}]}}. This was
written to the file the #leaf.atts data structure. While still correct, it is
unnecessarily verbose wasting around 100 bytes per attachment, per leaf.

To fix it use the disk serialized version of the stream position as returned
from couch_stream:to_disk_term. In case of the default CouchDB engine
implementation, this should avoid writing the module name and the pid value for
each attachment entry.

Reduce size of #leaf.atts keys
`#leaf.atts` data structure is a `[{Position, AttachmentLength}, ...]` proplist
which keeps track of attachment lengths and it is used when calculating
external data size of documents. `Position` is supposed to uniquely identify an
attachment in a file stream. Initially it was just an integer file offset. Then,
after some refactoring work it became a list of `{Position, Size}` tuples.

During the PSE work streams were abstracted such that each engine can supply
its own stream implementation. The position in the stream then became a tuple
that looks like `{couch_bt_engine_stream,{<0.1922.0>,[{4267,21}]}}`. This was
written to the file the `#leaf.atts` data structure. While still correct, it is
unnecessarily verbose wasting around 100 bytes per attachment, per leaf.

To fix it use the disk serialized version of the stream position as returned
from `couch_stream:to_disk_term`. In case of the default CouchDB engine
implementation, this should avoid writing the module name and the pid value for
each attachment entry.

@nickva nickva force-pushed the cloudant:reduce-leaf-atts-key-size branch from 1e44c6d to 861a3c0 Aug 15, 2018

@eiri

This comment has been minimized.

Member

eiri commented Aug 15, 2018

This seems like a good fix to me and I'm even fine with a lack of tests, this is an awkward place to test, but I'd love to get @davisp opinion on this. Reading around attachment, updater and streamer code made me very confident in my ability to miss something.

@jaydoane

This comment has been minimized.

Contributor

jaydoane commented Aug 15, 2018

Would it be useful to make some of the types explicit with the addition of some judicious typespecs?

@davisp

davisp approved these changes Aug 15, 2018

+1

@nickva

This comment has been minimized.

Contributor

nickva commented Aug 15, 2018

@jaydoane type spec might not be as useful in this case as stream pointer is essentially opaque given we have the PSE stream abstraction. It would be something line [{any(), non_neg_integer()}] which would still be correct for the old and new implementation. Just that any() got smaller now.

@nickva nickva merged commit 28ba48d into apache:master Aug 15, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nickva nickva deleted the cloudant:reduce-leaf-atts-key-size branch Aug 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment