Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid document body unmarshalling on changes requests #2428

Closed
adamcfraser opened this issue Mar 28, 2017 · 3 comments
Closed

Avoid document body unmarshalling on changes requests #2428

adamcfraser opened this issue Mar 28, 2017 · 3 comments
Assignees
Milestone

Comments

@adamcfraser
Copy link
Collaborator

adamcfraser commented Mar 28, 2017

Currently changes requests that specify style=all_docs have unnecessary performance overhead when retrieving branched documents.

changes.addDocToChangeEntry will retrieve the document from the bucket (to identify the set of leaf revisions). In the process it will unmarshal the entire doc.

Two enhancements:

  • Store the list of leaf rev ids in the revision cache along with the active revision, to avoid retrieving the document from the bucket when it's resident in the rev cache
  • When bucket retrieval is required, unmarshal only the rev history, not the full doc body.
@adamcfraser adamcfraser added this to the 1.4.1 milestone Mar 28, 2017
@djpongh djpongh modified the milestones: 1.4.2, 1.4.1, 2.0.0 Apr 5, 2017
@djpongh djpongh added icebox and removed backlog labels Apr 10, 2017
@djpongh djpongh modified the milestones: 1.4.1, 2.0.0 Apr 11, 2017
@djpongh djpongh added ready and removed icebox labels Apr 11, 2017
@adamcfraser
Copy link
Collaborator Author

For the second item ("When bucket retrieval is required, unmarshal only the rev history, not the full doc body.") - in the context of this ticket, this is specifically about the call to db.GetDoc in changes.addDocToChangeEntry.

One option would be to add a new db method (like 'GetDocSyncData') that only returns the sync data.

Another would be to modify the document struct to always lazy unmarshal the document body. This has the potential for more benefits across the entire codebase, but would require more significant refactoring - it might make more sense to hold that refactoring until 2.0, when the sync data moves to xattrs.

@tleyden
Copy link
Contributor

tleyden commented Apr 11, 2017

Notes to self:

  1. Get doc bytes by calling GetRaw
  2. Call UnmarshalDocumentSyncData

Benchmark:

  1. Pre-timer setup: Create 2k docs of size 50k, 1000 keys with branches, 1 parent + 2 child branches -- doesn't matter which API .. bucket api
  2. Start the timer
  3. Changes request of all docs (could also do GetDoc call, but misses other possible things). One shot, .. etc

@tleyden
Copy link
Contributor

tleyden commented Apr 12, 2017

Test results

Fix Status Millisconds/Op Bytes/Op Allocations/Op
Without Fix 5074 699042928 1031780
Without Fix 5127 699030160 1031734
Without Fix 4689 699035600 1031753
With Fix 1366 341696136 493655
With Fix 1334 341695456 493651
With Fix 1307 341695536 493652

Without fix: 8ef9a00
With fix: 341bf80

So it looks like a ~3.5X speedup

tleyden pushed a commit that referenced this issue Apr 12, 2017
NOTE: it uses the *testing.TB interface just where it needs to, but we should fix this everywhere in a different PR
adamcfraser pushed a commit that referenced this issue Apr 13, 2017
* Reproduces #2428 by adding BenchmarkChangesFeedDocUnmarashalling

NOTE: it uses the *testing.TB interface just where it needs to, but we should fix this everywhere in a different PR

* Fixes #2428 by using GetRaw and only parsing syncMeta in certain cases
@djpongh djpongh closed this as completed Apr 18, 2017
adamcfraser pushed a commit that referenced this issue May 5, 2017
* Reproduces #2428 by adding BenchmarkChangesFeedDocUnmarashalling

NOTE: it uses the *testing.TB interface just where it needs to, but we should fix this everywhere in a different PR

* Fixes #2428 by using GetRaw and only parsing syncMeta in certain cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants