Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid CPU spikes when abandoning skipped sequences #3823

Closed
adamcfraser opened this issue Nov 8, 2018 · 3 comments

Comments

@adamcfraser
Copy link
Contributor

commented Nov 8, 2018

Before abandoning skipped sequences, Sync Gateway issues a query as a final check for existence of the sequence in the bucket. When there are a large number of skipped sequences being abandoned at once, this is creating a spike in CPU that has the potential to impact other operations.

The skipped sequence query check needs to be treated as a low-priority background task that should not impact normal SG operations.

@adamcfraser adamcfraser added this to the Iridium milestone Nov 8, 2018

@adamcfraser adamcfraser added the ffc label Nov 8, 2018

@JFlath

This comment has been minimized.

Copy link
Collaborator

commented Nov 9, 2018

One thing that's come to mind in the past (particularly when looking at View ops) is that we do (did?) this individually per sequence number.

As I understand it, we do this checking on an interval, and then check for any that have been in the list for longer than MaxChannel.... This is important as it means that whenever we do this, there's a reasonably high chance that we'll be be doing a bunch of queries at once "block", rather than individually exactly 1hr after each went missing. With this in mind, is it worth pushing some batching down to the View/Query engine?

Say you're waiting for [9,10,11,12,13,14,15], you could run one query for the range, rather than 7 separate queries. Obviously, if your list is more like [9,1000,502000] then a single query might be too large to be efficient. If I'm thinking this through correctly, we'd have to assume that all sequence numbers not in the list were active (i.e. we can't assume that [9-1000] would only return two results), so maybe a reasonable threshold on what lump together...?

@adamcfraser

This comment has been minimized.

Copy link
Contributor Author

commented Nov 9, 2018

Batched queries is one of the options being considered. As you've identified, one of the challenges is calculating the point at which gaps in the skipped sequences result in single queries being more efficient.

@adamcfraser adamcfraser self-assigned this Nov 19, 2018

@adamcfraser adamcfraser added ready and removed backlog labels Jan 7, 2019

@adamcfraser

This comment has been minimized.

Copy link
Contributor Author

commented Jan 8, 2019

Should also refactor to minimize locking on skippedSeqLock during CleanSkippedSequenceQueue, to avoid blocking normal DCP processing during clean.

Should also ensure there aren't any feedback loops associated with pushing to the skipped sequence queue during Clean. Expectation is that this should generate back pressure on the DCP feed overall (and so TimeReceived isn't set), but should validate.

@adamcfraser adamcfraser added in progress review and removed ready labels Jan 18, 2019

@bbrks bbrks closed this in #3927 Jan 18, 2019

adamcfraser added a commit that referenced this issue Feb 7, 2019

adamcfraser added a commit that referenced this issue Feb 13, 2019

adamcfraser added a commit that referenced this issue Feb 13, 2019

2.1.2.1 backports (#3962)
* Remove histogram expvars

* Backport #3827 (fixes #3823) to 2.1.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.