Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common/OpHistory: move insert/cleanup into separate thread #20540

Merged
merged 6 commits into from Feb 28, 2018

Conversation

Projects
None yet
4 participants
@branch-predictor
Copy link
Contributor

commented Feb 22, 2018

Cluster that's flooded with incoming ops (and enabled optracker) is bottlenecked by OpHistory::insert. Reduce that by:

  • pushing incoming ops into separate queue that'll be processed by separate thread.
  • using std::atomic_bool for shutdown flag so ops_history_lock doesn't need to be taken as often

My initial testing has shown this noticeably reduced optracker impact on cluster perfornance:

optracker

Using separate thread ("threaded optracker") didn't improve things by much, neither did replacing OpHistorySvc thread mutex with spinlock ("threaded optracker + spin"). Removing the ops_history_lock from the processing path (by either removing it entirely or replacing shutdown bool flag with atomic) did the trick and the optracker perf impact is still there, albeit much smaller.
Note that I intentionally used spin loop with scaling sleep, as conditional variables/signaling turned out to be too slow for this purpose and it actually made it work much worse. Side effect of scaling sleep is that it reduces cpu time consumed by OpHistorySvc thread as it processes data in batches. This might incur some data latency in OpHistory, but up to around 128ms - data is still guaranteed to go in FIFO order.

Signed-off-by: Piotr Dałek piotr.dalek@corp.ovh.com

branch-predictor added some commits Feb 20, 2018

common/TrackedOp: use emplace_back
Replace push_back with explicit constructor with push_back for
minor perf increase.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
common/TrackedOp: get duration just once when inserting
No need to do this twice.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
_break_thread(false) { }

void BreakThread();
void InsertOp(utime_t& now, TrackedOpRef op);

This comment has been minimized.

Copy link
@liewegas

liewegas Feb 22, 2018

Member

lower_case not CamelCaps for these please

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor Feb 23, 2018

Author Contributor

Oops, forgot.

return;

opsvc.InsertOp(now, op);
}

This comment has been minimized.

Copy link
@liewegas

liewegas Feb 22, 2018

Member

would it make sense to put this function in teh header so the bool check gets inlined? that'll let us skip one additional stack frame/function call

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor Feb 23, 2018

Author Contributor

I'll check. Maybe it'll make sense to inline also OpServiceThread::insert_op().

@liewegas
Copy link
Member

left a comment

awesome to see this mitigates most of the current optracker overhead! just a few style nits

@gregsfortytwo
Copy link
Member

left a comment

Yep, good design but a few more nits. :)

~OpHistory() {
assert(arrived.empty());
assert(duration.empty());
assert(slow_op.empty());
}
void insert(utime_t now, TrackedOpRef op);
void insert(utime_t& now, TrackedOpRef op);
void _insert_delayed(utime_t& now, TrackedOpRef op);

This comment has been minimized.

Copy link
@gregsfortytwo

gregsfortytwo Feb 23, 2018

Member

Please make any pass-by-reference values const. (It's part of our style guide and assures users that the function won't modify their value.)

opsvc.InsertOp(now, op);
}

void OpHistory::_insert_delayed(utime_t& now, TrackedOpRef op)

This comment has been minimized.

Copy link
@gregsfortytwo

gregsfortytwo Feb 23, 2018

Member

Can we come up with a better name than _insert_delayed(), given that it's now finished being delayed? I think I'd prefer even just _insert() if nothing better suggests itself.


void OpHistoryServiceThread::InsertOp(utime_t& now, TrackedOpRef op) {
queue_spinlock.lock();
_external_queue.emplace_back(now, op);

This comment has been minimized.

Copy link
@gregsfortytwo

gregsfortytwo Feb 23, 2018

Member

This allocates, right? Is using a mutex really not okay here?

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor Feb 23, 2018

Author Contributor

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

@branch-predictor branch-predictor force-pushed the ovh:bp-optracker-cleanup branch from 4a368ab to 313a35c Feb 23, 2018

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

gregsfortytwo wrote

This allocates, right? Is using a mutex really not okay here?

branch-predictor wrote

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

Yeah, I get that, but there are reasons people tell you not to use spinlocks when memory allocation might happen. So I wonder if we can do some trick to allocate the memory and then put it on the end of the list. I'm probably worrying about it too much given the simple case presented here, though.

(Or, probably out of scope here, but did you consider giving a separate queue to each OSD op thread and having the OpTracker one pick off the front of each? That would greatly reduce the sharing...I notice you emphasize the preserved ordering but the OpTracker worker could handle that by walking forward with timestamp comparisons, and I don't think it's that big a deal anyway.)

@branch-predictor

This comment has been minimized.

Copy link
Contributor Author

commented Feb 26, 2018

gregsfortytwo wrote

gregsfortytwo wrote

This allocates, right? Is using a mutex really not okay here?

branch-predictor wrote

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

Yeah, I get that, but there are reasons people tell you not to use spinlocks when memory allocation might happen. So I wonder if we can do some trick to allocate the memory and then put it on the end of the list. I'm probably worrying about it too much given the simple case presented here, though.

That's what I'm thinking too. I mean, sure - your concerns are perfectly valid, but it's not like I'm allocating megabytes or even kilobytes of data. Just a few hundred of bytes max, should be little enough to not get slowed down badly by memory allocator.

(Or, probably out of scope here, but did you consider giving a separate queue to each OSD op thread and having the OpTracker one pick off the front of each? That would greatly reduce the sharing...I notice you emphasize the preserved ordering but the OpTracker worker could handle that by walking forward with timestamp comparisons, and I don't think it's that big a deal anyway.)

Yeah, I've been thinking about it too, but at this point I think it's too early for that. Let's see how far this one takes us, and then let's optimize further. I already see few possibilities to optimize it without complicating matters much.

@branch-predictor branch-predictor force-pushed the ovh:bp-optracker-cleanup branch from abbbcd2 to 52d8f09 Feb 26, 2018

branch-predictor added some commits Feb 22, 2018

common/OpHistory: move insert/cleanup into separate thread
Cluster that's flooded with incoming ops (and enabled optracker)
is bottlenecked by OpHistory::insert. Reduce that by:
- pushing incoming ops into separate queue that'll be processed by
  separate thread.
- using std::atomic_bool for shutdown flag so ops_history_lock doesn't
  need to be taken as often

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
mon/OSDMonitor: remove op_tracker
It's unused anyway.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
mon/Monitor: add missing shutdown of OpTracker
Now that it has its own processing thread, it must be shut down
explicitly or it'll sigsegv randomly.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
common/OpTracker: reorder fields
Reorder smaller fields around so they're aligning naturally,
regaining a few bytes of storage.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>

@branch-predictor branch-predictor force-pushed the ovh:bp-optracker-cleanup branch from 52d8f09 to 136642d Feb 26, 2018

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented Feb 26, 2018

Be nice to squash a few of those down, but looks good.
@liewegas, looks like he got your named issues, can you approve?

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented Feb 26, 2018

Eek, wrong button.

@tchaikov tchaikov merged commit 1b8ad4a into ceph:master Feb 28, 2018

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
@branch-predictor

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2018

@liewegas @gregsfortytwo does this qualify for backport to luminous?

@gregsfortytwo

This comment has been minimized.

Copy link
Member

commented Feb 28, 2018

I don't have strong feelings either way. It probably qualifies but it's enough of a change I wouldn't do it immediately or casually, I guess?

@liewegas

This comment has been minimized.

Copy link
Member

commented Feb 28, 2018

Yeah, let's give it some time in master to make sure there isn't fallout before backporting

@branch-predictor branch-predictor deleted the ovh:bp-optracker-cleanup branch May 27, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.