Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common/OpHistory: move insert/cleanup into separate thread #20540

Merged
merged 6 commits into from Feb 28, 2018

Conversation

branch-predictor
Copy link
Contributor

Cluster that's flooded with incoming ops (and enabled optracker) is bottlenecked by OpHistory::insert. Reduce that by:

  • pushing incoming ops into separate queue that'll be processed by separate thread.
  • using std::atomic_bool for shutdown flag so ops_history_lock doesn't need to be taken as often

My initial testing has shown this noticeably reduced optracker impact on cluster perfornance:

optracker

Using separate thread ("threaded optracker") didn't improve things by much, neither did replacing OpHistorySvc thread mutex with spinlock ("threaded optracker + spin"). Removing the ops_history_lock from the processing path (by either removing it entirely or replacing shutdown bool flag with atomic) did the trick and the optracker perf impact is still there, albeit much smaller.
Note that I intentionally used spin loop with scaling sleep, as conditional variables/signaling turned out to be too slow for this purpose and it actually made it work much worse. Side effect of scaling sleep is that it reduces cpu time consumed by OpHistorySvc thread as it processes data in batches. This might incur some data latency in OpHistory, but up to around 128ms - data is still guaranteed to go in FIFO order.

Signed-off-by: Piotr Dałek piotr.dalek@corp.ovh.com

Replace push_back with explicit constructor with push_back for
minor perf increase.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
No need to do this twice.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
_break_thread(false) { }

void BreakThread();
void InsertOp(utime_t& now, TrackedOpRef op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower_case not CamelCaps for these please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot.

return;

opsvc.InsertOp(now, op);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to put this function in teh header so the bool check gets inlined? that'll let us skip one additional stack frame/function call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check. Maybe it'll make sense to inline also OpServiceThread::insert_op().

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome to see this mitigates most of the current optracker overhead! just a few style nits

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good design but a few more nits. :)

~OpHistory() {
assert(arrived.empty());
assert(duration.empty());
assert(slow_op.empty());
}
void insert(utime_t now, TrackedOpRef op);
void insert(utime_t& now, TrackedOpRef op);
void _insert_delayed(utime_t& now, TrackedOpRef op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make any pass-by-reference values const. (It's part of our style guide and assures users that the function won't modify their value.)

opsvc.InsertOp(now, op);
}

void OpHistory::_insert_delayed(utime_t& now, TrackedOpRef op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we come up with a better name than _insert_delayed(), given that it's now finished being delayed? I think I'd prefer even just _insert() if nothing better suggests itself.


void OpHistoryServiceThread::InsertOp(utime_t& now, TrackedOpRef op) {
queue_spinlock.lock();
_external_queue.emplace_back(now, op);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates, right? Is using a mutex really not okay here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

@gregsfortytwo
Copy link
Member

gregsfortytwo wrote

This allocates, right? Is using a mutex really not okay here?

branch-predictor wrote

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

Yeah, I get that, but there are reasons people tell you not to use spinlocks when memory allocation might happen. So I wonder if we can do some trick to allocate the memory and then put it on the end of the list. I'm probably worrying about it too much given the simple case presented here, though.

(Or, probably out of scope here, but did you consider giving a separate queue to each OSD op thread and having the OpTracker one pick off the front of each? That would greatly reduce the sharing...I notice you emphasize the preserved ordering but the OpTracker worker could handle that by walking forward with timestamp comparisons, and I don't think it's that big a deal anyway.)

@branch-predictor
Copy link
Contributor Author

gregsfortytwo wrote

gregsfortytwo wrote

This allocates, right? Is using a mutex really not okay here?

branch-predictor wrote

I wanted threads to wait for lock to be freed without being preempted. See the graph, there's a slight difference in how it affects machinery performance.

Yeah, I get that, but there are reasons people tell you not to use spinlocks when memory allocation might happen. So I wonder if we can do some trick to allocate the memory and then put it on the end of the list. I'm probably worrying about it too much given the simple case presented here, though.

That's what I'm thinking too. I mean, sure - your concerns are perfectly valid, but it's not like I'm allocating megabytes or even kilobytes of data. Just a few hundred of bytes max, should be little enough to not get slowed down badly by memory allocator.

(Or, probably out of scope here, but did you consider giving a separate queue to each OSD op thread and having the OpTracker one pick off the front of each? That would greatly reduce the sharing...I notice you emphasize the preserved ordering but the OpTracker worker could handle that by walking forward with timestamp comparisons, and I don't think it's that big a deal anyway.)

Yeah, I've been thinking about it too, but at this point I think it's too early for that. Let's see how far this one takes us, and then let's optimize further. I already see few possibilities to optimize it without complicating matters much.

Cluster that's flooded with incoming ops (and enabled optracker)
is bottlenecked by OpHistory::insert. Reduce that by:
- pushing incoming ops into separate queue that'll be processed by
  separate thread.
- using std::atomic_bool for shutdown flag so ops_history_lock doesn't
  need to be taken as often

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
It's unused anyway.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
Now that it has its own processing thread, it must be shut down
explicitly or it'll sigsegv randomly.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
Reorder smaller fields around so they're aligning naturally,
regaining a few bytes of storage.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
@gregsfortytwo
Copy link
Member

Be nice to squash a few of those down, but looks good.
@liewegas, looks like he got your named issues, can you approve?

@gregsfortytwo
Copy link
Member

Eek, wrong button.

@tchaikov tchaikov merged commit 1b8ad4a into ceph:master Feb 28, 2018
@branch-predictor
Copy link
Contributor Author

@liewegas @gregsfortytwo does this qualify for backport to luminous?

@gregsfortytwo
Copy link
Member

I don't have strong feelings either way. It probably qualifies but it's enough of a change I wouldn't do it immediately or casually, I guess?

@liewegas
Copy link
Member

Yeah, let's give it some time in master to make sure there isn't fallout before backporting

@branch-predictor branch-predictor deleted the bp-optracker-cleanup branch May 27, 2019 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants