Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osdc/Objecter: take budgets across a LingerOp instead of on child Ops #20519

Merged
merged 2 commits into from Mar 6, 2018

Conversation

Projects
None yet
4 participants
@gregsfortytwo
Copy link
Member

commented Feb 21, 2018

No description provided.

osdc: pass OSDOp vector instead of Op* to calc_op_budget
We don't need any other portion of the struct and it makes "faking"
things easier for pre-calculating budget requirements.

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@dillaman
Copy link
Contributor

left a comment

Since this changes the behavior to have the linger ops always pre-budgeted, that would prevent more than 1024 watches per client (default settings). For the case of rbd-mirror, each image being mirrrored established a watch so the budget will quickly be taken.

It also appears that nothing stops more than one concurrent linger op message (for a given linger op) from being in-flight. That's an existing issue but now the budget for linger ops is even less accurate if we hit a slow down where multiple pings for a single linger op are waiting for a response from the OSD.

Would it make sense to just stop budgeting linger ops?

{
int op_budget = 0;
for (vector<OSDOp>::iterator i = op->ops.begin();
i != op->ops.end();
for (vector<OSDOp>::const_iterator i = ops.begin();

This comment has been minimized.

Copy link
@dillaman

dillaman Feb 22, 2018

Contributor

Nit: since you're changing this line, you could use for (auto it = ops.begin(); ... or for (auto& op : ops) syntax

@liewegas liewegas changed the title Objecter: take budgets across a LingerOp instead of on child Ops osdc/Objecter: take budgets across a LingerOp instead of on child Ops Feb 22, 2018

@liewegas

This comment has been minimized.

Copy link
Member

commented Feb 22, 2018

I wouldn't mind dropping the budget for linger ops. Since they are persistent, it doesn't really make sense to block when trying to do a linger op since we'll just deadlock.

Alternatively, we could add a separate cap on linger ops, and make the linger call return -EAGAIN or similar. I'm not sure it's really necessary, though... I think the only purpose would be to catch a buggy client that forgot to close out old lingers.

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Feb 22, 2018

Ah, yep, I didn’t consider the mirror daemon. Given that Lingers arbitrarily create (fairly small) messages, I’d rather we just auto-set a different value for the mirror daemon.

How do you see multiple ops being in-flight per Linger? Off the top of my head that should only happen if an OSD connection dies and we have to retransmit (in which case the old one gets cleaned up quickly).

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Feb 22, 2018

@liewegas, note that we quite carefully won’t deadlock on grabbing the budget here because we aren’t holding any locks when doing so — the calling thread just goes to sleep and eventually other IO will drain out (unless they hit the cap, yes — could check the number of Lingers and do some kind of bail if they’ve filled it up I guess, as you sort of suggest with the separate cap).

@dillaman

This comment has been minimized.

Copy link
Contributor

commented Feb 22, 2018

Ah, yep, I didn’t consider the mirror daemon. Given that Lingers arbitrarily create (fairly small) messages, I’d rather we just auto-set a different value for the mirror daemon.

Not sure how we would compute the limit for rbd-mirror daemon. It doesn't appear to be dynamically configurable at runtime, so it would just need to override the the value at startup to some "maximum number of possible mirrored images" + <current config setting>?

How do you see multiple ops being in-flight per Linger? Off the top of my head that should only happen if an OSD connection dies and we have to retransmit (in which case the old one gets cleaned up quickly).

It's something I've seen when attached to a client process w/ a slow / blocked OSD. The network connection is still up but the linger op timer will just keep sending pings every X seconds regardless if it is still waiting on a ping it's already sent.

@liewegas

This comment has been minimized.

Copy link
Member

commented Feb 22, 2018

Ah, the ping thing seems like a bug we should fix.

As for the cap... I would still lean toward (1) no linger cap or (2) a separate linger cap. The situation I'm imagining is rbd-mirror with a large pool (1000s of images) but may no io.. it will simply block/break on startup since the lingers don't drain. (1) is simpler, requires no tuning, and I see a pretty low value for a cap, but there are certainly potential bugs that (2) would catch. I don't see the cap being useful for a user, though.

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Feb 22, 2018

I've pushed a commit that stops actually getting or dropping throttle (although it still pretends so we can go through normal paths).

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Feb 22, 2018

I've looked a bit at the Linger pings as well and it looks like things should work okay if we pause pings when we're not getting replies. But I'm not sure what the right strategy would be here. Should we just only send a new ping when we get a reply on our previous one? (That's easy and has the wonderful advantage of no longer having to iterate through lingers during tick.) Or should we do some kind of exponential backoff?

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Feb 22, 2018

I also didn't dig all the way into the OSD watch timeouts, but I think they only go away if the client session does — part of the purpose of these pings is to keep those alive so that may be a maximum bound on how infrequently we can send pings.

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Feb 22, 2018

@dillaman, hoping we can merge this fix and handle the pings as a separate bug/PR, though! :)

@dillaman

This comment has been minimized.

Copy link
Contributor

commented Feb 22, 2018

@gregsfortytwo that plan is fine w/ me (I was mostly concerned about the deadlock)

@tchaikov tchaikov self-requested a review Feb 28, 2018

@dillaman

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2018

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Mar 3, 2018

@dillaman , @tchaikov, can I get an actual review statement to merge with? :)

@dillaman
Copy link
Contributor

left a comment

lgtm -- didn't know if you wanted to squash the last two commits since they cancel each other out.

osdc: create and use take_linger_budget for prefetching watch/notify …
…budgets

We don't actually take any budget, as we don't want to hit the
total op limits (especially on eg rbd-mirror), but this prevents
our per-message budgeting from causing a deadlock.

Fixes: http://tracker.ceph.com/issues/22882

Signed-off-by: Greg Farnum <gfarnum@redhat.com>

@gregsfortytwo gregsfortytwo force-pushed the gregsfortytwo:wip-22882-linger-locking branch from cd717d7 to aba55b5 Mar 6, 2018

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

commented Mar 6, 2018

Yeah, I've squashed it down.

@gregsfortytwo gregsfortytwo merged commit 017c846 into ceph:master Mar 6, 2018

1 of 5 checks passed

Docs: build check Docs: building
Details
Unmodified Submodules checking if PR has modified submodules
Details
make check running make check
Details
make check (arm64) running make check
Details
Signed-off-by all commits in this PR are signed
Details

@gregsfortytwo gregsfortytwo deleted the gregsfortytwo:wip-22882-linger-locking branch Mar 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.