New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd, messages/MOSDPing: bunch of fixes related to ping inflation #15727

Merged
merged 3 commits into from Jun 21, 2017

Conversation

Projects
None yet
5 participants
@branch-predictor
Member

branch-predictor commented Jun 16, 2017

For users who don't care about jumbo frames, or are certain that their setup is correct, the new diagnostic feature (#15535) will provide only unnecessary load. Due to a bug, users can't really disable it and this PR fixes this. Because both "osd heartbeat interval" and "osd heartbeat min size" aren't marked as observed, this causes confusion for users who try to change them at runtime (new options values are used, but the message makes this uncertain). Finally, initial code for heartbeat inflater was slightly optimized to not move dummy data as often.

Signed-off-by: Piotr Dałek piotr.dalek@corp.ovh.com

@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 16, 2017

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 16, 2017

@branch-predictor can you rebase on master? i suspect hte bug was fixed by #15714

@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 16, 2017

@liewegas nope, size_t is also unsigned. The only difference #15714 gives is that the memory usage is low enough to actually make it try to push almost 4GB of data over the wire.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 16, 2017

right. can you rebase your fix on master?

OSD: mark two heartbeat config opts as observed
"osd heartbeat min size" and "osd heartbeat interval" can be changed
at runtime, because their values, when used, are always taken from
global Ceph configuration. Mark them as observed, so the message
the user sees once they're changed doesn't confuse them.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 16, 2017

On it.

::encode(op, payload);
::encode(peer_stat, payload);

This comment has been minimized.

@gregsfortytwo

gregsfortytwo Jun 16, 2017

Member

You can't reorder these without making it a backwards-incompatible change ( COMPAT_VERSION).

This comment has been minimized.

@branch-predictor

branch-predictor Jun 16, 2017

Member

Right, I forgot we have that. I'll just drop this commit.

@ceph-jenkins

This comment has been minimized.

Collaborator

ceph-jenkins commented Jun 16, 2017

make check succeeded

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 16, 2017

@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 16, 2017

@gregsfortytwo @liewegas does 9fcf0df look reasonable?

@liewegas

looks good to me!

@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 16, 2017

Cool, unless there are any other comments/suggestions/etc., it's ready for QA.

@liewegas liewegas added the needs-qa label Jun 16, 2017

@gregsfortytwo

Not quite right.

In addition to the specific code thing, people want to backport this. So I'd actually prefer we just don't reorder stuff, or that we do it as another version increment. So HEADER_VERSION 3 could be adding the buffer pointer to increase the size, and keep a COMPAT_VERSION 2.
And then HEADER_VERSION 4 could remove the dummy_epoch and dummy_stat and change COMPAT_VERSION to 4. Is that reasonable?

// with luminous, we drop peer_as_of_epoch and peer_stat, and we add extra
// payload to artificially inflate the ping packet size.
if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {

This comment has been minimized.

@gregsfortytwo

gregsfortytwo Jun 16, 2017

Member

This still isn't quite right. You're triggering which encoding to use based on SERVER_LUMINOUS, which is good.
But we still need to set the COMPAT_VERSION to 3 when we remove elements from the middle.

This comment has been minimized.

@branch-predictor

branch-predictor Jun 16, 2017

Member

Fixing that as well.

payload.append(buffer::create_static(s, zeros));
}
} else {
::encode((uint32_t)s, payload);

This comment has been minimized.

@gregsfortytwo

gregsfortytwo Jun 16, 2017

Member

Shouldn't you be encoding s unconditionally? I don't see it getting written at all in the case where we actually append a big buffer, which is when we need it!

This comment has been minimized.

@branch-predictor

branch-predictor Jun 16, 2017

Member

Good catch, fixing. It didn't crash on me probably because of protocol robustness...

@gregsfortytwo gregsfortytwo removed the needs-qa label Jun 16, 2017

branch-predictor added some commits Jun 16, 2017

messages/MOSDPing: fix the inflation amount calculation
If user specifies a min_message_size small enough (or zero to disable
it altogether), OSDs will crash and burn while trying to allocate
almost 4GB of payload (both min_message_size and payload.length() are
unsigned, so it'll roll over back to 4GB and MAX(4GB, 0) will use 4GB).
If the size of dummy payload is 0, don't bother constructing bufferptr
and bufferlist, then encoding that.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
messages/MOSDPing: optimize encode and decode of dummy payload
The dummy payload doesn't need to be processed, we can just skip over
it when decoding and we can use a single bufferptr instead of entire
bufferlist to encode it.

Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 16, 2017

@gregsfortytwo That makes sense, even if it's bit complicated.
@liewegas do you know how many ping packets per second are processed on Cern's cluster(s)? Maybe saving these 12 bytes isn't really worth the code obfuscation.

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Jun 16, 2017

I was thinking about this when setting up the minimum size stuff. Basically, we're pinging all our peers every 5 seconds. So if we've got 100 PGs/OSD (more than most?), that's 20 every second. 240 bytes, or with the minimum size some 40000 bytes. (<40KB, out of 125 or 1250 MB).
Which just didn't seem worth fussing over to me. shrug

That said, I don't think the extra versioning actually changes much, just the values which get put in the if-block you already set up. (We can append extra data and older nodes will just ignore it, remember.)

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 16, 2017

@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 17, 2017

I'll move the field-dropping commit into another PR, and leave the rest here, at least backporters won't accidently break things in pre-luminous releases.

@branch-predictor

This comment has been minimized.

Member

branch-predictor commented Jun 19, 2017

@liewegas liewegas added the needs-qa label Jun 20, 2017

payload.append(buffer::create_static(sizeof(zeros), zeros));
s -= sizeof(zeros);
}
if (s) {

This comment has been minimized.

@gregsfortytwo

gregsfortytwo Jun 20, 2017

Member

This is already inside in if (s) block?

This comment has been minimized.

@branch-predictor

branch-predictor Jun 21, 2017

Member

It's not really redundant. First "if" (line 108) checks whether there's extra payload to be added, "while" loop (#114) decreases value of "s" by sizeof(zeros), leaving possibility for "s" to be nonzero when exiting a loop and second "if" (line 118) append remaining - less than or equal to sizeof(zeros) - payload.

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Jun 20, 2017

LGTM except for that redundant if-block.

@gregsfortytwo gregsfortytwo merged commit 0e51ab1 into ceph:master Jun 21, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
arm64 make check arm64 make check succeeded
Details
make check make check succeeded
Details

@branch-predictor branch-predictor referenced this pull request Jun 22, 2017

Merged

messages/MOSDPing.h: drop unused fields #15843

1 of 2 tasks complete

@branch-predictor branch-predictor deleted the ovh:bp-runtime-cfg-of-hb-inflator branch Jan 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment