Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

messages/MForward: fix encoding features #11180

Merged
merged 1 commit into from Sep 28, 2016

Conversation

liewegas
Copy link
Member

We were encoding the message with the sending client's
features, which makes no sense: we need to encode with
the recipient's features so that it can decode the
message.

The simplest way to fix this is to rip out the bizarre
msg_bl handling code and simply keep a decoded Message
reference, and encode it when we send.

Fixes: http://tracker.ceph.com/issues/17365
Signed-off-by: Sage Weil sage@redhat.com

@liewegas
Copy link
Member Author

@gregsfortytwo @jecluis Do you mind looking at this? I'm a little surprised this hasn't bitten us before. Am I missing something?

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm pretty sure this is (partly) deliberate. We can encode the MForward itself with whatever features we want, but the forwarded message needs to be encoded the way it was originally by the client — otherwise the receiving monitor might not know when it should and should not fill in fields in specific ways.
Similarly, on reply the leader needs to encode the message for the client, not for the forwarding monitor.

We've definitely had bugs around encoding here before, so I wouldn't toss out the entire thing without searching git logs and the tracker for how it happened. Did you see an actual bug or just notice it on inspection?

@gregsfortytwo
Copy link
Member

Oh, and indeed that's how the existing implementation seems to work. I guess if both the client and the peon monitor have newer features+code than the leader, we could get in trouble. So we might want to do a logical AND of the client's features and the quorum features for encoding, but that's the only issue I see with current implementation.

@liewegas
Copy link
Member Author

liewegas commented Sep 21, 2016 via email

@liewegas
Copy link
Member Author

Yeah, the scenario you describe is exactly what's going on (leader mon is older than peon and client). So the question is, should we AND the features or just use the target? I suppose there's no harm in ANDing them, and it's possible there's some subtle dependency. I'll change that.

Unrelated weirdness, though: in this particular case the message came from loopback, and for those virtual Connections we set features to the feature mask passed into the messenger ctor... which is 0 in ceph_mon.cc (and most other places). The defautl arg is 0, and everyone passes 0, except one caller that passes CEPH_FEATURES_ALL. And the value is only used by the messenger for this loopback connection. I'm going to do a separate PR to just rip that all out...

@liewegas liewegas force-pushed the wip-mon-forward branch 2 times, most recently from 9464b0f to 77d5730 Compare September 26, 2016 14:10
@liewegas
Copy link
Member Author

@gregsfortytwo It's doing the & now on the feature set. And passed the jewel-x test that it was previously failing due to the encoding disparity (http://pulpito.ceph.com/yuriw-2016-09-27_21:00:16-upgrade:jewel-x-wip-mon-forward_2016_09_27-distro-basic-smithi/)

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual code changes look good to me; let's just fix the comment.

public:
void encode_payload(uint64_t features) {
::encode(tid, payload);
::encode(client, payload, features);
::encode(client_caps, payload, features);
payload.append(msg_bl);
// encode client message with intersection of target and source
// features. this probably doesn't matter, but might.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a helpful comment. Say when it does matter rather than that it probably doesn't.

We were encoding the message with the sending client's
features, which makes no sense: we need to encode with
the recipient's features so that it can decode the
message.

The simplest way to fix this is to rip out the bizarre
msg_bl handling code and simply keep a decoded Message
reference, and encode it when we send.

We encode the encapsulated message with the intersection
of the target mon's features and the sending client's
features.  This probably doesn't matter, but it's
conceivable that there is some feature-dependent
behavior in the message encode/decode that is important.

Fixes: http://tracker.ceph.com/issues/17365
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

@gregsfortytwo updated

@gregsfortytwo gregsfortytwo merged commit b9f15e5 into ceph:master Sep 28, 2016
@liewegas liewegas deleted the wip-mon-forward branch September 28, 2016 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants