Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: check if a mds rank is up before fetching connection addr #41875

Merged
merged 1 commit into from
Jun 23, 2021

Conversation

vshankar
Copy link
Contributor

Client segfaults when trying to infer which mds rank a connection
reset call is coming from. It does this by iterating mds_sessions
and checking the mds addr (in mdsmap) to the Connection *.

However, cases where the mds is blocklisted, the client receives
an updated mdsmap in which the corresponding mds is not in up
set thereby resulting in a segfault when calling mdsmap->get_addrs
since it expects that the mds should be in up state.

Note that this leaves the Connection * as it is and does not clean
it up. That needs to fixed separately probably by maintaining a map
of Connection * to mds rank for lookup.

Fixes: http://tracker.ceph.com/issues/50530
Signed-off-by: Venky Shankar vshankar@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

Client segfaults when trying to infer which mds rank a connection
reset call is coming from. It does this by iterating `mds_sessions`
and checking the mds addr (in mdsmap) to the `Connection *`.

However, cases where the mds is blocklisted, the client receives
an updated mdsmap in which the corresponding mds is not in `up`
set thereby resulting in a segfault when calling `mdsmap->get_addrs`
since it expects that the mds should be in `up` state.

Note that this leaves the `Connection *` as it is and does not clean
it up. That needs to fixed separately probably by maintaining a map
of `Connection *` to mds rank for lookup.

Fixes: http://tracker.ceph.com/issues/50530
Signed-off-by: Venky Shankar <vshankar@redhat.com>
@vshankar vshankar added the cephfs Ceph File System label Jun 16, 2021
@vshankar vshankar requested a review from a team June 16, 2021 05:05
Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Comment on lines 14972 to 14976
for (auto &p : mds_sessions) {
if (mdsmap->get_addrs(p.first) == con->get_peer_addrs()) {
if (mdsmap->have_inst(p.first) && mdsmap->get_addrs(p.first) == con->get_peer_addrs()) {
mds = p.first;
s = &p.second;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is old code and you are just adding a check, but it'd be good if the first and second references were converted to structured binding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably just need to move mds and s to the for loop initializer so that the have_inst() and get_addrs() calls look more sensible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. But the whole client is littered with such .firs/.second semantics. I don't mind converting this to use structured binding, but we will still have the problem at large.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I know.
I was just trying to nudge you to do that so that we could get the changes done bit by bit in relevant PRs rather than one huge PR to move the Client to structured bindings :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to leave this as-is for now. But Milind is right it'd be nice to do that but it should go in a cleanup PR.

@vshankar
Copy link
Contributor Author

@batrick we probably want to start maintaining Connection * -> mds_rank_t to lookup a rank when handling remote resets.

@batrick
Copy link
Member

batrick commented Jun 19, 2021

@batrick we probably want to start maintaining Connection * -> mds_rank_t to lookup a rank when handling remote resets.

Yes, I think that makes sense.

@batrick
Copy link
Member

batrick commented Jun 23, 2021

@batrick batrick merged commit c6e137b into ceph:master Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants