Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: preventing blank hostname in DaemonState #17138

Merged

Conversation

Projects
None yet
4 participants
@liuchang0812
Copy link
Contributor

commented Aug 22, 2017

see

daemon_meta.erase("hostname");
,
we do not update daemon_meta's hostname even the existing DaemonState's hostname is "". MMgrReport
will insert a DaemonState which contains blank hostname.
see
// FIXME: we should avoid this case by rejecting MMgrReport from

This commit also simple the logical of MetadataUpdate.

Fixes: http://tracker.ceph.com/issues/20887
Fixes: http://tracker.ceph.com/issues/21060

Signed-off-by: liuchang0812 liuchang0812@gmail.com
Signed-off-by: Yanhu Cao gmayyyha@gmail.com

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2017

@jcsp @tchaikov @liewegas would you mind taking a look, thanks

// daemons without sessions, and ensuring that session open
// always contains metadata.
// we don't know the hostname at this stage, reject MMgrReport here.
dout(4) << "rejecting report from non-daemon client " << m->daemon_name

This comment has been minimized.

Copy link
@jcsp

jcsp Aug 22, 2017

Contributor

This message isn't quite right: the client could be a daemon, it just doesn't have anything in DaemonState because we have not read its metadata yet. We already ignore clients in the get_peer_type check above.

}


DaemonStatePtr state;

This comment has been minimized.

Copy link
@jcsp

jcsp Aug 22, 2017

Contributor

Because DaemonState contains things like a short history of perf counters, it is not desirable to create a new instance whenever the metadata updates. We should be checking for an existing instance and update the metadata on that, like the existing code (sort of) did

This comment has been minimized.

Copy link
@liuchang0812

liuchang0812 Aug 23, 2017

Author Contributor

thank you. @jcsp

but, I saw that we clear all metadatas when we update an existing instance. is it a bug?

This comment has been minimized.

Copy link
@liuchang0812

liuchang0812 Aug 24, 2017

Author Contributor

ping

This comment has been minimized.

Copy link
@jcsp

jcsp Aug 28, 2017

Contributor

We clear the metadata (map of strings) before re-populating it from the update, but not the whole DaemonState (everything else, including perf counters) object

This comment has been minimized.

Copy link
@liuchang0812

liuchang0812 Aug 28, 2017

Author Contributor

thank you very much.

// we don't know the hostname at this stage, reject MMgrReport here.
dout(4) << "rejecting report from non-daemon client " << m->daemon_name
<< dendl;
m->put();

This comment has been minimized.

Copy link
@jcsp

jcsp Aug 22, 2017

Contributor

I don't think it's safe to drop this message -- the MgrClient sessions are stateful, and the client assumes that the server has seen all previous perf counter types it declared (MMgrReport::declare_types).

We need to either delay the message instead of dropping it, or we need to kill the session at this point (hoping that by the time the client re-opens the session, we have created a DaemonState in MetadataUpdate)

This comment has been minimized.

Copy link
@liuchang0812

liuchang0812 Aug 23, 2017

Author Contributor
we need to kill the session at this point

could we do it as following:

diff --git a/src/mgr/DaemonServer.cc b/src/mgr/DaemonServer.cc
index a867754..8502f72 100644
--- a/src/mgr/DaemonServer.cc
+++ b/src/mgr/DaemonServer.cc
@@ -391,10 +391,17 @@ bool DaemonServer::handle_report(MMgrReport *m)
     daemon = daemon_state.get(key);
   } else {
     // we don't know the hostname at this stage, reject MMgrReport here.
-    dout(4) << "rejecting report from non-daemon client " << m->daemon_name
+    dout(1) << "rejecting report from " << k ", since we do not have its metadata now."
            << dendl;
-    m->put();
-    return true;
+    // kill session
+    MgrSessionRef session(static_cast<MgrSession*>(m->get_connection()->get_priv()));
+    if (!session) {
+      return false;
+    }
+    m->get_connection()->mark_down();
+    session->put();
+
+    return false;
   }

@liuchang0812 liuchang0812 force-pushed the liuchang0812:wip-updata-osd-hostname-in-metaupdater branch from b25df8b to b84adf6 Aug 24, 2017

@liuchang0812 liuchang0812 changed the title mgr: update DaemonState's hostname in MetadataUpdate context mgr: preventing blank hostname in DaemonState Aug 24, 2017

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2017

jenkins test this please

@jcsp jcsp added bluestore bug fix mgr and removed bluestore labels Aug 24, 2017

@liuchang0812 liuchang0812 force-pushed the liuchang0812:wip-updata-osd-hostname-in-metaupdater branch from b84adf6 to 893e45f Aug 25, 2017

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Aug 25, 2017

always failed on ceph-disk's reuse osd id test.I tried to solve it, did not understand why Ceph only puts all pg on OSD.0, not OSD.13. any thoughts? very thanks

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2017

jenkins test this please

1 similar comment
@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2017

jenkins test this please

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2017

always failed on ceph-disk's reuse osd id test.I tried to solve it, did not understand why Ceph only puts all pg on OSD.0, not OSD.13. any thoughts? very thanks

I got the reason finally 😭 , It's a bug in our void Mgr::handle_osd_map() function, we check [osd.0, osd.[osd_map.get_num_osds()]), so we will ignore osd.13's event and not update metadata. we should check [osd.0 ~ osd.[osd_map.get_max_osd()].

    for (unsigned int osd_id = 0; osd_id < osd_map.get_num_osds(); ++osd_id) {
      if (!osd_map.exists(osd_id)) {
        continue;
      }
mgr: kill MgrSession when MMgrReport come from daemon without metadat…
…a info

Signed-off-by: liuchang0812 <liuchang0812@gmail.com>

@liuchang0812 liuchang0812 force-pushed the liuchang0812:wip-updata-osd-hostname-in-metaupdater branch from 893e45f to 2908013 Sep 12, 2017

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2017

@jcsp ping. would you mind reviewing this patch again when you have a few of minutes? thank you

@@ -410,7 +410,7 @@ void Mgr::handle_osd_map()
* reload the metadata.
*/
objecter->with_osdmap([this, &names_exist](const OSDMap &osd_map) {
for (unsigned int osd_id = 0; osd_id < osd_map.get_num_osds(); ++osd_id) {
for (unsigned int osd_id = 0; osd_id < osd_map.get_max_osd(); ++osd_id) {

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 13, 2017

Contributor

nice catch!

@jcsp

jcsp approved these changes Sep 13, 2017

@liewegas liewegas added the needs-qa label Sep 13, 2017

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2017

@tchaikov it needs CI test

@tchaikov

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2017

agreed, that's what "needs-qa" is for.

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2017

agreed, that's what "needs-qa" is for.

it seems that we should put a "wip-xxx-testing" label to trigger a CI-testing?

@tchaikov

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2017

there is no such a "trigger". it's completely up to the developer to add this label or not.

@tchaikov tchaikov merged commit b49dd81 into ceph:master Sep 20, 2017

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@liuchang0812 liuchang0812 deleted the liuchang0812:wip-updata-osd-hostname-in-metaupdater branch Sep 20, 2017

@liuchang0812

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2017

thank you, kefu

@liewegas

This comment has been minimized.

Copy link
Member

commented Oct 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.