Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm/schedule: fix message #41257

Merged
merged 1 commit into from May 13, 2021

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented May 10, 2021

No description provided.

@liewegas liewegas requested a review from a team as a code owner May 10, 2021 13:04
@liewegas liewegas changed the title cephadm: normalize image digest in 'ls' output mgr/cephadm/schedule: fix message May 10, 2021
liewegas referenced this pull request May 10, 2021
Signed-off-by: Sage Weil <sage@newdream.net>
@tchaikov
Copy link
Contributor

cc @huww98

@huww98
Copy link
Contributor

huww98 commented May 10, 2021

Following up 1bf09d1#r50566843 .

Still not making sense to me. If some host does not belong to mon public_network, then why will this happen when upgrading, since I don't change any IP address? And why it is resolved by itself?

@liewegas
Copy link
Member Author

Following up 1bf09d1#r50566843 .

Still not making sense to me. If some host does not belong to mon public_network, then why will this happen when upgrading, since I don't change any IP address? And why it is resolved by itself?

I'm a bit confused by that as well. My best guess is that you'll see the same message again if you restart the mgr daemon or do 'ceph orch apply mon 5' (or whatever the current placement is)

@@ -304,7 +304,7 @@ def get_candidates(self) -> List[DaemonPlacement]:
ls.append(h)
else:
logger.info(
f"Filtered out host {h.hostname}: could not verify host allowed virtual ips")
f"Filtered out host {h.hostname}: host does not belong to mon public_network")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're trying to invent a message here, but only self.filter_new_host can provide a proper message. can we move this this log message to the implementation of self.filter_new_host ?

@liewegas liewegas force-pushed the cephadm-filter-net-message branch from 56b8b88 to ae9bdb8 Compare May 10, 2021 14:11
This is now only used when scheduling mons.  (Units now enable the kernel
features needed instead of checking for them during placement.)  Move the
message to the filter itself.

Signed-off-by: Sage Weil <sage@newdream.net>
@liewegas liewegas force-pushed the cephadm-filter-net-message branch from ae9bdb8 to d5aba1e Compare May 10, 2021 14:12
@huww98
Copy link
Contributor

huww98 commented May 10, 2021

I'm a bit confused by that as well. My best guess is that you'll see the same message again if you restart the mgr daemon or do 'ceph orch apply mon 5' (or whatever the current placement is)

No, the log looks like this:

cephadm 2021-05-10T14:06:21.298582+0000 mgr.gpu024.bapbcz (mgr.6832120) 96 : cephadm [INF] Schedule restart daemon mgr.gpu024.bapbcz
cluster 2021-05-10T14:06:33.512186+0000 mon.gpu023 (mon.0) 27177 : cluster [INF] Active manager daemon gpu024.bapbcz restarted
cluster 2021-05-10T14:06:33.514667+0000 mon.gpu023 (mon.0) 27178 : cluster [INF] Activating manager daemon gpu024.bapbcz
cluster 2021-05-10T14:06:33.627520+0000 mon.gpu023 (mon.0) 27215 : cluster [INF] Manager daemon gpu024.bapbcz is now available
cephadm 2021-05-10T14:06:44.939369+0000 mgr.gpu024.bapbcz (mgr.6832210) 8 : cephadm [INF] Saving service mon spec with placement cpu0;gpu023;gpu024;gpu006;gpu013

The logs when upgrading looks like:

cephadm 2021-05-08T17:03:59.678460+0000 mgr.gpu014.kwbqcf (mgr.6664231) 490838 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.gpu014.kwbqcf)
cephadm 2021-05-08T17:03:59.684683+0000 mgr.gpu014.kwbqcf (mgr.6664231) 490839 : cephadm [INF] Failing over to other MGR
cluster 2021-05-08T17:04:04.752923+0000 mon.cpu0 (mon.0) 192255 : cluster [INF] Active manager daemon gpu024.bapbcz restarted
cluster 2021-05-08T17:04:04.761191+0000 mon.cpu0 (mon.0) 192256 : cluster [INF] Activating manager daemon gpu024.bapbcz
cluster 2021-05-08T17:04:05.508183+0000 mon.cpu0 (mon.0) 192259 : cluster [INF] Manager daemon gpu024.bapbcz is now available
cephadm 2021-05-08T17:04:12.750241+0000 mgr.gpu024.bapbcz (mgr.6727372) 4 : cephadm [INF] Filtered out host cpu0: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:12.750354+0000 mgr.gpu024.bapbcz (mgr.6727372) 5 : cephadm [INF] Filtered out host gpu023: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:12.750443+0000 mgr.gpu024.bapbcz (mgr.6727372) 6 : cephadm [INF] Filtered out host gpu024: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:12.750525+0000 mgr.gpu024.bapbcz (mgr.6727372) 7 : cephadm [INF] Filtered out host gpu006: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:12.750603+0000 mgr.gpu024.bapbcz (mgr.6727372) 8 : cephadm [INF] Filtered out host gpu013: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:12.762441+0000 mgr.gpu024.bapbcz (mgr.6727372) 9 : cephadm [INF] Upgrade: Need to upgrade myself (mgr.gpu024.bapbcz)
cephadm 2021-05-08T17:04:13.272071+0000 mgr.gpu024.bapbcz (mgr.6727372) 10 : cephadm [INF] Upgrade: Pulling cr.scut-smil.cn/infra/ceph:v16.2.3 on gpu014
cephadm 2021-05-08T17:04:36.641321+0000 mgr.gpu024.bapbcz (mgr.6727372) 23 : cephadm [INF] Upgrade: Updating mgr.gpu014.kwbqcf
cephadm 2021-05-08T17:04:36.681668+0000 mgr.gpu024.bapbcz (mgr.6727372) 24 : cephadm [INF] Deploying daemon mgr.gpu014.kwbqcf on gpu014
cluster 2021-05-08T17:04:58.494557+0000 mon.gpu023 (mon.1) 24900 : cluster [INF] mon.gpu023 calling monitor election
cluster 2021-05-08T17:04:58.510456+0000 mon.gpu013 (mon.0) 31261 : cluster [INF] mon.gpu013 calling monitor election
cluster 2021-05-08T17:04:58.511531+0000 mon.gpu006 (mon.3) 147 : cluster [INF] mon.gpu006 calling monitor election
cluster 2021-05-08T17:05:03.533817+0000 mon.gpu013 (mon.0) 31262 : cluster [INF] mon.gpu013 is new leader, mons gpu013,gpu023,gpu006 in quorum (ranks 0,1,3)
cephadm 2021-05-08T17:04:58.431465+0000 mgr.gpu024.bapbcz (mgr.6727372) 36 : cephadm [INF] Filtered out host cpu0: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:58.431548+0000 mgr.gpu024.bapbcz (mgr.6727372) 37 : cephadm [INF] Filtered out host gpu024: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:58.431613+0000 mgr.gpu024.bapbcz (mgr.6727372) 38 : cephadm [INF] Filtered out host gpu006: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:58.431671+0000 mgr.gpu024.bapbcz (mgr.6727372) 39 : cephadm [INF] Filtered out host gpu013: could not verify host allowed virtual ips
cephadm 2021-05-08T17:04:58.441150+0000 mgr.gpu024.bapbcz (mgr.6727372) 40 : cephadm [INF] Safe to remove mon.cpu0: new quorum should be ['gpu013', 'gpu023', 'gpu024', 'gpu006'] (from ['gpu013', 'gpu023', 'gpu024', 'gpu006'])
cephadm 2021-05-08T17:04:58.441290+0000 mgr.gpu024.bapbcz (mgr.6727372) 41 : cephadm [INF] Removing monitor cpu0 from monmap...
cephadm 2021-05-08T17:04:58.446958+0000 mgr.gpu024.bapbcz (mgr.6727372) 42 : cephadm [INF] Removing daemon mon.cpu0 from cpu0
cluster 2021-05-08T17:05:00.468861+0000 mon.gpu024 (mon.2) 36 : cluster [INF] mon.gpu024 calling monitor election
cephadm 2021-05-08T17:05:01.883890+0000 mgr.gpu024.bapbcz (mgr.6727372) 47 : cephadm [INF] Reconfiguring crash.cpu0 (monmap changed)...
cluster 2021-05-08T17:05:05.493338+0000 mon.gpu013 (mon.0) 31373 : cluster [INF] mon.gpu013 calling monitor election
cluster 2021-05-08T17:05:05.497049+0000 mon.gpu023 (mon.1) 24901 : cluster [INF] mon.gpu023 calling monitor election
cluster 2021-05-08T17:05:05.518236+0000 mon.gpu013 (mon.0) 31374 : cluster [INF] mon.gpu013 is new leader, mons gpu013,gpu023,gpu024,gpu006 in quorum (ranks 0,1,2,3)
cluster 2021-05-08T17:05:05.526561+0000 mon.gpu013 (mon.0) 31379 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons down, quorum gpu013,gpu023,gpu006)
cephadm 2021-05-08T17:05:05.643759+0000 mgr.gpu024.bapbcz (mgr.6727372) 50 : cephadm [INF] Reconfiguring daemon crash.cpu0 on cpu0
...
cephadm 2021-05-08T17:07:50.304523+0000 mgr.gpu014.kwbqcf (mgr.6750496) 177 : cephadm [INF] Filtered out host cpu0: could not verify host allowed virtual ips
cephadm 2021-05-08T17:07:50.304893+0000 mgr.gpu014.kwbqcf (mgr.6750496) 178 : cephadm [INF] Filtered out host gpu024: could not verify host allowed virtual ips
cephadm 2021-05-08T17:07:50.304981+0000 mgr.gpu014.kwbqcf (mgr.6750496) 179 : cephadm [INF] Filtered out host gpu006: could not verify host allowed virtual ips
cephadm 2021-05-08T17:07:50.305078+0000 mgr.gpu014.kwbqcf (mgr.6750496) 180 : cephadm [INF] Filtered out host gpu013: could not verify host allowed virtual ips
cephadm 2021-05-08T17:07:50.320808+0000 mgr.gpu014.kwbqcf (mgr.6750496) 181 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
cephadm 2021-05-08T17:08:05.337629+0000 mgr.gpu014.kwbqcf (mgr.6750496) 189 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
cephadm 2021-05-08T17:08:20.352143+0000 mgr.gpu014.kwbqcf (mgr.6750496) 198 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
cephadm 2021-05-08T17:08:35.369326+0000 mgr.gpu014.kwbqcf (mgr.6750496) 206 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
cephadm 2021-05-08T17:08:51.415507+0000 mgr.gpu014.kwbqcf (mgr.6750496) 215 : cephadm [INF] Filtered out host cpu0: could not verify host allowed virtual ips
cephadm 2021-05-08T17:08:51.415706+0000 mgr.gpu014.kwbqcf (mgr.6750496) 216 : cephadm [INF] Filtered out host gpu024: could not verify host allowed virtual ips
cephadm 2021-05-08T17:08:51.415872+0000 mgr.gpu014.kwbqcf (mgr.6750496) 217 : cephadm [INF] Filtered out host gpu006: could not verify host allowed virtual ips
cephadm 2021-05-08T17:08:51.415991+0000 mgr.gpu014.kwbqcf (mgr.6750496) 218 : cephadm [INF] Filtered out host gpu013: could not verify host allowed virtual ips
cephadm 2021-05-08T17:08:51.434103+0000 mgr.gpu014.kwbqcf (mgr.6750496) 219 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
cephadm 2021-05-08T17:09:06.449302+0000 mgr.gpu014.kwbqcf (mgr.6750496) 228 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
cephadm 2021-05-08T17:09:21.466434+0000 mgr.gpu014.kwbqcf (mgr.6750496) 236 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
cephadm 2021-05-08T17:09:36.476534+0000 mgr.gpu014.kwbqcf (mgr.6750496) 245 : cephadm [INF] Upgrade: It is NOT safe to stop mon.gpu023 at this time: not enough monitors would be available (gpu024) after stopping mons [gpu023]
...(repeat many times)
cephadm 2021-05-08T17:18:16.678844+0000 mgr.gpu014.kwbqcf (mgr.6750496) 571 : cephadm [INF] Filtered out host cpu0: could not verify host allowed virtual ips
cephadm 2021-05-08T17:18:16.678988+0000 mgr.gpu014.kwbqcf (mgr.6750496) 572 : cephadm [INF] Filtered out host gpu024: could not verify host allowed virtual ips
cephadm 2021-05-08T17:18:16.679118+0000 mgr.gpu014.kwbqcf (mgr.6750496) 573 : cephadm [INF] Filtered out host gpu013: could not verify host allowed virtual ips
cephadm 2021-05-08T17:18:16.685029+0000 mgr.gpu014.kwbqcf (mgr.6750496) 574 : cephadm [INF] Deploying daemon mon.gpu006 on gpu006
cluster 2021-05-08T17:18:23.970362+0000 mon.gpu024 (mon.1) 188 : cluster [INF] mon.gpu024 calling monitor election
cluster 2021-05-08T17:18:23.984279+0000 mon.gpu023 (mon.0) 26751 : cluster [INF] mon.gpu023 calling monitor election
cluster 2021-05-08T17:18:29.005961+0000 mon.gpu023 (mon.0) 26761 : cluster [INF] mon.gpu023 is new leader, mons gpu023,gpu024 in quorum (ranks 0,1)
cephadm 2021-05-08T17:18:29.228513+0000 mgr.gpu014.kwbqcf (mgr.6750496) 581 : cephadm [INF] Filtered out host cpu0: could not verify host allowed virtual ips
cephadm 2021-05-08T17:18:29.234000+0000 mgr.gpu014.kwbqcf (mgr.6750496) 582 : cephadm [INF] Filtered out host gpu024: could not verify host allowed virtual ips
cephadm 2021-05-08T17:18:29.235023+0000 mgr.gpu014.kwbqcf (mgr.6750496) 583 : cephadm [INF] Filtered out host gpu013: could not verify host allowed virtual ips
cephadm 2021-05-08T17:18:29.257156+0000 mgr.gpu014.kwbqcf (mgr.6750496) 584 : cephadm [INF] Upgrade: It appears safe to stop mon.gpu006
cluster 2021-05-08T17:18:25.964655+0000 mon.gpu006 (mon.2) 1 : cluster [INF] mon.gpu006 calling monitor election
cluster 2021-05-08T17:18:30.988246+0000 mon.gpu006 (mon.2) 2 : cluster [INF] mon.gpu006 calling monitor election
cluster 2021-05-08T17:18:30.989345+0000 mon.gpu024 (mon.1) 189 : cluster [INF] mon.gpu024 calling monitor election
cluster 2021-05-08T17:18:31.002518+0000 mon.gpu023 (mon.0) 26883 : cluster [INF] mon.gpu023 calling monitor election
cluster 2021-05-08T17:18:31.030779+0000 mon.gpu023 (mon.0) 26886 : cluster [INF] mon.gpu023 is new leader, mons gpu023,gpu024,gpu006 in quorum (ranks 0,1,2)
cluster 2021-05-08T17:18:31.039878+0000 mon.gpu023 (mon.0) 26891 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum gpu023,gpu024)
cephadm 2021-05-08T17:18:59.654067+0000 mgr.gpu014.kwbqcf (mgr.6750496) 602 : cephadm [INF] Upgrade: Updating mon.gpu006
cephadm 2021-05-08T17:18:59.670067+0000 mgr.gpu014.kwbqcf (mgr.6750496) 603 : cephadm [INF] Deploying daemon mon.gpu006 on gpu006
cluster 2021-05-08T17:19:04.118763+0000 mon.gpu006 (mon.2) 1 : cluster [INF] mon.gpu006 calling monitor election
cluster 2021-05-08T17:19:04.121041+0000 mon.gpu024 (mon.1) 190 : cluster [INF] mon.gpu024 calling monitor election
cluster 2021-05-08T17:19:04.128956+0000 mon.gpu023 (mon.0) 27065 : cluster [INF] mon.gpu023 calling monitor election
cluster 2021-05-08T17:19:04.149828+0000 mon.gpu023 (mon.0) 27066 : cluster [INF] mon.gpu023 is new leader, mons gpu023,gpu024,gpu006 in quorum (ranks 0,1,2)
cephadm 2021-05-08T17:19:10.996242+0000 mgr.gpu014.kwbqcf (mgr.6750496) 609 : cephadm [INF] Filtered out host cpu0: could not verify host allowed virtual ips
cephadm 2021-05-08T17:19:10.996423+0000 mgr.gpu014.kwbqcf (mgr.6750496) 610 : cephadm [INF] Filtered out host gpu024: could not verify host allowed virtual ips
cephadm 2021-05-08T17:19:11.001299+0000 mgr.gpu014.kwbqcf (mgr.6750496) 611 : cephadm [INF] Deploying daemon mon.gpu013 on gpu013
...

This message first appears when a new mgr is activated. Then This checking is preventing new mon from being deployed, and hindering the upgrade process of mons. After about 10 minutes, at 17:18:16, host gpu006 is no longer being filtered out, and finally mon can be deployed on it.

I guess when a new manager starts, it does not have enough information. So it just prevents deploying for safe. But what blocks it for 10 minutes?

@liewegas
Copy link
Member Author

Oh! I know what the problem is. 1897d1c changed the way we store the per-host network interface/network info. On upgrade, cephadm thinks there are no networks on each host until the device refresh happens.

@tchaikov
Copy link
Contributor

jenkins, retest this please.

@tchaikov
Copy link
Contributor

[ RUN      ] TestLibRBD.TestPendingAio
using new format!
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/run-rbd-unit-tests.sh: line 20: 1174225 Segmentation fault      RBD_FEATURES=$i unittest_librbd

@tchaikov tchaikov merged commit f773c0b into ceph:master May 13, 2021
@liewegas liewegas deleted the cephadm-filter-net-message branch May 20, 2021 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants