New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
octopus: mgr: update mon metadata when monmap is updated #39219
Conversation
a2cfb01
to
11ab7c5
Compare
|
jenkins test api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
teuthology passed: https://trello.com/c/sbaQayOI/1161-wip-yuri-testing-2021-03-01-1125-octopus
failures not related
|
|
jenkins test api |
|
@epuertat FYI |
|
|
jenkins test api |
2 similar comments
|
jenkins test api |
|
jenkins test api |
|
jenkins test this please |
|
Mmm, it's failing with the same error as before: In fact it reported 3 times the same issue [1], [2], [3]. This doesn't like like a 'flapping' test but a real change (I don't yet mean bug, but at least the test is behaving in a different way between master and octopus). Main difference regarding the test itself is that in octopus it uses the That test consumes Additionally I see in the Jenkins console log the following trace, but it appears in successful runs too: |
|
Just pushed a commit to increase verbosity in api tests output |
|
jenkins retest this please |
646e4f5
to
0cdf54b
Compare
|
jenkins test api |
37f2823
to
1c71ddb
Compare
1c71ddb
to
ed6add5
Compare
|
|
The {
"0":{
"hostname":"braggi16",
"": "...",
"rotational":"0"
},
"1":{
"hostname":"braggi16",
"": "...",
"rotational":"0"
},
"2":{
"hostname":"braggi16",
"": "...",
"rotational":"0"
},
"3":{
"hostname":"braggi16",
"": "...",
"rotational":"0"
},
"a":{
"hostname":"braggi16",
"addrs":"[v2:172.21.2.16:40564/0,v1:172.21.2.16:40565/0]",
"": "...",
"os":"Linux"
},
"b":{
"hostname":"braggi16",
"addrs":"[v2:172.21.2.16:40566/0,v1:172.21.2.16:40567/0]",
"os":"Linux"
},
"c":{
"hostname":"braggi16",
"addrs":"[v2:172.21.2.16:40568/0,v1:172.21.2.16:40569/0]",
"": "...",
"os":"Linux"
}
} |
ed6add5
to
11ab7c5
Compare
should be fixed by #39937 |
|
Kefu, I'll cherry-pick #39937 tomorrow |
|
@k0ste add |
eda3796
to
aa7c5ee
Compare
there is chance that some monitor(s) is updated / upgraded in a single monmap update without being removed from cluster state's metata first, so, without this change, we will not update the metadata associated with that monitor, hence the mgr modules which consumes the metadata is not updated accordingly and keep reporting the stale information. in this change, we always update the metadata associated with all monitor included by the latest monmap. multiple "mon metadata" commands are sent to monitor for retrieving their updated metadata, instead of sending a single one, so that we can reuse "MetadataUpdate" to update the metadata of a given daemon. as the number of monitors in a typical cluster is relatively small, and the frequency of monmap update is low, so this overhead should be fine. unlike other places where we ask mon for metadata in Mgr class, the code sending the mon command for updated monitor metata is located outside of `cluster_state.with_monmap()` block, the reason is that `with_monmap()` is guraded by the monc_lock under the hood, while `start_mon_command()` also need to acquire the monc_lock, which is not a recursive lock. so we have to do this out of the `with_monmap()` block. Fixes: https://tracker.ceph.com/issues/48905 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit c037f4c) backport: - path: src/mgr/Mgr.cc comment: octopus don't declared `fmt`
aa7c5ee
to
a487353
Compare
this change addresses a regression introduced by c037f4c also remove the "P" before the json command. see also: https://tracker.ceph.com/issues/48905 Fixes: https://tracker.ceph.com/issues/49661 Signed-off-by: Kefu Chai <kchai@redhat.com> (cherry picked from commit 8fc290b)
a487353
to
db97ea4
Compare
|
|
backport trackers:
backport of
parent trackers:
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh