New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon/OSDMonitor: do not propose on error in prepare_update #50502
Conversation
f048021
to
863a45d
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
jenkins test make check |
src/mon/OSDMonitor.cc
Outdated
case -EAGAIN: | ||
wait_for_finished_proposal(op, new C_RetryMessage(this, op)); | ||
return true; | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ljflores @rzarzynski The reason for the failure 7311772 in
is because of this line. The method OSDMonitor::prepare_pool_crush_rule
actually changes the crush rules and signals -EAGAIN. It's expected that ::prepare_command_impl
will return true;
here to trigger a PAXOS proposal. And then, retry the pool creation.
This is really strange to me. Is it actually necessary to split this pool creation into two OSDMap epochs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023-06-22T15:48:31.674+0000 7f79267f3700 1 -- v1:172.21.15.182:6789/0 <== client.4347 v1:172.21.15.182:0/3391191032 8 ==== mon_command({"prefix": "osd pool create", "pool": "unique_pool_0", "pg_num": 1, "pgp_num": 1, "pool_type": "erasure", "erasure_code_profile": "backfill_toofull"} v 0) v1 ==== 191+0+0 (unknown 3185792114 0 0) 0x558ec2cf4900 con 0x558ec2f0cc00
2023-06-22T15:48:31.674+0000 7f79267f3700 20 mon.a@0(leader) e1 _ms_dispatch existing session 0x558ec30a2b40 for client.?
2023-06-22T15:48:31.674+0000 7f79267f3700 20 mon.a@0(leader) e1 entity_name client.admin global_id 4347 (new_ok) caps allow *
2023-06-22T15:48:31.674+0000 7f79267f3700 0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "osd pool create", "pool": "unique_pool_0", "pg_num": 1, "pgp_num": 1, "pool_type": "erasure", "erasure_code_profile": "backfill_toofull"} v 0) v1
2023-06-22T15:48:31.674+0000 7f79267f3700 20 MonCap is_capable service=osd command=osd pool create read write addr v1:172.21.15.182:0/3391191032 on cap allow *
2023-06-22T15:48:31.674+0000 7f79267f3700 20 MonCap allow so far , doing grant allow *
2023-06-22T15:48:31.674+0000 7f79267f3700 20 MonCap allow all
2023-06-22T15:48:31.674+0000 7f79267f3700 10 mon.a@0(leader) e1 _allowed_command capable
2023-06-22T15:48:31.674+0000 7f79267f3700 0 log_channel(audit) log [INF] : from='client.? v1:172.21.15.182:0/3391191032' entity='client.admin' cmd=[{"prefix": "osd pool create", "pool": "unique_pool_0", "pg_num": 1, "pgp_num": 1, "pool_type": "erasure", "erasure_code_profile": "backfill_toofull"}]: dispatch
2023-06-22T15:48:31.674+0000 7f79267f3700 1 -- v1:172.21.15.182:6789/0 --> v1:172.21.15.182:6789/0 -- log(1 entries from seq 78 at 2023-06-22T15:48:31.675841+0000) v1 -- 0x558ec3060a80 con 0x558ec1bd1800
2023-06-22T15:48:31.674+0000 7f79267f3700 10 mon.a@0(leader).paxosservice(osdmap 1..13) dispatch 0x558ec2cf4900 mon_command({"prefix": "osd pool create", "pool": "unique_pool_0", "pg_num": 1, "pgp_num": 1, "pool_type": "erasure", "erasure_code_profile": "backfill_toofull"} v 0) v1 from client.4347 v1:172.21.15.182:0/3391191032 con 0x558ec2f0cc00
2023-06-22T15:48:31.674+0000 7f79267f3700 5 mon.a@0(leader).paxos(paxos active c 1..51) is_readable = 1 - now=2023-06-22T15:48:31.675871+0000 lease_expire=2023-06-22T15:48:36.349426+0000 has v0 lc 51
2023-06-22T15:48:31.674+0000 7f79267f3700 10 mon.a@0(leader).osd e13 preprocess_query mon_command({"prefix": "osd pool create", "pool": "unique_pool_0", "pg_num": 1, "pgp_num": 1, "pool_type": "erasure", "erasure_code_profile": "backfill_toofull"} v 0) v1 from client.4347 v1:172.21.15.182:0/3391191032
2023-06-22T15:48:31.674+0000 7f79267f3700 7 mon.a@0(leader).osd e13 prepare_update mon_command({"prefix": "osd pool create", "pool": "unique_pool_0", "pg_num": 1, "pgp_num": 1, "pool_type": "erasure", "erasure_code_profile": "backfill_toofull"} v 0) v1 from client.4347 v1:172.21.15.182:0/3391191032
2023-06-22T15:48:31.674+0000 7f79267f3700 1 mon.a@0(leader).osd e13 implicitly use rule named after the pool: unique_pool_0
2023-06-22T15:48:31.674+0000 7f79267f3700 10 mon.a@0(leader).osd e13 prepare_pool_crush_rule returns -11
from the mon log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look. It doesn't look like there's a good reason except some code would need adjusted to look at the pending incremental OSDMap for the erasure code profiles.
For the purposes of this PR I will undo this change but add a comment protesting the current-state of the code :)
@rzarzynski requesting your review if you have a moment. |
Review-in-progress. |
jenkins test api |
Apologies, this comment was confusing. I had finished that batch, but forgot this one had been dropped. It needs to be added to a new batch. |
jenkins test api |
Fixes: https://tracker.ceph.com/issues/58972 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Fixes: https://tracker.ceph.com/issues/58972
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows