New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: check new crush for unknown name/type #4726
mon: check new crush for unknown name/type #4726
Conversation
|
per @jecluis this pull request only addresses a small portion of the problem, we should do more check on the new crush map. IIUC, which means we should try to map the concepts of clusters and pools to the ones in crush map, to see if it is capable to help us to route the request to the right place. |
|
@tchaikov I think we should try to keep this simple and get concerned only about empty maps. I think those will be the main source of potential user errors, as I can see it being more likely for a user to mistakenly inject an empty map than a modified crush map. Besides, I think we can't assume erroneous behavior when the user purposefully (even if my mistake) removes osds or any buckets from the crushmap and injects it to the cluster. But I'm sure we can assume it was not the user's intention to inject an empty map. My take on this is to check if the new crushmap is empty, and if so deny that to the user. We can add a '--force-empty-crushmap' and allow it if the user specifies said option, but never by default. Does someone else has an opposing view? Maybe @liewegas or @athanatos ? |
agreed. but i think monitor should never crash even because of user's mistake. so i added this vulnerable check at the expense of making things more complicated, in hope that monitor can reject a bad crush map, which could bring down this daemon when handling "ceph osd tree". |
|
Letting the monitor die/crash when the harm is already done is, imo, the only way out of a really really bad situation. I'd rather the monitor dies on 'ceph osd tree' because we have a badly formed crushmap than to cope with it and cause who knows how much damage in the log run. But the problem here is not the monitor crashing on 'ceph osd tree' when the crush is badly formed; the problem is allowing said map in the first place. But in order to prevent a bad crushmap from being injected, we must know what a bad crushmap looks like, and I for one am not entirely sure what such a map looks like besides "not being empty" -- at least of that one I'm pretty sure. Yeah, we should make sure the monitor does not die on a whim, just because we expected something on the crushmap and got a null pointer. Your patch does address a portion of that by checking if the crushmap has the osds the osdmap expects. I think we can even make that a hard requirement: if you want to remove osds from your crushmap, you first have to remove them from the osdmap -- makes sense to me and I don't see a scenario in which we would want to allow otherwise. But I still think this only solves part of the problem. |
yes (reluctantly). if we have no better way to prevent our user from doing so.
i start to understand you. actually we do some dry run with the crush rules using
so we can keep my check against osdmap, right ? will see what i can do to make it a hard requirement. |
fa5d7de
to
d49d712
Compare
@jecluis i realized that i should check for dangling name references in crush map. if an OSD does not exists in the crush map, there is chance that it is just a stray OSD, and is not added to the crush map yet. so this pull request is updated:
[0] this basically checks for an empty crush map. |
d49d712
to
d0f0d3c
Compare
|
I think the ceph-erasure-code-corpus change is not intended. |
| run_osd $dir 0 || return 1 | ||
|
|
||
| local empty_map=$dir/empty_map | ||
| :> $empty_map.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cute ;-)
d0f0d3c
to
5ac55b6
Compare
| local empty_map=$dir/empty_map | ||
| :> $empty_map.txt | ||
| ./crushtool -c $empty_map.txt -o $empty_map.map || return 1 | ||
| ./ceph osd setcrushmap -i $empty_map.map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should remove this line.
5ac55b6
to
c29f00f
Compare
|
@dachary thanks for your review. i updated this pull request to address your comments. but joao's assertion still holds, this change only solves part of the problem. per the latest log from Harish, there is chance that we can fail to find an OSD in odsmap, but the OSD does exist in the crush map. |
|
After a rados run (needs-qa or equivalent). Reviewed-by: Loic Dachary <ldachary@redhat.com> It would be great to have a summary of the discussions in the commit message for the record. |
|
@tchaikov the bot failure is because of the rebase which jenkins git can't handle. |
|
thanks @dachary ! will revise the commit message to help future readers. |
c29f00f
to
c971422
Compare
|
@dachary @jecluis , i added more commits on top of this pull request:
it is possible that |
c406416
to
4fc2dce
Compare
|
After a rados run (needs-qa or equivalent). Reviewed-by: Loic Dachary <ldachary@redhat.com> this series is a nice adventure to read. It could be argued that it needs to be squashed but I like it this way because it explains how the --check option came to be. |
|
@ktdreyer thanks for your review =) the latest commits should address your comments. |
4fc2dce
to
9955b04
Compare
9955b04
to
49e1cdd
Compare
* check for dangling bucket name or type names referenced by the
buckets/items in the crush map.
* also check for the references from Item(0, 0, 0) which does not
necessarily exist in the crush map under testing. the rationale
behind this is: the "ceph osd tree" will also print stray OSDs
whose id is greater or equal to 0. so it would be useful to
check if the crush map offers the type name indexed by "0"
(the name of OSDs is always "OSD.{id}", so we don't need to
look up the name of an OSD item in the crushmap).
Signed-off-by: Kefu Chai <kchai@redhat.com>
* so one is able to verify that the "ceph osd tree" won't chock on the new crush map because of dangling name/type references Signed-off-by: Kefu Chai <kchai@redhat.com>
* the "osd tree dump" command enumerates all buckets/osds found in either the crush map or the osd map. but the newly set crushmap is not validated for the dangling references, so we need to check to see if any item in new crush map is referencing unknown type/name when a new crush map is sent to monitor, reject it if any. Fixes: ceph#11680 Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
add an argument "max_id" for "--check-names" to check if any item has an id greater or equal to given "max_id" in crush map. Signed-off-by: Kefu Chai <kchai@redhat.com>
Fixes: ceph#11680 Signed-off-by: Kefu Chai <kchai@redhat.com>
* because "--check" also checks for the max_id Signed-off-by: Kefu Chai <kchai@redhat.com>
so we don't need to call CrushTester::check_name_maps() in OSDMonitor.cc anymore. Fixes: ceph#11680 Signed-off-by: Kefu Chai <kchai@redhat.com>
49e1cdd
to
c6e6348
Compare
mon: check new crush for unknown name/type
|
@jecluis @dachary tested at http://pulpito.ceph.redhat.com/kchai-2015-06-01_00:05:54-rados-wip-kefu-testing---basic-magna, all of the tests but 1 passed. |
|
there is one job still running but it's rados/basic/{clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/many.yaml tasks/repair_test.yaml} and I don't see how it could be related to this series. Good to merge unless @jecluis wants to wait for his own run of testing to complete. Reviewed-by: Loic Dachary <ldachary@redhat.com> |
|
@tchaikov you can merge whenever you feel like it ;-) |
mon: check new crush for unknown name/type Reviewed-by: Loic Dachary <ldachary@redhat.com>
crush map, reject it if any OSD is absent.
Fixes: #11680
Signed-off-by: Kefu Chai kchai@redhat.com