New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mon: Monitor: validate prefix on handle_command() #9700
Conversation
return; | ||
} | ||
|
||
if (!valid_cmd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving this check just below the cmd_getval() call. No need to check if prefix is empty if the call failed and it will better outline the what comes before what.
Have a few comments on your patch. Further, please replaces all occurrences of 'crush' with 'crash'. While this would otherwise be a minor typo, we don't want to pollute the history with a commit that may unintentionally refer CRUSH. Also, adjust your commit message title. Instead of
please consider
Also, this pull request lacks tests. I recommend implementing a few test cases using an existing test [1] or create a new unit test. [1] - test/mon/test-mon-msg.cc -- please be aware that the test may require some tweaks if you want to wait for a reply message to consider success in case of error. |
thanks @jecluis |
thanks @jecluis @xiaoxichen Now I'm writing test cases: I've find below testcases which may cause exists code to crash. Case 1 If no
Ceph monitor will crash with logs (I've add some outputs)
Case 2 if no
Will raise the same error as Case 3 If set prefix="" in parameter:
Will get error as below:
Case 4 set prefix=" ...(a lot of spaces).."
Will get the error as below:
Case 5 set prefix=";;;,,,;;;;"
Will get the error as below:
Case 6 if set prefix=";x;a;" etc, monitors will not crash, continue to running, clients will get error message. |
@jecluis @xiaoxichen [1] - test/mon/test-mon-msg.cc |
@dachary , seems the check failed with timeout, is that any issue in jenkins slave? |
jenkins, could you test this please (timeout) ? |
Why this could happened? CI running a fixed time, longer that that time, tests will be terminated?
|
I've run Case 1
Case 2
|
Fixes: http://tracker.ceph.com/issues/16297 Signed-off-by: You Ji <youji@ebay.com>
@jecluis this PR passed jenkins bot, would you mind take a review? |
@liewegas , thanks, can we go ahead for backport to hammer+ jewel (infernalis is eol soon?) |
Hammer backport is important. Gentoo just broke the no downtime upgrade path from its latest stable package (Firefly) because they removed Firefly and Hammer packages and stabilised Infernalis at the same time. Current Gentoo users must stop their whole cluster and upgrade to Infernalis or manually install Hammer and not have any supported stable package. |
@jtek hammer is backported. are you asking for infernalis backport? |
I suppose that 0.94.7 doesn't have the fix and this is scheduled for 0.94.8 ? In the meantime Gentoo devs may be confused about which version to use and only marked Infernalis stable (previously the only stable release version on Gentoo was Firefly 0.80.10) : Until this is cleared up Gentoo users upgrading Ceph or installing new servers will have a nasty surprise. Is there a patch available for 0.94.7 I can refer to in the bug above ? |
@xiaoxichen thanks, I would probably have spent quite some time finding it. |
This patch mainly fix issue:
http://tracker.ceph.com/issues/16297
Ceph monitors will crash when
mon_command
receive empty prefix sent by rados.py. Such as: use below script to query pool stats info, will cause ceph monitors crash.If
mon_command
does not containprefix=xxx
parameter, ceph monitor will get empty prefix, then monitor will crash.Signed-off-by: You Ji youji@ebay.com