[RM-15451] Gather keys directly from the mon#393
Conversation
|
Can one of the admins verify this patch? |
|
Please note this patch reverses the logic, Since connecting to hosts is a more expensive operation than calculating things locally, this code minimises this. The old code connected for each keyring, to each mon until the mon returned with the keyring. The new logic code connects to a mon and then retrieves all keyrings, from that one mon, and only if it fails to retrieve all keyrings does it then to the next mon. This should make the number of connections created and destroyed reduce from 5 to 1. As well as the benefit listed above. |
|
Fix for bug: http://tracker.ceph.com/issues/15451 Please note the discussion: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/30552 |
|
OK to test |
|
@oms4suse this is a lot of code with no tests. Care to include a few of these? Also, for such a big change in logic I would like to see thorough documentation on the behavioral change (even if no docs existed before for this specific action) |
|
@alfredodeza yes we need some tests. I added some comments as a start but yes we need some tests. Most annoying is the key we get back after making the mon's is different from the key we get back. In white space and in quoting. |
|
java.lang.IllegalStateException: no longer a configured node for 158.69.80.112+trusty_small_unique__35591ec5-fe53-49c8-a01b-a87f5e5ac69a Does not explain what is failing to me. Can some one please help? |
|
test this please |
|
@oms4suse you have valid failures now. Just a reminder: the automated PR checks are a convenience, we expect contributors to run the test suite throughout their code changes and before sending a PR. There hasn't been any tests added to this PR as well. |
|
@alfredodeza Some help would have been nice to find understand what this means. java.lang.IllegalStateException: no longer a configured node for I found a bug but have no way of knowing if it has anything to do with the java.lang.IllegalStateException: no longer a configured node for |
16897b9 to
c7c3b07
Compare
|
Well I added a unit test. |
| rlogger.error('"ceph mon_status %s" returned %s' % (host, code)) | ||
| for line in err: | ||
| rlogger.debug(line) | ||
| return False |
There was a problem hiding this comment.
gatherkeys_with_mon returns either True or False.
We have a test where it returns True and False.
Can you be more specific what you want tested here?
There was a problem hiding this comment.
I am not seeing a test that checks that this is returning False.
There was a problem hiding this comment.
test_gatherkeys_fail
uses mock of gatherkeys_with_mon on failure.
It checks for RuntimeError.
There was a problem hiding this comment.
it patches gatherkeys_with_mon, I am saying that gatherkeys_with_mon needs to be tested :) This return False needs a test.
… message imperative. Prevfious log message was not clear. Signed-off-by: Owen Synge <osynge@suse.com>
… failure of gatherkeys_missing Make this code as simple and clear as possible removing variable and shortening the failure path. Signed-off-by: Owen Synge <osynge@suse.com>
…-json output of 'mon status' We dump the output of mon status when it is not json. Signed-off-by: Owen Synge <osynge@suse.com>
…ys_missing: gatherkeys.gatherkeys_missing Add tests for gatherkeys.gatherkeys_missing Signed-off-by: Owen Synge <osynge@suse.com>
…a-separated values in logs It is prefured to use comma-separated values in logs Signed-off-by: Owen Synge <osynge@suse.com>
…ys_missing: Remove unneeded imports Remove unneeded imports Signed-off-by: Owen Synge <osynge@suse.com>
…umentation The admin command did not have and documentation. Signed-off-by: Owen Synge <osynge@suse.com>
…erkeys documentation The gatherkeys subcommand did not have and documentation. Signed-off-by: Owen Synge <osynge@suse.com>
…nged behavior The gatherkeys subcommand does not depend upon ceph-create-keys so does not have to have the side effect of making mons admin nodes. Signed-off-by: Owen Synge <osynge@suse.com>
…install Updated the basic install instructions. (1) Added a creating a new configuration section. (2) Added reference to the 'new' subcommand (3) Added reference to the 'mon' subcommand (4) Improved Gather keys (5) Added reference to the 'gatherkeys' subcommand (6) Moved Admin hosts section as now it is more important. (7) Added reference to the 'admin' subcommand The gatherkeys subcommand does not depend upon ceph-create-keys so does not have to have the side effect of making mons admin nodes. To get this in I must add documentation to ceph-deploy. Signed-off-by: Owen Synge <osynge@suse.com>
6f657a4 to
6d7d323
Compare
|
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
|
Trying to test this. Scheduled tests as:
Run: http://pulpito.ceph.com/teuthology-2016-05-20_10:24:22-ceph-deploy-jewel-distro-basic-vps/ @oms4suse @alfredodeza @liewegas Tests failed on ubuntu pls review if my configuration was valid and suggest next steps |
|
@oms4suse it would speed up testing a bit if you could update |
|
@yuriw Thanks for the initial test, time zones are a pain :) Shows that gather keys runs just fine in this context, Yet when run the next time as: It looks from the logs that the mon keyring is not placed at the expected location: The related code is: Hence no further operations can occur as we need this keyring to connect to the admin nodes. Now we need to understand why this test is deleting the mon keyring after doing a |
|
Oh and just to clarify the path for the mon key ring is: Would you like me to add more to the error message to confirm that this file has been removed from the cluster between these two calls by the test suite? |
|
Sure. Also, Owen, we can set you up with access to hte test cluster if
you don't already have it (I forget who does)...
|
… keyring location While testing we had issues finding the mon keyring. Signed-off-by: Owen Synge <osynge@suse.com>
Improve logging message so it is clear the error is not connecting the mon Signed-off-by: Owen Synge <osynge@suse.com>
We should always use the short hostname for mon interactions. Signed-off-by: Owen Synge <osynge@suse.com>
ffdda90 to
246e3b9
Compare
|
We found interesting issue with the extra logging. When running with the test suite: We get this log message: When the test suite runs as: We get this error: This clearly states that we are looking for a keyring with the long hostname which I think is wrong Hence the last patch ceph_deploy.gatherkeys: Normalize hostname This should now pass all tests provided github is not broken. |
Since gatherkeys now uses distro.conn.remote_module.shortname() We need to mock this also. Signed-off-by: Owen Synge <osynge@suse.com>
|
Since gatherkeys now uses distro.conn.remote_module.shortname() |
|
all tests passed on the latest ceph-deply wip-test-PR393-master |
Use the mon keyring to gather keys directly from the mon.
This is better than relying on:
ceph-create-keys --cluster ${cluster_name} --id ${mon_id}
Which is started as a side effect of booting a mon. This side effect has
numerous issues as stated in this thread on ceph-devel:
The conclusion is that ceph will remove this side effect, after ceph-deploy
has a fix. This fix removes that dependency.
Signed-off-by: Owen Synge osynge@suse.com