Skip to content
This repository was archived by the owner on Mar 16, 2026. It is now read-only.

[RM-15451] Gather keys directly from the mon#393

Merged
alfredodeza merged 27 commits into
ceph:masterfrom
SUSE:fix-remove-ceph-create-keys-dep
May 24, 2016
Merged

[RM-15451] Gather keys directly from the mon#393
alfredodeza merged 27 commits into
ceph:masterfrom
SUSE:fix-remove-ceph-create-keys-dep

Conversation

@oms4suse
Copy link
Copy Markdown
Contributor

Use the mon keyring to gather keys directly from the mon.
This is better than relying on:

ceph-create-keys --cluster ${cluster_name} --id ${mon_id}

Which is started as a side effect of booting a mon. This side effect has
numerous issues as stated in this thread on ceph-devel:

http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/30552

The conclusion is that ceph will remove this side effect, after ceph-deploy
has a fix. This fix removes that dependency.

Signed-off-by: Owen Synge osynge@suse.com

@ceph-jenkins
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

@oms4suse
Copy link
Copy Markdown
Contributor Author

Please note this patch reverses the logic, Since connecting to hosts is a more expensive operation than calculating things locally, this code minimises this.

The old code connected for each keyring, to each mon until the mon returned with the keyring.

The new logic code connects to a mon and then retrieves all keyrings, from that one mon, and only if it fails to retrieve all keyrings does it then to the next mon.

This should make the number of connections created and destroyed reduce from 5 to 1. As well as the benefit listed above.

@oms4suse
Copy link
Copy Markdown
Contributor Author

@alfredodeza
Copy link
Copy Markdown
Contributor

OK to test

@alfredodeza
Copy link
Copy Markdown
Contributor

@oms4suse this is a lot of code with no tests. Care to include a few of these? Also, for such a big change in logic I would like to see thorough documentation on the behavioral change (even if no docs existed before for this specific action)

@oms4suse
Copy link
Copy Markdown
Contributor Author

@alfredodeza yes we need some tests. I added some comments as a start but yes we need some tests.

Most annoying is the key we get back after making the mon's is different from the key we get back. In white space and in quoting.

@oms4suse
Copy link
Copy Markdown
Contributor Author

java.lang.IllegalStateException: no longer a configured node for 158.69.80.112+trusty_small_unique__35591ec5-fe53-49c8-a01b-a87f5e5ac69a

Does not explain what is failing to me. Can some one please help?

@alfredodeza
Copy link
Copy Markdown
Contributor

test this please

@alfredodeza
Copy link
Copy Markdown
Contributor

@oms4suse you have valid failures now. Just a reminder: the automated PR checks are a convenience, we expect contributors to run the test suite throughout their code changes and before sending a PR.

There hasn't been any tests added to this PR as well.

@oms4suse
Copy link
Copy Markdown
Contributor Author

@alfredodeza Some help would have been nice to find understand what this means.

java.lang.IllegalStateException: no longer a configured node for

I found a bug but have no way of knowing if it has anything to do with the

java.lang.IllegalStateException: no longer a configured node for

@oms4suse oms4suse force-pushed the fix-remove-ceph-create-keys-dep branch from 16897b9 to c7c3b07 Compare April 12, 2016 13:12
@oms4suse
Copy link
Copy Markdown
Contributor Author

Well I added a unit test.

Comment thread ceph_deploy/gatherkeys.py
rlogger.error('"ceph mon_status %s" returned %s' % (host, code))
for line in err:
rlogger.debug(line)
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gatherkeys_with_mon returns either True or False.

We have a test where it returns True and False.

Can you be more specific what you want tested here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing a test that checks that this is returning False.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_gatherkeys_fail

uses mock of gatherkeys_with_mon on failure.

It checks for RuntimeError.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it patches gatherkeys_with_mon, I am saying that gatherkeys_with_mon needs to be tested :) This return False needs a test.

oms4suse added 10 commits May 20, 2016 14:40
… message imperative.

Prevfious log message was not clear.

Signed-off-by: Owen Synge <osynge@suse.com>
… failure of gatherkeys_missing

Make this code as simple and clear as possible removing variable
and shortening the failure path.

Signed-off-by: Owen Synge <osynge@suse.com>
…-json output of 'mon status'

We dump the output of mon status when it is not json.

Signed-off-by: Owen Synge <osynge@suse.com>
…ys_missing: gatherkeys.gatherkeys_missing

Add tests for gatherkeys.gatherkeys_missing

Signed-off-by: Owen Synge <osynge@suse.com>
…a-separated values in logs

It is prefured to use comma-separated values in logs

Signed-off-by: Owen Synge <osynge@suse.com>
…ys_missing: Remove unneeded imports

Remove unneeded imports

Signed-off-by: Owen Synge <osynge@suse.com>
…umentation

The admin command did not have and documentation.

Signed-off-by: Owen Synge <osynge@suse.com>
…erkeys documentation

The gatherkeys subcommand did not have and documentation.

Signed-off-by: Owen Synge <osynge@suse.com>
…nged behavior

The gatherkeys subcommand does not depend upon ceph-create-keys so does not
have to have the side effect of making mons admin nodes.

Signed-off-by: Owen Synge <osynge@suse.com>
…install

Updated the basic install instructions.

(1) Added a creating a new configuration section.
(2) Added reference to the 'new' subcommand
(3) Added reference to the 'mon' subcommand
(4) Improved Gather keys
(5) Added reference to the 'gatherkeys' subcommand
(6) Moved Admin hosts section as now it is more important.
(7) Added reference to the 'admin' subcommand

The gatherkeys subcommand does not depend upon ceph-create-keys so does not
have to have the side effect of making mons admin nodes. To get this in I
must add documentation to ceph-deploy.

Signed-off-by: Owen Synge <osynge@suse.com>
@oms4suse oms4suse force-pushed the fix-remove-ceph-create-keys-dep branch from 6f657a4 to 6d7d323 Compare May 20, 2016 12:48
@oms4suse
Copy link
Copy Markdown
Contributor Author

@alfredodeza

  1. Have added documentation
  2. sorry I don’t know how to run teuthology

yuriw added a commit that referenced this pull request May 20, 2016
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
yuriw added a commit to ceph/ceph-qa-suite that referenced this pull request May 20, 2016
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
@yuriw
Copy link
Copy Markdown

yuriw commented May 20, 2016

Trying to test this.
Made a copy in ceph-deploy repo for SUSE:fix-remove-ceph-create-keys-dep - branch wip-test-PR393-master
Modified yaml https://github.com/ceph/ceph-qa-suite/blob/wip-test-PR393-master/suites/ceph-deploy/basic/tasks/ceph-admin-commands.yaml to point to ceph-deploy branch wip-test-PR393-master (ceph-qa-tests branch wip-test-PR393-master as well)

Scheduled tests as:

CEPH_BRANCH=jewel; MACHINE_NAME=vps; teuthology-suite -v -c $CEPH_BRANCH -m $MACHINE_NAME -k distro -s ceph-deploy -e $CEPH_QA_EMAIL --suite-branch="wip-test-PR393-master" --filter ubuntu_ -p 100

Run: http://pulpito.ceph.com/teuthology-2016-05-20_10:24:22-ceph-deploy-jewel-distro-basic-vps/

@oms4suse @alfredodeza @liewegas

Tests failed on ubuntu pls review if my configuration was valid and suggest next steps

@yuriw
Copy link
Copy Markdown

yuriw commented May 20, 2016

@oms4suse it would speed up testing a bit if you could update ceph-deploy branch wip-test-PR393-master in ceph repo for future testing (teuthology can't pull from other repos - speculating).

@oms4suse
Copy link
Copy Markdown
Contributor Author

@yuriw Thanks for the initial test, time zones are a pain :)

'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy mon create-initial'

Shows that gather keys runs just fine in this context,

2016-05-20T10:49:15.060 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.mon][INFO  ] Running gatherkeys...
2016-05-20T10:49:15.061 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory /tmp/tmpXx_4zU
2016-05-20T10:49:15.085 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][DEBUG ] connection detected need for sudo
2016-05-20T10:49:15.110 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][DEBUG ] connected to host: vpm065
2016-05-20T10:49:15.111 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][DEBUG ] detect platform information from remote host
2016-05-20T10:49:15.129 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][DEBUG ] detect machine type
2016-05-20T10:49:15.132 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][DEBUG ] find the location of an executable
2016-05-20T10:49:15.133 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][INFO  ] Running command: sudo /sbin/initctl version
2016-05-20T10:49:15.143 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][DEBUG ] fetch remote file
2016-05-20T10:49:15.144 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.vpm065.asok mon_status
2016-05-20T10:49:15.264 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-vpm065/keyring auth get-or-create client.admin osd allow * mds allow * mon allow *
2016-05-20T10:49:15.780 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-vpm065/keyring auth get-or-create client.bootstrap-mds mon allow profile bootstrap-mds
2016-05-20T10:49:16.047 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-vpm065/keyring auth get-or-create client.bootstrap-osd mon allow profile bootstrap-osd
2016-05-20T10:49:16.315 INFO:teuthology.orchestra.run.vpm065.stderr:[vpm065][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-vpm065/keyring auth get-or-create client.bootstrap-rgw mon allow profile bootstrap-rgw
2016-05-20T10:49:16.532 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] Storing ceph.client.admin.keyring
2016-05-20T10:49:16.532 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-mds.keyring
2016-05-20T10:49:16.533 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] keyring 'ceph.mon.keyring' already exists
2016-05-20T10:49:16.533 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-osd.keyring
2016-05-20T10:49:16.533 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-rgw.keyring
2016-05-20T10:49:16.533 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpXx_4zU

Yet when run the next time as:

'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy gatherkeys vpm065.front.sepia.ceph.com'

It looks from the logs that the mon keyring is not placed at the expected location:

2016-05-20T10:51:09.789 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][WARNING] No mon key found in host: vpm065.front.sepia.ceph.com
2016-05-20T10:51:09.789 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:vpm065.front.sepia.ceph.com
2016-05-20T10:51:09.790 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpRX0aIN
2016-05-20T10:51:09.790 INFO:teuthology.orchestra.run.vpm065.stderr:[ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon

The related code is:

distro = hosts.get(host, username=args.username)
dir_keytype_mon = ceph_deploy.util.paths.mon.path(args.cluster, host)
path_keytype_mon = "%s/keyring" % (dir_keytype_mon)
mon_key = distro.conn.remote_module.get_file(path_keytype_mon)
if mon_key is None:
    LOG.warning("No mon key found in host: %s", host)
    return False

Hence no further operations can occur as we need this keyring to connect to the admin nodes.

Now we need to understand why this test is deleting the mon keyring after doing a

ceph-deploy mon create-initial

@oms4suse
Copy link
Copy Markdown
Contributor Author

Oh and just to clarify the path for the mon key ring is:

/var/lib/ceph/mon/ceph-vpm065/keyring

Would you like me to add more to the error message to confirm that this file has been removed from the cluster between these two calls by the test suite?

@liewegas
Copy link
Copy Markdown
Member

liewegas commented May 23, 2016 via email

… keyring location

While testing we had issues finding the mon keyring.

Signed-off-by: Owen Synge <osynge@suse.com>
yuriw added a commit that referenced this pull request May 23, 2016
Added more logging from oms101

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
oms4suse added 2 commits May 23, 2016 19:14
Improve logging message so it is clear the error is not connecting the mon

Signed-off-by: Owen Synge <osynge@suse.com>
We should always use the short hostname for mon interactions.

Signed-off-by: Owen Synge <osynge@suse.com>
@oms4suse oms4suse force-pushed the fix-remove-ceph-create-keys-dep branch from ffdda90 to 246e3b9 Compare May 23, 2016 17:41
@oms4suse
Copy link
Copy Markdown
Contributor Author

We found interesting issue with the extra logging.

When running with the test suite:

./ceph-deploy mon create-initial

We get this log message:

[ceph_deploy.gatherkeys][INFO  ] Found keyring at mira046:/var/lib/ceph/mon/ceph-mira046/keyring

When the test suite runs as:

'cd /home/ubuntu/cephtest/ceph-deploy && ./ceph-deploy gatherkeys mira046.front.sepia.ceph.com'

We get this error:

[WARNING] No mon key not found mira046.front.sepia.ceph.com:/var/lib/ceph/mon/ceph-mira046.front.sepia.ceph.com/keyring

This clearly states that we are looking for a keyring with the long hostname which I think is wrong

Hence the last patch ceph_deploy.gatherkeys: Normalize hostname

This should now pass all tests provided github is not broken.

Since gatherkeys now uses distro.conn.remote_module.shortname()
We need to mock this also.

Signed-off-by: Owen Synge <osynge@suse.com>
@oms4suse
Copy link
Copy Markdown
Contributor Author

Since gatherkeys now uses distro.conn.remote_module.shortname()
We had to mock this in the tests.

@yuriw
Copy link
Copy Markdown

yuriw commented May 24, 2016

all tests passed on the latest ceph-deply wip-test-PR393-master
http://pulpito.ceph.com/yuriw-2016-05-24_08:19:19-ceph-deploy-jewel-distro-basic-vps/

@alfredodeza alfredodeza merged commit 6f33173 into ceph:master May 24, 2016
@smithfarm smithfarm deleted the fix-remove-ceph-create-keys-dep branch June 30, 2016 17:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants