Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: don't blow away bootstrap-mgr on upgrades #18399

Merged
merged 1 commit into from Nov 2, 2017
Merged

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Oct 19, 2017

@jcsp
Copy link
Contributor Author

jcsp commented Oct 19, 2017

@vasukulkarni please could you try the upgrade suite on this?

// ceph-create-keys)
EntityName bootstrap_mgr_name;
bootstrap_mgr_name.from_str("client.bootstrap-mgr");
if (!mon->key_server.contains(bootstrap_mgr_name)) {
KeyServerData::Incremental auth_inc;
bool r = auth_inc.name.from_str("client.bootstrap-mgr");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we reuse bootstrap_mgr_name here for auth_inc.name ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! done.

@vasukulkarni
Copy link
Contributor

@jcsp sure, I will remove the workaround I have added here (https://github.com/ceph/ceph/blob/master/qa/tasks/ceph_deploy.py#L767-L779 ) and retest with that branch and update the results here.

@vasukulkarni
Copy link
Contributor

@jcsp will need the shaman build , can you please push this to cephci please

Fixes: http://tracker.ceph.com/issues/20950
Signed-off-by: John Spray <john.spray@redhat.com>
vasukulkarni added a commit that referenced this pull request Oct 19, 2017
fixed by #18399

Signed-off-by: Vasu Kulkarni <vasu@redhat.com>
@vasukulkarni
Copy link
Contributor

Haven't seen this before, after upgrade, it tries to restart service but it failed, after the service is restarted the mgr node is installed, I can try to shuffle the restart but it looks new since it worked before.

http://qa-proxy.ceph.com/teuthology/vasu-2017-10-19_20:39:33-upgrade:jewel-x:ceph-deploy:-wip-20950-distro-basic-vps/1752183/teuthology.log

2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ]   python-cephfs.x86_64 2:13.0.0-2169.g57229ea.el7
2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ]   python-rados.x86_64 2:13.0.0-2169.g57229ea.el7
2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ]   python-rbd.x86_64 2:13.0.0-2169.g57229ea.el7
2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ]
2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ] Replaced:
2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ]   libcephfs1.x86_64 1:10.2.10-0.el7
2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ]
2017-10-19T21:12:26.248 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ] Complete!
2017-10-19T21:12:26.367 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][INFO  ] Running command: sudo ceph --version
2017-10-19T21:12:26.492 INFO:teuthology.orchestra.run.vpm011.stderr:[vpm181][DEBUG ] ceph version 13.0.0-2169-g57229ea (57229ea2a4369518c7a16b7a09b045b7896f5a70) mimic (dev)
2017-10-19T21:12:26.498 INFO:teuthology.orchestra.run.vpm181:Running: 'sudo systemctl restart ceph.target'
2017-10-19T21:12:26.594 INFO:teuthology.orchestra.run.vpm011:Running: 'sudo ceph -s'
2017-10-19T21:17:26.753 INFO:teuthology.orchestra.run.vpm011.stderr:2017-10-19 21:17:26.754 7ff3696b4700  0 monclient(hunting): authenticate timed out after 300
2017-10-19T21:17:26.790 INFO:teuthology.orchestra.run.vpm011.stderr:2017-10-19 21:17:26.754 7ff3696b4700  0 librados: client.admin authentication error (110) Connection timed out
2017-10-19T21:17:26.791 INFO:teuthology.orchestra.run.vpm011.stderr:[errno 110] error connecting to the cluster
2017-10-19T21:17:26.791 ERROR:teuthology.run_tasks:Saw exception from tasks.

@vasukulkarni
Copy link
Contributor

tried one more run with some minor modifications but can't get it to working , maybe this is a 13.x issue, previously the upgrade was tested from jewel -> 12.x , I dont see any crash in monitors in any logs.

http://pulpito.ceph.com/vasu-2017-10-26_21:09:34-upgrade:jewel-x:ceph-deploy:-wip-20950-distro-basic-vps/

@jcsp
Copy link
Contributor Author

jcsp commented Oct 31, 2017

@vasukulkarni can you try testing this cherry-picked onto luminous please?

@vasukulkarni
Copy link
Contributor

I will pick this up on luminous and test and update here.

@vasukulkarni
Copy link
Contributor

Tested this after cherry-picking the commit on top of luminous and without the mgr workaround and it is working fine, full logs here: http://pulpito.ceph.com/vasu-2017-11-02_00:30:23-upgrade:jewel-x:ceph-deploy:-wip-qa-mgr-testing-distro-basic-vps/1800269/

@jcsp jcsp merged commit 737877f into ceph:master Nov 2, 2017
@jcsp jcsp deleted the wip-20950 branch November 2, 2017 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants