Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling_update: migrate ceph-disk osds to ceph-volume #3727

Merged
merged 5 commits into from Apr 18, 2019
Merged

Conversation

andrewschoen
Copy link
Contributor

When upgrading to nautlius run ceph-volume simple scan and
ceph-volume simple activate --all to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen aschoen@redhat.com

Copy link
Contributor

@dsavineau dsavineau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add a condition because this will only work for non-containerized deployment.

@andrewschoen
Copy link
Contributor Author

I think you need to add a condition because this will only work for non-containerized deployment.

You're right, I've got that added now.

@andrewschoen
Copy link
Contributor Author

The dev-centos-non_container-update test is passing now, but it's deploying mimic initially with ceph-volume created osds. It's good to test that the updates here will noop when ran against a ceph-volume created osd, but I'd like to add another testing scenario that upgrades from luminous with ceph-disk created osds to nautilus.

@andrewschoen andrewschoen force-pushed the ceph-disk-update branch 5 times, most recently from 30ca2ba to c9f9446 Compare April 1, 2019 14:36
@andrewschoen
Copy link
Contributor Author

I manually ran the new testing sceneario added here and it was successful. https://2.jenkins.ceph.com/job/ceph-ansible-scenario/548/consoleFull

You can see the ceph-volume simple commands in the log here:

TASK [scan ceph-disk osds with ceph-volume if deploying nautilus] **************
task path: /home/jenkins-build/build/workspace/ceph-ansible-scenario/rolling_update.yml:422
Monday 01 April 2019  15:17:30 +0000 (0:00:00.054)       0:14:48.096 ********** 
changed: [osd0] => changed=true 
  cmd:
  - ceph-volume
  - --cluster=test
  - simple
  - scan
  delta: '0:00:01.706350'
  end: '2019-04-01 15:17:34.692910'
  rc: 0
  start: '2019-04-01 15:17:32.986560'
  stderr: ''
  stderr_lines: []
  stdout: |2-
     stderr: lsblk: /var/lib/ceph/osd/test-0: not a block device
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    Running command: /usr/sbin/cryptsetup status /dev/sdb1
    --> OSD 0 got scanned and metadata persisted to file: /etc/ceph/osd/0-c93e4979-6ff6-4d7e-aba3-34c4c3bb7184.json
    --> To take over management of this scanned OSD, and disable ceph-disk and udev, run:
    -->     ceph-volume simple activate 0 c93e4979-6ff6-4d7e-aba3-34c4c3bb7184
     stderr: lsblk: /var/lib/ceph/osd/test-1: not a block device
     stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or /sys expected.
    Running command: /usr/sbin/cryptsetup status /dev/sdc1
    --> OSD 1 got scanned and metadata persisted to file: /etc/ceph/osd/1-1e090b86-0ce4-4db1-90e9-071873b17d2d.json
    --> To take over management of this scanned OSD, and disable ceph-disk and udev, run:
    -->     ceph-volume simple activate 1 1e090b86-0ce4-4db1-90e9-071873b17d2d
  stdout_lines: <omitted>

TASK [activate scanned ceph-disk osds and migrate to ceph-volume if deploying nautilus] ***
task path: /home/jenkins-build/build/workspace/ceph-ansible-scenario/rolling_update.yml:430
Monday 01 April 2019  15:17:35 +0000 (0:00:04.542)       0:14:52.639 ********** 
changed: [osd0] => changed=true 
  cmd:
  - ceph-volume
  - --cluster=test
  - simple
  - activate
  - --all
  delta: '0:00:01.209180'
  end: '2019-04-01 15:17:38.771386'
  rc: 0
  start: '2019-04-01 15:17:37.562206'
  stderr: ''
  stderr_lines: []
  stdout: |-
    --> activating OSD specified in /etc/ceph/osd/0-c93e4979-6ff6-4d7e-aba3-34c4c3bb7184.json
    Running command: /bin/ln -snf /dev/sdb2 /var/lib/ceph/osd/test-0/block
    Running command: /bin/chown -R ceph:ceph /dev/sdb2
    Running command: /bin/systemctl enable ceph-volume@simple-0-c93e4979-6ff6-4d7e-aba3-34c4c3bb7184
     stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@simple-0-c93e4979-6ff6-4d7e-aba3-34c4c3bb7184.service to /usr/lib/systemd/system/ceph-volume@.service.
    Running command: /bin/ln -sf /dev/null /etc/systemd/system/ceph-disk@.service
    --> All ceph-disk systemd units have been disabled to prevent OSDs getting triggered by UDEV events
    Running command: /bin/systemctl enable --runtime ceph-osd@0
    Running command: /bin/systemctl start ceph-osd@0
    --> Successfully activated OSD 0 with FSID c93e4979-6ff6-4d7e-aba3-34c4c3bb7184
    --> activating OSD specified in /etc/ceph/osd/1-1e090b86-0ce4-4db1-90e9-071873b17d2d.json
    Running command: /bin/ln -snf /dev/sdc2 /var/lib/ceph/osd/test-1/block
    Running command: /bin/chown -R ceph:ceph /dev/sdc2
    Running command: /bin/systemctl enable ceph-volume@simple-1-1e090b86-0ce4-4db1-90e9-071873b17d2d
     stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@simple-1-1e090b86-0ce4-4db1-90e9-071873b17d2d.service to /usr/lib/systemd/system/ceph-volume@.service.
    Running command: /bin/ln -sf /dev/null /etc/systemd/system/ceph-disk@.service
    --> All ceph-disk systemd units have been disabled to prevent OSDs getting triggered by UDEV events
    Running command: /bin/systemctl enable --runtime ceph-osd@1
    Running command: /bin/systemctl start ceph-osd@1
    --> Successfully activated OSD 1 with FSID 1e090b86-0ce4-4db1-90e9-071873b17d2d
  stdout_lines: <omitted>

There is a caveat to upgrading to nautilus. For the upgrade to complete, the user must switch the osd_scenario to lvm. If the devices config option was previously used to deploy ceph-disk osds then those can not be used with ceph-volume. To get around this I've made it so no new osds will be created during an update. Afterwards the user will need to remove any devices listed in devices that were previous used by ceph-disk. These devices can not be used by ceph-volume because 1) they are already running OSDs and 2) they are rejected immediately because they have GPT headers.

Copy link
Collaborator

@guits guits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea in ceph-ansible@master is to test an upgrade from ceph@nautilus to ceph@dev

The same scenario to upgrade ceph@luminous to ceph@nautilus will be tested in ceph-ansible@stable-4.0

tox.ini Show resolved Hide resolved
tox.ini Show resolved Hide resolved
tox.ini Show resolved Hide resolved
tox.ini Show resolved Hide resolved
tox.ini Show resolved Hide resolved
tox.ini Show resolved Hide resolved
@andrewschoen
Copy link
Contributor Author

The idea in ceph-ansible@master is to test an upgrade from ceph@nautilus to ceph@dev

The same scenario to upgrade ceph@luminous to ceph@nautilus will be tested in ceph-ansible@stable-4.0

What value are we really gaining by running tests that upgrade from nautilus to ceph@master? The point is that I must upgrade from luminous (or mimic) to nautilus so that I can verify that the new ceph-volume simple additions are working correctly. Deploying nautilus initially doesn't give me ceph-disk created OSDs to test with.

@guits
Copy link
Collaborator

guits commented Apr 9, 2019

so it should be done in stable-4.0 branch, you upgrade from luminous to nautilus

@andrewschoen
Copy link
Contributor Author

so it should be done in stable-4.0 branch, you upgrade from luminous to nautilus

I'm not sure I understand. Are you saying that I should deploy luminous with stable-3.2 and then deploy nautilus with stable-4.0?

@guits guits requested a review from dsavineau April 10, 2019 16:30
@guits
Copy link
Collaborator

guits commented Apr 11, 2019

jenkins test pipeline

infrastructure-playbooks/rolling_update.yml Outdated Show resolved Hide resolved
infrastructure-playbooks/rolling_update.yml Outdated Show resolved Hide resolved
infrastructure-playbooks/rolling_update.yml Outdated Show resolved Hide resolved
infrastructure-playbooks/rolling_update.yml Outdated Show resolved Hide resolved
infrastructure-playbooks/rolling_update.yml Outdated Show resolved Hide resolved
When upgrading to nautlius run ``ceph-volume simple scan`` and
``ceph-volume simple activate --all`` to migrate any running
ceph-disk osds to ceph-volume.

Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1656460

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
This test deploys a luminous cluster with ceph-disk created osds
and then upgrades to nautilus and migrates those osds to ceph-volume.
The nodes are then rebooted and cluster state verified.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
When performing a rolling update do not try to create
any new osds with `ceph-volume lvm batch`. This is troublesome
because when upgrading to nautilus the devices list might contain
devices that are currently being used by ceph-disk and have GPT
headers on them, which will cause ceph-volume to fail when
trying to use such a device. Any devices originally created
by ceph-disk will need to be removed from the devices list
before any new osds can be created.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
We do this so that the ceph-config role can most accurately
report the number of osds for the generation of the ceph.conf
file.

We don't want to use ceph-volume to determine the number of
osds because in an upgrade to nautilus ceph-volume won't be able to
accurately count osds created by ceph-disk.

Signed-off-by: Andrew Schoen <aschoen@redhat.com>
Signed-off-by: Andrew Schoen <aschoen@redhat.com>
@andrewschoen
Copy link
Contributor Author

@guits I think I've addressed everything here, would you mind taking another look? I reran my test manually and it still succeeds as well. https://2.jenkins.ceph.com/job/ceph-ansible-scenario/572/consoleFull

@guits
Copy link
Collaborator

guits commented Apr 18, 2019

failures are unrelated, merging anyway

@guits guits merged commit e2529dc into master Apr 18, 2019
@guits guits deleted the ceph-disk-update branch April 18, 2019 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants