Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd.remove fails to zap devices on ceph version 14.2.3-349 #1747

Open
jschmid1 opened this issue Sep 16, 2019 · 10 comments
Open

osd.remove fails to zap devices on ceph version 14.2.3-349 #1747

jschmid1 opened this issue Sep 16, 2019 · 10 comments
Assignees
Labels

Comments

@jschmid1
Copy link
Contributor

jschmid1 commented Sep 16, 2019

ceph version 14.2.3-349-g7b1552ea82 (7b1552ea827cf5167b6edbba96dd1c4a9dc16937) nautilus (stable)

salt-run osd.remove $id

uses ceph-volume lvm zap --osd-id $id --destroy to zap a disk remotely on the minion.

In previous releases we expected a string Zapping successful for OSD in the return message.

With this release we get: --> Zapping: /dev/ceph-a8e4a78d-e3....

Since there are no significant changes that would indicate a change in the return string I assume it's due to the changes in logging in recent commits. mlogger vs terminal.success

This raises the question if invoking shell commands is the right way when we have an python api to use. A couple of things need to be verified though.

  1. Is the zap command consumable via the API
  2. Does it return meaningful messages
  3. Is it more efficient since it needs to be wrapped in a minion module to be called from the master (via a runner)
@jschmid1 jschmid1 added the bug label Sep 16, 2019
@jschmid1 jschmid1 self-assigned this Sep 16, 2019
@smithfarm
Copy link
Contributor

Ideally, DeepSea will still work with earlier versions of nautilus even after this is fixed.

@jschmid1
Copy link
Contributor Author

maybe @jan--f can confirm my assumption regarding the logging changes.

@jan--f
Copy link
Contributor

jan--f commented Sep 16, 2019

14.2.3 ceph-volume should only print logging messages to stderr. I guess the runner returns stderr? I'll look into it.

@jan--f
Copy link
Contributor

jan--f commented Sep 16, 2019

hmm I just saw this: I ran salt '*' cmd.run 'for d in b c d e f; do ceph-volume lvm zap --destroy /dev/vd$d; done' from the salt master and got output like

data2-6.virt1.home.fajerski.name:
    --> Zapping: /dev/vdb
    --> Zapping: /dev/vdc
    --> Zapping: /dev/vdd
    --> Zapping: /dev/vde
    --> Zapping: /dev/vdf

But a subsequent lsblk revealed that no LV got zapped. Running the same command directly on the minion (minus the salt part of course) zapped the disks just fine. No idea what is going on here, will investigate more tomorrow.

@jan--f
Copy link
Contributor

jan--f commented Sep 17, 2019

Here is the issue

[2019-09-17 11:01:39,104][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 148, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 205, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 40, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 205, in dispatch
    instance.main()
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 355, in main
    self.zap()
  File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 233, in zap
    self.zap_lvm_member(device)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 198, in zap_lvm_member
    self.zap_lv(Device(lv.lv_path))
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 144, in zap_lv
    self.unmount_lv(lv)
  File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/zap.py", line 133, in unmount_lv
    mlogger.info("Unmounting %s", lv_path)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 190, in info
    info(record)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 142, in info
    return _Write(prefix=blue_arrow).raw(msg)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 117, in raw
    self.write(string)
  File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 120, in write
    self._writer.write(self.prefix + line + self.suffix)
ValueError: I/O operation on closed file.

@smithfarm
Copy link
Contributor

so, it would seem, that in the CI tests which are now passing with the tmp fix, the OSDs are being removed but the underlying disk is not really getting zapped (AFAIK the tests do not include any logic for verifying the zap)

@jan--f
Copy link
Contributor

jan--f commented Sep 17, 2019

so, it would seem, that in the CI tests which are now passing with the tmp fix, the OSDs are being removed but the underlying disk is not really getting zapped (AFAIK the tests do not include any logic for verifying the zap)

that would be entirely plausible.

@jan--f
Copy link
Contributor

jan--f commented Sep 17, 2019

ok I can confirm that the ceph build fixes this issue. However there still seems to somehting up with purge, where the OSDs are not stopped, but once they are stopped, zapping them works just fine.

salt 'data1*' cmd.run 'for d in b c d e f; do ceph-volume lvm zap --destroy /dev/vd$d; echo $?; done'
data1-6.virt1.home.fajerski.name:
    --> Zapping: /dev/vdb
    --> Unmounting /var/lib/ceph/osd/ceph-3
    Running command: /bin/umount -v /var/lib/ceph/osd/ceph-3
     stderr: umount: /var/lib/ceph/osd/ceph-3 unmounted
    Running command: /usr/sbin/wipefs --all /dev/ceph-58736349-a5d6-4966-9598-f7ed4082441b/osd-data-bcb8791a-954d-417b-b016-fa08d9a62885
    Running command: /bin/dd if=/dev/zero of=/dev/ceph-58736349-a5d6-4966-9598-f7ed4082441b/osd-data-bcb8791a-954d-417b-b016-fa08d9a62885 bs=1M count=10
    --> Only 1 LV left in VG, will proceed to destroy volume group ceph-58736349-a5d6-4966-9598-f7ed4082441b
    Running command: /usr/sbin/vgremove -v -f ceph-58736349-a5d6-4966-9598-f7ed4082441b
     stderr: Removing ceph--58736349--a5d6--4966--9598--f7ed4082441b-osd--data--bcb8791a--954d--417b--b016--fa08d9a62885 (253:2)
     stderr: Archiving volume group "ceph-58736349-a5d6-4966-9598-f7ed4082441b" metadata (seqno 21).
        Releasing logical volume "osd-data-bcb8791a-954d-417b-b016-fa08d9a62885"
     stderr: Creating volume group backup "/etc/lvm/backup/ceph-58736349-a5d6-4966-9598-f7ed4082441b" (seqno 22).
     stdout: Logical volume "osd-data-bcb8791a-954d-417b-b016-fa08d9a62885" successfully removed
     stderr: Removing physical volume "/dev/vdb" from volume group "ceph-58736349-a5d6-4966-9598-f7ed4082441b"
     stdout: Volume group "ceph-58736349-a5d6-4966-9598-f7ed4082441b" successfully removed
    Running command: /usr/sbin/wipefs --all /dev/vdb
     stdout: /dev/vdb: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
    Running command: /bin/dd if=/dev/zero of=/dev/vdb bs=1M count=10
    --> Zapping successful for: <Raw Device: /dev/vdb>
    0
    --> Zapping: /dev/vdc
    --> Unmounting /var/lib/ceph/osd/ceph-9
    Running command: /bin/umount -v /var/lib/ceph/osd/ceph-9
     stderr: umount: /var/lib/ceph/osd/ceph-9 unmounted
    Running command: /usr/sbin/wipefs --all /dev/ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39/osd-data-b8281429-e4d0-4d6e-ac64-38dae8d1a270
    Running command: /bin/dd if=/dev/zero of=/dev/ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39/osd-data-b8281429-e4d0-4d6e-ac64-38dae8d1a270 bs=1M count=10
    --> Only 1 LV left in VG, will proceed to destroy volume group ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39
    Running command: /usr/sbin/vgremove -v -f ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39
     stderr: Removing ceph--df617cae--42bb--43ca--97ba--0d01b8ef2d39-osd--data--b8281429--e4d0--4d6e--ac64--38dae8d1a270 (253:4)
     stderr: Archiving volume group "ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39" metadata (seqno 21).
        Releasing logical volume "osd-data-b8281429-e4d0-4d6e-ac64-38dae8d1a270"
     stderr: Creating volume group backup "/etc/lvm/backup/ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39" (seqno 22).
     stdout: Logical volume "osd-data-b8281429-e4d0-4d6e-ac64-38dae8d1a270" successfully removed
     stderr: Removing physical volume "/dev/vdc" from volume group "ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39"
     stdout: Volume group "ceph-df617cae-42bb-43ca-97ba-0d01b8ef2d39" successfully removed
    Running command: /usr/sbin/wipefs --all /dev/vdc
     stdout: /dev/vdc: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
    Running command: /bin/dd if=/dev/zero of=/dev/vdc bs=1M count=10
    --> Zapping successful for: <Raw Device: /dev/vdc>
    0
    --> Zapping: /dev/vdd
    --> Unmounting /var/lib/ceph/osd/ceph-14
    Running command: /bin/umount -v /var/lib/ceph/osd/ceph-14
     stderr: umount: /var/lib/ceph/osd/ceph-14 unmounted
    Running command: /usr/sbin/wipefs --all /dev/ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d/osd-data-65a0439a-9543-4e7b-a94f-11a2ff373241
    Running command: /bin/dd if=/dev/zero of=/dev/ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d/osd-data-65a0439a-9543-4e7b-a94f-11a2ff373241 bs=1M count=10
    --> Only 1 LV left in VG, will proceed to destroy volume group ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d
    Running command: /usr/sbin/vgremove -v -f ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d
     stderr: Removing ceph--686e5dc6--00e8--46ba--b109--d93623a9f60d-osd--data--65a0439a--9543--4e7b--a94f--11a2ff373241 (253:0)
     stderr: Archiving volume group "ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d" metadata (seqno 21).
        Releasing logical volume "osd-data-65a0439a-9543-4e7b-a94f-11a2ff373241"
     stderr: Creating volume group backup "/etc/lvm/backup/ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d" (seqno 22).
     stdout: Logical volume "osd-data-65a0439a-9543-4e7b-a94f-11a2ff373241" successfully removed
     stderr: Removing physical volume "/dev/vdd" from volume group "ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d"
     stdout: Volume group "ceph-686e5dc6-00e8-46ba-b109-d93623a9f60d" successfully removed
    Running command: /usr/sbin/wipefs --all /dev/vdd
     stdout: /dev/vdd: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
    Running command: /bin/dd if=/dev/zero of=/dev/vdd bs=1M count=10
    --> Zapping successful for: <Raw Device: /dev/vdd>
    0
    --> Zapping: /dev/vde
    Running command: /usr/sbin/wipefs --all /dev/vde
     stdout: /dev/vde: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
    Running command: /bin/dd if=/dev/zero of=/dev/vde bs=1M count=10
    --> Zapping successful for: <Raw Device: /dev/vde>
    0
    --> Zapping: /dev/vdf
    --> Unmounting /var/lib/ceph/osd/ceph-24
    Running command: /bin/umount -v /var/lib/ceph/osd/ceph-24
     stderr: umount: /var/lib/ceph/osd/ceph-24 unmounted
    Running command: /usr/sbin/wipefs --all /dev/ceph-860f342f-8a0c-4746-a906-0eb32c9847dc/osd-data-9c48ec80-46f9-4184-ad77-fd4ac6e94126
    Running command: /bin/dd if=/dev/zero of=/dev/ceph-860f342f-8a0c-4746-a906-0eb32c9847dc/osd-data-9c48ec80-46f9-4184-ad77-fd4ac6e94126 bs=1M count=10
    --> Only 1 LV left in VG, will proceed to destroy volume group ceph-860f342f-8a0c-4746-a906-0eb32c9847dc
    Running command: /usr/sbin/vgremove -v -f ceph-860f342f-8a0c-4746-a906-0eb32c9847dc
     stderr: Removing ceph--860f342f--8a0c--4746--a906--0eb32c9847dc-osd--data--9c48ec80--46f9--4184--ad77--fd4ac6e94126 (253:1)
     stderr: Archiving volume group "ceph-860f342f-8a0c-4746-a906-0eb32c9847dc" metadata (seqno 21).
        Releasing logical volume "osd-data-9c48ec80-46f9-4184-ad77-fd4ac6e94126"
     stderr: Creating volume group backup "/etc/lvm/backup/ceph-860f342f-8a0c-4746-a906-0eb32c9847dc" (seqno 22).
     stdout: Logical volume "osd-data-9c48ec80-46f9-4184-ad77-fd4ac6e94126" successfully removed
     stderr: Removing physical volume "/dev/vdf" from volume group "ceph-860f342f-8a0c-4746-a906-0eb32c9847dc"
     stdout: Volume group "ceph-860f342f-8a0c-4746-a906-0eb32c9847dc" successfully removed
    Running command: /usr/sbin/wipefs --all /dev/vdf
     stdout: /dev/vdf: 8 bytes were erased at offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31
    Running command: /bin/dd if=/dev/zero of=/dev/vdf bs=1M count=10
    --> Zapping successful for: <Raw Device: /dev/vdf>
    0

@smithfarm
Copy link
Contributor

I can also confirm that this problem doesn't happen with 14.2.4, which would indicate this just another symptom of the ceph-volume regression that found its way into 14.2.3.

@smithfarm
Copy link
Contributor

. . . and since users on SUSE will not see 14.2.3, there's no reason for DeepSea to do anything special to work around that ceph-volume regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants