Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create snapshot #108

Closed
shootkin opened this issue Feb 27, 2020 · 10 comments
Closed

Cannot create snapshot #108

shootkin opened this issue Feb 27, 2020 · 10 comments

Comments

@shootkin
Copy link

Hello!

My environment:

Ubuntu 18.04
1 controller and 1 satellite on separate nodes
version: 1.4.1
storage pool: thinpool

Bug description:

I create volume by such commands:

linstor resource-definition create backups
linstor volume-definition create backups 5G
linstor resource create backups --auto-place 1 --storage-pool linstor-pool-remote

After that, I successfully create xfs filesystem on top of created drbd device and mount it to folder. But when I try to create snapshot with this command:

linstor snapshot create backups snap1

then this command is stuck, and linstor s l command shows Incomplete status of snapshot.
Moreover, when I try these commands on satellite:

lvdisplay
fdisk -l

they are stuck, though linstor controller shows all fine (except of snapshot whis is in Incomplete status).

Tell me please how to overcome this

@raltnoeder
Copy link
Member

Sounds mostly like any of

  • incorrectly configured LVM filters
  • deadlocked OS kernel, due to a bug in DRBD, LVM or block I/O to physical devices

Going from there, I'd suggest to

  • end running lvdisplay and fdisk processes (try Ctrl-C, SIGTERM, SIGABRT, SIGKILL)
  • if that doesn't work, reboot
  • check whether LVM filters exclude DRBD devices (they should, otherwise LVM will attempt to scan DRBD devices that are in the Secondary role, which doesn't work)

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

Thank you for response!

  1. lvdisplay process is in D state, so I cannot kill it.
  2. I didn't rebooted it yet, but I checked satellite logs, maybe they can tell you something:
    satellite:
[DeviceManager] ERROR LINSTOR/Satellite - SYSTEM - com.linbit.linstor.storage.StorageException: Failed to find major:minor of device /dev/drbd1000

/proc/lv_display_pid/stack:

[<0>] __drbd_make_request+0x293/0x5c0 [drbd]
[<0>] drbd_make_request+0x3e/0x70 [drbd]
[<0>] generic_make_request+0x124/0x300
[<0>] submit_bio+0x73/0x150
[<0>] __blkdev_direct_IO_simple+0x19d/0x360
[<0>] blkdev_direct_IO+0x3a2/0x3f0
[<0>] generic_file_read_iter+0xc6/0xbf0
[<0>] blkdev_read_iter+0x35/0x40
[<0>] new_sync_read+0xe4/0x130
[<0>] __vfs_read+0x29/0x40
[<0>] vfs_read+0x8e/0x130
[<0>] SyS_read+0x55/0xc0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff
  1. I didn't configure any LVM filter. I just installed lvm2 package and that's all. Week ago, I created linstor satellite with the same ansible role as now and it worked fine.

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

After reboot, lvdisplay and fdisk -l are working, but I don't want to reboot satellite node each time :-) Do you have any suggestions?

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

Also after reboot, when I run linstor volume list, I see such output:

root@linstor-controller-lc-795d778b49-qtckw:/# linstor v l
+--------------------------------------------------------------------------------------------------------------+
| Node      | Resource | StoragePool         | VolumeNr | MinorNr | DeviceName    | Allocated | InUse  | State |
|==============================================================================================================|
| linstor-1 | backups  | linstor-pool-remote | 0        | 1000    | /dev/drbd1000 |           | Unused | Error |
| linstor-1 | backups2 | linstor-pool-remote | 0        | 1001    | /dev/drbd1001 |           | Unused | Error |
+--------------------------------------------------------------------------------------------------------------+
ERROR:
Description:
    Node: 'linstor-1', resource: 'backups', volume: 0 - Device provider threw a storage exception
Details:
    Command 'blockdev --getsize64 /dev/vg-omni-data/backups_00000' returned with exitcode 1. 
    
    Standard out: 
    
    
    Error message: 
    blockdev: cannot open /dev/vg-omni-data/backups_00000: No such file or directory
    
ERROR:
Description:
    Node: 'linstor-1', resource: 'backups2', volume: 0 - Device provider threw a storage exception
Details:
    Command 'blockdev --getsize64 /dev/vg-omni-data/backups2_00000' returned with exitcode 1. 
    
    Standard out: 
    
    
    Error message: 
    blockdev: cannot open /dev/vg-omni-data/backups2_00000: No such file or directory

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

BTW, lvdisplay shows this:

 --- Logical volume ---
  LV Path                /dev/vg-omni-data/backups_00000
  LV Name                backups_00000
  VG Name                vg-omni-data
  LV UUID                qszboM-STjm-5zLg-tGqx-ZjFW-hwSG-CwPQKp
  LV Write Access        read/write
  LV Creation host, time linstor-1, 2020-02-27 07:28:55 -0800
  LV Pool name           thinpool
  LV Status              available
  # open                 0
  LV Size                5.00 GiB
  Mapped size            0.23%
  Current LE             1281
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:6
   
  --- Logical volume ---
  LV Path                /dev/vg-omni-data/backups2_00000
  LV Name                backups2_00000
  VG Name                vg-omni-data
  LV UUID                WB4a28-Kj7i-M6ux-FjFC-Ap3i-GxJS-7mOnSJ
  LV Write Access        read/write
  LV Creation host, time linstor-1, 2020-02-27 08:20:48 -0800
  LV Pool name           thinpool
  LV Status              available
  # open                 0
  LV Size                5.00 GiB
  Mapped size            0.02%
  Current LE             1281
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:7

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

And one more detail. I run linstor-satellite inside a docker container. Maybe that's the reason? Container is being run in privileged mode, as it is described here: https://www.linbit.com/en/container-kubernetes-docker-csi-linstor/

@ghernadi
Copy link
Contributor

ghernadi commented Mar 2, 2020

And one more detail. I run linstor-satellite inside a docker container. Maybe that's the reason?

Not just maybe :)

When running the satellite in a container, it still interacts through lvm2 with the kernel of the host system - that means, the resulting /dev/${volume_group}/${lv_name} will be visible on the host system but not (by default) within the container (not even in the container that created it in first place).

You can pass through (-v) /dev/ or at least the /dev/${volume_group}/. However, you should still be aware of what is happening in the host's kernel. Most notably: do not run multiple satellites in different containers on the same host whereas all have -v /dev/ added. You can easily (and most likely will) run into troubles with this kind of setup...

After reboot, lvdisplay and fdisk -l are working, but I don't want to reboot satellite node each time :-) Do you have any suggestions?

As Robert already mentioned, you should configure your lvm.conf properly. Usually you can simply exclude everything but the /dev/${volume_group} (or all your volume groups obviously)

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

Thank you for reply.
I added /dev folder as a volume to satellite container, but after creation of snapshot I see the same behavior. linstor snapshot create backups snap1 id stuck, lvdisplay and fdisk -l are stuck on host and from inside of container as well.
As to lvm.conf, I didn't get how to configure it. As I understand, I don't have to change anything in lvm.conf inside a docker container, because I'm using original Dockerfile from linstor-server repo.
But I should change lvm.conf on host side, right?
exclude everything but the /dev/${volume_group} - should I use global_filter or simple filter setting?

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

If I exclude DRBD devices by this rule:
global_filter = ["r|/dev/drbd.*|"]
then I see such error when I'm trying to create snapshot:

root@linstor-controller-lc-795d778b49-qtckw:/# linstor snapshot create backups snap1
SUCCESS:
Description:
    New snapshot 'snap1' of resource 'backups' registered.
Details:
    Snapshot 'snap1' of resource 'backups' UUID is: 95cbf563-80ec-46b4-9539-921b5dba5870
SUCCESS:
    Suspended IO of 'backups' on 'linstor-1' for snapshot
SUCCESS:
    Suspended IO of 'backups' on 'linstor-1' for snapshot
ERROR:
Description:
    (Node: 'linstor-1') Failed to create snapshot vg-omni-data/backups_00000_snap1 from backups_00000 within thin volume group vg-omni-data/thinpool
Details:
    Command 'lvcreate --snapshot --name vg-omni-data/backups_00000_snap1 vg-omni-data/backups_00000' returned with exitcode 3. 
    
    Standard out: 
    
    
    Error message: 
      WARNING: Failed to connect to lvmetad. Falling back to device scanning.
      /usr/sbin/modprobe failed: 1
      snapshot: Required device-mapper target(s) not detected in your kernel.
      Run `lvcreate --help' for more information.
    
Show reports:
    linstor error-reports show 5E5D34AA-600FB-000000
SUCCESS:
    Aborted snapshot of 'backups' on 'linstor-1'

@shootkin
Copy link
Author

shootkin commented Mar 2, 2020

OK, that's solution of my problem:

  1. modprobe dm_snapshot
  2. add global_filter = ["r|/dev/drbd.*|"] line to lvm.conf

Now everything seems working. But strange that it is not described in docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants