-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph-volume: Using --readonly for {vg|pv|lv}s commands #21409
Conversation
The actual code is detecting {vg|pv|lv}s by running the usual {vg|pv|lv}s commands. Those calls expect lvmetad to be aware of the actual state of them. This works actually pretty well in most of the cases. When ceph-volume is run from a container, this code also works from the container itself but don't on the host. On the host side, running {vg|pv|lv}s commands reports nothing, making ceph-volume reporting "No valid Ceph devices found" The root cause is lvmetad not receiving the udev notification and so, {vg|pv|lv}s commands reports the 'known' state instead of the 'real' state. This is a major issue as it means that it exists cases or maybe races where ceph-volume reports "No valid Ceph devices found" while the disk actually have some Ceph data & metadata on them. This could be interpreted like disks are free/available while they are not. This will mislead users or configuration tools trying to understand the current state of a node. In July 2015, as per https://www.redhat.com/archives/lvm-devel/2015-July/msg00086.html, a new option called "--readonly" have been added to lvm2. One of the most interesting part of it is : "disable using lvmetad so VGs are read from disk" In our case, that feature is really interesting as it means that everytime ceph-volume calls a {vg|pv|lv}s command, it will read the metadata from the disks instead of considering the lvmetad status. This patch change all the {vg|pv|lv}s call to use --readonly. It solves the bug exposed here and doesn't affect the traditional use-case. The main benefit of this patch is to avoid a false report of a disk not having metadata. Fixes: https://tracker.ceph.com/issues/23693 Signed-off-by: Erwan Velu <erwan@redhat.com>
jenkins test ceph-volume all |
1 similar comment
jenkins test ceph-volume all |
@ErwanAliasr1 have you pushed to ceph-ci.git ? We need to have that pushed there so that we have packages to test against. If you could just keep the same branch name, that would help as well. |
jenkins test ceph-volume all |
All tests passed with evelu-readonly branch on ceph-ci: |
Ah. So this is only needed when lvmetad is running, and when it's not, things are reread from disk each time anyway, correct? |
@dmick for the experiment I had on my server, it wasn't able to reach lvmetad and didn't read the disk directly neither. Only the --readonly forced it to reread the real status from the disks. |
Hmm. Wonder why then? |
@dmick from the man page, "run the command in a special read-only mode which will read on-disk metadata without needing to take any locks". We haven't found the root cause and why --readonly works, at least this is just assumptions. But the fact we disabled udev_sync and udev_rules has something to do with this. When you do this lv will mknode inside the container but the host won't see it and it'll remain inside the container even if |
The actual code is detecting {vg|pv|lv}s by running the usual {vg|pv|lv}s commands.
Those calls expect lvmetad to be aware of the actual state of them.
This works actually pretty well in most of the cases.
When ceph-volume is run from a container,
this code also works from the container itself but don't on the host.
On the host side, running {vg|pv|lv}s commands reports nothing,
making ceph-volume reporting "No valid Ceph devices found"
The root cause is lvmetad not receiving the udev notification and so,
{vg|pv|lv}s commands reports the 'known' state instead of the 'real' state.
This is a major issue as it means that it exists cases or maybe races where
ceph-volume reports "No valid Ceph devices found" while the disk
actually have some Ceph data & metadata on them.
This could be interpreted like disks are free/available while they are not.
This will mislead users or configuration tools trying to understand the
current state of a node.
In July 2015, as per https://www.redhat.com/archives/lvm-devel/2015-July/msg00086.html,
a new option called "--readonly" have been added to lvm2.
One of the most interesting part of it is : "disable using lvmetad so VGs are read from disk"
In our case, that feature is really interesting as it means that
everytime ceph-volume calls a {vg|pv|lv}s command, it will read the
metadata from the disks instead of considering the lvmetad status.
This patch change all the {vg|pv|lv}s call to use --readonly.
It solves the bug exposed here and doesn't affect the traditional use-case.
The main benefit of this patch is to avoid a false report of a disk not having metadata.
Fixes: https://tracker.ceph.com/issues/23693
Signed-off-by: Erwan Velu erwan@redhat.com
Please consider the backport to luminous.