Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-volume: Using --readonly for {vg|pv|lv}s commands #21409

Merged
merged 1 commit into from
Apr 17, 2018

Conversation

ErwanAliasr1
Copy link
Contributor

The actual code is detecting {vg|pv|lv}s by running the usual {vg|pv|lv}s commands.
Those calls expect lvmetad to be aware of the actual state of them.
This works actually pretty well in most of the cases.

When ceph-volume is run from a container,
this code also works from the container itself but don't on the host.

On the host side, running {vg|pv|lv}s commands reports nothing,
making ceph-volume reporting "No valid Ceph devices found"
The root cause is lvmetad not receiving the udev notification and so,
{vg|pv|lv}s commands reports the 'known' state instead of the 'real' state.

This is a major issue as it means that it exists cases or maybe races where
ceph-volume reports "No valid Ceph devices found" while the disk
actually have some Ceph data & metadata on them.
This could be interpreted like disks are free/available while they are not.
This will mislead users or configuration tools trying to understand the
current state of a node.

In July 2015, as per https://www.redhat.com/archives/lvm-devel/2015-July/msg00086.html,
a new option called "--readonly" have been added to lvm2.
One of the most interesting part of it is : "disable using lvmetad so VGs are read from disk"

In our case, that feature is really interesting as it means that
everytime ceph-volume calls a {vg|pv|lv}s command, it will read the
metadata from the disks instead of considering the lvmetad status.

This patch change all the {vg|pv|lv}s call to use --readonly.
It solves the bug exposed here and doesn't affect the traditional use-case.
The main benefit of this patch is to avoid a false report of a disk not having metadata.

Fixes: https://tracker.ceph.com/issues/23693
Signed-off-by: Erwan Velu erwan@redhat.com

Please consider the backport to luminous.

The actual code is detecting {vg|pv|lv}s by running the usual {vg|pv|lv}s commands.
Those calls expect lvmetad to be aware of the actual state of them.
This works actually pretty well in most of the cases.

When ceph-volume is run from a container,
this code also works from the container itself but don't on the host.

On the host side, running {vg|pv|lv}s commands reports nothing,
making ceph-volume reporting "No valid Ceph devices found"
The root cause is lvmetad not receiving the udev notification and so,
{vg|pv|lv}s commands reports the 'known' state instead of the 'real' state.

This is a major issue as it means that it exists cases or maybe races where
ceph-volume reports "No valid Ceph devices found" while the disk
actually have some Ceph data & metadata on them.
This could be interpreted like disks are free/available while they are not.
This will mislead users or configuration tools trying to understand the
current state of a node.

In July 2015, as per https://www.redhat.com/archives/lvm-devel/2015-July/msg00086.html,
a new option called "--readonly" have been added to lvm2.
One of the most interesting part of it is : "disable using lvmetad so VGs are read from disk"

In our case, that feature is really interesting as it means that
everytime ceph-volume calls a {vg|pv|lv}s command, it will read the
metadata from the disks instead of considering the lvmetad status.

This patch change all the {vg|pv|lv}s call to use --readonly.
It solves the bug exposed here and doesn't affect the traditional use-case.
The main benefit of this patch is to avoid a false report of a disk not having metadata.

Fixes: https://tracker.ceph.com/issues/23693
Signed-off-by: Erwan Velu <erwan@redhat.com>
@leseb
Copy link
Member

leseb commented Apr 13, 2018

jenkins test ceph-volume all

1 similar comment
@leseb
Copy link
Member

leseb commented Apr 13, 2018

jenkins test ceph-volume all

@alfredodeza
Copy link
Contributor

alfredodeza commented Apr 13, 2018

@ErwanAliasr1 have you pushed to ceph-ci.git ? We need to have that pushed there so that we have packages to test against. If you could just keep the same branch name, that would help as well.

@ErwanAliasr1
Copy link
Contributor Author

@leseb
Copy link
Member

leseb commented Apr 13, 2018

jenkins test ceph-volume all

@alfredodeza
Copy link
Contributor

All tests passed with evelu-readonly branch on ceph-ci:

https://jenkins.ceph.com/job/ceph-volume-test/9/

@dmick
Copy link
Member

dmick commented Apr 17, 2018

Ah. So this is only needed when lvmetad is running, and when it's not, things are reread from disk each time anyway, correct?

@ErwanAliasr1
Copy link
Contributor Author

@dmick for the experiment I had on my server, it wasn't able to reach lvmetad and didn't read the disk directly neither. Only the --readonly forced it to reread the real status from the disks.

@dmick
Copy link
Member

dmick commented Apr 17, 2018

Hmm. Wonder why then?

@leseb
Copy link
Member

leseb commented Apr 17, 2018

@dmick from the man page, "run the command in a special read-only mode which will read on-disk metadata without needing to take any locks". We haven't found the root cause and why --readonly works, at least this is just assumptions. But the fact we disabled udev_sync and udev_rules has something to do with this. When you do this lv will mknode inside the container but the host won't see it and it'll remain inside the container even if /dev/ is bindmounted. That's all we know now and we don't have time to investigate further. We might resume this later though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants