Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: add support for network namespaces (Multus CNI) #1450

Merged
merged 4 commits into from
Jan 7, 2021

Conversation

idryomov
Copy link
Contributor

@idryomov idryomov commented Sep 1, 2020

Depends on ceph/ceph#36927, see discussion in #1323.

Only compile tested at this point.

@idryomov
Copy link
Contributor Author

idryomov commented Sep 1, 2020

E2E tests are failing with

rbd: unknown map option 'noudev'
rbd: couldn't parse map options

as expected.

@Madhu-1 Madhu-1 added component/rbd Issues related to RBD DNM DO NOT MERGE labels Sep 2, 2020
@humblec
Copy link
Collaborator

humblec commented Sep 2, 2020

We could add this mount option to example SC of RBD in e2e directory if its going to a default mount options for a particular mounter ( krbd, rbd-nbd). Also we could have different SC examples in E2E for different mounters.. Just a suggestion.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 2, 2020

We could add this mount option to example SC of RBD in e2e directory if its going to a default mount options for a particular mounter ( krbd, rbd-nbd). Also we could have different SC examples in E2E for different mounters.. Just a suggestion.

This is not a mount option, this is the rbd map specific options passed when we are mapping and unmapping the rbd devices

@humblec
Copy link
Collaborator

humblec commented Sep 2, 2020

We could add this mount option to example SC of RBD in e2e directory if its going to a default mount options for a particular mounter ( krbd, rbd-nbd). Also we could have different SC examples in E2E for different mounters.. Just a suggestion.

This is not a mount option, this is the rbd map specific options passed when we are mapping and unmapping the rbd devices

Ah.. Discard.. I read noudev with nodev mount option :( ..

@leseb
Copy link
Member

leseb commented Sep 2, 2020

Once we get the backport in Octopus, we can temporarily switch the Ceph base image for ceph-csi to use: ceph/daemon-base:latest-octopus-devel to validate a few things.

@idryomov
Copy link
Contributor Author

idryomov commented Sep 2, 2020

@leseb As mentioned elsewhere, I would like to get the whole thing smoke tested with Multus before merging ceph/ceph#36927 into master and release branches.

@idryomov
Copy link
Contributor Author

idryomov commented Sep 2, 2020

@leseb Did you mean nautilus instead of octopus?

@dillaman
Copy link

dillaman commented Sep 2, 2020

@leseb Did you mean nautilus instead of octopus?

Upstream pulls from the ceph/ceph:v14.2 tag

@dillaman dillaman closed this Sep 2, 2020
@dillaman dillaman reopened this Sep 2, 2020
@humblec
Copy link
Collaborator

humblec commented Sep 3, 2020

Side note to this PR: Currently the mount options can be passed and its adjusted dynamically , but we dont have a mechanism to have map options for rbd mapping . We have to hardcode it for now. Ideally this should be configurable .

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 3, 2020

Side note to this PR: Currently the mount options can be passed and its adjusted dynamically , but we dont have a mechanism to have map options for rbd mapping . We have to hardcode it for now. Ideally this should be configurable .

the issue already exists to provide support for map options #291

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 3, 2020

@dillaman @idryomov currently cephcsi is using v15 as the base image

BASE_IMAGE=ceph/ceph:v15

@leseb
Copy link
Member

leseb commented Sep 3, 2020

@dillaman @idryomov currently cephcsi is using v15 as the base image

BASE_IMAGE=ceph/ceph:v15

@idryomov that's why I said Octopus devel :)

@leseb
Copy link
Member

leseb commented Sep 3, 2020

@leseb As mentioned elsewhere, I would like to get the whole thing smoke tested with Multus before merging ceph/ceph#36927 into master and release branches.

We would need this PR for that or a ceph-csi build of that branch. Also, ceph-csi needs to be able to detect that it's being configured with Multus, we can check the pod annotations for that. And if multus then we map with noudev. We still this logic in this PR.

I don't want to disrupt too much ceph-csi with this that's why it's probably best to map with noudev only when Multus is configured and not make it a default yet.

@idryomov
Copy link
Contributor Author

idryomov commented Sep 3, 2020

@idryomov that's why I said Octopus devel :)

OK, I'll prepare the octopus backport.

(As mentioned elsewhere, the nautilus backport is at ceph/ceph-ci@6b5a851, but the container didn't get built because nautilus
container builds have been disabled by ceph/ceph-build#1607, pending the merge of ceph/ceph-container#1706 and ceph/ceph-build#1628.)

@idryomov
Copy link
Contributor Author

idryomov commented Sep 3, 2020

I don't want to disrupt too much ceph-csi with this that's why it's probably best to map with noudev only when Multus is configured and not make it a default yet.

I hard-coded noudev without conditioning it Multus or anything else because I don't want to bifurcate. If it suits Multus, it should be enabled for all CSI mappings.

I don't see a reason to wait for Multus to become fully supported before rolling out noudev, but I want to make sure it actually works in a Multus environment because it's a little subtle (e.g. noudev requires the network namespace to be owned by the initial user namespace, etc).

@humblec
Copy link
Collaborator

humblec commented Sep 3, 2020

I don't want to disrupt too much ceph-csi with this that's why it's probably best to map with noudev only when Multus is configured and not make it a default yet.

I hard-coded noudev without conditioning it Multus or anything else because I don't want to bifurcate. If it suits Multus, it should be enabled for all CSI mappings.

Sure, as long as noudev does not impact non multus setups, we can make it default. We will wrap a better mechanism later with multus configured or not or better configurable options for RBD map options based on input..etc.

I don't see a reason to wait for Multus to become fully supported before rolling out noudev, but I want to make sure it actually works in a Multus environment because it's a little subtle (e.g. noudev requires the network namespace to be owned by the initial user namespace, etc).

Sure. That would be good.

@idryomov
Copy link
Contributor Author

idryomov commented Sep 3, 2020

Yeah, making it possible to supply custom rbd map and rbd unmap options as per #291 is good, but by default, we should be striving for as much uniformity as possible.

@idryomov
Copy link
Contributor Author

idryomov commented Sep 3, 2020

The octopus backport is at ceph/ceph-ci@21998a2 and the container is at quay.ceph.io/ceph-ci/ceph:wip-krbd-noudev-octopus.

@rohan47
Copy link

rohan47 commented Sep 21, 2020

Tested out madhupr001/cephcsi:multus and It works perfectly @Madhu-1 @idryomov

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 21, 2020

Tested out madhupr001/cephcsi:multus and It works perfectly @Madhu-1 @idryomov

@rohan47 Thanks for testing it out.

@leseb
Copy link
Member

leseb commented Sep 21, 2020

Now I believe this should detect whether Multus annotations are present, if so, then we turn on the noudev flag on mapping.

@idryomov
Copy link
Contributor Author

idryomov commented Sep 21, 2020

No, I think CSI should default to noudev (see discussion above). In case of breakage, users will be able to revert through mapOptions and unmapOptions storage class parameters. I'll rebase on top of #1460 once it merges to make that possible.

@humblec
Copy link
Collaborator

humblec commented Sep 21, 2020

Thanks @rohan47 for testing this out! hopefully we can get this in soon! Thanks @idryomov for the noudev support to enable Multus 👍

@mergify
Copy link
Contributor

mergify bot commented Sep 21, 2020

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

@idryomov
Copy link
Contributor Author

The bottom line is that this is going to need to wait for 15.2.8, unless someone produces a multi-arch container (either based on octopus branch or just 15.2.6 + rbd change) that ceph-csi CI can be temporarily switched to. We have an x86 container which was successfully used by multiple people for testing and benchmarking, but ceph-csi CI needs a multi-arch (x86 + arm64) one.

@idryomov idryomov force-pushed the wip-krbd-noudev branch 3 times, most recently from 463f551 to 298593f Compare December 17, 2020 09:23
@idryomov
Copy link
Contributor Author

So 15.2.8 is out, but I'm hitting the same issue: it's there on docker.io but not in registry-ceph-csi.apps.ocp.ci.centos.org. In fact, the internal registry is still on 15.2.6, meaning we missed 15.2.7 -- it looks like @nixpanic's bdf8fe7 ("build: use docker.io/ceph/ceph:v15 as BASE_IMAGE") change isn't working (or maybe I'm misinterpreting it)...

Is pushing images to the internal registry really supposed to be a manual process? Why not set it up to act as a cache?

@idryomov
Copy link
Contributor Author

@nixpanic Ping, can we get ceph v15.2.8 pushed to registry-ceph-csi.apps.ocp.ci.centos.org?

@nixpanic
Copy link
Member

nixpanic commented Jan 6, 2021

@nixpanic Ping, can we get ceph v15.2.8 pushed to registry-ceph-csi.apps.ocp.ci.centos.org?

Done! ceph/ceph:v15.2.8 has been pushed, and ceph/ceph:v15 has been updated too.

@nixpanic
Copy link
Member

nixpanic commented Jan 6, 2021

Is pushing images to the internal registry really supposed to be a manual process? Why not set it up to act as a cache?

There were (and possibly still are) issues when using a registry mirror, as that connects to Docker Hub a lot too, and caused failures since Docker reduced the image-pull limits. Doing this manual is a pain, and I hope we can setup an automated job that checks for updated images (daily or so).

With the new support for passing --options, referring to ExecCommand()
argument slices as mapOptions and options is confusing.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Fix "ndb" typo.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Factor out --device-type and --options formatting.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Make rbdplugin pod work in a non-initial network namespace (i.e. with
"hostNetwork: false") by skipping waiting for udev events when mapping
and unmapping images.  CSI use case is very simple: all that is needed
is a device node which is immediately fed to mkfs, so we should be able
to tolerate udev not being finished with the device just fine.

Fixes: ceph#1323
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov idryomov changed the title [DNM] rbd: add support for network namespaces (Multus CNI) rbd: add support for network namespaces (Multus CNI) Jan 6, 2021
@idryomov
Copy link
Contributor Author

idryomov commented Jan 7, 2021

With ceph:v15.2.8 pushed, this is good to go!

@nixpanic nixpanic added enhancement New feature or request and removed DNM DO NOT MERGE labels Jan 7, 2021
Copy link
Member

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, unit tests for appendDeviceTypeAndOptions() would have certainly been a good addition but I'm not familiar enough with the repo to request them.

@mergify mergify bot merged commit 04644c1 into ceph:master Jan 7, 2021
@idryomov idryomov deleted the wip-krbd-noudev branch January 8, 2021 10:30
@joed96
Copy link

joed96 commented Jan 10, 2021

Hi all, I believe I've run into a race condition resulting in data loss following this patch whereby, when re-attaching a PVC with volumeMode: Filesystem, the csi-rbdplugin re-runs mkfs and deletes my data.

The basic idea of the race condition is that:

  • csi-rbdplugin maps the RBD image successfully, with the noudev option
  • csi-rbdplugin runs blkid to determine if the RBD image is formatted. However, udev has not yet created the /dev/rbd* device. blkid exits with an error code, and so the plugin believes the image is not formatted.
  • udev catches up and creates the /dev/rbd* device
  • csi-rbdplugin runs mkfs which successfully re-formats the RBD image, deleting the previously existing data.

Detailed diagnostics are in Issue #1820.

I am new to ceph-csi, so could be missing something obvious. Please feel free to challenge my test cases and diagnostics.

@idryomov
Copy link
Contributor Author

Replied in #1820.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.