-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trident should ship Kubernetes FlexVolume plugin instead of using iSCSI PV #101
Comments
Hi @redbaron. Thanks for bringing the issue to our attention. We'll bring up this issue in the next Storage SIG meeting. Flex Plugins are no longer maintained as CSI plugins will be replacing them soon, so fixing the bug upstream seems like the most logical solution to me. NetApp is looking at submitting a patch upstream. Just curious, what NetApp platform (e.g., ONTAP, SolidFire, E-Series) and what host OS are you using? I'll ask more questions about the bug on the k8s GitHub issue. |
AFAIK Flexplugins are shipped as GA and not going anywhere, not until Kubernetes 2.0 at least :) CSI is alpha and much more complicated. At the end of the day, I either would work, but FlexPlugins fit niceley with current Trident architecture, all is needed is to provide ~100 lines bash script and create PV object of different type. Converting Trident to CSI is a much bigger architectural change.
ONTAP and CoreOS |
@redbaron We have also noticed that the iSCSI driver doesn't delete a device upon detach as there is no rescan following unmount. Once a session is established, a NetApp LUN would appear under I tried to recreate the problem of reusing LUN numbers (kubernetes/kubernetes#59946) using the following steps: First, I created two pods that attach two LUNs from the same target. There are two pods so that the session isn't terminated once one of the pods is gone:
Next, I detach pvcontapsan2 (/dev/sdk or LUN 4):
We see that /dev/sdk has been unmounted, but it's still present on the system:
LUN 4 is obviously still present on the storage backend as we haven't deleted it yet:
Next I delete the LUN:
Once the LUN is deleted, I try to reuse LUN4 by creating pvcontapsan2 on the same backend again:
Note the name change for LUN 4. Now, if I try to attach this new LUN (the new LUN 4), everything works as expected:
There are some errors in dmesg, but nothing related to [3:0:0:4]:
However, if I delete another device (say sdl) along with sdk and not try to recreate sdl when sdk is recreated and attached, I see the following in the dmesg output:
The
I just want to make sure that we're talking about the same problem. The state may linger on the host, but I haven't noticed any complications as a result of that. It would be helpful if you could provide the exact steps for reproducing the problem on your side as well as any logs that are insightful. Also, are you using multipath with Trident? You mentioned multipath under the k8s GitHub issue. |
Thanks for trying to look into it. I attributed mount errors to LUN reuse as it was easy for me to draw line from errors we saw when mounting fresh PV and lingering SCSI devices, but I have no easily reproducible test case :( Also keep in mind that iSCSI tech is alien to me and device mapper (multipath) is almost beyond comprehension, so I mostly don't know what I am talking about and stabbing in the dark. On our system with 32 days uptime, dmesg is full of errors (no connection between them, just showing classes of errors)
Needles to say, that on freshly rebooted node everything looks neat for some time, until there is enough churn in PV provisioning/deletion. So when on a fresh deployment I see
and multipath looks like
I simply don't know where to start looking at. So turning to dm-devel maillist and finding numerous references how LUN reuse should be carefully handled on a server side just fuelled my frustration. Another thing, which contributed to my belief that it might be LUN reuse or something close, is when it happened first (and I didn't know what to expect) I fsck'ed problematic device with autorepair and it was fixing and reporting pages of inodes, that FS was definitely populated! how it could possible be, if not some data reuse? I might have made a typo and just smashed somebody's else LUN though, we'd never know now. Unfortunately, despite total chaos in logs everything continue to work and very rarely we see actual pod startup errors, and it is hard to get all the necessary context quickly when it happens. Now I am trying to reproduce it in slightly more controlled environment, but no luck. There was iSCSI bug in 4.14 kernel, which was super painful to track down, it is fixed now, but unfortunately it means version updates: new kernel 4.14.24, ALUA modules enabled, hyperkube image 1.9.3 (was 1.8.3 on a problematic node), so I am starting over. And I am hitting problem immediately: kubelet doesn't mount new PVs as multipath device, but as /dev/sd* device. I opened kubernetes/kubernetes#60894 , but why it wasn't the problem before? kubelet code didn't change in that part. I am simply lost. I'll come back to you once I have something more tangible, which you can reproduce. Or I'll just give up and just tell devs to nuke PV and pods so that they can be recreated when problem occurs again. |
why new LUN continues to be /dev/sdk ? where is old /dev/sdk (asking, because kubelet 100% doesn't delete old devices)? what is ID_SERIAL of new SCSI devices? I done small experiment: created 10 pods with PVCs, deleted them, created again, kubelet seem to have mounted already existing multipath devices from initial creation! but that would mean that new scsi devices are created with same WWIDs! Shouldn't WWID be unique globally and in time? after first creation:
After second:
|
I think I start to understand, what is happening.
If multipath doesn't have |
Thanks for more information! We haven't done much testing around multipathing with Trident in Kubernetes, but I can take advantage of multipathing with Trident for Docker (Trident as a Docker Volume Plugin) using the same SVM and initiator that I used in my examples. I noticed you had earlier created #69. So can you use multipathing with Trident-provisioned volumes without specifying multiple portals in the PV? |
yes, if you login to all portals on a host (which reminds me that kubelet shouldn't logout from iSCSI because because it nukes devices discovered from "host" session too , I'll create a ticket for that) before starting kubelet, rescan on one sessions disovers devices from other sessions too. In my case I run following commands before kubelet starts (found them with trial and error, might not be optimal):
And current multipath config:
|
Plan is to make Kubernetes native iSCSI support better, and that work is underway. We have no intention of shipping our own driver at this time. |
We did some testing and we found this problem is caused by portals not being set in the pv. iscsi: After manually creating a pv and pvc connected to the Ontap the ISCSI device was removed as expected. |
iSCSI PV in kubernetes handles LUN number reuse incorrectly (see kubernetes/kubernetes#59946), but there is zero interest in fixing it.
Until it is fixed, Trident is unsafe to use. One way to overcome it is to stop provisioning iSCSI PVs and use Kubernetes FlexVolumes (https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md) + ship trident FlexVolume plugin(script) to be installed on nodes.
That script would handle iSCSI LUN discovery, mount , unmount by itself and if done right (possibly with help of sg3_utils) will make LUN number reuse safe again.
The text was updated successfully, but these errors were encountered: