Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trident should ship Kubernetes FlexVolume plugin instead of using iSCSI PV #101

Closed
redbaron opened this issue Mar 7, 2018 · 10 comments
Closed

Comments

@redbaron
Copy link

redbaron commented Mar 7, 2018

iSCSI PV in kubernetes handles LUN number reuse incorrectly (see kubernetes/kubernetes#59946), but there is zero interest in fixing it.

Until it is fixed, Trident is unsafe to use. One way to overcome it is to stop provisioning iSCSI PVs and use Kubernetes FlexVolumes (https://github.com/kubernetes/community/blob/master/contributors/devel/flexvolume.md) + ship trident FlexVolume plugin(script) to be installed on nodes.

That script would handle iSCSI LUN discovery, mount , unmount by itself and if done right (possibly with help of sg3_utils) will make LUN number reuse safe again.

@kangarlou
Copy link
Contributor

Hi @redbaron. Thanks for bringing the issue to our attention. We'll bring up this issue in the next Storage SIG meeting. Flex Plugins are no longer maintained as CSI plugins will be replacing them soon, so fixing the bug upstream seems like the most logical solution to me. NetApp is looking at submitting a patch upstream.

Just curious, what NetApp platform (e.g., ONTAP, SolidFire, E-Series) and what host OS are you using?

I'll ask more questions about the bug on the k8s GitHub issue.

@redbaron
Copy link
Author

redbaron commented Mar 7, 2018

Flex Plugins are no longer maintained

AFAIK Flexplugins are shipped as GA and not going anywhere, not until Kubernetes 2.0 at least :)

CSI is alpha and much more complicated. At the end of the day, I either would work, but FlexPlugins fit niceley with current Trident architecture, all is needed is to provide ~100 lines bash script and create PV object of different type. Converting Trident to CSI is a much bigger architectural change.

Just curious, what NetApp platform (e.g., ONTAP, SolidFire, E-Series) and what host OS are you using?

ONTAP and CoreOS

@kangarlou
Copy link
Contributor

@redbaron We have also noticed that the iSCSI driver doesn't delete a device upon detach as there is no rescan following unmount. Once a session is established, a NetApp LUN would appear under /dev/dis/by-path (e.g., /dev/disk/by-path/ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-2). Upon detaching a volume, the iSCSI plugin doesn't remove this device and everything stays the same.

I tried to recreate the problem of reusing LUN numbers (kubernetes/kubernetes#59946) using the following steps:

First, I created two pods that attach two LUNs from the same target. There are two pods so that the session isn't terminated once one of the pods is gone:

$ kubectl get pod -aw
NAME                       READY     STATUS      RESTARTS   AGE
pod-nginx1                 1/1       Running     0          1h
pod-nginx2                 1/1       Running     0          18m
$ cat pod-nginx-ontapsan2.yaml 
kind: Pod
apiVersion: v1
metadata:
  name: pod-nginx2
spec:
  containers:
    - name: nginx 
      image: nginx
      volumeMounts:
      - mountPath: "/usr/share/nginx/html"
        name: nginx-vol
  volumes:
    - name: nginx-vol 
      persistentVolumeClaim:
        claimName: pvcontapsan2
$ tree /var/lib/kubelet/plugins/kubernetes.io/iscsi/
/var/lib/kubelet/plugins/kubernetes.io/iscsi/
└── iface-default
    ├── 10.0.207.103:3260-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-3
    │   └── lost+found
    ├── 10.0.207.103:3260-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-4
    │   └── lost+found
$ lsscsi
[1:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0 
[2:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda 
[3:0:0:0]    disk    NETAPP   LUN C-Mode       8300  /dev/sdb 
[3:0:0:1]    disk    NETAPP   LUN C-Mode       8300  /dev/sdd 
[3:0:0:2]    disk    NETAPP   LUN C-Mode       8300  /dev/sdf 
[3:0:0:3]    disk    NETAPP   LUN C-Mode       8300  /dev/sdj 
[3:0:0:4]    disk    NETAPP   LUN C-Mode       8300  /dev/sdk
# ls -l /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root  9 Mar  7 11:32 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Mar  7 11:49 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-1 -> ../../sdd
lrwxrwxrwx 1 root root  9 Mar  7 11:50 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-2 -> ../../sdf
lrwxrwxrwx 1 root root  9 Mar  7 10:51 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-3 -> ../../sdj
lrwxrwxrwx 1 root root  9 Mar  7 11:34 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-4 -> ../../sdk

Next, I detach pvcontapsan2 (/dev/sdk or LUN 4):

$ kubectl delete pod pod-nginx2
pod "pod-nginx2" deleted
$ tree /var/lib/kubelet/plugins/kubernetes.io/iscsi/
/var/lib/kubelet/plugins/kubernetes.io/iscsi/
└── iface-default
    ├── 10.0.207.103:3260-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-3
    │   └── lost+found

We see that /dev/sdk has been unmounted, but it's still present on the system:

$ lsscsi
[1:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0 
[2:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda 
[3:0:0:0]    disk    NETAPP   LUN C-Mode       8300  /dev/sdb 
[3:0:0:1]    disk    NETAPP   LUN C-Mode       8300  /dev/sdd 
[3:0:0:2]    disk    NETAPP   LUN C-Mode       8300  /dev/sdf 
[3:0:0:3]    disk    NETAPP   LUN C-Mode       8300  /dev/sdj 
[3:0:0:4]    disk    NETAPP   LUN C-Mode       8300  /dev/sdk 
$ ls -l /dev/disk/by-path
total 0
lrwxrwxrwx 1 root root  9 Mar  7 11:32 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Mar  7 11:49 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-1 -> ../../sdd
lrwxrwxrwx 1 root root  9 Mar  7 11:50 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-2 -> ../../sdf
lrwxrwxrwx 1 root root  9 Mar  7 10:51 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-3 -> ../../sdj
lrwxrwxrwx 1 root root  9 Mar  7 11:34 ip-10.0.207.103:3260-iscsi-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-4 -> ../../sdk

LUN 4 is obviously still present on the storage backend as we haven't deleted it yet:

> lun mapping show
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
iscsi_svm_1
           /vol/netappdvp_test1/lun0                 trident       0  iscsi
iscsi_svm_1
           /vol/netappdvp_test2mp/lun0               trident       1  iscsi
iscsi_svm_1
           /vol/netappdvp_test3mp/lun0               trident       2  iscsi
iscsi_svm_1
           /vol/trident_default_pvcontapsan1_e22ad/lun0
                                                     trident       3  iscsi
iscsi_svm_1
           /vol/trident_default_pvcontapsan2_4134c/lun0
                                                     trident       4  iscsi

Next I delete the LUN:

$ kubectl delete pvc pvcontapsan2
persistentvolumeclaim "pvcontapsan2" deleted
> lun mapping show
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
iscsi_svm_1
           /vol/netappdvp_test1/lun0                 trident       0  iscsi
iscsi_svm_1
           /vol/netappdvp_test2mp/lun0               trident       1  iscsi
iscsi_svm_1
           /vol/netappdvp_test3mp/lun0               trident       2  iscsi
iscsi_svm_1
           /vol/trident_default_pvcontapsan1_e22ad/lun0
                                                     trident       3  iscsi
4 entries were displayed.

Once the LUN is deleted, I try to reuse LUN4 by creating pvcontapsan2 on the same backend again:

$ kubectl create -f pvc-ontapsan2.yaml 
persistentvolumeclaim "pvcontapsan2" created
> lun mapping show
Vserver    Path                                      Igroup   LUN ID  Protocol
---------- ----------------------------------------  -------  ------  --------
iscsi_svm_1
           /vol/netappdvp_test1/lun0                 trident       0  iscsi
iscsi_svm_1
           /vol/netappdvp_test2mp/lun0               trident       1  iscsi
iscsi_svm_1
           /vol/netappdvp_test3mp/lun0               trident       2  iscsi
iscsi_svm_1
           /vol/trident_default_pvcontapsan1_e22ad/lun0
                                                     trident       3  iscsi
iscsi_svm_1
           /vol/trident_default_pvcontapsan2_2ed96/lun0
                                                     trident       4  iscsi
5 entries were displayed.

Note the name change for LUN 4. Now, if I try to attach this new LUN (the new LUN 4), everything works as expected:

$ kubectl create -f pod-nginx-ontapsan2.yaml 
pod "pod-nginx2" created
$ kubectl get pod -aw
NAME                       READY     STATUS      RESTARTS   AGE
pod-nginx1                 1/1       Running     0          2h
pod-nginx2                 1/1       Running     0          12s
$ lsscsi
$ tree /var/lib/kubelet/plugins/kubernetes.io/iscsi/
[1:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0 
[2:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda 
[3:0:0:0]    disk    NETAPP   LUN C-Mode       8300  /dev/sdb 
[3:0:0:1]    disk    NETAPP   LUN C-Mode       8300  /dev/sdd 
[3:0:0:2]    disk    NETAPP   LUN C-Mode       8300  /dev/sdf 
[3:0:0:3]    disk    NETAPP   LUN C-Mode       8300  /dev/sdj 
[3:0:0:4]    disk    NETAPP   LUN C-Mode       8300  /dev/sdk 
/var/lib/kubelet/plugins/kubernetes.io/iscsi/
└── iface-default
    ├── 10.0.207.103:3260-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-3
    │   └── lost+found
    ├── 10.0.207.103:3260-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-4
    │   └── lost+found

There are some errors in dmesg, but nothing related to [3:0:0:4]:

[110900.769123] sd 3:0:0:1: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical
[110902.807492] sd 3:0:0:0: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical

However, if I delete another device (say sdl) along with sdk and not try to recreate sdl when sdk is recreated and attached, I see the following in the dmesg output:

[111347.890143] sd 3:0:0:5: [sdl] Unit Not Ready
[111347.890148] sd 3:0:0:5: [sdl] Sense Key : Illegal Request [current] 
[111347.890162] sd 3:0:0:5: [sdl] Add. Sense: Logical unit not supported
[111347.901283] sd 3:0:0:5: [sdl] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[111347.901286] sd 3:0:0:5: [sdl] Sense Key : Illegal Request [current] 
[111347.901287] sd 3:0:0:5: [sdl] Add. Sense: Logical unit not supported
[111347.903360] sd 3:0:0:5: [sdl] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[111347.903361] sd 3:0:0:5: [sdl] Sense Key : Illegal Request [current] 
[111347.903363] sd 3:0:0:5: [sdl] Add. Sense: Logical unit not supported
[111347.912053] sd 3:0:0:5: [sdl] 0 512-byte logical blocks: (0 B/0 B)
[111347.912056] sd 3:0:0:5: [sdl] 4096-byte physical blocks
[111347.917217] sdl: detected capacity change from 20971520 to 0
[111348.011754] EXT4-fs (sdk): VFS: Can't find ext4 filesystem

The EXT4-fs (sdk): VFS: Can't find ext4 filesystem is fine as sdk is a new volume that needs to be formatted. However, we still have sdl lingering on the host:

$ lsscsi
[1:0:0:0]    cd/dvd  VBOX     CD-ROM           1.0   /dev/sr0 
[2:0:0:0]    disk    ATA      VBOX HARDDISK    1.0   /dev/sda 
[3:0:0:0]    disk    NETAPP   LUN C-Mode       8300  /dev/sdb 
[3:0:0:1]    disk    NETAPP   LUN C-Mode       8300  /dev/sdd 
[3:0:0:2]    disk    NETAPP   LUN C-Mode       8300  /dev/sdf 
[3:0:0:3]    disk    NETAPP   LUN C-Mode       8300  /dev/sdj 
[3:0:0:4]    disk    NETAPP   LUN C-Mode       8300  /dev/sdk 
[3:0:0:5]    disk    NETAPP   LUN C-Mode       8300  /dev/sdl 

I just want to make sure that we're talking about the same problem. The state may linger on the host, but I haven't noticed any complications as a result of that. It would be helpful if you could provide the exact steps for reproducing the problem on your side as well as any logs that are insightful. Also, are you using multipath with Trident? You mentioned multipath under the k8s GitHub issue.

@redbaron
Copy link
Author

redbaron commented Mar 7, 2018

Thanks for trying to look into it. I attributed mount errors to LUN reuse as it was easy for me to draw line from errors we saw when mounting fresh PV and lingering SCSI devices, but I have no easily reproducible test case :( Also keep in mind that iSCSI tech is alien to me and device mapper (multipath) is almost beyond comprehension, so I mostly don't know what I am talking about and stabbing in the dark.

On our system with 32 days uptime, dmesg is full of errors (no connection between them, just showing classes of errors)

[2810068.440279] sd 6:0:0:13: [sdbd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2810068.440595] sd 6:0:0:13: [sdbd] tag#0 Sense Key : Illegal Request [current] 
[2810068.440860] sd 6:0:0:13: [sdbd] tag#0 Add. Sense: Logical unit not supported
[2810068.441119] sd 6:0:0:13: [sdbd] tag#0 CDB: Read(10) 28 00 01 3f ff 80 00 00 08 00
[2810068.441461] print_req_error: I/O error, dev sdbd, sector 20971392
[2810068.441645] Buffer I/O error on dev sdbd, logical block 2621424, async page read
...
[2813175.325523] sd 6:0:0:28: [sdda] Unit Not Ready
[2813175.325642] sd 6:0:0:28: [sdda] Sense Key : Illegal Request [current] 
[2813175.325788] sd 6:0:0:28: [sdda] Add. Sense: Logical unit not supported
[2813175.326387] sd 6:0:0:28: [sdda] Read Capacity(16) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2813175.326614] sd 6:0:0:28: [sdda] Sense Key : Illegal Request [current] 
[2813175.326757] sd 6:0:0:28: [sdda] Add. Sense: Logical unit not supported
[2813175.327300] sd 6:0:0:28: [sdda] Read Capacity(10) failed: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2813175.327528] sd 6:0:0:28: [sdda] Sense Key : Illegal Request [current] 
[2813175.327671] sd 6:0:0:28: [sdda] Add. Sense: Logical unit not supported
[2813175.328179] sd 6:0:0:28: [sdda] 0 512-byte logical blocks: (0 B/0 B)
[2813175.328320] sd 6:0:0:28: [sdda] 4096-byte physical blocks
[2813175.329368] sdda: detected capacity change from 10737418240 to 0
...
[2814506.503527] sd 6:0:0:0: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical
[2814506.503882] sd 3:0:0:0: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical
...
[2815877.629429] device-mapper: multipath: Failing path 70:48.
[2815881.935175] device-mapper: multipath: Reinstating path 70:48.

Needles to say, that on freshly rebooted node everything looks neat for some time, until there is enough churn in PV provisioning/deletion.

So when on a fresh deployment I see

Feb 27 17:23:53 internal1-worker1 kubelet-wrapper[4888]: I0227 17:23:53.526890    4888 operation_generator.go:416] MountVolume.WaitForAttach entering for volume "dev-apps-report-dev-auth-bmqszy-postgresql-3d721" (UniqueName: "kubernetes.io/iscsi/10.1.72.201:iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13:9") pod "apps-report-dev-auth-bmqszy-postgresql-8cdfd778b-q4knl" (UID: "3d775a9a-1bdf-11e8-9e43-005056bced2e") DevicePath ""
Feb 27 17:23:54 internal1-worker1 kubelet-wrapper[4888]: E0227 17:23:54.154212    4888 nestedpendingoperations.go:264] Operation for "\"kubernetes.io/iscsi/10.1.72.201:iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13:9\"" failed. No retries permitted until 2018-02-27 17:25:56.154183651 +0000 UTC (durationBeforeRetry 2m2s). Error: MountVolume.WaitForAttach failed for volume "dev-apps-report-dev-auth-bmqszy-postgresql-3d721" (UniqueName: "kubernetes.io/iscsi/10.1.72.201:iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13:9") pod "apps-report-dev-auth-bmqszy-postgresql-8cdfd778b-q4knl" (UID: "3d775a9a-1bdf-11e8-9e43-005056bced2e") : 'fsck' found errors on device /dev/dm-10 but could not correct them: fsck from util-linux 2.25.2

and multipath looks like

mpathl (3600a09804d542d744d2b4871417a4b46) dm-10 NETAPP,LUN C-Mode
size=10G features='4 queue_if_no_path pg_init_retries 50 retain_attached_hw_handle' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 6:0:0:9  sdan 66:112 active ready  running
| `- 3:0:0:9  sdam 66:96  active ready  running
`-+- policy='service-time 0' prio=10 status=enabled
  |- #:#:#:#  sdao 66:128 active faulty running
  `- 5:0:0:9  sdal 66:80  active ready  running

I simply don't know where to start looking at. So turning to dm-devel maillist and finding numerous references how LUN reuse should be carefully handled on a server side just fuelled my frustration.

Another thing, which contributed to my belief that it might be LUN reuse or something close, is when it happened first (and I didn't know what to expect) I fsck'ed problematic device with autorepair and it was fixing and reporting pages of inodes, that FS was definitely populated! how it could possible be, if not some data reuse? I might have made a typo and just smashed somebody's else LUN though, we'd never know now.

Unfortunately, despite total chaos in logs everything continue to work and very rarely we see actual pod startup errors, and it is hard to get all the necessary context quickly when it happens.

Now I am trying to reproduce it in slightly more controlled environment, but no luck. There was iSCSI bug in 4.14 kernel, which was super painful to track down, it is fixed now, but unfortunately it means version updates: new kernel 4.14.24, ALUA modules enabled, hyperkube image 1.9.3 (was 1.8.3 on a problematic node), so I am starting over.

And I am hitting problem immediately: kubelet doesn't mount new PVs as multipath device, but as /dev/sd* device. I opened kubernetes/kubernetes#60894 , but why it wasn't the problem before? kubelet code didn't change in that part. I am simply lost.

I'll come back to you once I have something more tangible, which you can reproduce.

Or I'll just give up and just tell devs to nuke PV and pods so that they can be recreated when problem occurs again.

@redbaron
Copy link
Author

redbaron commented Mar 7, 2018

Note the name change for LUN 4. Now, if I try to attach this new LUN (the new LUN 4), everything works as expected:

[3:0:0:4]    disk    NETAPP   LUN C-Mode       8300  /dev/sdk 
    ├── 10.0.207.103:3260-iqn.1992-08.com.netapp:sn.b8339c981bd011e8a900080027358c88:vs.2-lun-4
    │   └── lost+found

why new LUN continues to be /dev/sdk ? where is old /dev/sdk (asking, because kubelet 100% doesn't delete old devices)? what is ID_SERIAL of new SCSI devices?

I done small experiment: created 10 pods with PVCs, deleted them, created again, kubelet seem to have mounted already existing multipath devices from initial creation! but that would mean that new scsi devices are created with same WWIDs! Shouldn't WWID be unique globally and in time?

after first creation:

/dev/mapper/3600a09804d542d744d2b4871417a4c6a /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-10 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a4c6b /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-11 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5245 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-6 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5246 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-7 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a542f /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-25 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5453 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-9 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5454 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-12 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5455 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-13 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5456 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-14 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5457 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-16 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5458 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-18 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5459 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-19 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a545a /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-20 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5461 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-26 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0

After second:

/dev/mapper/3600a09804d542d744d2b4871417a4c6a /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-10 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a4c6b /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-11 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5245 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-6 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5246 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-7 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a542f /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-25 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5453 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-9 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5454 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-12 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5455 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-13 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5456 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-14 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5457 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-16 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5458 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-18 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5459 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-19 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a545a /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-20 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/3600a09804d542d744d2b4871417a5461 /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-26 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/sddf /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-27 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/sddg /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-28 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0

@redbaron
Copy link
Author

redbaron commented Mar 8, 2018

I think I start to understand, what is happening.

  1. LUN X is mapped to host, discovered and sdX device is created
  2. multipathd picks up that device and created multipath device
  3. kubelet mounts multipath device
  4. pod is destroed, PV is destroyed, LUN unmapped
  5. sdX device remains on a node
  6. new LUN X is mapped to host, it is discovered and same sdX device is "reassigned to it"
  7. multipath sees it as path recovering from a failure
  8. same multipath device is mounted into new container.

If multipath doesn't have disable_changed_wwids "yes" in its config, it doesn't detect that scsi device was completely replaced underneath. In theory there should be no outstanding IO after unmount and no data corruption should happen, but in practice when there are 4 paths to same LUN and PV constantly coming on and off across multiple nodes, something is not working quite right occasionally.

@kangarlou
Copy link
Contributor

Thanks for more information! We haven't done much testing around multipathing with Trident in Kubernetes, but I can take advantage of multipathing with Trident for Docker (Trident as a Docker Volume Plugin) using the same SVM and initiator that I used in my examples.

I noticed you had earlier created #69. So can you use multipathing with Trident-provisioned volumes without specifying multiple portals in the PV?

@redbaron
Copy link
Author

redbaron commented Mar 9, 2018

yes, if you login to all portals on a host (which reminds me that kubelet shouldn't logout from iSCSI because because it nukes devices discovered from "host" session too , I'll create a ticket for that) before starting kubelet, rescan on one sessions disovers devices from other sessions too.

In my case I run following commands before kubelet starts (found them with trial and error, might not be optimal):

ExecStartPost=/sbin/iscsiadm -m discovery -t st  -p 10.1.72.201 -p 10.1.74.204
ExecStartPost=/sbin/iscsiadm -m node -L all  -p 10.1.72.201 -p 10.1.74.204 --login

And current multipath config:

          defaults {
            disable_changed_wwids "yes"
            skip_kpartx "yes"
          }
          blacklist {
            devnode "^sda$"
          }

@korenaren
Copy link
Contributor

Plan is to make Kubernetes native iSCSI support better, and that work is underway. We have no intention of shipping our own driver at this time.

@dsmithfv
Copy link

dsmithfv commented Jun 15, 2018

We did some testing and we found this problem is caused by portals not being set in the pv.

iscsi:
fsType: ext4
iqn: iqn.1992-08.com.netapp:sn.12345678910:vs.4
iscsiInterface: default
lun: 6
targetPortal: 10.28.52.128
portals:
- 10.28.52.128
- 10.28.52.129
- 10.28.52.130
- 10.28.52.131
- 10.28.52.136
- 10.28.52.137
- 10.28.52.138
- 10.28.52.139

After manually creating a pv and pvc connected to the Ontap the ISCSI device was removed as expected.
Using Trident to dynamically provision it did not. Is it possible to just update Trident to add the LUNS to the portal section above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants