Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

[1632.2.1] kernel deathspin on iSCSI session logout without deleting scsi devices before #2357

Closed
redbaron opened this issue Feb 13, 2018 · 35 comments

Comments

@redbaron
Copy link

Bug

Container Linux Version

$ cat /etc/os-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=1632.2.1
DISTRIB_CODENAME="Ladybug"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 1632.2.1 (Ladybug)"

Environment

OEM=vmware_raw

iSCSI mounted disks

Actual Behavior

Now new containers can be started. Already running containers seem to be fine

Other Information

$ docker ps -a
CONTAINER ID        IMAGE                                      COMMAND                  CREATED              STATUS                     PORTS               NAMES
b52baae24f14        gcr.io/google_containers/pause-amd64:3.0   "/pause"                 About a minute ago   Created                                        k8s_POD_apps-sigpage-dev-email-templat-vgt40i-minio-7458565f84-zxhd4_dev_e1f31859-10d8-11e8-9e43-005056bced2e_5
$ dmesg
[18310.079477] Workqueue: netns cleanup_net
[18310.079551] Call Trace:
[18310.079628]  ? __schedule+0x2e3/0x890
[18310.079699]  schedule+0x32/0x80
[18310.079761]  schedule_preempt_disabled+0xa/0x10
[18310.079844]  __mutex_lock.isra.4+0x29a/0x510
[18310.079958]  ? cleanup_net+0x99/0x2c0
[18310.080029]  cleanup_net+0x99/0x2c0
[18310.080099]  process_one_work+0x144/0x350
[18310.080174]  worker_thread+0x4d/0x3e0
[18310.080254]  kthread+0xfc/0x130
[18310.080317]  ? rescuer_thread+0x310/0x310
[18310.080405]  ? kthread_park+0x60/0x60
[18310.080476]  ? do_syscall_64+0x59/0x1c0
[18310.080551]  ? SyS_exit_group+0x10/0x10
[18310.080623]  ret_from_fork+0x35/0x40
[18678.714037] INFO: task kworker/u24:4:13862 blocked for more than 120 seconds.
[18678.714176]       Not tainted 4.14.16-coreos #1
[18678.714258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18678.714396] kworker/u24:4   D    0 13862      2 0x80000080
[18678.714500] Workqueue: netns cleanup_net
[18678.714576] Call Trace:
[18678.714635]  ? __schedule+0x2e3/0x890
[18678.714707]  schedule+0x32/0x80
[18678.714771]  schedule_preempt_disabled+0xa/0x10
[18678.714854]  __mutex_lock.isra.4+0x29a/0x510
[18678.714935]  ? cleanup_net+0x99/0x2c0
[18678.715027]  cleanup_net+0x99/0x2c0
[18678.715097]  process_one_work+0x144/0x350
[18678.715173]  worker_thread+0x4d/0x3e0
[18678.715243]  kthread+0xfc/0x130
[18678.715305]  ? rescuer_thread+0x310/0x310
[18678.715380]  ? kthread_park+0x60/0x60
[18678.715450]  ? do_syscall_64+0x59/0x1c0
[18678.715523]  ? SyS_exit_group+0x10/0x10
[18678.715595]  ret_from_fork+0x35/0x40
[18801.593885] INFO: task kworker/u24:4:13862 blocked for more than 120 seconds.
[18801.594195]       Not tainted 4.14.16-coreos #1
[18801.594400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[18801.594570] kworker/u24:4   D    0 13862      2 0x80000080
[18801.594665] Workqueue: netns cleanup_net
[18801.594758] Call Trace:
[18801.594810]  ? __schedule+0x2e3/0x890
[18801.594877]  schedule+0x32/0x80
[18801.594935]  schedule_preempt_disabled+0xa/0x10
[18801.595011]  __mutex_lock.isra.4+0x29a/0x510
[18801.595083]  ? cleanup_net+0x99/0x2c0
[18801.595146]  cleanup_net+0x99/0x2c0
[18801.595676]  process_one_work+0x144/0x350
[18801.595861]  worker_thread+0x4d/0x3e0
[18801.595928]  kthread+0xfc/0x130
[18801.596547]  ? rescuer_thread+0x310/0x310
[18801.596617]  ? kthread_park+0x60/0x60
[18801.596692]  ? do_syscall_64+0x59/0x1c0
[18801.596761]  ? SyS_exit_group+0x10/0x10
[18801.596829]  ret_from_fork+0x35/0x40

top top

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                       
21248 root      20   0       0      0      0 R  99.1  0.0  17:00.98 kworker/u24:2                                                                                                                                 
 4098 root      20   0 1527328 169428  69072 S   2.2  0.5   7:58.11 kubelet                                                                                                                                       
  855 root      20   0 1214240  25560   8160 S   1.1  0.1   0:21.64 containerd                                                                                                                                    
 4839 root      20   0   47324  34792  21292 S   1.1  0.1   3:01.51 calico-felix   

Other Information

I'll add more into once I have it. Server is live and I am not rebooting it, so if you want me to collect some diagnostic I can run commands for you

@redbaron
Copy link
Author

redbaron commented Feb 13, 2018

$ echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
$ grep '21248' /sys/kernel/debug/tracing/trace_pipe 
   kworker/u24:2-21248 [003] d.s. 19533.130008: workqueue_queue_work: work struct=ffff9e306be24400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e3069befa00 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19533.641997: workqueue_queue_work: work struct=ffff9e3069a9bc00 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6c1ed800 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19533.642005: workqueue_queue_work: work struct=ffff9e306bcec400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca18000 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19538.249949: workqueue_queue_work: work struct=ffff9e306be24400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e3069befa00 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19538.761944: workqueue_queue_work: work struct=ffff9e3069a9bc00 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6c1ed800 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19538.761956: workqueue_queue_work: work struct=ffff9e306bcec400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca18000 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19538.761960: workqueue_queue_work: work struct=ffff9e30691a9400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca1b600 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.h. 19540.445322: workqueue_queue_work: work struct=ffff9e296eeaf918 function=cfq_kick_queue workqueue=ffff9e2c6da09a00 req_cpu=512 cpu=3
   kworker/u24:2-21248 [003] d.s. 19543.369902: workqueue_queue_work: work struct=ffff9e306be24400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e3069befa00 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19543.881890: workqueue_queue_work: work struct=ffff9e3069a9bc00 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6c1ed800 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19543.881901: workqueue_queue_work: work struct=ffff9e30691a9400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca1b600 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19543.881903: workqueue_queue_work: work struct=ffff9e306bcec400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca18000 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.h. 19544.457798: workqueue_queue_work: work struct=ffff9e296eeaf918 function=cfq_kick_queue workqueue=ffff9e2c6da09a00 req_cpu=512 cpu=3
   kworker/u24:2-21248 [003] d.s. 19546.441862: workqueue_queue_work: work struct=ffffffffc0206580 function=dm_bufio_prefetch [dm_bufio] workqueue=ffff9e296e06f200 req_cpu=512 cpu=3
   kworker/u24:2-21248 [003] d.s. 19548.489844: workqueue_queue_work: work struct=ffff9e306be24400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e3069befa00 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19549.001838: workqueue_queue_work: work struct=ffff9e3069a9bc00 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6c1ed800 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19549.001855: workqueue_queue_work: work struct=ffff9e30691a9400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca1b600 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19549.001858: workqueue_queue_work: work struct=ffff9e306bcec400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca18000 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19553.609797: workqueue_queue_work: work struct=ffff9e306be24400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e3069befa00 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19554.121785: workqueue_queue_work: work struct=ffff9e3069a9bc00 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6c1ed800 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19554.121797: workqueue_queue_work: work struct=ffff9e30691a9400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca1b600 req_cpu=512 cpu=4294967295
   kworker/u24:2-21248 [003] d.s. 19554.121798: workqueue_queue_work: work struct=ffff9e306bcec400 function=iscsi_requeue_task [libiscsi] workqueue=ffff9e2c6ca18000 req_cpu=512 cpu=4294967295

@redbaron
Copy link
Author

$ echo l > /proc/sysrq-trigger

Shows that 21248 is upset

[19806.968294] CPU: 3 PID: 21248 Comm: kworker/u24:2 Not tainted 4.14.16-coreos #1
[19806.968295] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[19806.968302] Workqueue: scsi_wq_3 iscsi_free_session [scsi_transport_iscsi]
[19806.968302] task: ffff9e2c6de19e40 task.stack: ffffbd9711270000
[19806.968305] RIP: 0010:mutex_lock+0x1e/0x30
[19806.968305] RSP: 0018:ffffbd9711273de0 EFLAGS: 00000246
[19806.968306] RAX: 0000000000000000 RBX: ffff9e2c6d827040 RCX: 0000000000000000
[19806.968307] RDX: ffff9e2c6de19e40 RSI: 0000000000000206 RDI: ffff9e2c6d827040
[19806.968307] RBP: ffff9e2c6c3a9000 R08: 0000000000000001 R09: 0000000000000000
[19806.968308] R10: ffffbd9703263e38 R11: 0000000000000358 R12: ffff9e2c6d827000
[19806.968308] R13: ffff9e306be25128 R14: ffff9e2c6c3a9000 R15: ffff9e2c6d827010
[19806.968309] FS:  0000000000000000(0000) GS:ffff9e2c6fcc0000(0000) knlGS:0000000000000000
[19806.968309] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19806.968310] CR2: 000000c420276748 CR3: 000000012d00a003 CR4: 00000000001606e0
[19806.968333] Call Trace:
[19806.968347]  scsi_remove_device+0x19/0x60 [scsi_mod]
[19806.968354]  scsi_remove_target+0x167/0x1b0 [scsi_mod]
[19806.968355]  iscsi_free_session+0x383/0x430 [scsi_transport_iscsi]
[19806.968366]  process_one_work+0x144/0x350
[19806.968367]  worker_thread+0x4d/0x3e0
[19806.968369]  kthread+0xfc/0x130
[19806.968370]  ? rescuer_thread+0x310/0x310
[19806.968371]  ? kthread_park+0x60/0x60
[19806.968372]  ? do_syscall_64+0xe9/0x1c0
[19806.968374]  ret_from_fork+0x35/0x40

@redbaron
Copy link
Author

Somewhat related https://patchwork.kernel.org/patch/7987711/

@redbaron
Copy link
Author

Difference between "stable" nodes:

  1. Container Linux 1632.2.1 vs 1576.3.0
  2. Kubelet 1.9.3 vs 1.8.4

It seems I can reliably reproduce test case. All is needed is to run POD with PersistentVolumeClaim, which then dynamically provisions PersistentVolume (we use netapp/trident for that), then delete the pod so that no more iSCSI volumes are used on the servers. AFAIK kubelet then triggers iSCSI session disconnect, which might be triggering some bug in the kernel

@redbaron
Copy link
Author

$ echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_conn
$ echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh
$ echo 1>  /sys/module/libiscsi/parameters/debug_libiscsi_session

Doesn't show any iscsi commands which are different from freshly started and cordoned node (== no iSCSI volumes mounted, but iSCSI discovery is completed and disks are seen in multipath -ll):

$ dmesg -wH
...

[  +5.110978]  connection3:0: iscsi_check_transport_timeouts Sending nopout as ping
[  +0.000157]  connection3:0: iscsi_check_transport_timeouts Setting next tmo 4296848776
[  +0.000155]  connection1:0: iscsi_check_transport_timeouts Sending nopout as ping
[  +0.000136]  connection1:0: iscsi_check_transport_timeouts Setting next tmo 4296848776
[  +0.000140]  connection4:0: iscsi_check_transport_timeouts Sending nopout as ping
[  +0.000282]  connection4:0: iscsi_check_transport_timeouts Setting next tmo 4296848776
[  +0.000169]  connection2:0: iscsi_check_transport_timeouts Sending nopout as ping
[  +0.000174]  connection2:0: iscsi_check_transport_timeouts Setting next tmo 4296848777
[  +0.000166]  session3: iscsi_prep_mgmt_task mgmtpdu [op 0x0 hdr->itt 0x1e datalen 0]
[  +0.000008]  session1: iscsi_prep_mgmt_task mgmtpdu [op 0x0 hdr->itt 0x21 datalen 0]
[  +0.000008]  session4: iscsi_prep_mgmt_task mgmtpdu [op 0x0 hdr->itt 0x59 datalen 0]
[  +0.000005]  session2: iscsi_prep_mgmt_task mgmtpdu [op 0x0 hdr->itt 0x2e datalen 0]
[  +0.000665]  session4: __iscsi_complete_pdu [op 0x20 cid 0 itt 0x59 len 0]
[  +0.000143]  session4: iscsi_complete_task complete task itt 0x59 state 3 sc           (null)
[  +0.000152]  session4: iscsi_free_task freeing task itt 0x59 state 1 sc           (null)
[  +0.000170]  session1: __iscsi_complete_pdu [op 0x20 cid 0 itt 0x21 len 0]
[  +0.000123]  session1: iscsi_complete_task complete task itt 0x21 state 3 sc           (null)
[  +0.000151]  session1: iscsi_free_task freeing task itt 0x21 state 1 sc           (null)
[  +0.000201]  session2: __iscsi_complete_pdu [op 0x20 cid 0 itt 0x2e len 0]
[  +0.000154]  session2: iscsi_complete_task complete task itt 0x2e state 3 sc           (null)
[  +0.000154]  session2: iscsi_free_task freeing task itt 0x2e state 1 sc           (null)
[  +0.000237]  session3: __iscsi_complete_pdu [op 0x20 cid 0 itt 0x1e len 0]
[  +0.000123]  session3: iscsi_complete_task complete task itt 0x1e state 3 sc           (null)
[  +0.000152]  session3: iscsi_free_task freeing task itt 0x1e state 1 sc           (null)
[  +0.004162]  connection1:0: iscsi_check_transport_timeouts Setting next tmo 4296848778
[  +0.000148]  connection4:0: iscsi_check_transport_timeouts Setting next tmo 4296848778
[  +0.000850]  connection3:0: iscsi_check_transport_timeouts Setting next tmo 4296848780
[  +0.000144]  connection2:0: iscsi_check_transport_timeouts Setting next tmo 4296848779

@redbaron
Copy link
Author

enabling scsi full logging (from https://github.com/hreinecke/sg3_utils/blob/master/scripts/scsi_logging_level)

$ echo 460175067 > /sys/module/scsi_mod/parameters/scsi_logging_level

shows no commands going to iscsi devices, just occasional sda writes, so it is not some sort of retry or SAN error

@redbaron
Copy link
Author

kubelet errors:

$ journalctl -b0  -u kubelet |grep iscsi.*go:
Feb 13 19:39:38 erepnk13 kubelet-wrapper[4119]: E0213 19:39:38.596868    4119 iscsi_util.go:338] iscsi: failed to mount iscsi volume /dev/dm-9 [ext4] to /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-9, error mount failed: exit status 32
Feb 13 19:39:54 erepnk13 kubelet-wrapper[4119]: E0213 19:39:54.355119    4119 iscsi_util.go:338] iscsi: failed to mount iscsi volume /dev/dm-14 [ext4] to /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-12, error mount failed: exit status 32
Feb 13 19:42:26 erepnk13 kubelet-wrapper[4119]: W0213 19:42:26.055943    4119 iscsi_util.go:353] Warning: Unmount skipped because path does not exist: /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-/dev-apps-sigpage-dev-email-templat-qbf8m5-minio-9ec04:3260-dev-apps-sigpage-dev-email-templat-qbf8m5-minio-9ec04-lun-0
Feb 13 19:42:29 erepnk13 kubelet-wrapper[4119]: I0213 19:42:29.127381    4119 iscsi_util.go:408] iscsi: log out target 10.1.72.201:3260 iqn iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13 iface default
Feb 13 19:42:29 erepnk13 kubelet-wrapper[4119]: W0213 19:42:29.664023    4119 iscsi_util.go:353] Warning: Unmount skipped because path does not exist: /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-/dev-apps-sigpage-dev-email-templat-qbf8m5-postgresql-9ec8c:3260-dev-apps-sigpage-dev-email-templat-qbf8m5-postgresql-9ec8c-lun-0
Feb 13 19:43:00 erepnk13 kubelet-wrapper[4119]: E0213 19:43:00.953910    4119 iscsi_util.go:338] iscsi: failed to mount iscsi volume /dev/dm-9 [ext4] to /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-9, error mount failed: exit status 32

@redbaron
Copy link
Author

redbaron commented Feb 13, 2018

Remaining mounts on the host:

/dev/mapper/mpathr /var/lib/rkt/pods/run/9ef1244e-c5c3-4379-ab5a-ab2a88fb03f9/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-9 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/mpathr /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/10.1.72.201:3260-iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13-lun-9 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/mpathr /var/lib/rkt/pods/run/9ef1244e-c5c3-4379-ab5a-ab2a88fb03f9/stage1/rootfs/opt/stage2/hyperkube/rootfs/var/lib/kubelet/pods/13f7b778-10f6-11e8-9e43-005056bced2e/volumes/kubernetes.io~iscsi/dev-apps-sigpage-dev-email-templat-qbf8m5-minio-13f41 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0
/dev/mapper/mpathr /var/lib/kubelet/pods/13f7b778-10f6-11e8-9e43-005056bced2e/volumes/kubernetes.io~iscsi/dev-apps-sigpage-dev-email-templat-qbf8m5-minio-13f41 ext4 rw,seclabel,relatime,stripe=16,data=ordered 0 0

rkt pod is kubelet

@redbaron
Copy link
Author

Running command which kubelet tries to run:

$ iscsiadm -m node  -p 10.1.72.201 3260 -T iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13 --logout -I default
Logging out of session [sid: 1, target: iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13, portal: 10.1.72.201,3260]
iscsiadm: Could not logout of [sid: 1, target: iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13, portal: 10.1.72.201,3260].
iscsiadm: initiator reported error (9 - internal error)
iscsiadm: Could not logout of all requested sessions

iscsid logs:

Feb 13 20:42:20 erepnk13 iscsid[884]: session in invalid state for logout. Try again later

I deleted all scsi devices from that session manually:

for d in $(iscsiadm -m session -P3 |grep -A 1 scsi3  |awk '/scsi disk/ {print $4}'); do echo 1 > /sys/block/$d/device/delete; done

Now session looks like following:

iscsiadm -m session -r 1 -P3
iSCSI Transport Class version 2.0-870
version 2.0-873
Target: iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13
        Current Portal: 10.1.72.201:3260,1046
        Persistent Portal: 10.1.72.201:3260,1046
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.2016-04.com.coreos.iscsi:erepnk13
                Iface IPaddress: 10.1.72.7
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 1
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE
                *********
                Timeouts:
                *********
                Recovery Timeout: 5
                Target Reset Timeout: 30
                LUN Reset Timeout: 30
                Abort Timeout: 15
                *****
                CHAP:
                *****
                username: <empty>
                password: ********
                username_in: <empty>
                password_in: ********
                ************************
                Negotiated iSCSI params:
                ************************
                HeaderDigest: None
                DataDigest: None
                MaxRecvDataSegmentLength: 262144
                MaxXmitDataSegmentLength: 65536
                FirstBurstLength: 65536
                MaxBurstLength: 65536
                ImmediateData: Yes
                InitialR2T: No
                MaxOutstandingR2T: 1
                ************************
                Attached SCSI devices:
                ************************
                Host Number: 3  State: running

@redbaron
Copy link
Author

redbaron commented Feb 14, 2018

It seems that nothing needs to be mounted. I masked kubelet, rebooted node. Simple logout run iscsiadm -m node -p 10.1.72.201 3260 -T iqn.1992-08.com.netapp:sn.440d6a6ad6a311e79b3500a0981e9162:vs.13 --logout -I default sends it into death spiral

Debug iscsid entry on logout:

Feb 14 17:21:36 erepnk13 iscsid[872]: poll result 1
Feb 14 17:21:36 erepnk13 iscsid[872]: in kunbind_session
Feb 14 17:21:36 erepnk13 iscsid[872]: in __kipc_call
Feb 14 17:21:36 erepnk13 iscsid[872]: in kwritev
Feb 14 17:21:36 erepnk13 iscsid[872]: in nlpayload_read
Feb 14 17:21:36 erepnk13 iscsid[872]: in nlpayload_read

@redbaron
Copy link
Author

Logging out from all targets sends 4 kworkers into infinite spin. same as iscsiadm -m node --logoutall=all

Found safe sequence to logout, which doesn't cause problems and succeeds:

multipath -F
for d in $(iscsiadm -m session -P3 |awk '/scsi disk/ {print $4}'); do echo 1 > /sys/block/$d/device/delete; done
iscsiadm -m node --logoutall=all

@redbaron redbaron changed the title [1632.2.1] Docker containers stuck in "Created" + "Workqueue: netns cleanup_net" hung task [1632.2.1] kernel deathspin on iSCSI session logout without deleting scsi devices before Feb 15, 2018
@redbaron
Copy link
Author

Asked for help in open-iscsi maillist: https://groups.google.com/forum/#!topic/open-iscsi/Tc6ERb1QOBg

@redbaron
Copy link
Author

"Bisecting":

  • 1576.3.0 (4.13.16-coreos-r1) - OK
  • 1576.4.0 (4.14.11-coreos) - FAIL

Interesting observation, on 1632.2.1 (didn't try other) on clean logout (multipath -F, delete scsi devices, iscsiadm logout):

$ lsmod |grep '^scsi_dh_alua'
scsi_dh_alua           20480  92

which is exactly 23 (number of LUNs) * 4 (number of portals) , so even scsi devices are gone, alua reference count didn't change.

@redbaron
Copy link
Author

Hm, interesting! masking multipathd , rebooting and coming back - logout works just fine. Without multipathd even starting iscsi_tcp module reference count is 4 (== number of iSCSI portals), but with it running it is 92...

@redbaron
Copy link
Author

With multipathd running, while kworkers are spinning on CPU, iscsi_tcp reference count slooowly going down (starting from 188 for some reason :( ):

iscsi_tcp              20480  151

@redbaron
Copy link
Author

"Bisecting":
1576.3.0 (4.13.16-coreos-r1) - OK
1590.0.0 (4.14.0-rc8-coreos) - FAIL

@redbaron
Copy link
Author

Reproduced on fedora. Fedora kernels print more info to dmesg:

https://bugzilla.redhat.com/show_bug.cgi?id=1546228

@redbaron
Copy link
Author

git bisect points to fbce4d97fd ("scsi: fixup kernel warning during rmmod()") , posted findings to lkml: https://lkml.org/lkml/2018/2/18/89

@redbaron
Copy link
Author

redbaron commented Feb 19, 2018

cherry-picking commit 81b6c9998979 ('scsi: core: check for device state in __scsi_remove_target()') on top of 4.14.20 fixes the problem for me

@dm0-
Copy link

dm0- commented Feb 19, 2018

Thanks for tracking this down. I'll cherry-pick the commit onto our 4.14.20 branch so it should be in the next releases.

@redbaron
Copy link
Author

I suppose there are no reasons why this patch can't be cherry-picked to 4.14 stable branch by upstream. I'll ask them, lets see what they say

@redbaron
Copy link
Author

@dm0- patch was pulled into 4.14-stable branch and will be released in 4.14.21: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=a9ca15212fc81273f4f9f5efdececda0d4c06808

@dm0-
Copy link

dm0- commented Feb 21, 2018

Thanks, it looks like it will be released Friday, so I will update to it then to get it into next week's alpha.

@redbaron
Copy link
Author

Any chance kernel can be updated in stable without rest of updates?

@dm0-
Copy link

dm0- commented Feb 21, 2018

Yes, we update the LTS kernel in all channels. The next stable minor release isn't scheduled, though, so the kernel update will be queued for release whenever there is an important security issue or bug fix that warrants it. (Unless a beta gets promoted to stable first, which happens in about five weeks.)

@redbaron
Copy link
Author

@dm0- , is there an easy way to rebuild stable image with 4.14.21 kernel locally?

@dm0-
Copy link

dm0- commented Feb 26, 2018

You could run this in the SDK:

repo init -b build-1632
repo sync
./build_packages
./build_image prod

Then use the image_to_vm script if necessary to build a platform-specific image format.

@dm0-
Copy link

dm0- commented Feb 26, 2018

Oh, and I forgot our test images are public, so you could download this image (which also has a signature): http://builds.developer.core-os.net/boards/amd64-usr/1632.3.0%2Bjenkins2-build-1632%2Blocal-1195/coreos_production_image.bin.bz2 . Note that is stable with 4.14.22.

@redbaron
Copy link
Author

Interesting! so I don't need to build anything? :) Where is full index of test images? http://builds.developer.core-os.net/boards/amd64-usr/ doesn't have 1632.3.0, even if your full URL works.

@dm0-
Copy link

dm0- commented Feb 26, 2018

Sorry, indexes are not generated for the development bucket, and unfortunately the directory names are unpredictable due to Jenkins build numbers being included in the OS version. There also is no easy way to determine the changes in a development build without access to an internal CI interface. (You could technically look up the build number in https://github.com/coreos/manifest-builds/tags and trace back manifest commits if you were so determined.)

Images for production releases such as 1632.3.0 are in a different bucket (which does generate indexes): https://stable.release.core-os.net/amd64-usr/1632.3.0

@euank
Copy link
Contributor

euank commented Feb 27, 2018

The bucket doesn't have up-to-date html indexes generated, but it's a google-cloud-storage bucket that allows anonymous read.

It can be accessed via the google cloud web console (https://console.cloud.google.com/storage/browser/builds.developer.core-os.net/boards/amd64-usr) or via gsutil ls and such commands.

The console link requires some form of google login, but the api doesn't (e.g. the json list of objects works with no auth).

Of course, none of the above is documented... maybe we should generate indexes again...

@dm0-
Copy link

dm0- commented Feb 27, 2018

I'm going to close this issue since the fixed kernel version is queued in all channels, and beta and alpha are releasing tomorrow.

@r7vme
Copy link

r7vme commented Apr 13, 2018

We are hitting exactly the same issue. Big thanks to @redbaron for digging this down.

@dm0-
Copy link

dm0- commented Apr 13, 2018

@r7vme Fixed kernels are in every channel now, so you might be seeing a different problem. Do you have error messages or backtraces?

@r7vme
Copy link

r7vme commented Apr 16, 2018

@dm0- We are on older kernel 4.14.11-coreos. Yes, we are planning to upgrade to latest stable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants