Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure) / Could not log into all portals #140

Closed
ahgraber opened this issue Dec 23, 2021 · 11 comments

Comments

@ahgraber
Copy link

I am suddenly getting this new error message, seems similar to 112. I had not changed my democratic-csi config since mid-October and got this error starting a few days ago. I'm running democratic-csi chart 0.8.3 with freenas-api-iscsi driver.

MountVolume.MountDevice failed for volume "pvc-5a7349c4-b25e-45a6-9475-c18e4bc7cc6c" : rpc error: code = Internal desc = {"code":19,"stdout":"Logging in to [iface: default, target: iqn.2005-10.org.freenas.ctl:flux-nextcloud-nextcloud-db, portal: 10.2.1.1,3260] (multiple)\n","stderr":"iscsiadm: Could not login to [iface: default, target: iqn.2005-10.org.freenas.ctl:flux-nextcloud-nextcloud-db, portal: 10.2.1.1,3260].\niscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure)\niscsiadm: Could not log into all portals\n"}

Per 112, I have tried running systemctl restart scst on SCALE, although I already had several targets available at the time when I started receiving the error. I also tried restarting SCALE, updating SCALE from TrueNAS-SCALE-22.02-RC.1 to TrueNAS-SCALE-22.02-RC.2, and restarting nodes, to no avail.

LMK if I can hunt down other logs or provide additional config files that might be of use.

@travisghansen
Copy link
Member

Yeah get the scst logs from the server to see what’s going on server side.

@ahgraber
Copy link
Author

ahgraber commented Dec 23, 2021

systemctl status scst reports a bunch of

Dec 23 15:55:47 truenas.mydomain.com iscsi-scstd[1735709]: Connect from 10.2.113.31:35336 to 10.2.1.1:3260
Dec 23 15:55:47 truenas.mydomain.com iscsi-scstd[1735709]: Initiator iqn.1993-08.org.debian:01:4efdaa48c143 not allowed to connect to target iqn.2005-10.org.freenas.>

Edit --
Looks like the freenas box might be denying the connection, but that portal is set to allow all initiators...

Also, two days ago (when I first started seeing these errors), I got a different reason for the error (cannot allocate memory). A reboot seems to have resolved that problem, though.

Dec 21 09:17:18 truenas.mydomain.com iscsi-scstd[13361]: Connect from 10.2.113.31:50938 to 10.2.1.1:3260
Dec 21 09:17:18 truenas.mydomain.com iscsi-scstd[13361]: Can't create sess 0x4941000003d0200 (tid 8, initiator iqn.1993-08.org.debian:01:4efdaa48c143): Cannot allocate memory
Dec 21 09:17:19 truenas.mydomain.com iscsi-scstd[13361]: Connect from 10.2.113.30:53482 to 10.2.1.1:3260

@travisghansen
Copy link
Member

Use juournalctl to get full logs. Probably send over scst.conf file and output of lsmod.

@ahgraber
Copy link
Author

Requested data attached

debug_logs.zip

@travisghansen
Copy link
Member

travisghansen commented Dec 23, 2021

Things generally look sane with the exception of the scst.conf file. It appears the extents have disappeared (for the nextcloud volumes). If you look at the SCALE admin UI how many extents do you see in the list?

Essentially the targets are pointing to non-existent extents in the config file which would explain failures I'm guessing. If you look in the TARGET sections you'll see a line like this LUN 0 flux-nextcloud-nextcloud-db where flux-nextcloud-nextcloud-db matches exactly to a DEVICE name/id in the earlier part of file (which are clearly missing).

If the extents do show up in the admin UI then there's some breakdown in the config file generation process, if the do not show up in the admin UI then it begs the question how/why did they get deleted?

@ahgraber
Copy link
Author

ahgraber commented Dec 23, 2021

I do see the extents in SCALE ui:

Screen Shot 2021-12-23 at 17 30 21

I had an extra extent present in the UI that should have been removed previously. I deleted it and will see if that was offsetting the index somehow and preventing the extents from mapping.

@travisghansen
Copy link
Member

Can you send over a screenshot of the associated targets tab as well?

What exactly do you mean by extra?

@ahgraber
Copy link
Author

ahgraber commented Dec 23, 2021

Targets:

Screen Shot 2021-12-23 at 18 45 55

Associated Targets:

Screen Shot 2021-12-23 at 18 46 01

I had a leftover extent 'flux-vaultwarden-test-config-vaultwarden` where the target zfs volume (and associated pv and pvc) were deleted.

After removing this "leftover" extent, restarting scst service, and renewing the k8s deployment, the error is no longer occurring and the deployment succeeds.

I'm unsure why the extent was left after the targets were removed. Perhaps because the iscsi storageClass and volumeSnapshotClass are set to 'retain', so even if I kubectl delete the PV and PVC, and then zfs destroy the associated volumes, there's something lingering in the iscsi config?

@travisghansen
Copy link
Member

If it was provisioned by this project then the extent should be deleted and for sure everything should get tore down (assuming a delete policy on the pv).

The 2nd issue seems to be that the TrueNAS middleware should more gracefully handle that scenario when generating the config file and ignore invalid entries but continue with valid entries.

@ahgraber
Copy link
Author

With a 'retain' policy, what is the appropriate way to remove?

@travisghansen
Copy link
Member

Just kubectl delete the pv. Retain doesn’t really do anything special other than prevent it from deleting when a bound pvc is deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants