New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENOLCK (no locks available) errors because rpc-statd is not running or GlusterFS is installed #175
Comments
Please follow https://netapp-trident.readthedocs.io/en/master/kubernetes/troubleshooting.html. I would pay close attention to the etcd logs. |
Attached some detail logs as below. Can you give me some advice for me to resolve the issue? [xadmop01@devrepo1 trident-installer]$ ./tridentctl logs -l all -n trident etcd log: goroutine 75 [running]: [xadmop01@devrepo1 trident-installer]$ ./tridentctl install -n trident -d |
We're aware of this issue, and if I'm not mistaken, I've already helped you or your account team determine etcd is the problem when a question was asked using our internal mailing list. My understanding is there is a support case open, so please be patient and wait for the process to go through. In the meantime, we'll update this issue once we have a solution. Thanks! |
We have validated that the problem ( The problem seems to be with obtaining a lock on an NFS file and getting ENOLCK. Any application that issues the flock system call may experience the same problem: Our NFS experts are investigating this problem. |
@ceojinhak To confirm NFS locking is the issue, you can follow these steps:
If flock fails, this confirms NFS locking is the issue. |
Okay, I will try it tomorrow and let you know the result.
2018년 9월 19일 (수) 오전 12:57, Ardalan Kangarlou <notifications@github.com>님이
작성:
… @ceojinhak <https://github.com/ceojinhak> To validate NFS locking is the
issue, you can follow these steps:
1. Manually create an NFS share.
2. Manually mount the NFS share to the host.
3. Create a file on this share.
4. Use flock (man 1 flock) to obtain a lock on this file: (e.g., flock
/mnt/nfs/myfile -c cat).
If flock fails, this confirms NFS locking is the issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#175 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AoYGilTJLrsbrNeQmNU5vHVYxlRCVQxZks5ucRfngaJpZM4Wq5ix>
.
|
I just ran into this while trying to deploy trident to a different k8s cluster. If you have multiple installations of trident on the same netapp instance you will probably want to change |
Thanks @nlowe. As the first comment indicates, this is an issue with our ontap-nas driver and with a new install, so the etcd volume doesn't exist already. However, you're right that the majority of the etcd problems are because users inadvertently share the same etcd volume between different instances of Trident. You can also run |
The customer tested flock and sent the result as below.
# mount xxx.xxx.xxx.xxx:/vol001 /mnt
# touch /mnt/test.txt
# flock /mnt/test.txt -c cat
flock: /mnt/test.txt: No locks available
Does that mean the issue caused by not trident & k8s but NFS protocol? Is
it right?
2018년 9월 19일 (수) 오전 3:29, Ardalan Kangarlou <notifications@github.com>님이 작성:
… Thanks @nlowe <https://github.com/nlowe>. As the first comment indicates,
this is an issue with our ontap-nas driver and with a new install, so the
etcd volume doesn't exist already. However, you're right that the majority
of the etcd problems are because users inadvertently share the same etcd
volume between different instances of Trident. You can also run tridentctl
installl --volume-name to specify a different name for the etcd volume,
but it's a good practice to use different storage prefixes for different
instances of Trident (currently you should only deploy one Trident instance
per k8s cluster).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#175 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AoYGipfAN5NnN9CLFWjjEBOspaQkDoKTks5ucTtzgaJpZM4Wq5ix>
.
|
That's exactly what it means. Something with NFS isn't configured properly in your customer environment. For NFS locking to work,
|
The customer verified the activation of rpc-statd status on all nodes. By the way, he succeeded to install Trident after NFS v4 enable on FAS. Many thanks for your help so far. |
Glad to hear you figured it out! I'm adding some notes for the future reference of anyone who may encounter this problem. Source: https://www.netapp.com/us/media/tr-4067.pdf
Source: http://people.redhat.com/steved/Netapp_NFS_BestPractice.pdf Source: https://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-nfs-client-config-options.html
Source: https://www.centos.org/docs/5/html/Deployment_Guide-en-US/ch-nfs.html
Source: https://wiki.wireshark.org/Network_Lock_Manager
|
We should write this up in the troubleshooting section. |
Hello, I'm running into a similar problem right now. I tried installing/uninstalling the latest trident driver several times. We used the same NetApp SVM before for tests with Openshift and want to reuse it for Kubernetes/Rancher. Is it possible to use the same SVM for more than one cluster? Or how or where can the attributes you mentioned be changed? |
ok, I stumbled across a discussion regarding this issue on a NetApp slack channel. The "first" installation used a standard volume name for etcd storage on the SVM. When a second installation tries to create/use it, an error is raised. In this case a customized installation is necessary. |
Trident 19.07 no longer uses the etcd volume, so this is no longer an issue. |
I tried to install Trident v18.07 on K8s cluster with FAS(ontap-nas).
During the installation, I've got the following error logs with the debug mode.
[xadmop01@devrepo1 trident-installer]$ ./tridentctl install -n trident
INFO Trident pod started. namespace=trident pod=trident-797f547579-d572m
INFO Waiting for Trident REST interface.
ERRO Trident REST interface was not available after 180.00 seconds.
FATA Install failed; exit status 1; Error: Get http://127.0.0.1:8000/trident/v1/version: dial tcp 127.0.0.1:8000: connect: connection refused
command terminated with exit code 1; use 'tridentctl logs' to learn more. Resolve the issue; use 'tridentctl uninstall' to clean up; and try again.
No high latency between k8s nodes and FAS storage. Several re-installations were useless.
What can be the cause of the issue?
The text was updated successfully, but these errors were encountered: