Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nfsserver fixes and cleanup #420

Merged
merged 9 commits into from May 13, 2014
Merged

Conversation

davidvossel
Copy link
Contributor

I've been working with the nfsserver and exportfs agents for the past couple of weeks. I'm sure I'll have more fixes/changes, but this current set of changes I have now seems like a good place to start.

Most of what I've done does not affect anyone already using these agents. The only major change here is the removal of the requirement to specify a set of floating ip addresses to the nfsserver agent's nfs_ip option. This 'nfs_ip' option was used in conjunction with the sm_notify/rpc.statd binaries in an attempt to send SM_NOTIFY requests originating from the floating IPs to the nfs clients. There are use-cases where this functionality is neither necessary nor desired. All I did was make nfs_ip argument optional rather than required.

The other changes are minor and mostly only affect the systemd use-case I've been working with for rhel.

…y or else lock recovery fails

sm-notify drops root privileges and executes as the rpc user.
If we do not maintain the statd ownership correctly sm-notify
can not access the lock state data.
If a nfsserver restarts on a node without rebooting, the lock
notifications are not sent out properly because of a stale
sm-notify.pid file. In normal non-ha operation, this is to prevent
sm-notify from notifying clients multiple times, but in an ha
environment we need this to occur.
nfs clients use lock services outside of ha nfs server
Previously the nfsserver agent took in what binary to use
in order to send the NSM lock notifications. We don't need to know
what binary to use. It only makes sense to use sm-notify if rpc.statd
is already running as sm-notify re-notifies the clients of a change.
If rpc.statd is down, it has to be started for NSM locking to occur.
By default rpc.statd is going to invoke sm-notify on initialization.

So, if rpc.statd is down. We use rpc.statd to notify on start.
If rpc.statd is up, we use sm-notify to re-notify clients.

I've also made the nfs_ip argument optional. There are use-cases
where we do not want to bind rpc.statd to the floating ip. If no
nfs_ip is specified, rpc.statd binds to the wildcard address.

One more change to rpc.statd has been made. I removed the ability
to accidently execute rpc.statd in the foreground. rpc.statd never
returns. It is a daemon... This would cause start to timeout if
anyone ever set this option and rpc.statd was used to notify
the clients.
This increases the recommended stop time from 10s
to 120s.  This will help mitigate users running into
the problem where they need to wait out the nfsv4 lease
time during stop, but the stop timeout is too short which
causes the resource to fail. It is very common for the nfsv4
lease timeout to be 90 seconds. Increasing the stop timeout
to 120s gives us plenty of room to work with here for the
majority of use-cases.
@davidvossel
Copy link
Contributor Author

I feel confident in the direction I'm taking here with these patches. I'm going to merge what I have because I already have another set of patches that need to go on top of this work. If my changes happen to cause a regression for your use-case, file a bug and I'll address it.

davidvossel added a commit that referenced this pull request May 13, 2014
@davidvossel davidvossel merged commit ffe7cdf into ClusterLabs:master May 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants