Skip to content

Conversation

@hosungsmsft
Copy link

(@rgardler -- Just an informational PR, no need to review. I manually validated that this feature works.)

Implements the highly available NFS cluster as a file server type option. Uses DRBD and Pacemaker/Corosync to implement active-passive (master-slave) failover NFS cluster. Usual/non-cloud way is to use a fixed secondary IP address that can fail over from one machine to the other, but on clouds (at least Azure), even secondary IP address must be assigned to a specific VM on the cloud platform, so we had to use a load balancer-based solution like implemented here.

Huge thanks to @kermat from LINBIT who helped tremendously (even with an Azure dedicated guide document) while I was struggling with getting the HA clustering right.

There are a few things to address in the future:

  • STONITH is disabled, so split-brain can occur. STONITH on clouds requires something more sophisticated, and @kermat already shared an idea
  • Currently the DRBD replication traffic uses the same NIC, which might add congestion to the overall network performance. I'll also need to perform some load testing on this file server option, to see if this is really (or how much) better than Gluster, and if this is comparable to the non-HA NFS option.
  • When testing fail-over, I experienced some prolonged period (about a minute) during which accessing files on NFS share is seemingly hung. I observed that a cluster failover is really quick, but the actual file access after a failover occurs seems delayed. Need to investigate why.
  • Currently the DRBD kernel module is compiled/installed at the deployment time, because Linux-azure kernels don't ship the DRBD kernel module by default. This will be problematic when the kernel is upgraded and the VM is rebooted. We need to make sure the module is available through Azure Ubuntu repos and the module upgrades should always accompany kernel upgrades.

I tried to make the added templates as independent as possible so that they can be added to the Azure quick start templates repo without any modification. I'll do that once we have a few more working test deployment experiences.

Hosung Song and others added 9 commits June 20, 2018 18:33
setup_nfs_ha.sh: Added static port assignments to NFS to allow access
through a loadbalancer. Ports 111 (TCP/UDP), 2049 (TCP/UDP), 2000
(TCP/UDP), 2001 (TCP), and 2002 (UDP) are the ports required.
* Reread sysctl tunables for static NFS port

setup_nfs_ha.sh: Modified to prevent needing a reboot for static
port assignments to take effect.

* Increased DRBD resync performance

setup_nfs_ha.sh: Edited the DRBD configuration to increase the resync
speed. This should decrease the amount of time it takes for the initial
sync to complete.
@hosungsmsft hosungsmsft merged commit 0b6a407 into master Jun 23, 2018
@hosungs hosungs deleted the hs-nfs-ha branch June 23, 2018 03:05
naioja pushed a commit that referenced this pull request Dec 29, 2025
Add highly available NFS option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants