Full configuration of FredHutch Scratch File System using commodity servers & disks, Ubuntu ZFS and BeeGFS
- describe configuration choices
- only provide details where you are making config changes instead of accepting the defaults
- Prereqs, network install
- Hardware, Diskconfiguration Bios
- OS install, Ubuntu 16.04.2 with Kernel 4.8 & chef bootstrapping with scicomp-base
- Get 4.8+ kernels by running
apt-get install --install-recommends linux-generic-hwe-16.04
- Ubuntu ZFS config, RAID choices and decision
- BeeGFS install (supports Kernel 4.8), Kernel tuning, etc.
- Benchmarks / Discussion
Each BeeGFS storage node is configured with ZFS for several reasons including:
- ease of administration
- well integrated file system and logical volume manager
- integral compression
- sequentialization of disk writes
- and very importantly, elaborate error checking and correction
Each storage node (Supermicro SC847) contains:
- 34 2TB SATA drives (/dev/sd[a-z] + /dev/sda[a-g] intended for ZFS data)
- 2 240GB SSDs (/dev/sda[ij] formatted as mdraid RAID1/ext4 and used for boot/OS)
- 2 200GB SSDs (/dev/sda[kl] intended for ZFS intent log)
- 2 400GB SSDs (/dev/sda[mn] intended for ZFS cache)
Install ZFS utilities from Ubuntu 16.04.1 LTS standard repositories:
apt install zfsutils-linux
In ZFS, RAIDZ2 is much like RAID6 in that 2 drives in each group can fail without risk to data. It requires more computation than RAIDZ or conventional RAID5, but these storage nodes are well configured with available, fast CPU cores.
Note: the following ZFS commands are available in a unified script beegfs-zfs-storage.sh
in this repository
Configure each BeeGFS storage node with 3 RAIDZ2 groups, each group with 11 drives:
zpool create -f chromium_data raidz2 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk
zpool add -f chromium_data raidz2 /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv
zpool add -f chromium_data raidz2 /dev/sdw /dev/sdx /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag
Allocate the remaining SATA drive, /dev/sdah
, as a global spare to the ZFS pool:
zpool add -f chromium_data spare /dev/sdah
Allocate the pair of 200GB SSDs, /dev/sda[kl]
, as ZFS intent log, mirrored for integrity:
zpool add -f chromium_data log mirror /dev/sdak /dev/sdal
Allocate the pair of 400GB SSDs, /dev/sda[mn]
, as ZFS cache:
zpool add -f chromium_data cache /dev/sdam /dev/sdan
The cache pair is individually added to maximize space rather than mirrored as cache is checksummed.
Important note: while drive ids such as /dev/sda are simple to use, they are dynamically assigned and can/will change on reboot if a device is missing via failure. It is strongly recommended to use invariant names from /dev/disk/by-id/ so that RAID groups and especially spares aren't confused during failure modes.
ls -l /dev/disk/by-id (to see correspondence)
zpool remove chromium_data /dev/sdaf (sample spare)
zpool add chromium_data spare /dev/disk/by_id/wwn-0x5000c50094b863c3
With the ZFS pool successfully created and populated, create a file system with LZ4 compression enabled:
zfs create -o compression=lz4 chromium_data/beegfs_data
All nodes - storage, metadata, management (and client) will have BeeGFS installed.
In the described system, metadata and management share the same physical node.
Retrieve the current BeeGFS distribution list for Debian (and related distributions like Ubuntu):
wget http://www.beegfs.com/release/latest-stable/dists/beegfs-deb8.list
Copy the retrieved distribution list file into each node's package sources directory:
cp beegfs-deb8.list /etc/apt/sources.list.d/
Add key (optional but seems wise):
wget -q http://www.beegfs.com/release/latest-stable/gpg/DEB-GPG-KEY-beegfs -O- | apt-key add -
Update the configured repositories:
apt update
Hostnames: chromium-storen
Install storage package(s):
apt install beegfs-storage
Edit /etc/beegfs/beegfs-storage.conf
, altering the 2 following lines to match:
sysMgmtdHost = chromium-meta
storeStorageDirectory = /chromium_data/beegfs_data
Edit /lib/systemd/system/beegfs-storage.service
uncommenting the line containing the PIDFile
settings.
Then reload systemd by running systemctl daemon-reload
.
This will enable proper systemd management of the beegfs-storage service.
Hostname: chromium-meta
Install metadata package(s):
apt install beegfs-meta
Create metadata directory:
mkdir -p /var/beegfs/meta
Edit /etc/beegfs/beegfs-meta.conf
, altering the 2 following lines to match:
sysMgmtdHost = chromium-meta
storeMetaDirectory = /var/beegfs/meta
Edit /lib/systemd/system/beegfs-meta.service
uncommenting the line containing the PIDFile
settings.
Then reload systemd by running systemctl daemon-reload
.
This will enable proper systemd management of the beegfs-meta service.
Hostname: chromium-meta
Install management package(s):
apt install beegfs-mgmtd beegfs-utils
Create storage location for management logging:
mkdir -p /var/beegfs/mgmt
Edit /etc/beegfs/beegfs-mgmt.conf,
, altering the following lines to match:
storeMgmtdDirectory = /var/beegfs/mgmt
logLevel = 3
Edit /lib/systemd/system/beegfs-mgmtd.service
uncommenting the line containing the PIDFile
settings.
Then reload systemd by running systemctl daemon-reload
.
This will enable proper systemd management of the beegfs-mgmtd service.
Note: if client not installed, create /etc/beegfs/beegfs-client.conf
containing:
sysMgmtdHost = chromium-meta
This allows the use of beegfs-ctl
without client installation.
Hostnames: gizmon, rhinon, chromium-meta(different conf)
Install client package(s):
apt-get install beegfs-client beegfs-helperd beegfs-utils
Note: This will pull in many other packages as the client requires a kernel module to be built.
Build client kernel module:
/etc/init.d/beegfs-client rebuild
Edit /etc/beegfs/beegfs-client.conf
, altering the following line to match:
sysMgmtdHost = chromium-meta
Edit /etc/beegfs/beegfs-mounts.conf
to specify where to mount BeeGFS.
The first column is the mount point, e.g. /mnt/beegfs
The second column is the client configuration file for that mount point, e.g. /etc/beegfs/beegfs-client.conf
FhGFS and BeeGFS have different client names, configuration directories/files and kernel modules. With appropriate configuration they can both be installed and function properly on the same node. This is helpful in easing migration from old to new scratch file systems.
BeeGFS 6 should be installed explicitly from packages via dpkg
rather than be added to apt. Though the existing nodes are Ubuntu 14.04.02 LTS (identifying themselves as debian jessie/sid - deb8), BeeGFS deb7 packages must be installed instead due to version mismatches.
Get client packages from BeeGFS site or previously installed node and install in order:
dpkg -i beegfs-opentk-lib_6.11-debian7_amd64.deb
dpkg -i beegfs-common_6.11-debian7_amd64.deb
dpkg -i beegfs-helperd_6.11-debian7_amd64.deb
dpkg -i beegfs-client_6.11-debian7_all.deb
Edit config files to avoid conflicts with existing FhGFS client:
Edit /etc/beegfs/beegfs-client.conf
, altering the following lines to match:
sysMgmtdHost = chromium-meta
connClientPortUDP = 8014
connHelperdPortTCP = 8016
Edit /etc/beegfs/beegfs-helperd.conf
, altering the following line to match:
connHelperdPortTCP = 8016
/etc/init.d/beegfs-mgmtd start
/etc/init.d/beegfs-meta start
/etc/init.d/beegfs-storage start
/etc/init.d/beegfs-helperd start
/etc/init.d/beegfs-client start
There are two utilities that can be used to verify that all BeeGFS components are running and visible. Both should be accessible from the management server.
beegfs-check-servers
will show all reachable node types and their BeeGFS system IDs.
This program is actually a wrapper for the more general beegfs-ctl
.
beegfs-ctl
without parameters will provide a list of its many options. This tool can be used to:
- list cluster components
- delete nodes
- explicitly migrate data between storage servers
- display system stats
- configure redundancy
- run read/write benchmarks
- and much, much more
Further information and debugging can be done with the aid of BeeGFS log files on each node. These have been configured to reside in /var/log/beegfs-[storage,mgmtd,meta] on each server type.
The scratch file system has 3 folders, delete10, delete30, delete90. Files are deleted when mtime, atime AND ctime are greater than 10, 30 or 90 days from the current date. Each folder contains a work folder per Principal Investigator with the naming convention lastname_f
/etc/cron.d/fs-cleaner-scratch has 3 cron jobs that execute fs-cleaner daily for delete10, delete30 and delete90 and removes older files or sends warning emails about files to be deleted. (see https://github.com/FredHutch/fs-cleaner)
/etc/cron.d/new-pi-folders has 3 cron jobs that trigger createPIfolders, an internal shell script that looks for existance of AD security groups and creates folders for each PI that has a security group for accessing the posix file system
This documents the manual work done to add the storage servers from the previous fhgfs cluster as new, fresh storage servers to the current BeeGFS chromium cluster.
Steps to create ZFS:
- install 16.04 on chromium-store[1234]
- install zfs with
apt-get install zfsutils-linux
- configure RAIDZ as above, but different drive devices (not sure entirely why this happened):
zpool create -f chromium_data raidz2 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm
zpool add -f chromium_data raidz2 /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx
zpool add -f chromium_data raidz2 /dev/sdy /dev/sdz /dev/sdaa /dev/sdab /dev/sdac /dev/sdad /dev/sdae /dev/sdaf /dev/sdag /dev/sdah /dev/sdai
- add the remaining spinning drive (/dev/sdaj) as a spare with
zpool add -f chromium_data spare /dev/sdaj
- add ZIL drives with
zpool add -f chromium_data log mirror /dev/sdak /dev/sdal
(same as above) - add the L2ARC drive with
zpool add -f chromium_data cache /dev/sda
- create the zfs with
zfs create -o compression=lz4 chromium_data/beegfs_data
(same as above)
Please refer to earlier comment re the desirability of using invariant device names especially for spares
Steps to install BeeGFS:
- install OS packages as above - mixing minor versions within a major release is OK and supported