-
Notifications
You must be signed in to change notification settings - Fork 2
Setup the storage
GlusterFS is a scalable network filesystem. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks. More on the official website or the project GitHub page.
Within SecurityCloud, we use two volumes: conf
and flow
. Volume conf
works as a shared storage for configuration files and volume flow
is a storage for the flow files. Make sure there is enough disk space for the flow files on each node.
Configuration options in install.conf
for GlusterFS are all mandatory and have gfs_
prefix. Option gfs_conf_brick
determines path to the conf
brick directory (brick is a place where data are stored). Options gfs_flow_primary_brick
and gfs_flow_backup_brick
determine paths to the primary and backup brick directories of the flow
volume, respectively. Volume flow
uses two bricks, because data are stored redundantly on two nodes. Options gfs_conf_mount
and gfs_flow_mount
determine paths to the mount points of the GlusterFS volumes.
In the exmple below, the paths are set according to the naming convention. It is not mandatory, but we recommend to use the paths from the example configuration. If any of the directories doesn't exist, it will be created by the installation script.
gfs_conf_brick=/data/glusterfs/conf/brick
gfs_flow_primary_brick=/data/glusterfs/flow/brick1
gfs_flow_backup_brick=/data/glusterfs/flow/brick2
gfs_conf_mount=/data/conf
gfs_flow_mount=/data/flow
Services have to be running on all the nodes.
#start and enable GlusterFS daemon on CentOS
$ systemctl start glusterd.service
$ systemctl enable glusterd.service
#start and enable GlusterFS daemon on Debian
$ systemctl start glusterfs-server
$ systemctl enable glusterfs-server
On an arbitrary node run:
$ ./install.sh glusterfs
Now you can verify that all the actions were successful and GlusterFS is ready.
GlusterFS services should be running:
$ ps -C glusterd,glusterfs,glusterfsd
PID TTY TIME CMD
7596 ? 00:00:00 glusterd
8325 ? 00:00:00 glusterfsd
8550 ? 00:00:00 glusterfs
8777 ? 00:00:00 glusterfs
8843 ? 00:00:00 glusterfs
...
Connections between nodes should be established:
$ netstat -tavn | grep "2400[7|8]"
tcp 0 0 0.0.0.0:24007 0.0.0.0:* LISTEN
tcp 0 0 10.4.0.25:49144 10.4.0.41:24007 ESTABLISHED
tcp 0 0 127.0.0.1:24007 127.0.0.1:49069 ESTABLISHED
tcp 0 0 10.4.0.25:49142 10.4.0.25:24007 ESTABLISHED
tcp 0 0 10.4.0.25:49149 10.4.0.37:24007 ESTABLISHED
tcp 0 0 127.0.0.1:24007 127.0.0.1:49121 ESTABLISHED
tcp 0 0 10.4.0.25:24007 10.4.0.39:49143 ESTABLISHED
tcp 0 0 127.0.0.1:49121 127.0.0.1:24007 ESTABLISHED
...
All nodes should be present in the trusted pool in a connected state:
$ gluster pool list
UUID Hostname State
b6a46565-45c1-4b54-8611-950616cbc765 sub1.example.org Connected
9435070c-0f2c-40b9-be94-da91c4a4c0d3 sub2.example.org Connected
609e386e-ca6f-4a89-932f-0d70557bac12 sub3.example.org Connected
...
Check information about the volumes:
$ gluster volume info conf
Volume Name: conf
Type: Replicate
Volume ID: c37231e4-1e7b-48a7-86db-a3f0635bc6e8
Status: Started
Number of Bricks: 1 x 10 = 10
Transport-type: tcp
Bricks:
Brick1: sub1.example.org:/data/glusterfs/conf/brick
Brick2: sub2.example.org:/data/glusterfs/conf/brick
Brick3: sub3.example.org:/data/glusterfs/conf/brick
...
Options Reconfigured:
network.ping-timeout: 10
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: true
$ gluster volume info flow
Volume Name: flow
Type: Distributed-Replicate
Volume ID: 7c620b8c-8b09-4ada-8a4b-86fd2cc1e263
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: sub1.example.org:/data/glusterfs/flow/brick1
Brick2: sub2.example.org:/data/glusterfs/flow/brick2
Brick3: sub2.example.org:/data/glusterfs/flow/brick1
...
Options Reconfigured:
cluster.nufa: enable
network.ping-timeout: 10
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: true
Check status of the volumes:
$ gluster volume status conf
Status of volume: conf
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick sub1.example.org:/data/glusterf
s/conf/brick 49152 0 Y 9370
Brick sub2.example.org:/data/glusterf
s/conf/brick 49152 0 Y 9005
Brick sub3.example.org:/data/glusterf
s/conf/brick 49152 0 Y 8964
...
Self-heal Daemon on sub1.example.org N/A N/A Y 9701
Self-heal Daemon on sub2.example.org N/A N/A Y 9242
Self-heal Daemon on sub3.example.org N/A N/A Y 9201
...
Task Status of Volume conf
------------------------------------------------------------------------------
There are no active volume tasks
$ gluster volume status flow
Status of volume: flow
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick sub1.example.org:/data/glusterf
s/flow/brick1 49153 0 Y 9660
Brick sub2.example.org:/data/glusterf
s/flow/brick2 49153 0 Y 9201
Brick sub2.example.org:/data/glusterf
s/flow/brick1 49154 0 Y 9220
...
Self-heal Daemon on sub1.example.org N/A N/A Y 9701
Self-heal Daemon on sub3.example.org N/A N/A Y 9201
Self-heal Daemon on sub2.example.org N/A N/A Y 9242
...
Task Status of Volume flow
------------------------------------------------------------------------------
There are no active volume tasks
Check if the volumes are mounted on all the nodes:
$ mount | grep glusterfs
localhost:/conf on /data/conf type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
localhost:/flow on /data/flow type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
And finally, try to write some data to the volumes. You should be able to access those data from all the nodes:
$ dd if=/dev/urandom of=/data/conf/test.bin bs=4M count=1
$ dd if=/dev/urandom of=/data/flow/test.bin bs=4M count=1
$ ls -l /data/conf/test.bin /data/flow/test.bin
-rw-r--r-- 1 root root 4194304 Jul 28 12:36 /data/conf/test.bin
-rw-r--r-- 1 root root 4194304 Jul 28 12:37 /data/flow/test.bin
The SecurityCloud project is supported by the Technology Agency of the Czech Republic under No. TA04010062 Technology for processing and analysis of network data in big data concept.