Setup the storage

GlusterFS

GlusterFS is a scalable network filesystem. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks. More on the official website or the project GitHub page.

Within SecurityCloud, we use two volumes: conf and flow. Volume conf works as a shared storage for configuration files and volume flow is a storage for the flow files. Make sure there is enough disk space for the flow files on each node.

Configuration

Configuration options in install.conf for GlusterFS are all mandatory and have gfs_ prefix. Option gfs_conf_brick determines path to the conf brick directory (brick is a place where data are stored). Options gfs_flow_primary_brick and gfs_flow_backup_brick determine paths to the primary and backup brick directories of the flow volume, respectively. Volume flow uses two bricks, because data are stored redundantly on two nodes. Options gfs_conf_mount and gfs_flow_mount determine paths to the mount points of the GlusterFS volumes.

In the exmple below, the paths are set according to the naming convention. It is not mandatory, but we recommend to use the paths from the example configuration. If any of the directories doesn't exist, it will be created by the installation script.

gfs_conf_brick=/data/glusterfs/conf/brick
gfs_flow_primary_brick=/data/glusterfs/flow/brick1
gfs_flow_backup_brick=/data/glusterfs/flow/brick2

gfs_conf_mount=/data/conf
gfs_flow_mount=/data/flow

Start and enable required services

Services have to be running on all the nodes.

#start and enable GlusterFS daemon on CentOS
$ systemctl start glusterd.service
$ systemctl enable glusterd.service

#start and enable GlusterFS daemon on Debian
$ systemctl start glusterfs-server
$ systemctl enable glusterfs-server

Run installation script

On an arbitrary node run:

$ ./install.sh glusterfs

Verify installation

Now you can verify that all the actions were successful and GlusterFS is ready.

GlusterFS services should be running:

$ ps -C glusterd,glusterfs,glusterfsd
  PID TTY          TIME CMD
 7596 ?        00:00:00 glusterd
 8325 ?        00:00:00 glusterfsd
 8550 ?        00:00:00 glusterfs
 8777 ?        00:00:00 glusterfs
 8843 ?        00:00:00 glusterfs
...

Connections between nodes should be established:

$ netstat -tavn | grep "2400[7|8]"
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN
tcp        0      0 10.4.0.25:49144         10.4.0.41:24007         ESTABLISHED
tcp        0      0 127.0.0.1:24007         127.0.0.1:49069         ESTABLISHED
tcp        0      0 10.4.0.25:49142         10.4.0.25:24007         ESTABLISHED
tcp        0      0 10.4.0.25:49149         10.4.0.37:24007         ESTABLISHED
tcp        0      0 127.0.0.1:24007         127.0.0.1:49121         ESTABLISHED
tcp        0      0 10.4.0.25:24007         10.4.0.39:49143         ESTABLISHED
tcp        0      0 127.0.0.1:49121         127.0.0.1:24007         ESTABLISHED
...

All nodes should be present in the trusted pool in a connected state:

$ gluster pool list
UUID                                    Hostname                        State
b6a46565-45c1-4b54-8611-950616cbc765    sub1.example.org          Connected
9435070c-0f2c-40b9-be94-da91c4a4c0d3    sub2.example.org          Connected
609e386e-ca6f-4a89-932f-0d70557bac12    sub3.example.org          Connected
...

Check information about the volumes:

$ gluster volume info conf
Volume Name: conf
Type: Replicate
Volume ID: c37231e4-1e7b-48a7-86db-a3f0635bc6e8
Status: Started
Number of Bricks: 1 x 10 = 10
Transport-type: tcp
Bricks:
Brick1: sub1.example.org:/data/glusterfs/conf/brick
Brick2: sub2.example.org:/data/glusterfs/conf/brick
Brick3: sub3.example.org:/data/glusterfs/conf/brick
...
Options Reconfigured:
network.ping-timeout: 10
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: true

$ gluster volume info flow
Volume Name: flow
Type: Distributed-Replicate
Volume ID: 7c620b8c-8b09-4ada-8a4b-86fd2cc1e263
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: sub1.example.org:/data/glusterfs/flow/brick1
Brick2: sub2.example.org:/data/glusterfs/flow/brick2
Brick3: sub2.example.org:/data/glusterfs/flow/brick1
...
Options Reconfigured:
cluster.nufa: enable
network.ping-timeout: 10
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: true

Check status of the volumes:

$ gluster volume status conf
Status of volume: conf
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sub1.example.org:/data/glusterf
s/conf/brick                                49152     0          Y       9370
Brick sub2.example.org:/data/glusterf
s/conf/brick                                49152     0          Y       9005
Brick sub3.example.org:/data/glusterf
s/conf/brick                                49152     0          Y       8964
...
Self-heal Daemon on sub1.example.org  N/A       N/A        Y       9701
Self-heal Daemon on sub2.example.org  N/A       N/A        Y       9242
Self-heal Daemon on sub3.example.org  N/A       N/A        Y       9201
...
Task Status of Volume conf
------------------------------------------------------------------------------
There are no active volume tasks

$ gluster volume status flow
Status of volume: flow
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sub1.example.org:/data/glusterf
s/flow/brick1                               49153     0          Y       9660
Brick sub2.example.org:/data/glusterf
s/flow/brick2                               49153     0          Y       9201
Brick sub2.example.org:/data/glusterf
s/flow/brick1                               49154     0          Y       9220
...
Self-heal Daemon on sub1.example.org  N/A       N/A        Y       9701
Self-heal Daemon on sub3.example.org  N/A       N/A        Y       9201
Self-heal Daemon on sub2.example.org  N/A       N/A        Y       9242
...
Task Status of Volume flow
------------------------------------------------------------------------------
There are no active volume tasks

Check if the volumes are mounted on all the nodes:

$ mount | grep glusterfs
localhost:/conf on /data/conf type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
localhost:/flow on /data/flow type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

And finally, try to write some data to the volumes. You should be able to access those data from all the nodes:

$ dd if=/dev/urandom of=/data/conf/test.bin bs=4M count=1
$ dd if=/dev/urandom of=/data/flow/test.bin bs=4M count=1
$ ls -l /data/conf/test.bin /data/flow/test.bin
-rw-r--r-- 1 root root 4194304 Jul 28 12:36 /data/conf/test.bin
-rw-r--r-- 1 root root 4194304 Jul 28 12:37 /data/flow/test.bin

The SecurityCloud project is supported by the Technology Agency of the Czech Republic under No. TA04010062 Technology for processing and analysis of network data in big data concept.

Collecting flow records

Provide feedback

Saved searches