Skip to content
Brad Bebee edited this page Feb 13, 2020 · 1 revision


This page captures some information about setting up bigdata over a CentOS 5.3 minimum install and presumes that you have root privileges and will install bigdata to run as root (the latter is not necessary, but that is what is shown here). See the ClusterGuide for more general information on a bigdata cluster install.


Install the following packages. Some of these are optional (telnet, emacs, nfs-utils, ntp).

yum -y install man # optional (man page support).
yum -y install mlocate # optional (used to locate procmail's lockfile, which is at /usr/bin/lockfile).
yum -y install emacs-nox # optional.
yum -y install vixie-cron # install vixie cron (cron is used to manage the bigdata runstate).
yum -y install telnet # optional (useful for testing services and firewall settings)
yum -y install nfs-utils  # optional (used iff you will use NFS for the shared volume).
yum -y install ntp # optional, but highly recommended.
yum -y install subversion # used to checkout bigdata from its SVN repository (only necessary for the main server).


CentOS 5.3 uses an earlier build of sysstat which does not include pidstat, so DO NOT install the RPM. If installed, it must be removed. Then download and install the sysstat rpm as follows. You will have to do this on each node (or you can do it once on a shared volume and then just do 'make install' on each node).

cd /tmp
tar xvfz sysstat-9.0.6.tar.gz
cd sysstat-9.0.6
make install


CentOS 5.3 uses an earlier build of ant, so DO NOT install the RPM. Download and install an appropriate ant binary instead.

cd /tmp
tar xvfz apache-ant-1.8.0RC1-bin.tar.gz
cp -r apache-ant-1.8.0RC1 /usr/java


Linux, like many other operating systems, has a very aggressive posture towards free memory. By default, Linux will allow your applications to occupy no more than 1/2 of the available RAM before it begins to swap things out. You can fix this by turning down the swappiness parameter to ZERO.

sysctl -w vm.swappiness=0

Host Configuration

You MUST be able to resolve the hostnames in the cluster using DNS. Normally someone is administering DNS and you don't have to worry about this. If that is not true, then the easy fix is to edit /etc/hosts to make sure each host in the cluster knows the name and IP associated with all the hosts in the cluster.

Here is a sample /etc/hosts file. Your file must reflect the IP addresses and host names in your cluster.     localhost localhost.localdomain
x.y.z.129     BigData0
x.y.z.130     BigData1
x.y.z.131     BigData2

VNC (optional, "main" host only)

VNC can be used to remotely login to the X-Windows desktop on the machines in the cluster. This can be very useful and it can be done securely using an ssh tunnel. This installs X-Windows, the KDE desktop, and the VNC server. See 1 for more information.

# install X and KDE
yum -y install xorg*
yum -y install xfce*
yum -y update # required to get around kdebase-wallpapers conflict for fc10.
yum -y install kde*

It appears that NetworkManager (the network-manager package) can cause a conflict if you are using static IPs, in which case it should be removed. See and

rpm -qa | grep -i network | egrep -i 'manager|management'

Once you have removed those packages, continue with the vnc install.

Install vnc.

yum -y install vnc-server #(0:4.1.3-1.fc10)

Set the vnc password.


Edit /etc/sysconfig/vncservers. You must define at least one vncserver here. Choose your own display resolution. Use the "-localhost" option to restrict connections to SSH tunnels. The remote machine should port forward local 5901 to remote localhost:5901 and then connect using "localhost:1".

VNCSERVERARGS[1]="-geometry 1280x1024 -nolisten tcp -nohttpd -localhost"

Specify KDE as the display manager by editing /etc/sysconfig/display. This only has effect each time you start vncserver. It will not effect a session which is already running.


Start vncserver and configure the vncserver runlevels.

/etc/init.d/vncserver start
chkconfig vncserver on

Edit ~/.vnc/xstartup

# Uncomment the following two lines for normal desktop:
exec /etc/X11/xinit/xinitrc

See the notes above on how to connect using an ssh tunnel.

NFS (optional, done differently for the NFS server and the clients)

Bigdata requires a shared volume to hold the JARs, configuration files, and similar things. This volume must be mounted by each host in the cluster. One way to do this is to use NFS. This section shows you how to setup NFS while leaving iptables enabled. See 2 and 3 for more details.

Note: Most of these steps are performed only on the node which will provide the NFS service. Once you have everything setup, you can mount that NFS share from the other nodes as specified at the end of this section.

edit /etc/sysconfig/nfs to specify the ports that will be used for the services required to support NFS. These port choices are arbitrary, but the same ports MUST be opened up in the iptables firewall in the next step below.


Modify iptables to open your firewall for nfs on the ports configured in /etc/sysconfig/nfs

/sbin/iptables -I INPUT -m state --state NEW \
   -m tcp -p tcp --dport 111 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m tcp -p tcp --dport 2049 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m tcp -p tcp --dport 48620 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m tcp -p tcp --dport 48621 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m tcp -p tcp --dport 48622 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m tcp -p tcp --dport 48623 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m udp -p udp --dport 111 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m udp -p udp --dport 2049 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m udp -p udp --dport 48620 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m udp -p udp --dport 48621 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m udp -p udp --dport 48622 -j ACCEPT |
/sbin/iptables -I INPUT -m state --state NEW \
   -m udp -p udp --dport 48623 -j ACCEPT

# save the changes to iptables
/etc/init.d/iptables save

Next, edit /etc/hosts.allow and /etc/hosts.deny to restrict access to the NFS services. The /etc/hosts.allow file only needs to be modified on the host actually providing the NFS share. The other hosts will be clients, so they do not need to allow anything.

This example explicitly enumerates the IP addresses which are allowed to access the NFS services, but you can specify these constraints in a variety of ways. See the hosts.allow man page for more details.

edit /etc/hosts.allow

portmap: localhost, x.y.z.129, x.y.z.130, x.y.z.131
lockd: localhost, x.y.z.129, x.y.z.130, x.y.z.131
rquotad: localhost, x.y.z.129, x.y.z.130, x.y.z.131
mountd: localhost, x.y.z.129, x.y.z.130, x.y.z.131
statd: localhost, x.y.z.129, x.y.z.130, x.y.z.131

edit /etc/hosts.deny


Edit /etc/exports on the NFS server. You need to either enumerate all of the IP addresses which can access the NFS share or use a combination of a network address and a bitmask, etc. to accomplish the same ends.

/nas x.y.z.130(rw) x.y.z.131(rw)

Create the directory that you want to export

mkdir /nas

Start NFS on the server.

/etc/init.d/rpcbind start
/etc/init.d/nfslock start
/etc/init.d/nfs start

Set the run levels for NFS (rpcbind and nfslock should already be running).

chkconfig nfs on

NFS is now running on the server. The next steps need to be done for each node in the cluster that will mount that NFS share. Note that you have an option to either mount the NFS share by hand or to have it automount. However, the client can hang if there is a problem with the NFS server or the network connectivity if you choose to automount the NFS share.

You will need to install the set of packages listed at the top of this page, which includes rpcbind and nfs-utils, to start NFS on the client.

# Start NFS on the clients (if not running, then also do chkconfig service on).
/etc/init.d/rpcbind start
/etc/init.d/nfslock start

# ensure the mount point exists.
[ ! -d /nas ] && mkdir /nas

# Make the files on that mount point visible to root on other hosts in the cluster.
chown -R root.wheel /nas

# Either mount the shared volume (not restart safe)
mount -t nfs /nas

# -or-

# Edit fstab and add this line (automount, but will hang if the NFS server is not available): /nas nfs rw,addr= 0 0

Open up the iptables firewall for log4j, zookeeper and jini

If this is necessary in your environment, then see ClusterSetupGuide for information on how to configure the firewall.

Install JDK

Install the JDK on each node in the cluster. The JDK must be installed into the same location on each machine. If you like, you can install it on the shared volume instead. We recommend Sun JDK 1.6.0_16 or better. We have not tested with recent openjdk releases.

Checkout, configure and install bigdata

Now that you have the cluster nodes prepped, please see the ClusterGuide for details on how to checkout, configure and install bigdata.

Clone this wiki locally