Skip to content
Zubair Saiyed edited this page Sep 7, 2016 · 2 revisions

HBase Dev

Table of Contents

  1. Requirements

  2. Install HBase on all nodes

  3. Configure HBase on all Nodes

  4. Start HBase on the Cluster

4.1. Starting Zookeeper on all nodes

4.2. Starting HDFS on all nodes, from the Master Node

4.3. Starting HBase

  1. Using HBase
## Requirements
  • At least 4 EC2 Instances (see AWS Intro)
  • Hadoop Intro
  • Zookeeper
## Install HBase on all nodes Install the latest stable version of HBase on each node in your Hadoop cluster, including the Master and the RegionServers (i.e. workers).
all-nodes:~$ wget http://apache.mirrors.pair.com/hbase/stable/hbase-1.1.2-bin.tar.gz -P ~/Downloads
all-nodes:~$ sudo tar zxvf ~/Downloads/hbase*.gz -C /usr/local
all-nodes:~$ sudo mv /usr/local/hbase* /usr/local/hbase

Add the following environment variables to the ~/.profile

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin

Be sure to source the profile

all-nodes:~$. ~/.profile
## Configure HBase on all Nodes

Edit the /usr/local/hbase/conf/hbase-site.xml file on all the nodes

all-nodes:~$ sudo nano /usr/local/hbase/conf/hbase-site.xml

Change the bolded text to match your setup

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master-public-dns:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
	<value>zookeeper1-private-DNS,zookeeper2-private-DNS,zookeeper3-private-DNS,zookeeper4-private-DNS</value>
  </property>
</configuration>

For example, this last property might read:

<value>ip-172-31-17-115.ec2.internal,ip-172-31-17-114.ec2.internal,ip-172-31-17-113.ec2.internal,ip-172-31-17-112.ec2.internal</value>

Edit the /usr/local/hbase/conf/hbase-env.sh file on all the nodes

all-nodes:~$ sudo nano /usr/local/hbase/conf/hbase-env.sh

Find the line below and remove the # to uncomment. Change it to /usr

# The java implementation to use.  Java 1.7+ required.
export JAVA_HOME=/usr

Find the line below and remove the # to uncomment. Change it to false so that HBase doesn’t manage Zookeeper (otherwise Zookeeper will stop when HBase stops)

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false

Edit the /usr/local/hbase/conf/regionservers file on all the nodes

all-nodes:~$ sudo nano /usr/local/hbase/conf/regionservers

Remove the localhost and replace it with the private DNS of ONLY the worker nodes. For example:

ip-172-31-17-114.ec2.internal
ip-172-31-17-113.ec2.internal
ip-172-31-17-112.ec2.internal

Edit the /usr/local/hbase/conf/backup-masters file on all the nodes

all-nodes:~$ sudo nano /usr/local/hbase/conf/backup-masters

Remove the localhost and replace it with the private DNS of any worker nodes. These will take over if the master becomes unavailable. For example:

ip-172-31-17-114.ec2.internal

Change ownership of the HBase directory on all nodes

all-nodes:~$ sudo chown -R ubuntu $HBASE_HOME
## Start HBase on the Cluster ### Starting Zookeeper on all nodes

Prior to starting HBase, make sure that Zookeeper is running correctly as described in the Zookeeper Dev, which you can check with:

all-nodes:~$ echo srvr | nc localhost 2181

If you don’t see an output, start Zookeeper with:

all-nodes:~$ sudo /usr/local/zookeeper/bin/zkServer.sh start
### Starting HDFS on all nodes, from the Master Node

You’ll also need HDFS started, which you can check from the Master node:

master-node:~$ hdfs dfs -ls /

If you don’t see an output, start HDFS with:

master-node:~$ sudo $HADOOP_HOME/sbin/start-dfs.sh
### Starting HBase
master-node:~$ sudo $HBASE_HOME/bin/start-hbase.sh

You can go to namenode-public-dns:16010 in your browser to check if it’s working

## Using HBase Begin using the shell by going thru the examples [here](http://hbase.apache.org/book.html#shell_exercises).