HBase Kinesis-Enabled Sample Application
This sample application demonstrates how to launch an Apache HBase cluster on Amazon EMR using the AWS SDK for Java as well as how to extend the Amazon Kinesis Connector Library to emit data in real-time to your HBase cluster running on Amazon EMR.
Using the HBase Kinesis-enabled application -- The best way to get started with this sample application is to first read Using HBase on Amazon EMR post in the AWS Big Data Blog. The following instructions show you how to configure, build and execute this sample application. You can checkout, configure and execute this application using your favorite Java IDE configured with a Maven plugin.
This section shows you how to configure your application properties and setup your HBase cluster on Amazon EMR. Application configuration You will need to configure your application properties in EMRHBase.properties. Key properties include:
emrEndpoint: This property allows you to select a regional endpoint to make your requests.
emrClusterIdentifier: This property is optional. You can specify a cluster identifier of an existing Amazon EMR cluster with HBase installed. If this property is left blank, this sample application with automatically launch a new Amazon EMR cluster with HBase installed.
ec2KeyPair: Specify an Amazon EC2 key pair to allow you to SSH into your cluster’s master node.
emrLogUri: Specify an Amazon S3 bucket to store Amazon EMR logs that your cluster generates. These logs are useful for debugging.
HBase on Amazon EMR Configuration Once your HBase cluster on Amazon EMR is available you will need to perform the following steps as described in the blog post:
- SSH into master node and start HBase Rest Server.
- Configure the master node’s security group to accept traffic on port 8080.
- Establish SSH tunnel for dynamic port forwarding on the master node.
- Launch HBase shell.
Running the Sample Application
- Edit EMRHBase.properties as described in the Application Configuration section. Note: HBaseExecutor uses the DefaultAWSCredentialsProviderChain, which looks for credentials supplied by environment variables, system properties, or IAM role on Amazon EC2.
- Confirm the required AWS resources exist or specify that they should be created when the sample is run.
- Execute HBaseExecutor.
- Query your HBase table from the HBase shell to verify that indeed data loading is occurring via this Amazon Kinesis-HBase connector.
Important: These resources will incur charges on your AWS bill. It is your responsibility to delete these resources.