Skip to content
This repository has been archived by the owner. It is now read-only.
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
src
README.md

README.md

#Gearpump HBase

Gearpump integration for Apache HBase

Usage

The message type that HBaseSink is able to handle including:

  1. Tuple4[String, String, String, String] which means (rowKey, columnGroup, columnName, value)
  2. Tuple4[Array[Byte], Array[Byte], Array[Byte], Array[Byte]] which means (rowKey, columnGroup, columnName, value)
  3. Sequence of type 1 and 2

Suppose there is a DataSource Task will output above-mentioned messages, you can write a simple application then:

val sink = new HBaseSink(UserConfig.empty, "$tableName")
val sinkProcessor = DataSinkProcessor(sink, "$sinkNum")
val split = Processor[DataSource]("$splitNum")
val computation = split ~> sinkProcessor
val application = StreamApplication("HBase", Graph(computation), UserConfig.empty)

Launch the application

The HBase cluster should run on where Gearpump is deployed. Suppose HBase is installed at /usr/lib/hbase on every node and you already have your application built into a jar file. Then before submitting the application, you need to add HBase lib folder and conf folder into gearpump.executor.extraClasspath in conf/gear.conf, for example /usr/lib/hbase/lib/*:/usr/lib/hbase/conf. Please note only client side's configuration change is needed. After that, you are able to submit the application.

If you need to supply the HBase user for the connection

There are HBase configurations that have authorization configured (some users are allowed to read/write to selected namespaces/tables, some are not).

In this cases you may need to configure the user that connects to HBase.

When creating HBase Sink you can pass UserConfig object. If the object contains "hbase.user" property, the value will be used as the user name for HBase connection.

Working with Kerberized HBase

When the remote HBase is security enabled, a kerberos keytab and the corresponding principal name need to be provided for the gearpump-hbase connector. Specifically, the UserConfig object passed into the HBaseSink should contain {("gearpump.keytab.file", "\$keytab"), ("gearpump.kerberos.principal", "\$principal")}, example code of writing an application to write to secured HBase:

val principal = "gearpump/fully.qualified.domain.name@YOUR-REALM.COM"
val keytabContent = Files.toByteArray(new File("path_to_keytab_file))
val appConfig = UserConfig.empty
      .withString("gearpump.kerberos.principal", principal)
      .withBytes("gearpump.keytab.file", keytabContent)
val sink = new HBaseSink(appConfig, "$tableName")
val sinkProcessor = DataSinkProcessor(sink, "$sinkNum")
val split = Processor[Split]("$splitNum")
val computation = split ~> sinkProcessor
val application = StreamApplication("HBase", Graph(computation), UserConfig.empty)

Note here the keytab file set into config should be a byte array.

You can’t perform that action at this time.