hive-bulkload-hbase

Import hive table into hbase as fast as possible.

Directories

bin: Contains the shell script that starts the program.
src: Contains the source code and the test code.
schema: Contains the schema file of a table.

Compilation

$ mvn clean compile

$ mvn clean package

$ mvn assembly:assembly

Description

HBase gives random read and write access to your big data, but getting your big data into HBase can be a challenge. And there are three methods to be able to make it.

Use the API to put the data one by one.
Hive Integrates HBase. And you can check HBaseIntegration and here
HBase comes with bulk load capabilities.

However, the first two methods is slower than the last method that you simply bypassed the lot and created the HFiles yourself and copied them directly into the HDFS. The HBase bulk load process consists of two steps if Hive and HBase are on one cluster.

HFile preparation via a MapReduce job.
Importing the HFile into HBase using LoadIncrementalHFiles.doBulkLoad(eg. Driver2.java).

But HBase bulk load process consists of three steps if Hive and HBase are on different cluster.

HFile preparation via a MapReduce job.
Copying HFile from Hive cluster to HBase cluster.
Importing the HFile into HBase via HBase commands on HBase cluster.

Usage

The aim of the MapReduce job is to generate HBase date files(HFile) from your input RCFile using HFileOutputFormat. Before you generate HFile, you should get Hive table's schema. And you can make use the following methods to get the schema.

Reading Hive metadata.
- Using JDBC to obtain from MysSQL
- Using HCatalog to obtain from MySQL
Parsing a file that records the schema. In my opinion, it is more efficient than reading metadata, even if a table contains serveral thousands columns.

Output from Mapper class are ImmutableBytesWritable, KeyValue. These classes are used by the subsequent partitioner and reducer to create the HFiles.
There is no need to write your own reducer as the HFileOutputFormat.configureIncrementalLoad() as used in the driver code sets the correct reducer and partitioner up for you.
Then, you should copy generated HFile from one cluster to another if Hive and HBase are on different cluster.

hadoop distcp hdfs://mycluster-hive/hfile/hbase hdfs://mycluster-hbase/hbase/test

Finally, import the File into HBase via HBase commands on HBase cluster.

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /hbase/test hbase_table

Or import the File into HBase via Java code on HBase cluster(eg. Driver2.java).

// Importing the generated HFiles into a HBase table
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(new Path(outputPath, htable);

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
bin		bin
schema		schema
src/main/java/edu/wzm		src/main/java/edu/wzm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

schema

schema

src/main/java/edu/wzm

src/main/java/edu/wzm

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pom.xml

pom.xml

Repository files navigation

hive-bulkload-hbase

Directories

Compilation

Description

Usage

About

Releases

Packages

Languages

License

GatsbyNewton/hive-bulkload-hbase

Folders and files

Latest commit

History

Repository files navigation

hive-bulkload-hbase

Directories

Compilation

Description

Usage

About

Resources

License

Stars

Watchers

Forks

Languages