This R package provides basic connectivity to HBASE, using the Thrift server. R programmers can browse, read, write, and modify tables stored in HBASE. The following functions are part of this package
Installing the package requires that you first install and build Thrift. Once you have the libraries built, be sure they are in a path where the R client can find them (i.e. /usr/lib). This package was built and tested using Thrift 0.8
Here is an example for building the libraries on CentOS:
pkg-config --cflags thrift, returns:
sudo cp /usr/local/lib/libthrift-0.8.0.so /usr/lib/
The Thrift server by default starts on port 9090.
[hbase-root]/bin/hbase thrift start
If you are running on rhbase on a different hostname:port you will have to change how the package is initialized
By default the rhbase uses "native" R serialization (serialize/unserialize) to read and write data from hbase. You can switch this to "raw" (i.e treat everything as a string) serialization by specifying "serialization="raw"" during the initialization of the package
See the sample
/rhbase/pkg/inst/samples/StringSerializer.R for details
In version 1.1 of rhbase, a new function
hb.scan.ex was introduced. This function allows the use of a 'filterString' for Hbase table scans (Hbase 0.92 or >).
Please see the Apache docs (http://hbase.apache.org/book/thrift.html) for details on filterString syntax (be aware that as of this writing, there are some inaccuracies in this documentation).
Hbase/Thrift is very unforgiving if you get the syntax or spelling wrong. An exception will be throw
rhbase<hbScannerOpenFilterEx>:: (TTransportException) No more data to read.
This basically means that the socket connection to the Thrift server is dead. The only way to recover, is to reinitialize your connection
An example of a filterstring has been added to the sample: