RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala
R Java Perl Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
install
java
src
LICENSE
README.md
RImpala-demo.R
build-package.sh
impala-jdbc-cdh5.zip

README.md

#RImpala

RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala. Impala supports jdbc integration and this feature is used by RImpala to establish a connection between R and Impala.

##Installating RImpala

To use this package you must also have access to a Hadoop cluster running Cloudera Impala with at least one populated table defined in the Hive Metastore.

###Install JDBC jars for RImpala

  • Download the Impala JDBC zip fileto the client machine that you will use to connect to Impala servers.
  • Extract the contents of the zip file to a location of your choosing. For example:
    • On Linux, you might extract this to a location such as /opt/jars/.
    • On Windows, you might extract this to a folder such as C:\Program Files\impala-jars.
  • We will use this location in rimpala.init()

###Install RImpala

  1. Compressed package: R CMD INSTALL RImpala-0.1.6.tar.gz

  2. Source code: R CMD INSTALL ./RImpala ##Loading RImpala and connecting to Impala

  3. Find the ip of the machine and the port where the Impala service is running.

  4. Find the location where you have unziped the JDBC jars in the above section.

  5. Launch R

  6. library("RImpala") rimpala.init(libs="/path/to/JDBC/jars/") result = rimpala.query("your query"); by default rimpala.init() searches "/usr/lib/impala" for the JDBC jars.

Here are links to more information on Cloudera Impala:

##Requirements