RImpala is an R package that helps you to connect and execute distributed queries using Cloudera Impala. Impala supports jdbc integration and this feature is used by RImpala to establish a connection between R and Impala.
To use this package you must also have access to a Hadoop cluster running Cloudera Impala with at least one populated table defined in the Hive Metastore.
###Install JDBC jars for RImpala
- Download the Impala JDBC zip fileto the client machine that you will use to connect to Impala servers.
- Extract the contents of the zip file to a location of your choosing.
- On Linux, you might extract this to a location such as /opt/jars/.
- On Windows, you might extract this to a folder such as C:\Program Files\impala-jars.
- We will use this location in
R CMD INSTALL RImpala-0.1.6.tar.gz
R CMD INSTALL ./RImpala##Loading RImpala and connecting to Impala
Find the ip of the machine and the port where the Impala service is running.
Find the location where you have unziped the JDBC jars in the above section.
library("RImpala") rimpala.init(libs="/path/to/JDBC/jars/") result = rimpala.query("your query");by default rimpala.init() searches "/usr/lib/impala" for the JDBC jars.
Here are links to more information on Cloudera Impala:
- Java (>= 1.5)
- R (>= 2.7.0)
- rJava (>= 0.5-0)
- Impala JDBC driver jars