Skip to content


piccolbo edited this page · 129 revisions


  • 24/1/2012 - Merged branch binary-io into dev. This decision was hastened by the discovery of a R bug affecting the performance of large reduces in all branches for which a workaround has been developed. Unfortunately the patch developed for binary-io couldn't be backported, thus the decision to accelerate the merger of the binary-io into dev. While dev passes all checks in the usual testing, the new binary-io features are still under development. Please consider them more experimental than what normally goes into dev. Also please note some non-backward compatible changes in the API intended to strike a compromise between flexibility and ease of use in the IO department.
  • 12/7/2011 - Version 1.1 of the package rmr is available. See the Changelog for details.
  • 9/29/2011 - Version 1.0.1 available - fixes some minor defects with R CMD check tests on the packages
  • 8/10/2011 - Wiki gone live


RHadoop is a collection of three R packages that allow users to manage and analyze data with Hadoop. The packages have been implemented and tested in Cloudera's distribution of Hadoop (CDH3). and R 2.13.0. THe packages have also been tested with Revolution R 4.3 and 5.0

RHadoop consists of the following packages:

rmr - functions providing Hadoop MapReduce functionality in R
rhdfs - functions providing file management of the HDFS from within R
rhbase - functions providing database management for the HBase distributed database from within R

More information about RHadoop

Overview of RHadoop, from the Revolution Analytics blog.

Slides and Replay of 30-minute presentation about RHadoop, "Leveraging R in Hadoop Environments".

Contribute to the RHadoop project

Questions: you can use the above address or, if you don't mind sharing your question with everyone, just create a new issue and tag it as type-question.

Something went wrong with that request. Please try again.