RHadoop is a collection of five R packages that allow users to manage and analyze data with Hadoop. The packages are regularly tested (and always before a release) on recent releases of the Cloudera and Hortonworks Hadoop distributions and should have broad compatibility with open source Hadoop and mapR's distribution. We normally test on recent Revolution R and CentOS releases, but we expect all the RHadoop packages to work on a recent release of open source R and Linux.
RHadoop consists of the following packages:
Questions: Please participate in our discussion group.
VARhelps using plyrmr in programs. More basic data frame functions in their big data version. Transparent caching of intermediate results. Fast aggregation functions for the small groups case. See the Changelog.
dfs.lsand Avro input format, different default Hadoop settings and bug fixes. See the Changelog.
ungroup, extension packs and improved
count.cols, plus a raft of bug fixes. See the Changelog.
rmr.strfor debugging, better error messages and many bugfixes. See the Changelog.