Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Java R Shell
branch: dev-0.1.x

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
crunchR
examples
.gitignore
README.md
pom.xml

README.md

Introduction

CrunchR is an R wrapper for Apache Crunch that allows you to write MapReduce pipelines in R.

Prerequisites

For starters, you want to be running Linux-- we're not quite ready for OS X.

To use Crunch R, you need to have the following R packages installed locally:

  • rJava
  • roxygen2
  • bitops
  • RProtoBuf (optional)

You will also need the protocol buffer compiler, protoc, version 2.4.1 installed on your path.

Another thing is that your map/reducer tasks on the cluster will have to have access to the same as above plus they need to be able to load JRI library. One way to do it is to supply -Djava.library.path to children as in follows

<property>
   <name>mapred.child.java.opts</name>
   <value>-Djava.library.path=/home/dmitriy/R/x86_64-pc-linux-gnu-library/2/rJava/jri </value>
   <final>false</final>
</property>

Perhaps another possible way to install it is just to soft-link the libjri.so into your hadoop native libs folder. You can figure location of libjri.so by running

system.file("jri",package="rJava")

after installing rJava package.

Getting Started

We're still in the phase of getting everything to play nicely together. You can try everything out by running the following:

cd crunchR
./install-snapshot-rpkg.sh

Assuming that works well, there are some example R scripts under the examples directory that you can use to test out loading the crunchR library and running simple pipelines.

License

Apache 2.0

Something went wrong with that request. Please try again.