Basic R interface
There are two ways of assign variables in R, through method assign or with the '=' method. To retrieve an R variable just access it in the R namespace.
# The NULL value
# variable null is NULL. Variable 'null' exists in the R namespace and can be
# access normally in a call to 'eval'
R.eval("null = NULL")
R.eval("print(null)")
> NULL
# Basic integration with R can always be done by calling eval and passing it a valid
# R expression. Creating variable 'r.i' in R.
R.eval("r.i = 10L")
R.eval("print(r.i)")
> [1] 10
R.eval("vec = c(10, 20, 30, 40, 50)")
R.eval("print(vec)")
> [1] 10 20 30 40 50
R.eval("print(vec[1])")
> [1] 10
should "use assign and pull to set and get data from R" do
# Using method assign, to assign NULL to variable 'null' in R namespace.
R.assign("null", nil)
R.eval("print(null)")
> NULL
# Variable 'res' is available only in the Ruby namespace and not in the R namespace.
# a NULL object in R is converted to nil in Ruby.
res = R.pull("null")
p res
> nil
# Assign a value to an R variable, 'n2'.
R.n2 = nil
R.eval("print(n2)")
> NULL
One can access variables created in R namespace by using R.. Variable in R that have a '.' such as 'r.i3' need to have the '.' substituted by '__'
R.eval("r.i3 = 10.235")
R.r__i3.pp
> [1] 10,235
R.eval <<EOF
r.i2 = 10L
print(r.i2)
EOF
Variables created in Ruby can be accessed in an eval clause:
val = "10L"
R.eval <<EOF
r.i3 = #{val}
print(r.i3)
EOF
This example uses a dataset from Baseball-Reference.com. In it, we try to predict the number of wins of a baseball team based on the number of runs allowed (RA) and runs scored (RS). The model tries to see if the runs difference (RD), i.e, RS - RA is a good predictor of the number of wins. The dataset contains data after 2002, but we are only looking at data until 2002, which is the data used for the book Moneyball (Michael Lewis).
R.eval <<EOF
# This dataset comes from Baseball-Reference.com.
baseball = read.csv("baseball.csv")
# Lets look at the data available for Momeyball.
moneyball = subset(baseball, Year < 2002)
# Let's see if we can predict the number of wins, by lookin at
# runs allowed (RA) and runs scored (RS). RD is the runs difference.
# We are making a linear model from predicting wins (W) based on RD
moneyball$RD = moneyball$RS - moneyball$RA
WinsReg = lm(W ~ RD, data=moneyball)
print(summary(WinsReg))
EOF
> Call:
> lm(data = moneyball, formula = W ~ RD)
> Residuals:
> Min 1Q Median 3Q Max
> -14,266 -2,651 0,123 2,936 11,657
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 80,881 0,131 616,675 <0 ***
> RD 0,106 0,001 81,554 <0 ***
> ---
> Signif. codes: 0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
> Residual standard error: 3,939 on 900 degrees of freedom
> Multiple R-squared: 0,8808, Adjusted R-squared: 0,8807
> F-statistic: 6.650,9926 on 1 and 900 DF, p-value: < 0