R from NodeJS, the right way.
hordes
can be installed from npm with
npm install hordes
Maybe you don't have time to read the background and you just want to jump straight to the examples:
hordes
makes R available from NodeJS.
The general idea of hordes
is that NodeJS is the perfect tool when it comes to HTTP i/o, hence we can leverage the strength of this ecosystem to build Web Services that can serve R results.
For example, if you have a web service that needs authentication, using hordes
allows to reuse existing NodeJS modules, which are widely used and tested inside the NodeJS ecosystem.
Another good example is NodeJS native cluster mode, and external modules like pm2
which are designed to launch your app in a multicore mode, and also that watches that your app is still running continuously, and relaunches it if one of the process stop (kind of handy for a production application that handle a lot of load).
It also makes things easier when it comes to mixing various languages in the same API: for example, you can serve standard html on an endpoint, and R on others.
And don't get me started on scaling NodeJS applications.
From the R point of view, the general idea with hordes
is that every R function call should be stateless.
Keeping this idea in mind, you can build a package where functions are to be considered as 'endpoints' which are then called from NodeJS.
In other words, there is no "shared-state" between two calls to R—if you want this to happen, you should either register the values inside Node, save it on disk, or use a database as a backend (which should be the preferred solution if you ask me).
Examples below will probably make this idea clearer.
The hordes
module contains the following functions:
The library()
and mlibrary()
functions will be talking to RServe through node-rio.
You can either launch Rserve by hand, or from Node by calling hordes_init()
at the top of your script if you want to lauch it.
You can serve several instances of RServe, by calling hordes_init(port = XXX)
where XXX
is a port.
That also mean that you can open and call several instances of RServe using a Node load balancer.
library
behaves as R library()
function, except that the output is a JavaScript object with all the functions from the package.
For example, library("stats")
will return an object with all the functions from {stats}
.
By doing const stats = library("stats");
, you will have access to all the functions from {stats}
, for example stats.lm()
.
Note that if you want to call functions with dot (for example
as.numeric()
), you should do it using the[
notation, not the dot one (i.ebase['as.numeric']
, notbase.as.numeric
).
Calling stats.lm("code")
will launch R, run stats::lm("code")
and return the output to Node.
// Here, we suppose you already have Rserve running in the background on port 6311
const {library} = require('hordes');
const stats = library(package = "stats");
stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
.then((e) => console.log(e.join("\n")))
.catch((err) => console.error(err))
Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234
As they are promises, you can use them in an async/await pattern or with then/catch
.
The rest of this README will use async/await
const { library, hordes_init } = require('hordes');
const stats = library("stats");
(async() => {
// You can ignore this if you already have Rserve running on the background
await hordes_init();
try {
const a = await stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
console.log(a.join("\n"))
} catch (e) {
console.log(e)
}
try {
const a = stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
const b = stats.lm("Sepal.Length ~ Petal.Width, data = iris")
const ab = await Promise.all([a, b])
console.log(ab[0].join("\n"))
console.log(ab[1].join("\n"))
} catch (e) {
console.log(e)
}
})();
Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234
Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234
Call:
stats::lm(formula = Sepal.Length ~ Petal.Width, data = iris)
Coefficients:
(Intercept) Petal.Width
4.7776 0.8886
By default, these functions will return an array of characters, corresponding to the output of R.
If you want to return one of the types supported by node-rio
, you can specify capture_output = false
in the library()
function: that could improve the performance of your application if you have a lot of load.
mlibrary
does the same job as library
except the functions are natively memoized.
This is probably the mode you will want to use on a regular basis, unless your data are changing regularly.
const {library, mlibrary} = require('hordes');
const base = library("base");
const mbase = mlibrary("base");
(async () => {
try {
const a = await base.sample("1:100, 5")
console.log("a:", a)
const b = await base.sample("1:100, 5")
console.log("b:", b)
} catch(e){
console.log(e)
}
try {
const a = await mbase.sample("1:100, 5")
console.log("a:", a)
const b = await mbase.sample("1:100, 5")
console.log("b:", b)
} catch(e){
console.log(e)
}
}
)();
a: [1] 49 13 37 25 91
b: [1] 5 17 68 26 29
a: [1] 96 17 6 4 75
b: [1] 96 17 6 4 75
If you want to exchange data between R and NodeJS, you can rely on the default node-rio
, that can share a series of formats (string, numbers...), by passing {capture_output: false}
as the option parameter to library()
.
Otherwise, the function calls will return a string, so use an interchangeable format that can be converted in Node: JSON, arrow, base64 for images, raw strings...
const {library} = require('hordes');
const jsonlite = library("jsonlite");
const base = library("base");
(async () => {
await hordes_init();
try {
const a = await jsonlite.toJSON("iris")
console.log(JSON.parse(a)[0])
} catch(e){
console.log(e)
}
try {
const b = await base.cat("21")
console.log(parseInt(b) * 2)
} catch(e){
console.log(e)
}
}
)();
{
'Sepal.Length': 5.1,
'Sepal.Width': 3.5,
'Petal.Length': 1.4,
'Petal.Width': 0.2,
Species: 'setosa'
}
42
Note that there is a hordes
R package here on the r-hordes folder, and that it contains some functions to facilitate the data translation.
It can be installed with
remotes::install_github("colinfay/hordes", subdir = "r-hordes")
For example, to share images, you can create a function in a package (here named "{hordex}
") that does:
ggpoint <- function(n) {
gg <- ggplot(iris[1:n, ], aes(Sepal.Length, Sepal.Width)) +
geom_point()
hordes::base64_img_ggplot(gg)
}
Then in NodeJS:
const express = require('express');
const {mlibrary} = require('hordes');
const app = express();
const hordesx = mlibrary("hordesx")
app.get('/ggplot', async (req, res) => {
try {
const im = await hordesx.ggpoint(`n = ${req.query.n}`);
const img = Buffer.from(im, 'base64');
res.writeHead(200, {
'Content-Type': 'image/png',
'Content-Length': img.length
});
res.end(img);
} catch(e){
res.status(500).send(e)
}
})
app.listen(2811, function () {
console.log('Example app listening on port 2811!')
})
http://localhost:2811/ggplot?n=5 http://localhost:2811/ggplot?n=50 http://localhost:2811/ggplot?n=150
Before calling library()
or mlibrary()
, you can check that the install package still match a hash previously compiled with get_hash
.
This hash is computed from the DESCRIPTION
of the package called.
That way, if ever the DESCRIPTION
file changes (version update, or stuff like that...), you can get alerted (app won't launch).
Just ignore this if you don't care about checking this has (but you should in a production setting, so you can be alerted that the package you are using stays the same).
const { check_hash, get_hash } = require('hordes');
console.log(get_hash("golem"))
'fdfe0166629045e6ae8f7ada9d9ca821742e8135efec62bc2226cf0811f44ef3'
Then if you call library()
with another hash, the app will fail.
check_hash("golem", "blabla")
throw new Error("Hash from DESCRIPTION doesn't match specified hash.")
check_hash("golem", 'e2167f289a708b2cd3b774dd9d041b9e4b6d75584b9421185eb8d80ca8af4d8a')
var golem = library("golem")
Object.keys(golem).length
104
You can launch an R process that streams data and wait for a specific output in the stdout.
The specificity of waiter
is that it doesn't rely on node-rio
, but spawn a real R process, and reads the elements streamed on stdout.
The promise resolves with and {proc, raw_output}
: proc
is the process object created by Node, raw_output
is the output buffer, that can be turned to string with .toString()
.
A streaming process here is considered in a lose sense: what we mean here is anything that prints various elements to the console.
For example, when you create a new application using the {golem}
package, the app is ready once this last line is printed to the console.
This is exactly what waiter
does, it waits for this last line to be printed to the R stdout before resolving.
> golem::create_golem('pouet')
-- Checking package name -------------------------------------------------------
v Valid package name
-- Creating dir ----------------------------------------------------------------
v Created package directory
-- Copying package skeleton ----------------------------------------------------
v Copied app skeleton
-- Setting the default config --------------------------------------------------
v Configured app
-- Done ------------------------------------------------------------------------
A new golem named pouet was created at /private/tmp/pouet .
To continue working on your app, start editing the 01_start.R file.
const { waiter } = require("hordes")
const express = require('express');
const app = express();
app.get('/creategolem', async(req, res) => {
try {
await waiter("golem::create_golem('pouet')", {solve_on: "To continue working on your app"});
res.send("Created ")
} catch (e) {
console.log(e)
res.status(500).send("Error creating the golem project")
}
})
app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})
-> http://localhost:2811/creategolem
By default, the R code is launched by RScript
, but you can specify another (for example if you need another version of R):
const { waiter } = require("hordes")
const express = require('express');
const app = express();
app.get('/creategolem', async(req, res) => {
try {
await waiter("golem::create_golem('pouet')", {solve_on: "To continue working on your app", process: '/usr/local/bin/RScript'});
res.send("Created ")
} catch (e) {
console.log(e)
res.status(500).send("Error creating the golem project")
}
})
app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})
const { mlibrary } = require('hordes');
const dplyr = mlibrary("dplyr");
const stats = mlibrary("stats");
(async() => {
try {
const sample = await dplyr.sample_n("iris, 5")
console.log(sample)
} catch (e) {
console.log(e)
}
try {
const pull = await dplyr.pull("airquality, Month")
console.log(pull)
} catch (e) {
console.log(e)
}
try {
const lm = await stats.lm("Sepal.Length ~ Sepal.Width, data = iris")
console.log(lm)
} catch (e) {
console.log(e)
}
}
)();
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.7 3.8 1.7 0.3 setosa
2 6.7 2.5 5.8 1.8 virginica
3 6.9 3.1 5.1 2.3 virginica
4 6.4 2.9 4.3 1.3 versicolor
5 5.1 3.3 1.7 0.5 setosa
[1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6
[38] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7
[75] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[112] 8 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
[149] 9 9 9 9 9
Call:
stats::lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234
const express = require('express');
const { mlibrary } = require('hordes');
const app = express();
const stats = mlibrary("stats");
app.get('/lm', async(req, res) => {
try {
const output = await stats.lm(`${req.query.left} ~ ${req.query.right}`)
res.send('<pre>' + output + '</pre>')
} catch (e) {
res.status(500).send(e)
}
})
app.get('/rnorm', async(req, res) => {
try {
const output = await stats.rnorm(req.query.left)
res.send('<pre>' + output + '</pre>')
} catch (e) {
res.status(500).send(e)
}
})
app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})
-> http://localhost:2811/lm?left=iris$Sepal.Length&right=iris$Petal.Length
-> http://localhost:2811/rnorm?left=10
const { waiter } = require("hordes")
const express = require('express');
const app = express();
app.get('/creategolem', async(req, res) => {
try {
await waiter(`golem::create_golem('${req.query.name}')`, solve_on = "To continue working on your app");
res.send("Created ")
} catch (e) {
console.log(e)
res.status(500).send("Error creating the golem project")
}
})
app.listen(2811, function() {
console.log('Example app listening on port 2811!')
})