Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance on large data set #55

Open
xcompass opened this issue Jun 15, 2017 · 2 comments
Open

Slow performance on large data set #55

xcompass opened this issue Jun 15, 2017 · 2 comments

Comments

@xcompass
Copy link

It seems the library has slow performance on larger data set. Here is the example:

use Statistics::R::IO;

my $c = Statistics::R::IO::Rserve->new(server => 'r', _usesocket => 1);

$c->eval('n <- 1500');
$c->eval('x1 <- round(runif(n, min=-1, max=7),3)');
$c->eval('x2 <- round(runif(n, min=-1, max=7),3)');
$c->eval('x3 <- round(rnorm(n, sd=3),3)');
$c->eval('x4 <- round(x3 + abs(rnorm(n, sd=2)),3)');
$c->eval('x5 <- round(runif(n, min=-1, max=7),3)');
$c->eval('x6 <- round(sample(x1),3)');
$c->eval('x7 <- round(sample(x3),3)');
$c->eval('x8 <- round(sample(x2),3)');
$c->eval('x9 <- round(x4 + x1 + rnorm(n, sd=.5),3)');
$c->eval('x10 <- round(x2 + rnorm(n, sd=.5),3)');
my $r = $c->eval('x <- cbind(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)');
print $r;
$c->close();
root@6b3ba909a6e2:/opt/webwork# time perl test.pl
....
real    0m3.931s
user    0m3.890s
sys     0m0.030s

Same R statements running in python client:

import pyRserve

conn = pyRserve.connect(host='r');

conn.eval('n <- 1500');
conn.eval('x1 <- round(runif(n, min=-1, max=7),3)');
conn.eval('x2 <- round(runif(n, min=-1, max=7),3)');
conn.eval('x3 <- round(rnorm(n, sd=3),3)');
conn.eval('x4 <- round(x3 + abs(rnorm(n, sd=2)),3)');
conn.eval('x5 <- round(runif(n, min=-1, max=7),3)');
conn.eval('x6 <- round(sample(x1),3)');
conn.eval('x7 <- round(sample(x3),3)');
conn.eval('x8 <- round(sample(x2),3)');
conn.eval('x9 <- round(x4 + x1 + rnorm(n, sd=.5),3)');
conn.eval('x10 <- round(x2 + rnorm(n, sd=.5),3)');
result = conn.eval('x <- cbind(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)');
print result;
conn.close();
root@6b3ba909a6e2:/opt/webwork# time python test.py
...
real    0m0.291s
user    0m0.080s
sys     0m0.110s

Native Rscript:

n <- 1500
x1 <- round(runif(n, min=-1, max=7),3)
x2 <- round(runif(n, min=-1, max=7),3)
x3 <- round(rnorm(n, sd=3),3)
x4 <- round(x3 + abs(rnorm(n, sd=2)),3)
x5 <- round(runif(n, min=-1, max=7),3)
x6 <- round(sample(x1),3)
x7 <- round(sample(x3),3)
x8 <- round(sample(x2),3)
x9 <- round(x4 + x1 + rnorm(n, sd=.5),3)
x10 <- round(x2 + rnorm(n, sd=.5),3)
x <- cbind(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)
print(x)
root@8f3ee1c31d84:/# time Rscript test.R
...
real    0m0.146s
user    0m0.090s
sys     0m0.030s

Python is slow than native Rscript, but still comparable. But perl client is more than 10x slower. I wonder if there is something inefficient or a bug in the client.

I tested on both remote and local host to Rserve.

R version 3.1.1
Rserve 1.8-5, 1.7-3, 0.6-8 all similar results.

@cubranic
Copy link
Owner

Can you check the performance of just a single assignment? (E.g., x1) I want to check whether it's any object, even a simple vector (x1), or if it's especially the more complex objects, like matrices (i.e., cbind).

The library was never optimized for performance, it's kind of like a recursive parser to make it easy to work from the protocol spec.

@cubranic
Copy link
Owner

Also, I suggested to the problem creators to avoid returning those objects to webwork if they don't really need them. In other words, run:

$c->eval('x1 <- round(runif(10000, min=-1, max=7),3); NULL');

unless x1 is actually needed in the PG problem. I know it's just a workaround, but I don't have the time right now to deep dive into performance issues in the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants