Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'optim' function with 'L-BFGS' method fails in multithreading #448

Open
tlucz opened this issue Feb 5, 2019 · 4 comments
Open

The 'optim' function with 'L-BFGS' method fails in multithreading #448

tlucz opened this issue Feb 5, 2019 · 4 comments

Comments

@tlucz
Copy link

tlucz commented Feb 5, 2019

Hello,

We faced a problem with multithreading in Renjin. The 'optim' function with 'L-BFGS' method calculates different results when is invoked from many threads.

A simple Java project was prepared to reproduce the problem. Also the code is attached at the bottom of this mail.

https://github.com/tlucz/parallel-optim

When we run the code with THREADS_NUMBER=1 we get following correct results (every time the same):

list(par = c(0,99980003783379, 0,99960011579784), value = 3.9984867919192357E-8, counts = c(49L, 49L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)
list(par = c(0,99980003783379, 0,99960011579784), value = 3.9984867919192357E-8, counts = c(49L, 49L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)
list(par = c(0,99980003783379, 0,99960011579784), value = 3.9984867919192357E-8, counts = c(49L, 49L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)

But when we run it with more threads e.g. THREADS_NUMBER = 2 we get different results for the calls and sometimes even an errors:

list(par = c(-1,02632082910288, 1,06079411634746), value = 24.199999999999996, counts = c(2L, 2L), convergence = c(52L), message = ERROR: ABNORMAL_TERMINATION_IN_LNSRCH)
list(par = c(-1,2, 1), value = 24.199999999999996, counts = c(2L, 2L), convergence = c(52L), message = ERROR: ABNORMAL_TERMINATION_IN_LNSRCH)
list(par = c(-1,2, 1), value = 0.0, counts = c(0L, 0L), convergence = c(52L), message = ERROR: ABNORMAL_TERMINATION_IN_LNSRCH)
list(par = c(0,37771903024192, 1,64396551742973), value = 225.77555649737445, counts = c(0L, 0L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)
list(par = c(0,99987487694407, 0,99974046721066), value = 2.4309119528957442E-8, counts = c(52L, 52L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)

Separate ScriptEngine is created for each thread as it is described in Renjin documentation.

code:

package com.example;

import javax.script.ScriptEngine;
import org.renjin.script.RenjinScriptEngineFactory;
import org.renjin.sexp.SEXP;

public class Example {

 private static final ThreadLocal<ScriptEngine> engines = new ThreadLocal<>();
 private static final int RUNS = 100;
 private static final int THREADS_NUMBER = 2;

 public static void main(String[] args) {
   new Example().execute();
 }

 private void execute() {
   for (int i = 0; i < THREADS_NUMBER; i++) {
     new Thread(this::task).start();
   }
 }

 private void task() {
   ScriptEngine engine = getEngine();
   try {
     engine.eval("fr <- function(x) { ## Rosenbrock Banana function\\n\" +\n"
     + " x1 <- x[1]\n"
     + " x2 <- x[2]\n"
     + " 100 * (x2 - x1 * x1) ^ 2 + (1 - x1) ^ 2\n"
     + "}");
   } catch (Exception e) {
     System.err.println(e.getMessage());
   }

   for (int i = 0; i < Example.RUNS; i++) {
     try {
       SEXP result = (SEXP) engine.eval("optim(c(-1.2,1), fr, method = \"L-BFGS\")");
       System.out.println(result);
     } catch (Exception e) {
       System.err.println(e.getMessage());
     }
   }
 }

 private ScriptEngine getEngine() {
   ScriptEngine engine = engines.get();
   if (engine == null) {
     RenjinScriptEngineFactory factory = new RenjinScriptEngineFactory();
     engine = factory.getScriptEngine();
     engines.set(engine);
   }
   return engine;
 }
}

@akbertram
Copy link
Member

it is most likely due to the use of global variables in the C code we have imported from GNU R. I thought we had rewritten offending code, but will need to check on this.

@akbertram
Copy link
Member

Indeed, one of the support routines declares a large number of static local variables that we missed:

static int lsnd, ld, lr, lt, lz, lwa, lwn, lss, lws, lwt, lsy, lwy;

I will see if it's easier to refactor the many many global variables into a working data structure, or if we can use the global variable transformer that has worked so well with the graphics package.

akbertram added a commit that referenced this issue Feb 5, 2019
This extends the GlobalVarRewriter to automatically transform the
remaining global variables in C and Fortran code in the stats package
to use Session-scoped values so that it can be safely called from
multiple threads.

The performance impact needs to be evaluated before merging.
@tlucz
Copy link
Author

tlucz commented Feb 8, 2019

Hi,
Thanks for a quick reaction. When do you plan to release a new version with this fix?

@tlucz
Copy link
Author

tlucz commented Jul 20, 2020

Hello, Is there a plan to merge the fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants