Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety question #17

Closed
dselivanov opened this issue Nov 11, 2015 · 4 comments
Closed

Thread safety question #17

dselivanov opened this issue Nov 11, 2015 · 4 comments

Comments

@dselivanov
Copy link
Contributor

Is is safe to not use RVector<T> in threads, but use regular Rcpp types? Consider following word-count example (I need to use ListOf<CharacterVector> as an input ), I tested it OS X and linux and have no problems.

#include <Rcpp.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;

struct WordCounter : public Worker {

  const ListOf<const CharacterVector> input;
  unordered_map<string, uint32_t> wc;

  vector<string> temp;
  //constructors
  WordCounter(const ListOf<const CharacterVector> docs) : input(docs) {}
  WordCounter( WordCounter& wc, Split) : input(wc.input) {}
  //word count
  void operator()(size_t begin, size_t end) {
    for(size_t i = begin; i < end; i++ ) {
      //this produces quite big overhead
      temp = as<vector< string>>(input [i]);
      for( auto word: temp)
        wc [word]++;
    }
  }
  // join maps
  void join( WordCounter& rhs) {
    for (auto it: rhs.wc)
      wc[it.first] += it.second;
  }
};

And corresponding export:

// [[Rcpp::export]]
IntegerVector word_count( const ListOf< const CharacterVector> docs, uint32_t GRAIN_SIZE) {
  WordCounter wc( docs );
  // do the job
  parallelReduce(0, docs.size(), wc, GRAIN_SIZE);
  // construct output
  IntegerVector res(wc.wc.size());
  CharacterVector res_names(wc.wc.size());
  size_t i = 0;
  for(auto it:wc.wc) {
    res[i] = it.second;
    res_names[i] = it.first;
    i++;
  }
  res.names() = res_names;
  return res;
}
@eddelbuettel
Copy link
Member

In general you should use RVector. Using the Rcpp types exposes you to possible gc() events.

@jjallaire
Copy link
Member

As you probably observed we don't have a specialization of RVector for
RVector<const char*> (I've tried a couple of times but always get tripped
up, I'm actually not sure it's possible). We'd definitely welcome a pull
request for either a successful specialization of RVector<const char*> or
worst case for an RCharacterVector class.

The problem with using Rcpp types as Dirk pointed out is that the Rcpp API
makes no guarantees about whether code calls back into R (and thus
potentially triggers a gc which would be disastrous). The fact that your
test code works doesn't prove that a gc isn't possible, just that it
doesn't happen in those specific runs of your test code :-.

On Wed, Nov 11, 2015 at 6:53 AM, Dirk Eddelbuettel <notifications@github.com

wrote:

In general you should use RVector. Using the Rcpp types exposes you to
possible gc() events.


Reply to this email directly or view it on GitHub
#17 (comment)
.

@dselivanov
Copy link
Contributor Author

@eddelbuettel , @jjallaire, thank you very much for clarification. Actually I already crashed R with code above. I'm not C++ or R internals expert, but will have a look.

@romainfrancois
Copy link
Collaborator

As soon as you don't use references for Rcpp types, you are not safe.

If you use references, it all depends on what you do with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants