Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsetter indexing uses int, breaks for matrices with > MAXINT elements #919

Closed
nolanw123 opened this issue Nov 8, 2018 · 4 comments
Closed

Comments

@nolanw123
Copy link

Various parts of Subsetter use int for indexing instead of R_xlen_t, causing it to break for matrices with > MAXINT elements (x86_64 arch).

Example (subset.cc) below:

//
// Reproduce subsetting error caused by use of int type for indexing.
// NB requires > 16GB of memory to execute.
//

#include <Rcpp.h>
#include <RInside.h>

using namespace Rcpp;

void testMat(size_t nrow_, size_t ncol_)
{
  std::cout << "Testing " << nrow_ << " x " << ncol_ << " matrix subsetting." << std::endl;
  if(static_cast<int>(nrow_ * ncol_) < 0) std::cout << "Should throw exception as total # of elements exceeds MAXINT on this system." << std::endl;
  NumericMatrix m(nrow_, ncol_);
  std::cout << "Allocated " << m.nrow() << " x " << m.ncol() << " matrix." << std::endl;
  auto sz = m.size();
  if(sz != nrow_ * ncol_) throw std::runtime_error("Total # of elements incorrect");
  std::cout << "Matrix size is " << sz << std::endl;
  IntegerVector sub(1);       // all we need is one attempt to subset for it to fail
  sub[0] = 0;
  static_cast<NumericVector &>(m)[sub] = 0;
  std::cout << "Completed test of " << m.nrow() << " x " << m.ncol() << " matrix." << std::endl;
}

int main(int argc_, char **argv_)
{
  RInside R;

  try { testMat(10, 10); } catch(std::exception &e) { std::cerr << "Caught exception '" << e.what() << "', exiting." << std::endl; }
  try { testMat(65536, 32768); } catch(std::exception &e) { std::cerr << "Caught exception '" << e.what() << "', exiting." << std::endl; }

}

Compiled via:

g++ -g --std=c++11 $(R CMD config --cppflags) $(R CMD config --ldflags) $(echo 'Rcpp:::CxxFlags()' | $(R RHOME)/bin/R --vanilla --slave) $(echo 'Rcpp:::LdFlags()' | $(R RHOME)/bin/R --vanilla --slave) $(echo 'RInside:::CxxFlags()' | $(R RHOME)/bin/R --vanilla --slave) $(echo 'RInside:::LdFlags()' | $(R RHOME)/bin/R --vanilla --slave) subset.cc

Results from running a.out:

[nolanw@jerry ~]$ ./a.out
Testing 10 x 10 matrix subsetting.
Allocated 10 x 10 matrix.
Matrix size is 100
Completed test of 10 x 10 matrix.
Testing 65536 x 32768 matrix subsetting.
Should throw exception as total # of elements exceeds MAXINT on this system.
Allocated 65536 x 32768 matrix.
Matrix size is 2147483648
Caught exception 'index error', exiting.

Will try to put together a fix to revamp Subsetter to use the R_xlen_t internally.

@nolanw123
Copy link
Author

Interesting caveat here. Even if the code were fixed to use R_xlen_t internally, using an IntegerVector as the subsetting array type will ultimately not be able to address the deeper elements of a "too big" matrix, as under the covers the get_indices specialization for IntegerVector casts it to INTEGER, which is of course an int *.

String and bool indexing is unaffected. Real indexing currently is broken (creates a temp IntegerVector -- same problem). But that's fixable.

It should be possible to modify the specialization for NumericVector to do the right thing. Numerics can represent up to 2^53 in R_xlen_t values when cast to them, which is fine.

Would it be better to throw an exception with a helpful warning ("use NumericVector for objects of this size") in get_indices if size() > 2^31 and it was called with an IntegerVector?

@nolanw123
Copy link
Author

Okay, I have a fix that seems to work and handles the issues above. Need to do some reading on how to submit pull request and run unit tests first.

@eddelbuettel
Copy link
Member

We're here to help, also on the small stuff. Try creating a local source tarball (R CMD build ...) and checking that (R CMD check ...) Be careful that if the version number is still 1.0.0 it will NOT trigger all tests (see tests/doRUnit.R for details) so maybe make it a local 1.0.0.1 in DESCRIPTION.

I use helper scripts build.r and rcc.r (from my littler package) but they are just candy around other functions (from base R and/or helper, here rcmdcheck).

@eddelbuettel
Copy link
Member

Now taken care of in #920. Thanks for that, and the report ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants