Skip to content

Conversation

jaganmn
Copy link
Contributor

@jaganmn jaganmn commented Jan 15, 2022

Fixes integer overflow bug detected here so that Rcpp::wrap(<Eigen::Matrix>) actually supports long vectors.

@eddelbuettel
Copy link
Member

Nice. I'll give this as a good look.

@eddelbuettel
Copy link
Member

eddelbuettel commented Jan 15, 2022

Do you think you can cook up an example or test ? (bad and wrong math deleted)

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- added some whitespace changes and a ChangeLog entry

@eddelbuettel eddelbuettel merged commit be96204 into RcppCore:master Jan 15, 2022
@jaganmn
Copy link
Contributor Author

jaganmn commented Jan 15, 2022

Sorry about the whitespace. Probably an Emacs glitch on my end, since it looked fine in my buffer. Here is a basic test - I only have 16 GB RAM myself...

Rcpp::sourceCpp(code = '
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
Rcpp::IntegerVector vector_wrap(R_xlen_t n) {
    Eigen::VectorXi x(n, 1);
    for (R_xlen_t i = 0; i < n; ++i) {
        x(i) = (int) (i % 10);
    }
    return Rcpp::wrap(x);
}
// [[Rcpp::export]]
Rcpp::IntegerMatrix matrix_wrap(R_xlen_t n) {
    Eigen::MatrixXi x(n, 1);
    for (R_xlen_t i = 0; i < n; ++i) {
        x(i, 0) = (int) (i % 10);
    }
    return Rcpp::wrap(x);
}
')

gc(FALSE)
n <- floor(2^31.5)
x <- vector_wrap(n)
stopifnot(is.vector(x, "integer"),
          length(x) == n,
          identical(x[seq_len(2^10)], rep_len(0:9, 2^10)))
rm(x)
gc(FALSE)
h <- function(cond) grepl("INT_MAX", conditionMessage(cond))
stopifnot(tryCatch({matrix_wrap(n); FALSE}, error = h))

@eddelbuettel
Copy link
Member

Hehe, I use Emacs too. See the (old) first line in the file; these days I prefer .editorconf files.

Maybe my math was wrong but I try to convince myself we need more. Oh, but now I know what my error was. Better way:

  • 2^31 = 2147483648
  • or 2.14gb
  • times 4 bytes per cell for int32_t yields just under 9gb
  • I have 32gb here so I can play too

@eddelbuettel
Copy link
Member

Yep, works well here too. Matrix case does not error. Was it supposed to?

@jaganmn
Copy link
Contributor Author

jaganmn commented Jan 15, 2022

I expect matrix_wrap(n) to throw the error array dimensions cannot exceed INT_MAX, and stopifnot to not throw an error as a result, due to the error handler. That is what I observe on my machine...

@eddelbuettel
Copy link
Member

Well played -- because that error-throwing was not in your code I didn't run it so I didn't see it. Concur on what the test script does.

And maybe today isn't just my day -- but a vector of size N should have the same memory requirement as an N x 1 matrix. What am I missing?

@jaganmn
Copy link
Contributor Author

jaganmn commented Jan 15, 2022

Memory isn't really the problem. The dim attribute of an R array is by definition an integer vector, so no element can exceed INT_MAX even if it is possible to allocate a prod(dim)-length vector. You throw an error only to avoid creating an invalid R object.

The constraint on dim is mentioned in ?`Memory-limits`:

The number of bytes in a character string is limited to 2^31 - 1 ~ 2*10^9, which is also the limit on each dimension of an array.

and also discussed in R-ints here.

@eddelbuettel
Copy link
Member

You are spot on, and I once knew that by heart too. With R_xlen_t we got larger matrices and vectors (double as index counter) but the dims is the remaining constraint even if total size no longer is. I had plainly forgotten about the dim attribute type constraint, and that also explains why you had to add the check.

That is actually a really nice on. We could add that the unit tests, possibly behind another opt-in boolean toggle (just how Rcpp has by default most of its tests off as they take moment). So if, say, RcppEigen_Large_Vector_Test is TRUE we run them just as you have it here with one expect_equal or expect_true and one expect_error. I may toss that in tomorrow.

R_xlen_t m = obj.rows(), n = obj.cols(), size = m * n;
SEXP ans = PROTECT(::Rcpp::wrap(objCopy.data(), objCopy.data() + size));
if ( T::ColsAtCompileTime != 1 ) {
if (m > INT_MAX || n > INT_MAX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be immediately after the dimension definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... I've just committed again to my fork to address that, but it won't appear here until the PR is reopened. (I think?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tacking on: Is objCopy really a copy of obj? Why do we need objCopy.data() here, rather than obj.data()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it may needs a cleanup as the Rcpp:;stop() also breaks the protect/unprotect dance.

The const T& obj to <.... const T&>::type objCopy(obj) dance is likely due to some TMP magic we need.

I'll move the check up a line now.

Copy link
Contributor Author

@jaganmn jaganmn Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my latest commit, unless you are suggesting something different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no commits, this has been merged to the repo. See the purple 'Merged' button here.

Note that I also just committed so if you want to change relative to the repo you need to update your for by pulling the main repo here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your change seems perfectly fine - I often forget how GitHub works past midnight... Glad this was caught and fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh - I spoke too soon. See my PR #106.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants