Fix querying and creation of a large matrix due to integer overflow #338

fplazaonate · 2015-08-12T19:56:42Z

No description provided.

eddelbuettel · 2015-08-12T20:20:56Z

Thanks for the PR, that looks interesting.

Would you be able to supply an example where the old code break, and ideally where the new code passes? Even better, could you add that as a unit test?

thirdwing · 2015-08-12T20:29:22Z

I think nrow and ncol should be int to be consistent with R source code.

https://github.com/wch/r-source/blob/trunk/src/main/array.c#L202-#L213

fplazaonate · 2015-08-12T20:34:55Z

Actually, I think that almost all occurences of 'int' and 'size_t' should be replaced by 'R_xlen_t' which is a typedef of 'ptr_diff_t' that "acts as the signed counterpart of std::size_t" (cf: RInternals.h)

thirdwing · 2015-08-12T20:46:22Z

In my opinion, if nrow and ncol are both R_xlen_t and large enough, the overall size of matrix, nrow * ncol, can still be too big to fit into a R_xlen_t type.

So in R's own source code, nrow and ncol are int and the overall size is R_xlen_t.

https://github.com/wch/r-source/blob/trunk/src/main/array.c#L202-#L213

fplazaonate · 2015-08-12T20:59:22Z

You are right.
ncol and nrow are "int" and bounded to 2^31-1
However, some casts to "R_xlen_t" are required to avoid integer overflows while multiplying dimensions.

fplazaonate · 2015-08-12T21:02:35Z

This code doesn't produce the expected result.

#include <Rcpp.h>
using namespace Rcpp;


// [[Rcpp::export]]
NumericMatrix test() {
  const int nrow = 4000000;
  const int ncol = 700;
  NumericMatrix v(nrow,ncol);
  return v;
}

/*** R
test()
*/

> test()
Error in .Primitive(".Call")(<pointer: 0x000000006f941db0>) : 
  negative length vectors are not allowed

kevinushey · 2015-08-12T21:05:37Z

I agree that changing from int to size_t isn't the right choice here; we should match the interface with (current) R's. But we do need to perform some casts to prevent overflow.

E.g.

https://github.com/wch/r-source/blob/trunk/src/main/array.c#L148-L151
https://github.com/wch/r-source/blob/trunk/src/main/array.c#L134

eddelbuettel · 2015-08-12T21:13:04Z

Quick check:

R> 4000000 * 700 * 8 / 1e9
[1] 22.4
R>

22 gb of RAM for a single matrix is not all that common, but not impossible. And on board with the rest of the discussion: we ought to have the proper interface to R.

fplazaonate · 2015-08-12T21:15:17Z

This code probably produces a segfault but I will not have an access to a machine with enought RAM to test it until tomorrow.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void test(const NumericMatrix& m) {
  Rcout << m(m.nrow()-1,m.ncol()-1)  << std::endl;
}

/*** R
m=matrix(rep(1.1,4000000*700), nrow=4000000,ncol=700)
test(m)
*/

fplazaonate · 2015-08-12T21:17:00Z

For your information, a lot laboratories are using R to analyse matrices with much more than 2^31-1 elements

thirdwing · 2015-08-12T21:22:08Z

The problem is in the dimension class: https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/Dimension.h#L59-L61

nrow and ncol should be int, and their production should be a R_xlen_t.

eddelbuettel · 2015-08-12T21:25:08Z

Nicely done, KK. Agree with the issue in Dimension.h.

And yes, the code does segfault (on our largest server with lots of ram):

R> cppFunction("void mattest(NumericMatrix m) { Rcout <<  m(m.nrow()-1,m.ncol()-1)  << std::endl; }")  
R> m <- matrix(1.1, nrow=4000000,ncol=700)    
R> mattest(m)

 *** caught segfault ***
address 0x7f96c8671040, cause 'memory not mapped'

Traceback:
 1: .Primitive(".Call")(<pointer: 0x7f9ec8675c40>, m)
 2: mattest(m)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

fplazaonate · 2015-08-12T21:27:45Z

This constructor is also problematic

    template <typename Iterator>
    Matrix( const int& nrows_, const int& ncols, Iterator start ) :
        VECTOR( start, start + (nrows_*ncols) ), // Potential overflow. Need to cast to R_xlen_t
        nrows(nrows_)
    {
        VECTOR::attr( "dim" ) = Dimension( nrows, ncols ) ;
    }

thirdwing · 2015-08-12T21:29:48Z

So can you update your PR according to our discussion? @FPlaza

fplazaonate · 2015-08-12T21:31:39Z

What about those operators?
Should 'size_t' be replaced by int?

    inline Proxy operator()( const size_t& i, const size_t& j) {
      return static_cast< Vector<RTYPE>* >( this )->operator[]( offset( i, j ) ) ;
    }
    inline const_Proxy operator()( const size_t& i, const size_t& j) const {
       return static_cast< const Vector<RTYPE>* >( this )->operator[]( offset( i, j ) ) ;
    }

thirdwing · 2015-08-12T21:36:40Z

They should be int and actually are cast to int by offset.

https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/vector/Matrix.h#L160

fplazaonate · 2015-08-12T21:40:26Z

I am not very familiar with git.
Could you briefly indicate me how to update my PR?

eddelbuettel · 2015-08-12T21:41:22Z

Change your file. Commit it again.

thirdwing · 2015-08-12T21:58:43Z

inst/include/Rcpp/vector/Matrix.h

@@ -58,7 +58,7 @@ class Matrix : public Vector<RTYPE, StoragePolicy>, public MatrixBase<RTYPE, tru

    template <typename Iterator>
    Matrix( const int& nrows_, const int& ncols, Iterator start ) :
-        VECTOR( start, start + (nrows_*ncols) ),
+        VECTOR( start, start + (static_cast<R_xlen_t>(nrows)_*ncols) ),


You have a typo here, static_cast<R_xlen_t>(nrows_)

eddelbuettel · 2015-08-12T22:21:26Z

inst/include/Rcpp/vector/Matrix.h

@@ -58,7 +58,7 @@ class Matrix : public Vector<RTYPE, StoragePolicy>, public MatrixBase<RTYPE, tru

    template <typename Iterator>
    Matrix( const int& nrows_, const int& ncols, Iterator start ) :
-        VECTOR( start, start + (nrows_*ncols) ),
+        VECTOR( start, start + (static_cast<R_xlen_t>(nrows_)*ncols) ),


Or rather VECTOR( start, start + (static_cast<R_xlen_t>(nrows_ * ncols))) ?

I think we need to cast before multiply, otherwise, there is still an overflow, right?

Ahh. Good point.

fplazaonate · 2015-08-14T10:00:57Z

Do the changes satisfy you?
If so, when do you plan to publish a new version of Rcpp which includes them?
Regards.

eddelbuettel · 2015-08-14T10:50:35Z

inst/include/Rcpp/Dimension.h

@@ -39,14 +39,14 @@ namespace Rcpp{
 	            dims = other.dims ;
 	        return *this ;
 	    }
-	    Dimension(const size_t& n1) : dims(1){
+	    Dimension(const int& n1) : dims(1){


But int is signed and size_t is not. I think you just reduced the range of numbers which can be expressed here.

Actually, dims is an 'int' vector so 'size_t' will be downcasted to 'int' at the end

We are not that lighthearted about changing existing interfaces. Right now I am inclined not to take the patch. Lines 59 and 60 in Dimension.h look fine as does line 61 in Matrix.h. I am not so sure about the rest.

The patches make the Rcpp interface coherent with the RInternals (which is currently not the case.)
Tell me what are your concerns and I will answer it.

For coherence with R's interface, I think it's best to use int here. The std::size_t items used would still get casted to int later which of course would cause problems if we had a std::size_t which did not fit in int.

That said, this is a change that could potentially break ABI and the ultimate effect is 'correctness' but nothing really user-visible unless one attempts to construct a dimension that doesn't fit in an int so it's a bit more difficult to accept.

eddelbuettel · 2015-08-14T10:51:40Z

I think it needs more discussion.

ChangeLog and NEWS.Rd will give you a good idea about our release frequency.

fplazaonate · 2015-08-15T06:14:42Z

I am closing this PR.
I will create an issue to create a consensus about the changes needed.

Florian Plaza Oñate added 2 commits August 12, 2015 21:55

Fix integer overflow while querying large matrix.

7afafd4

Fix creation of very large matrix.

932165b

fplazaonate changed the title ~~Fix integer overflow while querying large matrix.~~ Fix querying and creation of a large matrix due to integer overflow Aug 12, 2015

Use 'int' for dimensions and R_xlen_t for product of dimensions.

fc770e5

thirdwing reviewed Aug 12, 2015
View reviewed changes

Fix typo.

ea9b2ea

eddelbuettel reviewed Aug 12, 2015
View reviewed changes

Florian Plaza Oñate added 2 commits August 13, 2015 10:46

Use int instead of size_t in Dimension constructors.

450be09

Use R_xlen_t instead of size_t in bounds checked Vector accessors.

5a8911e

eddelbuettel reviewed Aug 14, 2015
View reviewed changes

Fix wrong types in Vector accessor

076a7fe

fplazaonate closed this Aug 15, 2015

fplazaonate mentioned this pull request Aug 19, 2015

Add at accessors with bounds checking #342

Merged

fplazaonate mentioned this pull request Sep 8, 2015

Rcpp is inconsistent with RInternals integers types #369

Closed

Uh oh!

Fix querying and creation of a large matrix due to integer overflow #338

Fix querying and creation of a large matrix due to integer overflow #338

Uh oh!

Conversation

fplazaonate commented Aug 12, 2015

Uh oh!

eddelbuettel commented Aug 12, 2015

Uh oh!

thirdwing commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

thirdwing commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

kevinushey commented Aug 12, 2015

Uh oh!

eddelbuettel commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

thirdwing commented Aug 12, 2015

Uh oh!

eddelbuettel commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

thirdwing commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

thirdwing commented Aug 12, 2015

Uh oh!

fplazaonate commented Aug 12, 2015

Uh oh!

eddelbuettel commented Aug 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fplazaonate commented Aug 14, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddelbuettel commented Aug 14, 2015

Uh oh!

fplazaonate commented Aug 15, 2015

Uh oh!

Uh oh!