Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long vector Rcpp function fails on Windows server, C function works #460

Closed
kendonB opened this issue Apr 13, 2016 · 26 comments
Closed

Long vector Rcpp function fails on Windows server, C function works #460

kendonB opened this issue Apr 13, 2016 · 26 comments
Assignees

Comments

@kendonB
Copy link
Contributor

kendonB commented Apr 13, 2016

Here is the Rcpp function (you'll need to sourceCpp):

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector plus1Vec(NumericVector x){
  R_xlen_t nrow = ::Rf_xlength(x);
  NumericVector out(nrow);
  for (R_xlen_t i = 0; i < nrow; ++i) {
      out[i] = x[i] + 1;
  }
  return out;
}

// [[Rcpp::export]]
double xlenout(NumericVector x){
  R_xlen_t out = ::Rf_xlength(x);
  return out;
}

// [[Rcpp::export]]
double unusedVec(NumericVector x){
  R_xlen_t nrow = ::Rf_xlength(x);
  NumericVector out(nrow);
  return 1.0;
}

And the call in R:

stop("Need around 40GB")
tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1

# works
plus1Vec(head(tmpVec))

# Doesn't work
tmp2 <- plus1Vec(tmpVec)
# Error in .Primitive(".Call")(<pointer: 0x000000006c501b00>, x) :
# negative length vectors are not allowed

# xlength works
xlenout(tmpVec)
# [1] 2147483649

# Problem is in the constructor
unusedVec(tmpVec)
# Error in .Primitive(".Call")(<pointer: 0x000000006f9427b0>, x) : 
#  negative length vectors are not allowed

Dirk had suggested that I try and see if this works with a plain C function, and it works:

library(inline)
stop("Need around 40GB")
tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1

add_one <- cfunction(c(x = "numeric"), "
  R_xlen_t n = xlength(x);
                     SEXP out = PROTECT(allocVector(REALSXP, n));

                     for (R_xlen_t i = 0; i < n; i++) {
                     REAL(out)[i] = REAL(x)[i] + 1;
                     }
                     UNPROTECT(1);

                     return out;
                     ")
tmp2 <- add_one(tmpVec)
length(tmp2)
# [1] 2147483649

Here is my sessionInfo():

R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.4   Rcpp_0.12.4.3

Hope this helps.

@thirdwing
Copy link
Member

I think I might know what's wrong. I will fix it soon.

@thirdwing
Copy link
Member

@kendonB I am afraid this is really related to Windows.

On a Ubuntu machine with 64GB memory, I used your code and I got

> Rcpp::sourceCpp("test.cpp")
> tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1
> tmp2 <- plus1Vec(tmpVec)
> length(tmp2)
[1] 2147483649

@eddelbuettel
Copy link
Member

KK do you remember if either we or else R has an #ifdef somewhere on R_xlen_t that would make this different on Windows?

@thirdwing
Copy link
Member

@eddelbuettel
Copy link
Member

Being conditional on what is just before it.

This is starting to ring a bell. Could someone please tell me what the size of size_t on Windoze is?

@eddelbuettel
Copy link
Member

Turns out we get 8 on both OSs. Back to square one.

R> library(Rcpp)
R> cppFunction("int foo() { return sizeof(size_t); }")
R> foo()
[1] 8
R> 

@kendonB
Copy link
Contributor Author

kendonB commented Apr 13, 2016

Not sure if this is completely obvious to y'all, but it seems the problem is in the constructor stage. See my edits to the main comment above

@eddelbuettel
Copy link
Member

Sorry but no idea what you are referring too. If you indeed know what the source of the problem is I would encourage you to submit a pull request.

@thirdwing
Copy link
Member

@kendonB I know the error is from ctor.

First, the code from you works well on the Linux machine:

> Rcpp::sourceCpp("test.cpp")
> tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1
> tmp2 <- plus1Vec(tmpVec)
> length(tmp2)
[1] 2147483649
> unusedVec(tmpVec)
[1] 1
> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.12.4

Second, since I really don't have access to a Windows machine with so many RAM, I can just guess. It might be related to how ptrdiff_t and size_t defined on your platform.

Can you check the size of ptrdiff_t and size_t on your machine?

Besides, can you check if the macro RCPP_HAS_LONG_LONG_TYPES is defined on your machine?

@kevinushey
Copy link
Contributor

Is it possible that, for some reason, the Vector constructor on Windows is choosing the int overload versus an appropriate R_xlen_t overload?

Cross-ref: https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/vector/Vector.h#L120-L131

Maybe is_arithmetic is failing to detect R_xlen_t or other non-int size types?

@kevinushey
Copy link
Contributor

Do we need R_len_t and R_xlen_t helpers in here? https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/traits/is_arithmetic.h

@thirdwing
Copy link
Member

R_xlen_t is defined to be ptrdiff_t, which is implementation-defined.

In the is_arithmetic, we have int, unsigned int, long, unsigned long, long long, unsigned long long.

I am a little curious how R_xlen_t is defined on Windows.

@kendonB
Copy link
Contributor Author

kendonB commented Apr 13, 2016

library(Rcpp)
cppFunction("int foo1() { return sizeof(ptrdiff_t); }")
cppFunction("int foo2() { return sizeof(size_t); }")
foo1()
# [1] 8
foo2()
# [1] 8

@thirdwing Size of both are 8. How do I check the RCPP_HAS_LONG_LONG_TYPES is defined?

@eddelbuettel
Copy link
Member

We generally have that only with C++11:

#include <Rcpp.h>

// [[Rcpp::export]]
int foo() {
#ifdef RCPP_HAS_LONG_LONG_TYPES
  return 1;
#else
  return 0;
#endif
}

/*** R
foo()
*/

Then in R:

R> Rcpp::sourceCpp("/tmp/longlong.cpp")

R> foo()
[1] 0
R> 

If you add the line

// [[Rcpp::plugins(cpp11)]]

and re-run you get a 1. So that's not it.

@kendonB
Copy link
Contributor Author

kendonB commented Apr 13, 2016

Adding // [[Rcpp::plugins(cpp11)]] makes everything work now:

> tmp2 <- plus1Vec(tmpVec)
> length(tmp2)
[1] 2147483649

@kendonB
Copy link
Contributor Author

kendonB commented Apr 13, 2016

Is there any reason why adding

// [[Rcpp::plugins(cpp11)]]

would be undesirable?

@eddelbuettel
Copy link
Member

Ahhhh, nice! Especially as we get C++11 on Windows soon too. Right now it kinda/sorta/not quite works on Windows and just gets us past some truly absurd old limits.

Now, @thirdwing @kevinushey do we know why this helps it because we get the right behaviour (with compilers from this century) on OS X and Linux. What on earth breaks with 4.6.* and the old stuff?

@eddelbuettel
Copy link
Member

Yes, trust us, that has been a concern for quite some time. The best piece of advice is to just add this locally.

We cannot depend on C++11 as we are fully committed to supporting all reasonable platforms. Which for RHEL4 or other dinosaurs may mean g++ 4.4.* or something older than you have.

@kendonB
Copy link
Contributor Author

kendonB commented Apr 13, 2016

When you say "locally", do you mean just for personal use cases? Or would you be comfortable having it added in a CRAN package?

@eddelbuettel
Copy link
Member

So as I recall, one of the benefits of the C++11 plugin was to enable long long everywhere. Part of the CRAN braindeadness was to pretend the C 2003 (yes, ANSI C, not C++) standard never happened, so C++98 it was for us -- and hence no long long. That probably means indexing was off, and hence the bug found by @kendonB.

I have long relied on adding C++11 to my packages just to get long long -- I think the oldest examples of mine are RcppBDT and RcppCPNy.

So @kendonB you can add the requirement locally to your builds (via ~/.R/Makevars) and of course also to your packages. It is a fudge, and on Windows it does not mean full C++11 support -- but it means 'better than the really stodgy C++98'. Once we have R 3.3.0 and the new toolchain you can also opt into real C++11 at the price of foregoing many pre-R 3.3.0 installations, or installations with older compilers.

I put something into the Rcpp FAQ. Ok to close this once we document it?

And thanks for finding the bug and your help. Your build with the plain C example made it clear it was us / our C++ environment. So big thanks all!

@kendonB
Copy link
Contributor Author

kendonB commented Apr 13, 2016

No problem at all. I'm certainly a net drain on the open source world, so happy to contribute in small ways when I can. Happy for you to close this when you wish, of course!

@thirdwing
Copy link
Member

@eddelbuettel I can give a guess and might confirm it when have time. This depends on how ptrdiff_t is defined on Windows and Linux.

When we test whether ptrdiff_t is arithmetic(https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/traits/is_arithmetic.h#L71), if it is defined as long long and we don't use C++11, we will get an error.

@kevinushey
Copy link
Contributor

For reference, I see this in the MinGW toolchain sources (in stddef.h):

#ifndef __PTRDIFF_TYPE__
#ifdef _WIN64
#define __PTRDIFF_TYPE__ long long int
#else
#define __PTRDIFF_TYPE__ long int
#endif
#endif
#ifndef _PTRDIFF_T_DEFINED
#define _PTRDIFF_T_DEFINED
__MINGW_EXTENSION typedef __PTRDIFF_TYPE__ ptrdiff_t;
#endif

So I guess if you're compiling on Windows 64, and you don't have C++11 support enabled, then the is_arithmetic overloads for long long don't kick in (and hence, no R_xlen_t / ptrdiff_t support).

(I think this just confirms all the investigation already done in this thread)

@thirdwing
Copy link
Member

Thank you!

Exactly what I want to confirm. As I remember, ptrdiff_t is defined as long on Linux.

On Apr 14, 2016, at 12:19 AM, Kevin Ushey notifications@github.com wrote:

PTRDIFF_TYPE

@coatless
Copy link
Contributor

Fourth entry in Section 5: Known Issues

Title: Long Vector Support on Windows

Proposed Text:

Prior to \R v3.0.0, the largest vector one could obtain was at most $2^{31} - 1$ elements. With the release of \R v3.0.0 , long vector support was added to allow for largest vector possible to increase up to $2^{52}$ elements on x64 bit operating systems (c.f. \href{https://stat.ethz.ch/R-manual/R-devel/library/base/html/LongVectors.html}{Long Vectors help entry}). Once this was established, support for long vectors within the Rcpp paradigm was introduced with Rcpp version 0.12.0 (c.f \href{http://dirk.eddelbuettel.com/blog/2015/07/25/}{Rcpp 0.12.0 annoucement}). However, the requirement for using long vectors in Rcpp necessitates the presence of compiler support for the \code{R_xlen_t}, which is platform dependent on how \code{ptrdiff_t} is implemented. Unfortunately, this means that on the Windows platform the definition of \code{R_xlen_t} is of type \code{long} instead of \code{long long} when compiling under the \proglang{C++98} specification. Therefore, to solve this issue one must compile under the specification for \proglang{C++11} or later version.

There are three options to trigger compilation with \proglang{C++11}. The first -- and most likely option to use -- will be the plugin support offered by Rcpp attributes. This is engaged by adding \code{// [[Rcpp::plugins(cpp11]]} to the top of the \proglang{C++} script. For diagnostic and illustrativative purposes, consider the following code which checks to see if \code{R_xlen_t} is available on your platform:

#include <Rcpp.h>
// Force compilation mode to C++11
// [[Rcpp::plugins(cpp11]]

// [[Rcpp::export]]
int test_long_vector_support() {
#ifdef RCPP_HAS_LONG_LONG_TYPES
  return 1;
#else
  return 0;
#endif
}

/*** R
test_long_vector_support()
*/

The remaining two options are for users who have opted to embed Rcpp code within an R package. In particular, the second option requires adding \code{CXX_ STD = CXX11} to a \code{Makevars} file found in the \code{/src} directory. Finally, the third option is to add \code{SystemRequirements: C++11} in the package's \code{DESCRIPTION} file.

Please note that the support for C++11 prior to \R v3.3.0 on Windows is limited. Therefore, plan accordingly if the goal is to support older versions of \R.

eddelbuettel added a commit that referenced this issue Mar 31, 2017
Adds Known Issues section to Rcpp FAQ (closes #628, #563, #552, #460, #419, and #251)
@coatless
Copy link
Contributor

@eddelbuettel: This issue can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants