Long vector Rcpp function fails on Windows server, C function works #460

kendonB · 2016-04-13T01:35:08Z

Here is the Rcpp function (you'll need to sourceCpp):

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector plus1Vec(NumericVector x){
  R_xlen_t nrow = ::Rf_xlength(x);
  NumericVector out(nrow);
  for (R_xlen_t i = 0; i < nrow; ++i) {
      out[i] = x[i] + 1;
  }
  return out;
}

// [[Rcpp::export]]
double xlenout(NumericVector x){
  R_xlen_t out = ::Rf_xlength(x);
  return out;
}

// [[Rcpp::export]]
double unusedVec(NumericVector x){
  R_xlen_t nrow = ::Rf_xlength(x);
  NumericVector out(nrow);
  return 1.0;
}

And the call in R:

stop("Need around 40GB")
tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1

# works
plus1Vec(head(tmpVec))

# Doesn't work
tmp2 <- plus1Vec(tmpVec)
# Error in .Primitive(".Call")(<pointer: 0x000000006c501b00>, x) :
# negative length vectors are not allowed

# xlength works
xlenout(tmpVec)
# [1] 2147483649

# Problem is in the constructor
unusedVec(tmpVec)
# Error in .Primitive(".Call")(<pointer: 0x000000006f9427b0>, x) : 
#  negative length vectors are not allowed

Dirk had suggested that I try and see if this works with a plain C function, and it works:

library(inline)
stop("Need around 40GB")
tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1

add_one <- cfunction(c(x = "numeric"), "
  R_xlen_t n = xlength(x);
                     SEXP out = PROTECT(allocVector(REALSXP, n));

                     for (R_xlen_t i = 0; i < n; i++) {
                     REAL(out)[i] = REAL(x)[i] + 1;
                     }
                     UNPROTECT(1);

                     return out;
                     ")
tmp2 <- add_one(tmpVec)
length(tmp2)
# [1] 2147483649

Here is my sessionInfo():

R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.4   Rcpp_0.12.4.3

Hope this helps.

The text was updated successfully, but these errors were encountered:

thirdwing · 2016-04-13T13:42:29Z

I think I might know what's wrong. I will fix it soon.

thirdwing · 2016-04-13T13:55:21Z

@kendonB I am afraid this is really related to Windows.

On a Ubuntu machine with 64GB memory, I used your code and I got

> Rcpp::sourceCpp("test.cpp")
> tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1
> tmp2 <- plus1Vec(tmpVec)
> length(tmp2)
[1] 2147483649

eddelbuettel · 2016-04-13T14:00:15Z

KK do you remember if either we or else R has an #ifdef somewhere on R_xlen_t that would make this different on Windows?

thirdwing · 2016-04-13T14:07:07Z

R_xlen_t is defined here: https://github.com/wch/r-source/blob/trunk/src/include/Rinternals.h#L74-L83

eddelbuettel · 2016-04-13T14:09:23Z

Being conditional on what is just before it.

This is starting to ring a bell. Could someone please tell me what the size of size_t on Windoze is?

eddelbuettel · 2016-04-13T14:18:24Z

Turns out we get 8 on both OSs. Back to square one.

R> library(Rcpp)
R> cppFunction("int foo() { return sizeof(size_t); }")
R> foo()
[1] 8
R>

kendonB · 2016-04-13T21:45:00Z

Not sure if this is completely obvious to y'all, but it seems the problem is in the constructor stage. See my edits to the main comment above

eddelbuettel · 2016-04-13T21:53:38Z

Sorry but no idea what you are referring too. If you indeed know what the source of the problem is I would encourage you to submit a pull request.

thirdwing · 2016-04-13T22:06:55Z

@kendonB I know the error is from ctor.

First, the code from you works well on the Linux machine:

> Rcpp::sourceCpp("test.cpp")
> tmpVec <- 1:(.Machine$integer.max + 2)*0 + 1
> tmp2 <- plus1Vec(tmpVec)
> length(tmp2)
[1] 2147483649
> unusedVec(tmpVec)
[1] 1
> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.12.4

Second, since I really don't have access to a Windows machine with so many RAM, I can just guess. It might be related to how ptrdiff_t and size_t defined on your platform.

Can you check the size of ptrdiff_t and size_t on your machine?

Besides, can you check if the macro RCPP_HAS_LONG_LONG_TYPES is defined on your machine?

kevinushey · 2016-04-13T22:07:02Z

Is it possible that, for some reason, the Vector constructor on Windows is choosing the int overload versus an appropriate R_xlen_t overload?

Cross-ref: https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/vector/Vector.h#L120-L131

Maybe is_arithmetic is failing to detect R_xlen_t or other non-int size types?

kevinushey · 2016-04-13T22:07:43Z

Do we need R_len_t and R_xlen_t helpers in here? https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/traits/is_arithmetic.h

thirdwing · 2016-04-13T22:13:20Z

R_xlen_t is defined to be ptrdiff_t, which is implementation-defined.

In the is_arithmetic, we have int, unsigned int, long, unsigned long, long long, unsigned long long.

I am a little curious how R_xlen_t is defined on Windows.

kendonB · 2016-04-13T22:19:33Z

library(Rcpp)
cppFunction("int foo1() { return sizeof(ptrdiff_t); }")
cppFunction("int foo2() { return sizeof(size_t); }")
foo1()
# [1] 8
foo2()
# [1] 8

@thirdwing Size of both are 8. How do I check the RCPP_HAS_LONG_LONG_TYPES is defined?

eddelbuettel · 2016-04-13T22:26:50Z

We generally have that only with C++11:

#include <Rcpp.h>

// [[Rcpp::export]]
int foo() {
#ifdef RCPP_HAS_LONG_LONG_TYPES
  return 1;
#else
  return 0;
#endif
}

/*** R
foo()
*/

Then in R:

R> Rcpp::sourceCpp("/tmp/longlong.cpp")

R> foo()
[1] 0
R>

If you add the line

// [[Rcpp::plugins(cpp11)]]

and re-run you get a 1. So that's not it.

kendonB · 2016-04-13T22:31:28Z

Adding // [[Rcpp::plugins(cpp11)]] makes everything work now:

> tmp2 <- plus1Vec(tmpVec)
> length(tmp2)
[1] 2147483649

kendonB · 2016-04-13T22:33:24Z

Is there any reason why adding

// [[Rcpp::plugins(cpp11)]]

would be undesirable?

eddelbuettel · 2016-04-13T22:34:23Z

Ahhhh, nice! Especially as we get C++11 on Windows soon too. Right now it kinda/sorta/not quite works on Windows and just gets us past some truly absurd old limits.

Now, @thirdwing @kevinushey do we know why this helps it because we get the right behaviour (with compilers from this century) on OS X and Linux. What on earth breaks with 4.6.* and the old stuff?

eddelbuettel · 2016-04-13T22:35:51Z

Yes, trust us, that has been a concern for quite some time. The best piece of advice is to just add this locally.

We cannot depend on C++11 as we are fully committed to supporting all reasonable platforms. Which for RHEL4 or other dinosaurs may mean g++ 4.4.* or something older than you have.

kendonB · 2016-04-13T22:38:27Z

When you say "locally", do you mean just for personal use cases? Or would you be comfortable having it added in a CRAN package?

eddelbuettel · 2016-04-13T22:43:07Z

So as I recall, one of the benefits of the C++11 plugin was to enable long long everywhere. Part of the CRAN braindeadness was to pretend the C 2003 (yes, ANSI C, not C++) standard never happened, so C++98 it was for us -- and hence no long long. That probably means indexing was off, and hence the bug found by @kendonB.

I have long relied on adding C++11 to my packages just to get long long -- I think the oldest examples of mine are RcppBDT and RcppCPNy.

So @kendonB you can add the requirement locally to your builds (via ~/.R/Makevars) and of course also to your packages. It is a fudge, and on Windows it does not mean full C++11 support -- but it means 'better than the really stodgy C++98'. Once we have R 3.3.0 and the new toolchain you can also opt into real C++11 at the price of foregoing many pre-R 3.3.0 installations, or installations with older compilers.

I put something into the Rcpp FAQ. Ok to close this once we document it?

And thanks for finding the bug and your help. Your build with the plain C example made it clear it was us / our C++ environment. So big thanks all!

kendonB · 2016-04-13T22:52:25Z

No problem at all. I'm certainly a net drain on the open source world, so happy to contribute in small ways when I can. Happy for you to close this when you wish, of course!

thirdwing · 2016-04-13T23:22:07Z

@eddelbuettel I can give a guess and might confirm it when have time. This depends on how ptrdiff_t is defined on Windows and Linux.

When we test whether ptrdiff_t is arithmetic(https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/traits/is_arithmetic.h#L71), if it is defined as long long and we don't use C++11, we will get an error.

kevinushey · 2016-04-14T04:19:00Z

For reference, I see this in the MinGW toolchain sources (in stddef.h):

#ifndef __PTRDIFF_TYPE__
#ifdef _WIN64
#define __PTRDIFF_TYPE__ long long int
#else
#define __PTRDIFF_TYPE__ long int
#endif
#endif
#ifndef _PTRDIFF_T_DEFINED
#define _PTRDIFF_T_DEFINED
__MINGW_EXTENSION typedef __PTRDIFF_TYPE__ ptrdiff_t;
#endif

So I guess if you're compiling on Windows 64, and you don't have C++11 support enabled, then the is_arithmetic overloads for long long don't kick in (and hence, no R_xlen_t / ptrdiff_t support).

(I think this just confirms all the investigation already done in this thread)

thirdwing · 2016-04-14T04:29:45Z

Thank you!

Exactly what I want to confirm. As I remember, ptrdiff_t is defined as long on Linux.

On Apr 14, 2016, at 12:19 AM, Kevin Ushey notifications@github.com wrote:

PTRDIFF_TYPE

coatless · 2017-03-26T05:04:19Z

Fourth entry in Section 5: Known Issues

Title: Long Vector Support on Windows

Proposed Text:

Prior to \R v3.0.0, the largest vector one could obtain was at most $2^{31} - 1$ elements. With the release of \R v3.0.0 , long vector support was added to allow for largest vector possible to increase up to $2^{52}$ elements on x64 bit operating systems (c.f. \href{https://stat.ethz.ch/R-manual/R-devel/library/base/html/LongVectors.html}{Long Vectors help entry}). Once this was established, support for long vectors within the Rcpp paradigm was introduced with Rcpp version 0.12.0 (c.f \href{http://dirk.eddelbuettel.com/blog/2015/07/25/}{Rcpp 0.12.0 annoucement}). However, the requirement for using long vectors in Rcpp necessitates the presence of compiler support for the \code{R_xlen_t}, which is platform dependent on how \code{ptrdiff_t} is implemented. Unfortunately, this means that on the Windows platform the definition of \code{R_xlen_t} is of type \code{long} instead of \code{long long} when compiling under the \proglang{C++98} specification. Therefore, to solve this issue one must compile under the specification for \proglang{C++11} or later version.

There are three options to trigger compilation with \proglang{C++11}. The first -- and most likely option to use -- will be the plugin support offered by Rcpp attributes. This is engaged by adding \code{// [[Rcpp::plugins(cpp11]]} to the top of the \proglang{C++} script. For diagnostic and illustrativative purposes, consider the following code which checks to see if \code{R_xlen_t} is available on your platform:

#include <Rcpp.h>
// Force compilation mode to C++11
// [[Rcpp::plugins(cpp11]]

// [[Rcpp::export]]
int test_long_vector_support() {
#ifdef RCPP_HAS_LONG_LONG_TYPES
  return 1;
#else
  return 0;
#endif
}

/*** R
test_long_vector_support()
*/

The remaining two options are for users who have opted to embed Rcpp code within an R package. In particular, the second option requires adding \code{CXX_ STD = CXX11} to a \code{Makevars} file found in the \code{/src} directory. Finally, the third option is to add \code{SystemRequirements: C++11} in the package's \code{DESCRIPTION} file.

Please note that the support for C++11 prior to \R v3.3.0 on Windows is limited. Therefore, plan accordingly if the goal is to support older versions of \R.

Adds Known Issues section to Rcpp FAQ (closes #628, #563, #552, #460, #419, and #251)

coatless · 2017-03-31T01:40:13Z

@eddelbuettel: This issue can now be closed.

eddelbuettel mentioned this issue Apr 14, 2016

cannot allocate vector of length 1790997466 / loadRcppModules / loadModule - new windows toolchain #462

Closed

eddelbuettel added the documentation label May 6, 2016

eddelbuettel self-assigned this May 6, 2016

coatless mentioned this issue Jul 16, 2016

Documentation Update / Issue Tracker Cleaning #506

Closed

43 tasks

coatless mentioned this issue Mar 28, 2017

Adds Known Issues section to Rcpp FAQ (closes #628, #563, #552, #460, #419, and #251) #661

Merged

eddelbuettel added a commit that referenced this issue Mar 31, 2017

Merge pull request #661 from coatless/feature/faq-init-known-issues

886f5df

Adds Known Issues section to Rcpp FAQ (closes #628, #563, #552, #460, #419, and #251)

eddelbuettel closed this as completed Mar 31, 2017

krlmlr mentioned this issue Jan 18, 2018

Calling vector constructors with R_xlen_t or size_t values > 2³¹ fails in 64-bit R on Windows #804

Closed

marcosci mentioned this issue Nov 5, 2018

Error: requested size is too large r-spatialecology/landscapemetrics#52

Closed

kendonB mentioned this issue Apr 22, 2019

digest can't detect small change to large dataset eddelbuettel/digest#97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long vector Rcpp function fails on Windows server, C function works #460

Long vector Rcpp function fails on Windows server, C function works #460

kendonB commented Apr 13, 2016

thirdwing commented Apr 13, 2016

thirdwing commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

thirdwing commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

thirdwing commented Apr 13, 2016

kevinushey commented Apr 13, 2016

kevinushey commented Apr 13, 2016

thirdwing commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

thirdwing commented Apr 13, 2016

kevinushey commented Apr 14, 2016

thirdwing commented Apr 14, 2016

coatless commented Mar 26, 2017

coatless commented Mar 31, 2017

Long vector Rcpp function fails on Windows server, C function works #460

Long vector Rcpp function fails on Windows server, C function works #460

Comments

kendonB commented Apr 13, 2016

thirdwing commented Apr 13, 2016

thirdwing commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

thirdwing commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

thirdwing commented Apr 13, 2016

kevinushey commented Apr 13, 2016

kevinushey commented Apr 13, 2016

thirdwing commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

eddelbuettel commented Apr 13, 2016

kendonB commented Apr 13, 2016

thirdwing commented Apr 13, 2016

kevinushey commented Apr 14, 2016

thirdwing commented Apr 14, 2016

coatless commented Mar 26, 2017

coatless commented Mar 31, 2017