Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in Rcpp::sample #1074

Closed
kkrismer opened this issue Apr 25, 2020 · 10 comments
Closed

Memory leak in Rcpp::sample #1074

kkrismer opened this issue Apr 25, 2020 · 10 comments

Comments

@kkrismer
Copy link

First of all, thanks for the great work, Rcpp is awesome!

I've noticed that memory usage blows up when repeatedly calling Rcpp::sample().

Running this minimal reproducible example allocates multiple GB of memory:

Rcpp::cppFunction("void call_sample_n_times(const int n) {
        for(int i(0); i < n; ++i) {
            Rcpp::sample(15, 5);
        }
    }")

set.seed(1)
call_sample_n_times(10^8)

The expected memory footprint of call_sample_n_times(10^8) is pretty much zero. Note that the parameters to Rcpp::sample do not depend on n.

Rcpp::RcppArmadillo::sample does not exhibit this behavior. The equivalent RcppArmadillo code uses almost no memory, as expected:

Rcpp::IntegerVector vec(Rcpp::seq_len(15))
for(int i(0); i < n; ++i) {
    Rcpp::RcppArmadillo::sample(vec, 5, false);
}

Tested on Windows 10 and Ubuntu 18.04.

Ubuntu 18.04:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3    Rcpp_1.0.4.6

Windows 10:

> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] usethis_1.6.0

loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3    fs_1.4.1       glue_1.4.0     Rcpp_1.0.4.6   rlang_0.4.5
@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 25, 2020

Thanks for the report but ... I'm not sure. You are allocated 10^8 as R objects in the context of a memory-managed dynamic interpreted environment. These are not supposed to disappear at end of scope as in C/C++; it simply is different by design.

@eddelbuettel
Copy link
Member

Then again, something seems to stick around longer indeed. There are two calls to R_alloc() which per Section 6.1.1 of Writing R Extensions should be transient. You could try and replace those with direct IntegerVector x(n) use and see if that makes a difference.

@kkrismer
Copy link
Author

The behavior does not depend on Rcpp::cppFunction. I just used that for the sake of brevity. Same story in stand-alone C++ files.

@eddelbuettel
Copy link
Member

That is not something I implied.

Scope in C++ is curlies, but R is still reference counted. Different story. Either an R function gets created which gets called by .Call(). Per WRE R_alloc() should free, maybe it doesn't? I see no user memory use.

Feel free to debug if it is an itch that needs scratching. Help is always welcome :)

@eddelbuettel
Copy link
Member

From a very casual first glance this seems to make a difference. Can you try it too, please?

modified   inst/include/Rcpp/sugar/functions/sample.h
@@ -348,7 +348,8 @@ inline Vector<INTSXP> EmpiricalSample(int n, int size, bool replace, bool one_ba
         return ans;
     }
 
-    int* x = reinterpret_cast<int*>(R_alloc(n, sizeof(int)));
+    //int* x = reinterpret_cast<int*>(R_alloc(n, sizeof(int)));
+    IntegerVector x(n);
     for (int i = 0; i < n; i++) {
         x[i] = i;
     }
@@ -378,7 +379,8 @@ inline Vector<RTYPE> EmpiricalSample(int size, bool replace, const Vector<RTYPE>
         return ans;
     }
 
-    int* x = reinterpret_cast<int*>(R_alloc(n, sizeof(int)));
+    //int* x = reinterpret_cast<int*>(R_alloc(n, sizeof(int)));
+    IntegerVector x(n);
     for (int i = 0; i < n; i++) {
         x[i] = i;
     }

@kkrismer
Copy link
Author

Already did. This resolves the issue. Thanks! Should I create a PR?

@eddelbuettel
Copy link
Member

No, I got as it as I have a branch up.

Any idea why that goes belly up? R_alloc() should return the temp memory. Very strange.

Anyway, good catch! Thanks for that.

@kevinushey
Copy link
Contributor

The memory is also cleared up for me if I just manually call gc() after running the example ... so perhaps R is just holding onto the memory but not explicitly freeing it immediately?

@kevinushey
Copy link
Contributor

Still, makes sense to be deterministic here and just use an IntegerVector (preferably with no_init() to avoid the cost of zero-initialization here since we don't need it)

@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 25, 2020

I tried the usual for (i in 1:N) {gc(); Sys.sleep(0.1)} for various n and htop showed it as R holding on. Very weird. (Machine moderately idle, no other demand for memory at the time.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants