-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PreserveStorage doesn't clean up ref counters, leading to unnecessary object copies #1203
Comments
Thanks for taking the time to write all this down. As I hinted when you first raised this at StackOverflow, this is mostly an R-internals discussion you should have with the likes of Luke Tierney over on r-devel. Rcpp does not do R's memory management, and does not call Now, is it possible that we fail to incerment or decrement a usage counter? Quite possibly. Does it matter? I don't think so, but I could of course be wrong -- and would be happy to be convinced otherwise. Lastly, R's reference counting is both somewhat robust and under what you may call permanent rewrite. We have the basic framework for ALTREP, and Luke Tierney sometimes hints that, time permitting, the R internals may get another update / refactoring. All that is worth keeping in mind. |
Firstly, apologies if I've come across as inflammatory. That certainly wasn't my intention. I've just written out the facts of what I'm seeing. The way things stand, SET_TAG in Rcpp_PreciousPreserve increments ref counter on object. I can see that there's a case for picking up with Luke, because this is something that garbage collection could, and perhaps should, detect and resolve in the R internals. I'll do just that. And I can see why you could legitimately wash your hands of it and just say "i've removed the pairlist element, anything else is part of what gc should be doing". But I think in this case we could be a better citizen and ensure that the Release code does the cleanup by unsetting TAG. The reason this matters is the copy that gets created on modify after returning control to R. But it shouldn't strictly be necessary for me to do that to avoid the performance penalty. Thanks for engaging with me on this btw. I've been coding for an age, but never in the open source arena until recently. The time and effort you, and others, put in to this is awesome. And the fact that I can talk directly to you about this is something I'd never have imagined 25yrs ago. |
Howdy -- thanks for keeping this very constructive. It's been a while since I look at it but as I recall it (which may well be faulty) the refcounter is not exactly 'linear'. The net effect between values n and n+1 is non-exactly meaningful. I think the scheme changed a little in the last few years but what matters really is (again, my recollection, may be wrong) is values 'none', 'one', 'more'. I think what I would lean towards is trying to do something along the lines of your hypthesis (i.e. setting/unsetting the tag) with plain C(++) code (no Rcpp) and seeing if R 'picks it up as you expect' and releases object. If you then mirror that with Rcpp and we do something that prevents then we do have an issue. But until we actually show that objects go away on the R side I am just not so sure we have something here. And I with you: the scope of open source and what / and how we can do things these is so cool compared to where we where mid-nineties. Anyway. gotta run now... |
Refcounting in R has been updated more recently to be a genuine counter, moving away from the NAMED 0/1/more that it used to be. As I said previously, until that change happened I don't think it would have been possible to spot this, since it relies on the counter having true decrement support. What you suggest with pure C code is, I believe, precisely what I've done in the code sample above. I've verified that I am able to modify-in-place with the fix enabled, which can only happen if ref count is genuinely 1 as expected. Likewise, verified I get a copy with fix disabled which replicates current Rcpp behaviour. I trust that the R garbage collection is working correctly once we get the ref count to decrement to where it should be. If there's a clearer way for me to present the findings then I can do so. Just let me know what you need. |
Right. And some of our code may predate that so there may be room for improvement.
Yes. Still, there may be room for improvement.
I think I would feel more compelled if we got R to show, say, "see here 1000 calls under the old scheme and X mb remain allocated" and magically they would not be with tags set and whatnot. But the memory management inside R is a little more misterious in its ways... Anyway, I should find some time and study your proposal some more. |
This is good stuff, thanks for putting it together, @mb147. I agree that we should add a That said, there is no memory leak. The garbage collector knows better. You can quickly see that by creating a big object, passing it through your empty And that said, still Dirk's advice applies: avoid going back and forth. There are just so many places in which the ref count could be updated (legitimately or not). Even after implementing this fix, you are doomed to check and inspect everything you are doing, and you'll never be entirely sure. So it's best if you just go full C++ and then return to R when you are done. |
Thanks both for your input on this. I'm feeling like a doofus for not having done the obvious check of looping a large object. I've taken the liberty of renaming this issue to remove the reference to memory leak, which was disproved - so there's a better history of what this issue really was. |
Reverse dependency check is almost done so about to merge this -- this is making things better so many thanks again for very diligent work here! |
Brief background
Rcpp keeps a precious pairlist when PreserveStorage is used.
Releasing an element of that list leaves a ref behind, creating a later memory leak.
As of v4 (certainly) (but possibly as early as Nov 2019), R moved from the NAMED mechanism to ref counting.
This has exposed an issue that likely went undetected before then, because object copies were more common.
https://stat.ethz.ch/pipermail/r-devel/2019-November/078728.html
Relates to my recent question on SO: https://stackoverflow.com/questions/71417422/
Proposed fix
In Rcpp, barrier.cpp, Line 122, add:
SET_TAG(token, R_NilValue);
Use case description
Reproducible example
See below, which does the following:
System detail
R: 4.1.2
Windows 11
Rcpp: 1.0.8
Runnable code
Code output
The text was updated successfully, but these errors were encountered: