Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upgc() race in data.table with R-devel #2882
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2882 +/- ##
==========================================
+ Coverage 93.47% 93.47% +<.01%
==========================================
Files 61 61
Lines 12355 12356 +1
==========================================
+ Hits 11549 11550 +1
Misses 806 806
Continue to review full report at Codecov.
|


Closes #2866
Closes #2767
The approach of this PR is to ensure ALTREP vectors are not allowed as columns in a DT. We like and benefit from ALTREP very much in R code such as
[.data.tablewhere sequence vectors are used a lot. But as columns in a data.table, ALTREPs are not so appropriate. Internals like:=assign by reference are rewritten on the basis of columns already being materialized (expanded).setDT()now expands ALTREP columns. The reproducible examples usedsetDT()to create the test data.table becausedata.table()already expanded ALTREP columns (by happy accident) inCcopyNamedInList. That function now checks for ALTREP just in case (as well as MAYBE_REFERENCED as before) just to be safe.Luke said that ALTREPs may in future be more than just sequence vectors; e.g. distributed arrays that cannot be expanded. But in that case, data.table will need code changes anyway to deal with such arrays. If and when that happens, the expansion will fail on such ALTREPs which is reasonable, graceful behaviour; much better than a subsequent gc race at least.
The above is long-term approach with no plans to change; i.e. data.table is unlikely to ever support ALTREP columns in a data.table.
What is short term though, is in all the parallel regions this PR will add checks that no SEXP being used inside the parallel region are ALTREPs (and fail if so). In R-devel, there's only a problem with INTEGER(), REAL() etc on ALTREP vectors. Those functions are still thread-safe on regular vectors, currently. This is a short term solution in the interests of getting an update to CRAN which is intermittently in error state on R-devel due to the gc race. In future we will still take all API use out of parallel regions as Luke suggested. That involves a new approach around the parallel regions which will take time to work through.
setDT()expands ALTREPs (data.table()already did)Add checks before all parallel regions that no ALTREPs are present.
All files with parallel regions :
The following were moved to follow up PR #2899 : between.c, freadR.c, fwrite.c, fsort.c, reorder.c