Skip to content

Provide R < 4.6 entry points for older versions of data.table in atime tests#7731

Open
aitap wants to merge 12 commits intomasterfrom
atime-backports
Open

Provide R < 4.6 entry points for older versions of data.table in atime tests#7731
aitap wants to merge 12 commits intomasterfrom
atime-backports

Conversation

@aitap
Copy link
Copy Markdown
Member

@aitap aitap commented Apr 29, 2026

Fixes: #7693

With these changes, atime::atime_pkg() passes for me on R-4.6.0 and R-devel.

aitap added 11 commits April 5, 2026 16:59
At least the following entry points were used by older data.table
versions and need to be available to reproduce benchmarks:

    void SETLENGTH(SEXP x, R_xlen_t n);
    R_xlen_t TRUELENGTH(SEXP x);
    void SET_TRUELENGTH(SEXP x, R_xlen_t n);
    void SET_GROWABLE_BIT(SEXP);
    int LEVELS(SEXP);
    int NAMED(SEXP);
    #define isFrame(x) isDataFrame(x)
    #define GetOption(x, none) GetOption1(x)
Since there's no more USE_RINTERNALS, STRING_PTR_RO is no longer a macro.
Older versions of data.table access elements in [LENGTH(x),
max(LENGTH(x), TRUELENGTH(x))), which is not a read overflow but
forbidden nowadays. Skip the length checks.
Since we USE_RINTERNALS, we get old macros for MAYBE_SHARED(),
NO_REFERENCES() from Rinternals.h that rely on REFCNT() existing.
@github-actions
Copy link
Copy Markdown

  • HEAD=atime-backports slower P<0.001 for memrecycle regression fixed in #5463
    Comparison Plot

Generated via commit a3d2921

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 6 minutes and 32 seconds
Installing different package versions 11 minutes and 49 seconds
Running and plotting the test cases 5 minutes and 23 seconds

@tdhock
Copy link
Copy Markdown
Member

tdhock commented Apr 29, 2026

Wow Ivan!
This looks great!!
I’m impressed!!
Thanks very much.

Copy link
Copy Markdown
Member

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing before merging, can you please add a comment to explain how you landed on this result? (I see several commits in this branch, suggesting that some trials did not work. Maybe explain what did not work and what you learned from that? Or if it is too complex for a comment, provide a link to a blog etc? Or do you think the explanation not necessary?)

@aitap
Copy link
Copy Markdown
Member Author

aitap commented Apr 29, 2026

The process was mostly try to run tests, fix an error (following the approach of #6640), repeat. I had to backtrack once after I defined VECTOR_PTR(x) as SEXPPTR(x) and an earlier version of data.table failed to compile because it didn't have SEXPPTR yet, but I overwrote that commit before pushing.

Compilation or linking errors were easiest to fix. For example, SETLENGTH does not exist, data.table.so compiles with warnings and links but fails to load. Solution: copy enough definitions from Defn.h to implement the function. Another example: VECTOR_PTR does not exist. The code uses VECTOR_PTR(foo)[bar]. The implicit declaration provided by the C compiler is int VECTOR_PTR(SEXP), and dereferencing an int like an array/pointer doesn't make sense, which results in a compiler error. Solution: provide a VECTOR_PTR.

An interesting workaround is the -DSTRING_PTR_RO=STRING_PTR_RO flag in Makevars. There used to be a test for R ≥ 3.5 in src/data.table.h: if STRING_PTR_RO is not defined as a macro, use the preprocessor to remap it to STRING_PTR. It worked well as long as our use of USE_RINTERNALS caused R to define some internal entry points as macros. Modern versions of R only offer STRING_PTR_RO as a function, not a macro; thus the code tries to rename STRING_PTR_RO to STRING_PTR, which no longer exists. Solution: satisfy the test for the macro without breaking anything else.

Some problems were only visible as compilation warnings. For example, int OBJECT(SEXP) still remains exported from R despite not being declared, so when data.table tries to call OBJECT(foo), the implicit function declaration provided by the C compiler is enough to produce working code. It is important to notice and fix them as well, because C uses the int type for the return value of undeclared functions, and an int foo(...) in place of SEXP foo(...) may cause very confusing crashes after the code successfully compiles and links.

Another interesting workaround is #define REFCNT(x) NAMED(x). An old version of data.table defines USE_RINTERNALS, and Rinternals.h in R-4.6.0 still contains an #ifdef USE_RINTERNALS branch that defines MAYBE_SHARED() as a macro that calls REFCNT(). REFCNT() is still exported, but no longer declared, so the code compiles with a warning but links and works. Since we already have NAMED() because some other data.table version demanded it, it's easy enough to silence this compilation warning as well.

An old version of data.table did not include Rversion.h, so I had to include it a second time.

@MichaelChirico
Copy link
Copy Markdown
Member

This is great! I wonder if other packages would benefit from something similar. Would adding this to {backports} or as a standalone package make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

performance tests error with R ⩾4.6.0

3 participants