-
Notifications
You must be signed in to change notification settings - Fork 3
/
InformationForPackageAuthors.Rmd
388 lines (311 loc) · 17.9 KB
/
InformationForPackageAuthors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
---
title: "Information for package authors"
author: "Konrad Krämer"
output: html_document
classoption: twocolumn
vignette: >
%\VignetteIndexEntry{Information for package authors}
%\VignetteEngine{knitr::rmarkdown}
\usepackage{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
```
<style>
body {
text-align: justify}
</style>
```{css echo=FALSE}
/* Define a margin before h2 element */
h2 {
margin-top: 6em;
}
/* Define a margin after every first p elements */
p:first-of-type {
margin-bottom: 3em;
}
```
- [Guidance for Package Authors](#guidance-for-package-authors)
- [Naming rationalization ast2ast](#naming-rationalization-ast2ast)
- [Comparison of R and ETR code](#comparison-of-r-and-etr-code)
- [Internals of ETR](#internals-of-etr)
- [How to use the XPtr interface](#How-to-use-the-xptr-interface)
## Guidance for Package Authors
This section of the documentation describes how the external pointers to C++ function that are produced by *ast2ast* can be used in other packages. This information is intended for package authors who want to use the translated functions within their own code. Additionally, this section endeavors to present pertinent implementation details, providing valuable insights into the inner workings of the process.
## Naming rationalization ast2ast
The nomenclature of the package, ast2ast, is rooted in the abbreviation *ast*, signifying abstract syntax tree. Originally I planned to convert the abstract syntax tree of R to the C++ tree as the literature recommended for transpilers. However, through iterative refinement, a more optimal methodology emerged. The development incorporated an Expression Template Library in C++ known as ETR, meticulously crafted to mimic R. Thus, R code is translated into ETR code, which is subsequently compiled. The original ETR library is accessible at https://github.com/Konrad1991/ETR. It's imperative to note that the version integrated into ast2ast has undergone substantial enhancements, amplifying its efficacy and adaptability.
## Comparison of R and ETR code
Displayed below is a basic bubble sort function implemented in R on the left, juxtaposed with its ETR counterpart on the right. It is obvious that the two code snippets are quite similar. Remarkably, the overall structure of the R code remains unaltered. Instead, the substitution of individual functions with their ETR equivalents forms the crux of the transformation. \
In the C++ code, all functions are located in the *etr* namespace. Certain functions share identical names in both R and C++, such as the *length* function. To mitigate potential conflicts, these calls are modified to explicitly reference the etr namespace, resulting in expressions like *etr::length*. Other functions as for example *`:`* and *`[`* cannot be defined in C++ (at least not in the way they are used in R) therefore they are replaced by functions with new names e.g. *etr::colon* and *etr::subset*. \
Additionally, C++ necessitates explicit declaration of variable types. Within this example for all variables the type *sexp* is used. A detailed explanation is deferred to the subsequent section.
:::::::::::::: {.columns}
::: {.column width="35%"}
\
\
\
\
\
\
\
```{r, eval = FALSE, echo = TRUE, attr.source='.numberLines'}
bubbleSort <- function(a) {
size <- length(a)
for (i in 1:size) {
for (j in 1:(size - 1)) {
if (a[j] > a[j + 1]) {
temp <- a[j]
a[j] <- a[j + 1]
a[j + 1] <- temp
}
}
}
return(a)
}
```
:::
::: {.column width="65%"}
```{Rcpp, eval = FALSE, echo = TRUE, attr.source='.numberLines'}
// [[Rcpp::depends(ast2ast)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp2a)]]
#include "etr.hpp"
// [[Rcpp::export]]
SEXP getXPtr();
sexp bubbleSort(sexp a) {
sexp size;
sexp temp;
size = etr::length(a);
for (auto&i: etr::colon(etr::i2d(1), size)) {;
for (auto&j: etr::colon(etr::i2d(1), (size - etr::i2d(1)))) {;
if (etr::subset(a, j) > etr::subset(a, j + etr::i2d(1))) {;
temp = etr::subset(a, j);
etr::subset(a, j) = etr::subset(a, j + etr::i2d(1));
etr::subset(a, j + etr::i2d(1)) = temp;
};
};
};
return(a);
}
SEXP getXPtr () {
typedef sexp (*fct_ptr) (sexp a);
return Rcpp::XPtr<fct_ptr>(new fct_ptr(&bubbleSort));
}
```
:::
::::::::::::::
## Internals of ETR
### Fundamental Type in Expression Template Library R (ETR): Vec Class
The core type in the expression template library R (ETR) is a class called *Vec*. Presuming a foundational familiarity with classes and templates in C++, we embark on a detailed exploration of the design inherent in this class. The Vec class incorporates three templates, namely *T*, *R*, and *Trait*. In this context, the typename *T* signifies a fundamental data type, while *R* represents another class (more details forthcoming). The third template, *Trait*, plays a crucial role in endowing the class with identifiable properties as Vec during the compile-time phase.
In the majority of instances, the template *T* is instantiated as a *double*, referred to as numeric in the context of R types. On certain occasions, *T* is a *bool*, denoted as *logical* in the realm of R types. The typename *R* represents another class this can be either: Buffer, BorrowSEXP, Borrow, Subset, UnaryOperation or BinaryOperation. Each of these classes contributes distinct functionalities and features to the *Vec* class. It is recommended to directly use only the classes Buffer, BorrowSEXP and Borrow. The Subset class is yielded by calls to the function subset. Whereas UnaryOperation or BinaryOperation are produced by invoking functions like *sin*, *cos*, *+* and *-*. \
Under the hood, regardless whether they represent dataframes, vectors, functions or any other R type, all variables are of type *SEXP* in R.
As an allusion to this fact the basic type used for most variables is called *sexp*. Where *sexp* is an alias for *Vec<etr::BaseType>* (where *etr::BaseType* itself is an alias for *double*). The *sexp* class uses an instance of class *Buffer<T>* which manages the memory of the variable. To do this a raw pointer of the form *T\** is used. Thus, all objects are located at the heap. \
As mentioned in other sections of the documentation an R function can be either translated to an R function which calls a C++ function or to an external pointer to a C++ function. If an R function is desired as output form the function expects that all arguments passed to it are of type *SEXP* and furthermore that the function always returns an element of *SEXP*. Due to this a class called *WrapperSEXP* was introduced which is equivalent to *Vec<etr::BaseType, etr::BorrowSEXP>*. Whenever, you want to use a *SEXP* object in C++ it is necessary to construct an object of type *WrapperSEXP* and assign the *SEXP* object to it (see the code snippet below). Notably, the *WrapperSEXP* class works with the original memory section as long as it is not increased above the size of the *SEXP* object. Thus, R objects can be changed as side effect, which differs from the normal R behaviour but increases the performance. Moreover, each object that is returned is converted into a *SEXP* object. This is done by the function *cpp2R*. If no return statement is found the command *return(R_NilValue);* is added to the code. Thereby, the function returns *NULL*.
```{Rcpp, eval = FALSE, echo = TRUE}
SEXP handleSEXP(SEXP aSEXP) {
WrapperSEXP a; a = aSEXP;
/* rest of code*/
return(etr::cpp2R(a));
}
```
### The XPtr interface
In contrast if the external pointer interface is used a broader range of argument types can be used. However, *SEXP* cannot be used. The user who translates the function can choose between: *sexp*, *double*, *ptr_vec*, *ptr_mat* and *BorrowPtr*. This arguments can be either passed by copy or by reference. This is handled by the boolean *reference* argument in R.
For ptr_vec or ptr_mat, two (for vectors) and three (for matrices) arguments are passed to the function. The first one is the pointer pointing to the start of the memory area. In case of *vec* the second argument is of type int defining the size of the memory section. If it is a matrix two *int* arguments are used defining the number of rows and columns respectivly. In case the values should be copied the elements are passed to a *sexp* object which copies the memory section to a new one which is than handled by the class. In contrast if the elements are passed by reference an object of type *BorrowPtr* is used. *BorrowPtr* is an alias for *Vec<etr::BaseType, etr::Borrow>*. The class *BorrowPtr* behaves in the same way as a *sexp* object except that it cannot be resized. Thus, it behaves as a wrapper around the raw pointer providing the convenience of the library. Moreover, the memory section is not deleted at the end of the function. Therefore, the user which passes the pointer and the size arguments has to handle the memory itself. The advantage is that something can be written in the memory section as side effect of the function which is very efficient.
#### Using raw pointers
```{Rcpp, eval = TRUE, echo = TRUE}
// [[Rcpp::depends(ast2ast)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp2a)]]
#include "etr.hpp"
sexp ptrVecCopy(etr::BaseType* a_double_ptr , int a_int_size) {
sexp a(a_double_ptr,a_int_size);
for (auto&i: etr::colon(etr::i2d(1), etr::length(a))) {
etr::subset(a, i) = i * etr::i2d(2);
}
return(a);
}
sexp ptrVecReference(etr::BaseType* a_double_ptr , int a_int_size) {
BorrowPtr a(a_double_ptr,a_int_size);
for (auto&i: etr::colon(etr::i2d(1), etr::length(a))) {
etr::subset(a, i) = i * etr::i2d(2);
}
return(a);
}
// [[Rcpp::export]]
void callFct () {
int size = 3;
double* ptr = new double[size];
for(int i = 0; i < size; i++) ptr[i] = 0.0;
ptrVecCopy(ptr, size);
for(int i = 0; i < size; i++) Rcpp::Rcout << ptr[i] << "\t";
Rcpp::Rcout << std::endl;
ptrVecReference(ptr, size);
for(int i = 0; i < size; i++) Rcpp::Rcout << ptr[i] << "\t";
delete[] ptr;
}
```
```{r, eval = TRUE, echo = TRUE}
callFct()
```
#### Using Borrow
Instead of passing the raw pointer directly it is also possible to pass it wrapped in a class BorrowPtr. As a side note it does not make a difference whether the argument is passed by reference or not. Furthermore, BorrowPtr cannot be returned from a function. Here is an example for this:
```{Rcpp, eval = TRUE, echo = TRUE}
// [[Rcpp::depends(ast2ast)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp2a)]]
#include "etr.hpp"
void otherFct(BorrowPtr a) {
subset(a, true) = 3.14;
}
// [[Rcpp::export]]
void callFct () {
int size = 3;
double* ptr = new double[size];
for(int i = 0; i < size; i++) ptr[i] = 0.0;
BorrowPtr a(ptr, size);
otherFct(a);
etr::print(a);
delete[] ptr;
}
```
```{r, eval = TRUE, echo = TRUE}
callFct()
```
#### Using sexp instead of pointers
Notably, the same thing can be achieved with a *sexp* object without using raw pointers. Here the code to illustrate it:
```{Rcpp, eval = TRUE, echo = TRUE, cache = TRUE}
// [[Rcpp::plugins(cpp2a)]]
// [[Rcpp::depends(RcppArmadillo, ast2ast)]]
#include "etr.hpp"
void byRef(sexp& a) {
for (auto&i: etr::colon(etr::i2d(1), etr::length(a))) {
etr::subset(a, i) = i * etr::i2d(2);
}
}
// [[Rcpp::export]]
void call_fct() {
sexp a = etr::vector_numeric(3);
etr::print(a);
byRef(a);
etr::print(a);
}
```
```{r, eval=TRUE, echo=TRUE}
call_fct()
```
### Thread safety
The class *Vec* is thread safe in the sense that no memory is associated with functions or global variables. Moreover, almost no static methods are defined. An exception are the methods used for comparison. Notably, these comparison methods (*==*, *<=*, *>=*, *<*, *>* and *!=*) except two doubles (by copy) as arguments and return a bool. Thus, it is still possible to use instances of the class in parallel. However, the user has to take care that only one thread edits an object at a time.
### Conversions
A *sexp* can be converted to Rcpp::NumericVectors, Rcpp::NumericMatrices, arma::vec or to a arma::mat.
### Rcpp Interface
In the last section, the usage of *ast2ast* was described. However, only *sexp* variables were defined. Which are most likely not used in your package. Therefore interfaces to common libraries are defined. First of all, *ast2ast* can communicate with Rcpp which alleviates working with the library substantially. The code below shows that it is possible to pass a *sexp* object to a variable of type *NumericVector* or *NumericMatrix* and *vice versa*. Here, the data is always copied.
```{Rcpp, eval = TRUE, cache = TRUE}
// [[Rcpp::depends(ast2ast, RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>
// [[Rcpp::plugins(cpp2a)]]
using namespace Rcpp;
#include <etr.hpp>
using namespace etr;
// [[Rcpp::export]]
void fct() {
// NumericVector to sexp
NumericVector a{1, 2};
sexp a_ = a;
print(a_);
// sexp to NumericVector
sexp b_ = coca(3, 4);
NumericVector b = b_;
Rcpp::Rcout << b << std::endl;
// NumericMatrix to sexp
NumericMatrix c(3, 3);
sexp c_ = c;
print(c_);
// sexp to NumericMatrix
sexp d_ = matrix(colon(1, 16), 4, 4);
NumericMatrix d = d_;
Rcpp::Rcout << d << std::endl;
}
```
```{r, eval = TRUE, cache = TRUE}
trash <- fct()
```
### RcppArmadillo Interface
Besides Rcpp types, *sexp* objects can transfer data to *RcppArmadillo* objects and it is also possible to copy the data from *RcppArmadillo* types to *sexp* objects using the operator *=*. The code below shows that it is possible to pass a *sexp* object to a variable of type *vec* or *mat* and *vice versa*. Here the data is always copied.
```{Rcpp, eval = TRUE, cache = TRUE}
// [[Rcpp::depends(ast2ast, RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>
// [[Rcpp::plugins(cpp2a)]]
using namespace arma;
#include <etr.hpp>
using namespace etr;
// [[Rcpp::export]]
void fct() {
// vec to sexp
arma::vec a(4, fill::value(30.0));
sexp a_ = a;
print(a_);
// sexp to vec
sexp b_ = coca(3, 4);
vec b = b_;
b.print();
// mat to sexp
mat c(3, 3, fill::value(31.0));
sexp c_ = c;
print(c_);
// sexp to mat
sexp d_ = matrix(colon(1, 16), 4, 4);
mat d = d_;
d.print();
}
```
```{r, eval = TRUE, cache = TRUE}
trash <- fct()
```
## How to use the XPtr interface
In this paragraph, a basic example demonstrates how to write the R code, translate it to an external pointer and call it from C++. Particular emphasis is placed on the C++ code. First of all, the R function is defined which accepts one argument called *a*, adds two to *a* and stores it into *b*. The variable *b* is returned at the end of the function. The R function called *f* is translated to an external pointer to the *C++* function.
```{r, eval = TRUE, cache = TRUE}
f <- function(a) {
b <- a + 2
return(b)
}
library(ast2ast)
f_cpp <- translate(f, output = "XPtr", types_of_args = "sexp", return_type = "sexp")
```
The C++ function depends on RcppArmadillo and ast2ast therefore the required macros and headers were included. Moreover, *ETR* requires std=c++20 therefore the corresponding plugin is added. The function *getXPtr* is defined which is the function available in R. Subsequently, the actual translated code is depicted. The function *f* returns a *sexp* and gets one argument of type *sexp* called *a*. The body of the function looks almost identical to the R function. Except that the variable *b* is defined in the first line of the body with the type *sexp*. The function *i2d* converts an integer to a double variable. This is necessary since C++ would identify the *2* as an integer which is not what the user wants in this case.
```{Rcpp, eval = FALSE}
// [[Rcpp::depends(ast2ast, RcppArmadillo)]]
// [[Rcpp::plugins(cpp2a)]]
#include <etr.hpp>
// [[Rcpp::export]]
SEXP getXPtr();
sexp f(sexp a) {
sexp b;
b = a + etr::i2d(2);
return(b);
}
SEXP getXPtr () {
typedef sexp (*fct_ptr) (sexp a);
return Rcpp::XPtr<fct_ptr>(new fct_ptr(&f));
}
```
Afterwards, the translated R function has to be used in C++ code. This would be your package code for example. First, the macros were defined for *RcppArmadillo* and *ast2ast*. Subsequently, the necessary header files were included. As already mentioned *ast2ast* requires std=c++20 thus the required plugin is included. To use the function, it is necessary to dereference the pointer. The result of the dereferenced pointer has to be stored in a function pointer. Later the function pointer can be used to call the translated function. Therefore, a function pointer called *fp* is defined. It is critical that the signature of the function pointer matches the one of the translated function. Perhaps it would be a good idea to check the R function or the produced C++ code before it is translated. After defining the function pointer, a function is defined which is called later by the user (called *call_package*). This function accepts the external pointer. Within the function body, a variable *f* is defined of type *fp* and *inp* is assigned to it. Next, a *sexp* object called *a* is defined which stores a vector of length 3 containing 1, 2 and 3. The function *coca* is equivalent to the *c* function R. Afterwards *a* is printed. Followed by the call of the function *f* and storing the result in *a*. The variable *a* is printed again to show that the values are changed according to the code defined in the R function.
```{Rcpp, eval = TRUE, cache = TRUE}
// [[Rcpp::depends(RcppArmadillo, ast2ast)]]
// [[Rcpp::plugins("cpp2a")]]
#include "etr.hpp"
typedef sexp (*FP)(sexp a);
// [[Rcpp::export]]
void call_package(Rcpp::XPtr<FP> inp) {
FP f = *inp;
sexp a = etr::coca(1, 2, 3);
etr::print(a);
a = f(a);
etr::print("a is now:");
etr::print(a);
}
```
The user can call now the package code and pass the R function to it. Thus, the user only has to install the compiler or Rtools depending on the operating system. But it is not necessary to write the function in Rcpp.
```{r, eval = TRUE, cache = TRUE}
call_package(f_cpp)
```