-
Notifications
You must be signed in to change notification settings - Fork 10
/
z-advanced-topic-running-r-or-shell-code-in-nix-from-r.Rmd
515 lines (381 loc) · 16.6 KB
/
z-advanced-topic-running-r-or-shell-code-in-nix-from-r.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
---
title: "z - Advanced topic: Running R or Shell Code in Nix from R"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{z-advanced-topic-running-r-or-shell-code-in-nix-from-r}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r, include=FALSE}
library(rix)
```
## **Testing code in evolving software dependency environments with confidence**
Adhering to sound versioning practices is crucial for ensuring the
reproducibility of software. Despite the expertise in software engineering, the
ever-growing complexity and continuous development of new, potentially
disruptive features present significant challenges in maintaining code
functionality over time. This pertains not only to backward compatibility but
also to future-proofing. When code handles critical production loads and relies
on numerous external software libraries, it's likely that these dependencies
will evolve. Infrastructure-as-code and other DevOps principles shine in
addressing these challenges. However, they may appear less approachable and more
labor-intensive to set up for the average R developer.
Are you ready to test your custom R functions and system commands in a a
different environment with isolated software builds that are both pure at build
and at runtime, without leaving the R console?
Let's introduce `with_nix()`. `with_nix()` will evaluate custom R code or shell
commands with command line interfaces provided by Nixpkgs in a Nix environment,
and thereby bring the read-eval-print-loop feeling. Not only can you evaluate
custom R functions or shell commands in Nix environments, but you can also bring
the results back to your current R session as R objects.
## **Two operational modes of computations in environments: 'System-to-Nix' and 'Nix-to-Nix'**
We aim to accommodate various use cases, considering a gradient of declarativity
in individual or sets of software environments based on personal preferences.
There are two main modes for defining and comparing code running through R and
system commands (command line interfaces; CLIs)
1. **'System-to-Nix'** environments: We assume that you launch an R session
with an R version defined on your host operating system, either from the
terminal or an integrated development environment like RStudio. You need to
make sure that you actively control and know where you installed R and R
packages from, and at what versions. You may have interactively tested that
your custom function pipeline worked for the current setup. Most
importantly, you want to check whether you get your computations running and
achieve identical results when going back to a Nix revision that represent
either newer or also older versions of R and package sources.
2. **'Nix-to-Nix'** environments: Your goals of testing code are the same as in
1., but you want more fine-grained control in the source environment where
you launch `with_nix()` from, too. You are probably on the way of getting a
passionate Nix user.
## **Case study 1: Evolution of base R**
Carefully curated software improves over time, so does R. We pick an example
from the R changelog, the following [literal entry in R
4.2.0](https://cran.r-project.org/doc/manuals/r-release/NEWS.html):
- "`as.vector()` gains a `data.frame` method which returns a simple named
list, also clearing a long standing 'FIXME' to enable
`as.vector(<data.frame>, mode ="list")`. This breaks code relying on
`as.vector(<data.frame>)` to return the unchanged data frame."
The goal is to illustrate this change in behavior from R versions 4.1.3 and
before to R versions 4.2.0 and later.
### Setting up the (R) software environment with Nix
We first create a isolated directory to prepare for a Nix environment, and write
a custom `.Rprofile` file as well. By default, the R derivation in Nixpkgs
includes the user library at first position (returned by `.libPaths()`). Startup
code written to this local `.Rprofile` will make sure that the system's user
library (`R_LIBS_USER`) is excluded from library paths to load packages from.
This is nice to install packages from a Nix-R session environment in ad-hoc and
interactive manner. However, this comes at the cost that one needs be aware of
potential run-time pollution of packages outside the pool of paths per package
from the nix store. On macOS, we experienced a high-chance of segmentation
faults when accidentally loading packages and linked system libraries from the
system's user library, to give an example. `rix::rix_init()` writes a configuration
that takes care of runtime-pure R package libraries from declaratively defined
Nix builds. Additionally, it modifies `.libPaths()` in the running R session.
```{r parsermd-chunk-1}
library("rix")
path_env_1 <- file.path(".", "_env_1_R-4-1-3")
rix_init(
project_path = path_env_1,
rprofile_action = "overwrite",
message_type = "simple"
)
list.files(path = path_env_1, all.files = TRUE)
```
This will generate the following `.Rprofile` file.
```{r parsermd-chunk-2, echo = FALSE}
cat(readLines(file.path(path_env_1, ".Rprofile")), sep = "\n")
```
Next, we write a `default.nix` file containing Nix expressions that pin R
version 4.2.0 from Nixpkgs.
```{r parsermd-chunk-3}
rix(
r_ver = "4.1.3",
overwrite = TRUE,
project_path = path_env_1
)
```
The following expression is written to default.nix in the subfolder
`./_env_1_R-4-1-3/`.
```{r parsermd-chunk-4, echo = FALSE}
cat(readLines(file.path(path_env_1, "default.nix")), sep = "\n")
```
### Defining and interactively testing custom R code with function(s)
We know have set up the configuration for R 4.1.3 set up in a `default.nix` file
in the folder `./_env_1_R-4-1-3`. Since you are sure you are using an R version
higher 4.2.0 available on your system, you can check what that
`as.vector.data.frame()` S3 method returns a list.
```{r parsermd-chunk-5}
df <- data.frame(a = 1:3, b = 4:6)
as.vector(x = df, mode ="list")
```
This is is different for R versions 4.1.3 and below, where you should get an
identical data frame back.
### Run functioned up code and investigate results produced in pure Nix Rsoftware environments
To formally validate in a 'System-to-Nix' approach that the object returned from
`as.vector.data.frame()` is before `R` \< 4.2.0, we define a function that runs
the computation above.
```{r parsermd-chunk-6}
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_system_1 <- df_as_vector(x = df))
```
Then, we will evaluate this test code through a `nix-shell` R session. This adds
both build-time and run-time purity with the declarative Nix software
configuration we have made earlier. `with_nix()` leverages the following
principles under the hood:
1. **Computing on the Language:** Manipulating language objects using code.
2. **Static Code Analysis:** Detecting global objects and package environments
in the function call stack of 'expr'. This involves utilizing essential
functionality from the 'codetools' package, which is recursively iterated.
3. **Serialization of Dependent R objects:** Saving them to disk and
deserializing them back into the R session's RAM via a temporary folder.
This process establishes isolation between two distinct computational
environments, accommodating both 'System-to-Nix' and 'Nix-to-Nix'
computational modes. Simultaneously, it facilitates the transfer of input
arguments, dependencies across the call stack, and outputs of `expr` between
the Nix-R and the system's R sessions.
This approach guarantees reproducible side effects and effectively streams
messages and errors into the R session. Thereby, the {sys} package facilitates
capturing standard outputs and errors as text output messages. Please be aware
that `with_nix()` will invoke `nix-shell`, which will itself run `nix-build` in
case the Nix derivation (package) for R version 4.1.3 is not yet in your Nix
store. This will take a bit of time to get the cache. When you use the
`exec_mode == "non-blocking"` argument of `with_nix()`, you will see in your
current R console the specific Nix paths that will be downloaded and copied into
your Nix store automatically.
```{r parsermd-chunk-7, eval = FALSE}
# now run it in `nix-shell`; `with_nix()` takes care
# of exporting global objects of `df_as_vector` recursively
out_nix_1 <- with_nix(
expr = function() df_as_vector(x = df), # wrap to avoid evaluation
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1,
message_type = "simple" # you can do `"verbose"`, too
)
# compare results of custom codebase with indentical
# inputs and different software environments
identical(out_system_1, out_nix_1)
# should return `FALSE` if your system's R versions in
# current interactive R session is R >= 4.2.0
```
### Syntax option for specifying function in `expr` argument of `with_nix()`
In the previous code snippet we wrapped the top-level `expr` function with
`function()` or `function(){}`. As an alternative, you can also provide default
arguments when assigning the function used as `expr` input like this:
```{r parsermd-chunk-8}
df_as_vector <- function(x = df) {
out <- as.vector(x = x, mode = "list")
return(out)
}
```
Then, you just supply the name of the function to evaluate with default
arguments.
```{r parsermd-chunk-9, eval = FALSE}
out_nix_1_b <- with_nix(
expr = df_as_vector, # provide name of function
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1,
message_type = "simple" # you can do `"verbose"`, too
)
```
It yields the same results.
```{r parsermd-chunk-10, eval = FALSE}
Reduce(f = identical, list(out_nix_1, out_nix_1_b))
```
### Comparing `as.vector.data.frame()` for both R versions 4.1.3 and 4.2.0 from Nixpkgs
Here follows an example a `Nix-to-Nix` solution, with two subshells to track the
evolution of base R in this specific case. We can verify the breaking changes in
case study 1 in more declarative manner when we use both R 4.1.3 and R 4.2.0
from Nixpkgs. Since we already have defined R 4.1.3 in the *`env`*`_1_R-4-1-3`
subshell, we can use it as a source environment where with_nix() is launched
from. Accordingly, we define the R 4.2.0 environment in a
*`env`*`_1_2_R-4-2-0`using Nix via `rix::rix()`. The latter environment will be
the target environment where `df_as_vector()` will be evaluated in.
```{r parsermd-chunk-11}
library("rix")
path_env_1_2 <- file.path(".", "_env_1_2_R-4-2-0")
rix_init(
project_path = path_env_1_2,
rprofile_action = "overwrite",
message_type = "simple"
)
rix(
r_ver = "4.2.0",
overwrite = TRUE,
project_path = path_env_1_2,
shell_hook = "R"
)
list.files(path_env_1_2)
```
Now, initiate a new R session as development environment using `nix-shell`. Open
a new terminal at the current working directory of your R session. The provided
expression `default.nix`. defines R 4.1.3 in a "subfolder per subshell"
approach. `nix-shell` will use the expression by `default.nix` and prefer it
over any other `.nix` files, except when you put a `shell.nix` file in that
folder, which takes precedence.
```{sh parsermd-chunk-12, eval = FALSE}
nix-shell --pure ./_env_1_R-4-1-3
```
After some time downloading caches and doing builds, you will enter an R console
session with R 4.1.3. You did not need to type in R first, because we set up a R
shell hook via `rix::rix()`. Next, we define again the target function to test
in R 4.2.0, too.
```{r parsermd-chunk-13, eval = FALSE}
# current Nix-R session with R 4.1.3
df_as_vector <- function(x) {
out <- as.vector(x = x, mode = "list")
return(out)
}
(out_nix_1 <- df_as_vector(x = df))
```
```{r parsermd-chunk-14, eval = FALSE}
out_nix_1_2 <- with_nix(
expr = function() df_as_vector(x = df),
program = "R",
exec_mode = "non-blocking", # run as background process
project_path = path_env_1_2,
message_type = "simple" # you can do `"verbose"`, too
)
```
You can now formally compare the outputs of the computation of the same code in
R 4.1.3 vs. R 4.2.0 environments controlled by Nix.
```{r parsermd-chunk-15, eval = FALSE}
identical(out_nix_1, out_nix_1_2)
# yields FALSE
```
## **Case study 2: Breaking changes in {stringr} 1.5.0**
We add one more layer to the reproducibility of the R ecosystem. User libraries
from CRAN or GitHub, one thing that makes R shine is the huge collection of
software packages available from the community.
There was a change introduce in {stringr} 1.5.0; in earlier versions, this
line of code:
```{r parsermd-chunk-16, eval = FALSE}
stringr::str_subset(c("", "a"), "")
```
would return the character `"a"`. However, this behaviour is unexpected:
it really should return an error. This was addressed in versions after
1.5.0:
```{r parsermd-chunk-17, eval = FALSE}
out_system_stringr <- tryCatch(
expr = {stringr::str_subset(c("", "a"), "")},
error = function(e)NULL)
```
Since the code returns an error, we wrap it inside `tryCatch()` and return
`NULL` instead of an error (if we wouldn't do that, this vignette could not
compile!).
Let's build a subshell with the latest version of R, but an older version
of `{stringr}`:
```{r parsermd-chunk-18}
library("rix")
path_env_stringr <- file.path(".", "_env_stringr_1.4.1")
rix_init(
project_path = path_env_stringr,
rprofile_action = "overwrite",
message_type = "simple"
)
list.files(path = path_env_stringr, all.files = TRUE)
```
The call below generates a `default.nix` file for the subshell with the
old version of `{stringr}`:
```{r parsermd-chunk-19, eval = FALSE}
rix(
r_ver = "latest",
r_pkgs = "stringr@1.4.1",
overwrite = TRUE,
project_path = path_env_stringr
)
```
We can now run the code in the subshell
```{r parsermd-chunk-20, eval = FALSE}
out_nix_stringr <- with_nix(
expr = function() stringr::str_subset(c("", "a"), ""),
program = "R",
exec_mode = "non-blocking",
project_path = path_env_stringr,
message_type = "simple"
)
```
Here are the last few lines printed on screen:
```
==> `expr` succeeded!
### Finished code evaluation in `nix-shell` ###
* Evaluating `expr` in `nix-shell` returns:
[1] "a"
```
Not only do we see the result of evaluating the code in the subshell, we also
have access to it: `out_nix_stringr` holds this result.
We can now compare the two: the result of the code running in our main session
with the latest version of `{stringr}` and the result of the code running in the
subshell with the old version of `{stringr}`:
```{r parsermd-chunk-21, eval = FALSE}
identical(out_system_stringr, out_nix_stringr)
```
As expected, the result is `FALSE`.
## **Case study 3: Using a subshell to get hard to install dependencies**
Nix subshells are quite useful in cases where you need to use a package that
might be difficult to install, such as `{arrow}`, or other packages that must be
compiled. Depending on your operating system you need to compile `{arrow}` from
source, which can be a frustrating experience, especially if you only need it to
load data and bring it down to a manageable size (using `select()` and
`filter()` for instance). This use cases illustrates how to achieve this.
Let's start by building a subshell that is based on a distinct revision of
nixpkgs, for which we know that arrow compiles on both linux and macOS (darwin).
```{r parsermd-chunk-22, eval = FALSE}
library("rix")
path_env_arrow <- file.path("env_arrow")
rix_init(
project_path = path_env_arrow,
rprofile_action = "overwrite",
message_type = "simple"
)
rix(
r_ver = "af63e7a15daf283b4ce634006b3767f9c0eb0c58",
r_pkgs = c("dplyr", "arrow"),
overwrite = TRUE,
project_path = path_env_arrow
)
```
This specific revision of R contains `{arrow}` 13. Let's now
suppose that you already have a script with some code to
load and transform some data using `{arrow}`. It may look
something like this:
```{r parsermd-chunk-23, eval = FALSE}
library(arrow)
library(dplyr)
arrow_cars <- arrow_table(cars)
arrow_cars %>%
filter(speed > 10) %>%
as.data.frame()
```
To run this code in a subshell, we recommend wrapping it inside a function:
```{r parsermd-chunk-24, eval = FALSE}
arrow_script <- function() {
library(arrow)
library(dplyr)
arrow_cars <- arrow_table(cars)
arrow_cars %>%
filter(speed > 10) %>%
as.data.frame()
}
```
Which we can then run in the subshell:
```{r parsermd-chunk-25, eval = FALSE}
out_nix_arrow <- with_nix(
expr = function() arrow_script(),
program = "R",
exec_mode = "non-blocking",
project_path = path_env_arrow,
message_type = "simple"
)
```
This will run the function in the subshell, and its output will be saved in the
`out_nix_arrow` variable, for further manipulation in your main shell/session.