/
23_Measuring_performance.Rmd
executable file
·152 lines (112 loc) · 4.59 KB
/
23_Measuring_performance.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
```{r, include = FALSE}
source("common.R")
```
# (PART) Techniques {-}
\stepcounter{chapter}
# Measuring performance
<!-- 23 -->
\stepcounter{section}
## Profiling
<!-- 23.2 -->
__[Q1]{.Q}__: Profile the following function with `torture = TRUE`. What is surprising? Read the source code of `rm()` to figure out what's going on.
```{r}
f <- function(n = 1e5) {
x <- rep(1, n)
rm(x)
}
```
__[A]{.solved}__: We expect `f()` to create a vector (`x`) of length `n`, which is then removed so that `f()` just returns `NULL`. When we profile this function, it executes too fast for meaningful results.
```{r, error = TRUE}
profvis::profvis(f())
```
Setting `torture = TRUE` triggers garbage collection after every memory allocation call, which may be useful for more exact memory profiling.
```{r, eval = FALSE}
profvis::profvis(f(), torture = TRUE)
```
Surprisingly, profiling `f()` like this takes a very long time. What could be the reason?
We follow the hint in the question and inspect the source code of `rm()`:
```{r,eval = FALSE}
function (..., list = character(), pos = -1,
envir = as.environment(pos),
inherits = FALSE)
{
dots <- match.call(expand.dots = FALSE)$...
if (
length(dots) && !all(
vapply(dots, function(x) is.symbol(x) ||
is.character(x), NA, USE.NAMES = FALSE)
)
)
stop("... must contain names or character strings")
names <- vapply(dots, as.character, "")
if (length(names) == 0L)
names <- character()
list <- .Primitive("c")(list, names)
.Internal(remove(list, envir, inherits))
}
```
`rm()` does a surprising amount of work to get the name of the object to delete because it relies on non-standard evaluation.
We can make the job of `rm()` considerably simpler by using the `list` argument:
```{r,eval = FALSE}
f2 <- function(n = 1e5) {
x <- rep(1, n)
rm(list = "x")
}
profvis::profvis(f2(), torture = TRUE)
```
Unfortunately, this still takes too long, and we are literally stuck in profiling.
Anecdotally, one of the authors once finished the profiling under an older R version. But the output seemed to be not very meaningful.
In conclusion, this question appears to be unanswerable for us, even for Hadley.
## Microbenchmarking
<!-- 23.3 -->
__[Q1]{.Q}__: Instead of using `bench::mark()`, you could use the built-in function `system.time()`. But `system.time()` is much less precise, so you'll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below.
```{r, eval = FALSE}
n <- 1e6
system.time(for (i in 1:n) sqrt(x)) / n
system.time(for (i in 1:n) x ^ 0.5) / n
```
How do the estimates from `system.time()` compare to those from `bench::mark()`? Why are they different?
__[A]{.solved}__: We first microbenchmark these two expressions using `bench::mark()` [@bench] and observe that the mean is not reported (as it is generally more affected by outliers).
```{r}
n <- 1e6
x <- runif(100)
bench_df <- bench::mark(
sqrt(x),
x ^ 0.5,
iterations = n
)
bench_df
```
We need to access the raw data, so we can compare the results of both benchmarking approaches.
```{r}
t1_bench <- mean(unlist(bench_df[1, "time"]))
t2_bench <- mean(unlist(bench_df[2, "time"]))
t1_systime <- system.time(for (i in 1:n) sqrt(x)) / n
t2_systime <- system.time(for (i in 1:n) x ^ 0.5) / n
```
We see, that both approaches get the order of magnitude right. We assume, that the `bench::mark()`-results may be a little more accurate, because of its high precision timer. There may also be overhead introduced by the for loop in the `system.time()`-approach.
```{r}
# Compare the results
t1_systime["elapsed"]
t1_bench
t2_systime["elapsed"]
t2_bench
```
Side Note: take a look at `?proc.time` if you want to learn about the differences between "user", "system" and "elapsed" time.
__[Q2]{.Q}__: Here are two other ways to compute the square root of a vector. Which do you think will be fastest? Which will be slowest? Use microbenchmarking to test your answers.
```{r, eval = FALSE}
x ^ (1 / 2)
exp(log(x) / 2)
```
__[A]{.solved}__: To compare these approaches, we'll `bench::mark()` them and sort the result by the median execution time.
```{r}
x <- runif(100)
bm <- bench::mark(
sqrt(x),
x^0.5,
x^(1 / 2),
exp(log(x) / 2)
)
bm[order(bm$median), ]
```
As one might expect the idiomatic primitive function `sqrt()` is the fastest. The approach `exp(log(x) / 2)` which builds on two other primitive functions is second, even though already considerably slower. The other calculations are even slower: `x ^ 0.5` is faster than `x ^ (1 / 2)`, because `0.5` requires less computation than `(1 / 2)`.