-
Notifications
You must be signed in to change notification settings - Fork 3
/
slides.qmd
318 lines (201 loc) · 8.92 KB
/
slides.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
---
title: 'Debugging and Profiling'
author: 'Jonathan Chipman, Ph.D.'
format:
html:
toc: TRUE
toc-depth: 2
toc-location: left
highlight: pygments
font_adjustment: -1
css: styles.css
# code-fold: true
code-tools: true
smooth-scroll: true
embed-resources: true
---
# Introduction
## Readings
In the RStudio (aka Posit) menu bar, notice the `Debug` and `Profile` drop-down menus. These are quick starting points for finding code bugs/bottlenecks. Studying the below chapters will give insight into finding where and how to improve code efficiency.
Primary reading:
Norman Matloof's [Ch. 13.1-13.3.5: Debugging](https://diytranscriptomics.com/Reading/files/The%20Art%20of%20R%20Programming.pdf#page=311) and Hadley Wickham's [Ch. 23: Measuring Performance](https://adv-r.hadley.nz/perf-measure.html) and [Ch. 24: Improving Performance](https://adv-r.hadley.nz/perf-improve.html).
Supplemental reading:
Matloof [Ch. 14: Performance Enhancement: Speed and Memory](https://diytranscriptomics.com/Reading/files/The%20Art%20of%20R%20Programming.pdf#page=331) is slightly outdated by recommending `RProf` to measure performance (as compared to `profvis` used in RStudio) but has good content on speed and memory allocation.
## Warm-up problem
[Urn problem](https://diytranscriptomics.com/Reading/files/The%20Art%20of%20R%20Programming.pdf#page=331)
"Urn 1 contains ten blue marbles and eight yellow ones. In urn 2, the mixture is six blue and six yellow. We draw a marble at random from
urn 1, transfer it to urn 2, and then draw a marble at random from urn 2. What is the probability that that second marble is blue? This is easy to find analytically, but we’ll use simulation." Create a function that generates 100K replicates and returns the estimated probability.
<!-- How can you "modularize" design 1 and 2 of [lab 2](https://uofuepibio.github.io/PHS7045-advanced-programming/week-02-lab.html)? -->
<!-- * What are common tasks of design 1 and 2 that can be performed using the same user-defined function? -->
Post your solution [here](https://docs.google.com/document/d/1NDNEbi4HA074DaS6sta9UKbXouS8r0FCwMP6o9fRE3Y/edit?usp=sharing)
```{r, eval=FALSE}
library(bench)
bench::mark( <solutions>, relative==TRUE, check = FALSE)
```
## This week's lesson
### Key concepts
1. Debugging strategies
* Modular programming
* traceback
* `debug` functions
* set break points
2. Profiling code
* `system.time` and `bench::mark`
* `profvis::provis`
3. Improving efficiency
* [Ch. 24: Improving Performance](https://adv-r.hadley.nz/perf-improve.html)
* Use vectorization
* Beware of copy-on-change (example: continual `rbind`'ing)
* Use `data.table` as appropriate
* Use parallel computing
* Compile code in C (`rcpp`)
# Debugging
## Debug in a Modular, Top-Down Manner
How do you organize files on your computer?
A friend of mine stored _all_ documents in the downloads folder! When it came time to find a file, it was like finding a needle in a haystack (there also wasn't a uniform naming convention).
[Quoting from Matloff](https://diytranscriptomics.com/Reading/files/The%20Art%20of%20R%20Programming.pdf#page=312)
"Most good software developers agree that code should be written in a modular manner. Your first-level code should not be longer than, say, a dozen lines, with much of it consisting of function calls. And those functions should not be too lengthy and should call other functions if necessary. This makes the code easier to organize during the writing stage and easier for others to understand when it comes time for the code to be extended."
Question: What can be "modularized" in homework 1?
Top-down modular level:
1. Analysis / Summary
2. Design functions
3. Function to draw posterior distributions
With debugging start top-down. Assume lower-level functions are correct until finishing debugging / checking top-level modules.
An example from collaboration.
## Antibugging
Pro-actively stop a function (or provide a warning or message) if a condition is not met:
```{r, error=TRUE}
x <- 1
stopifnot(is.character(x))
```
Here, is an equivalent stop command using if-logic. `deparse(subsitute([var]))` prints the name of the object:
```{r, error=TRUE}
x <- 1
if(!is.character(x)){
stop(paste0(deparse(substitute(x)), " is not a character"))
}
```
```{r}
x <- 1
if(!is.character(x)){
warning(paste0(deparse(substitute(x)), " is not a character"))
}
```
## Example
An R package example of modular programming and anti-bugging: [seqSGPV](https://github.com/chipmanj/SeqSGPV).
## Interactive debugging
When an error occurs, use the `traceback` function to get a sense of where the error occurred.
Options to debug a function from top-to-bottom:
* `debug (undebug)`
* `debugonce`
This will interactively walk you through the function call with the following commands:
* `n`: next line of code
* `s`: step into the next function
* `f`: continue forward through the current `{}` code block (example, a loop or the rest of the function)
* `c`: continue the rest of the function to see if it'll complete as anticipated
* `Q`: quit interactive
* `where`: shows the trace stack (the possibly multiple layers of a function call)
* `any r command` may be run when in debugger mode
Some examples ...
```{r, eval=FALSE}
f <- function(a) g(a)
g <- function(b) h(b)
h <- function(c) i(c)
i <- function(d) {
if (!is.numeric(d)) {
stop("`d` must be numeric", call. = FALSE)
}
d + 10
}
# Traceback and debug using Rstudio 'Rerun with Debug'
f("a")
# Enter debugging mode for a function until deactivating debugging mode
debug(f)
f("a")
f("a")
undebug(f)
# Enter debug mode for a single call
debugonce(f)
f("a")
f("a")
```
`View` and `debug` are good ways to learn the underlying code to a function.
```{r,eval=FALSE}
View(lm)
debugonce(lm)
lm(1:10~1,method = "model.frame")
```
You can use the `browser` function to enter debugging mode partially through a function
```{r, eval=FALSE}
f <- function(x, remainder){
# Step 1, add 1 to even elements
x[x%%2] <- x[x%%2] + 1
# Step 2, see if each value in sequence is even or odd
out <- x %% remainder
if(length(unique(out))!= 2) browser()
# Step 2, for elements with 0 remainder, add 1
x[out == 0] <- x[out == 0] + 1
# Step 3, report number of unique elements
length(unique(x))
}
f(x=1:10,remainder=3)
```
The browser function can be helpful if you want to debug a loop mid-way through iterations.
```{r, eval=FALSE}
f <- function(){
for (i in 1:10^5){
print(i)
if(i == 100) browser()
}
}
f()
```
While embedding `browser` into code can be helpful, you must remember to remove the browser calls when done. Rstudio also provides a feature to set breakpoints by clicking on the line-number where you'd want to start debugging. See also [breakpoints](https://adv-r.hadley.nz/debugging.html#breakpoints).
Using your warm up problem, set breakpoints.
## Practice
Find a set of code you've written and know well to share with a classmate.
As the classmate,
* Do you have any feedback on how to make the code more readable? More efficient?
* Insert a bug(s) into the code
When you get your code back, can you find and fix the bug?
# Profiling
## system.time
For a quick profiling of code, use `system.time()`.
```{r}
x <- runif(10000000)
y <- runif(10000000)
t1 <- system.time(z <- x + y)
z <- numeric(10000000)
t2 <- system.time({
for (i in 1:length(x)) {
z[i] <- x[i] + y[i]
}
})
# Time duration
t1
t2
# The loop is longer by a factor of:
t2["elapsed"] / t1["elapsed"]
```
Why is `t2` `r round(t2["elapsed"] / t1["elapsed"],2)`? times longer?
* Seemingly hidden functions are `:` and `[` (subsetting) which add to the collective set of functions called (i.e. the "stack frame").
`system.time` is not as precise for quick calls and can require many iterations to get a meaningful timing (example: 10e6 in above example). `bench::mark` can provide more precise timings (see warm up problem).
* ns (nanoseconds) < $\mu$s (microseconds) < ms (milliseconds) < s (seconds) < m (minutes) < h (hours) < d (days) < w (weeks)
* `relative` option in `mark` does not report in raw time but in comparing different calls
* `check=FALSE` allows for running different code which may return different results (example: due to sampling)
## profvis
`profvis::profvis` is a good go-to for evaluating performance of multiple calls.
We'll work through the [tutorial](http://rstudio.github.io/profvis/) together.
Walk through profiling SeqSGPV.
Profile the code you earlier shared with your classmate.
## Improving efficiency
[Ch. 24: Improving Performance](https://adv-r.hadley.nz/perf-improve.html)
Parts to highlight:
* [introduction](https://adv-r.hadley.nz/perf-improve.html#introduction-23)
* [vectorize](https://adv-r.hadley.nz/perf-improve.html#vectorise)
* [do as little as possible](https://adv-r.hadley.nz/perf-improve.html#be-lazy)
* [avoid copies](https://adv-r.hadley.nz/perf-improve.html#avoid-copies)
# Session info
```{r}
devtools::session_info()
```