- Describe common sources of errors
- Fail gracefully with assertions
- Diagnose issues with informative error messages
If your code isn't going to work, you want to know as soon as possible. This is the opposite of what you want:
long_fun <- function(x) {
# Do
# something
# that
# takes
# a
# really
# long
# time
long_running_subprocess()
# Assume `x` is a number
needs_a_number(x)
}
long_fun("definitely not a number")
If long_fun()
started out by first checking if x
was a number, then you could have saved all the time burned by long_running_subprocess()
. In this example, the source of the error was x
having the wrong type. A few common sources of bugs in R are:
- Wrong type E.g., you used a numeric vector where the function expected a character
- Wrong size E.g., a function needs to vectors of the same length, but you called it with one longer than the other
- Missing data
E.g., unexpected
NULL
orNA
values
Q1 What are some other sources of bugs you've encountered?
Software engineers call these error checks assertions. For example, what if long_fun()
looked like this?
long_fun <- function(x) {
stopifnot(is.numeric(x))
# Do
# something
# that
# takes
# a
# really
# long
# time
long_running_subprocess()
# Assume `x` is a number
needs_a_number(x)
}
long_fun("definitely not a number")
Q2
Check the help for stopifnot()
. What would happen in this revised version?
Assertions are a cornerstone of defensive programming. They assert what the condition should be. E.g., stopifnot(is.numeric(x))
asserts x
must be numeric. Assertions can prevent long-running code that's doomed to fail (as above). They also serve as an extra layer of documentation, by specifically stating the function's assumptions. Perhaps most importantly, assertions can provide context and clues for diagnosing where things went sideways. Using the unrevised long_fun()
example again, the error with long_fun("definitely not a number")
happens inside needs_a_number(x)
. But let's say needs_a_number()
looks like this:
needs_a_number <- function(y) {
1 - y
}
Notice needs_a_number()
takes an argument called y
, not x
. So the error you get from long_fun("definitely not a number")
would be Error in 1 - y : non-numeric argument to binary operator
. To which the reasonable reply is "what the heck is y
? I called a function that takes x
". Conversely, the revised version that uses stopifnot()
to assert x
must be numeric, so the error would be Error in long_fun("a") : is.numeric(x) is not TRUE
. That's way more helpful! But we can do even better.
Default stopifnot()
error messages are better than the alternative, but they can quickly become useless for more complex assertions. Read Chapter 9 of the tidyverse style guide on error messages.
Q3
Rewrite long_fun()
with an improved error message.
Open an issue in FlukeAndFeather/jese4sci-VAL. Put "VAL102" in the title. In the comment, include your answers to the questions above. Then answer the following.
- Find a function in an R package you use frequently. Look at the source code. Are there any assertions?
- Write an assertion for this function. Include a link to the function.
- Search this package for uses of the
stop()
orstopifnot()
function. What do their error messages look like? How could you improve them?