Skip to content

Latest commit

 

History

History
95 lines (69 loc) · 3.64 KB

VAL102.md

File metadata and controls

95 lines (69 loc) · 3.64 KB

VAL102: Defensive programming

Learning goals

  1. Describe common sources of errors
  2. Fail gracefully with assertions
  3. Diagnose issues with informative error messages

Fail as soon as you can

If your code isn't going to work, you want to know as soon as possible. This is the opposite of what you want:

long_fun <- function(x) {
  # Do
  # something
  # that 
  # takes
  # a
  # really
  # long
  # time
  long_running_subprocess()
  
  # Assume `x` is a number
  needs_a_number(x)
}
long_fun("definitely not a number")

If long_fun() started out by first checking if x was a number, then you could have saved all the time burned by long_running_subprocess(). In this example, the source of the error was x having the wrong type. A few common sources of bugs in R are:

  • Wrong type E.g., you used a numeric vector where the function expected a character
  • Wrong size E.g., a function needs to vectors of the same length, but you called it with one longer than the other
  • Missing data E.g., unexpected NULL or NA values

Q1 What are some other sources of bugs you've encountered?

Assert yourself

Software engineers call these error checks assertions. For example, what if long_fun() looked like this?

long_fun <- function(x) {
  stopifnot(is.numeric(x))
  
  # Do
  # something
  # that 
  # takes
  # a
  # really
  # long
  # time
  long_running_subprocess()
  
  # Assume `x` is a number
  needs_a_number(x)
}
long_fun("definitely not a number")

Q2

Check the help for stopifnot(). What would happen in this revised version?

Assertions are a cornerstone of defensive programming. They assert what the condition should be. E.g., stopifnot(is.numeric(x)) asserts x must be numeric. Assertions can prevent long-running code that's doomed to fail (as above). They also serve as an extra layer of documentation, by specifically stating the function's assumptions. Perhaps most importantly, assertions can provide context and clues for diagnosing where things went sideways. Using the unrevised long_fun() example again, the error with long_fun("definitely not a number") happens inside needs_a_number(x). But let's say needs_a_number() looks like this:

needs_a_number <- function(y) {
  1 - y
}

Notice needs_a_number() takes an argument called y, not x. So the error you get from long_fun("definitely not a number") would be Error in 1 - y : non-numeric argument to binary operator. To which the reasonable reply is "what the heck is y? I called a function that takes x". Conversely, the revised version that uses stopifnot() to assert x must be numeric, so the error would be Error in long_fun("a") : is.numeric(x) is not TRUE. That's way more helpful! But we can do even better.

What seems to be the problem here?

Default stopifnot() error messages are better than the alternative, but they can quickly become useless for more complex assertions. Read Chapter 9 of the tidyverse style guide on error messages.

Q3

Rewrite long_fun() with an improved error message.

Submit assignment

Open an issue in FlukeAndFeather/jese4sci-VAL. Put "VAL102" in the title. In the comment, include your answers to the questions above. Then answer the following.

  1. Find a function in an R package you use frequently. Look at the source code. Are there any assertions?
  2. Write an assertion for this function. Include a link to the function.
  3. Search this package for uses of the stop() or stopifnot() function. What do their error messages look like? How could you improve them?