-
Notifications
You must be signed in to change notification settings - Fork 0
/
04-reference-data.Rmd
166 lines (120 loc) · 5.43 KB
/
04-reference-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: "Reference data"
author: "Kamil Foltyński"
date: "11/04/2019"
output:
html_document:
fig_caption: yes
keep_md: no
toc: yes
toc_float:
collapsed: no
smooth_scroll: yes
runtime: shiny
---
```{r, echo=FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = ">")
```
# What is reference data? And why should we care?
Along with development process, code changes, but its behaviour and results should not -
and if they must change, let them change into a way we control.
Reference data is used as input for unit tests. It's frozen, so it ensures **reproducibility** of results.
During development process **code changes**, but thanks to **reproducibility** it is possible to run
code (with reference data as input) and **always get same, unchanged results**. However if results vary, then
it's a sign something went wrong and code is probably broken.
# How to prepare reference data?
* `dput` - the easiest way to prepare reference data is to use `dput` function. It prints object's structure to terminal, so
it is just about copy-pasting from there to our test script, see example:
```{r}
library(dplyr)
result <- band_members %>% inner_join(band_instruments)
dput(result)
```
so unit test can be witten as:
```{r}
library(dplyr)
ref_data <- structure(
list(
name = c("John", "Paul"),
band = c("Beatles",
"Beatles"),
plays = c("guitar", "bass")
),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame")
)
testthat::expect_equal(band_members %>% inner_join(band_instruments), ref_data)
testthat::expect_identical(band_members %>% inner_join(band_instruments), ref_data)
```
BTW: `dput` is also recommended way to provide Minimal Working Example (MWE) when
asking on [stackoverflow.com](https://stackoverflow.com).
* `.rds`/`.RData` - `dput` is useful form small _handy_ data, but imagine reference `data.frame` counting thousands of rows
and pasted into test script...
<br>
<details>
<summary>Click to expand</summary>
```{r}
dput(iris)
```
</details>
<br>
In such cases it's worth keeping reference data in R formats such as `.rds`/`.RData`.
But when we work with R package test data should always go with package, because when installing R package
on multiple environments we can run unit tests to see whether everything works properly or not.
**So where test data should be stored**? Every package is installed together with `inst` directory.
The recommended location for test data is `inst/testdata` directory in R package.
To get path to reference data `system.file` functionshould be used:
```{r,eval=FALSE}
ref_data <- system.file(file.path("testdata", "test_data.rds"), package = "SpendWorx")
testthat::expect_equal(functionTotest(), ref_data)
```
* `snapshot`/`screenshot` - sometimes we want to would like to test output from plotting function.
The simplest way to do that is comparing screenshots of plot:
```{r,eval=FALSE}
testthat::context("Testing plot(s) output")
testthat::test_that("plotly renders correctly", {
library(plotly)
target.path <- file.path("inst", "testdata", "plot1.png")
p <- economics %>% plot_ly(x = ~date, y = ~unemploy/pop) %>% add_lines()
# compareImages function already contains testing function testthat::expect_lte
unitTestsWorkshopPackage::compareImages(takeScreenshot(p), target.path)
})
```
# Recreating reference data
> _Along with development process, code changes, but its behaviour and results should not -
and **if they must change, let them change into a way we control**_.
Code is being improved, it evolves and there will always be a moment when
results, outputs change (but we konw about that, **it is a conscious change**),
so reference data must be updated. I really like approach where developer
puts few lines of **code to easily recreate reference data**.
For example, if we go back to plotly unit test, the whole test could look like this:
```{r,eval=FALSE}
testthat::context("Testing plot(s) output")
testthat::test_that("plotly renders correctly", {
library(plotly)
target.path <- system.file(file.path("testdata", "plot1.png"), package = "unitTestsWorkshopPackage")
p <- economics %>% plot_ly(x = ~date, y = ~unemploy/pop) %>% add_lines()
# compareImages function already contains testing function testthat::expect_lte
unitTestsWorkshopPackage::compareImages(takeScreenshot(p), target.path)
# following code recreates reference screenshot
if (exists("RECREATE_REF_DATA") && isTRUE(RECREATE_REF_DATA)) {
message("Recreating plotly reference screenshot...")
testdata_dir <- file.path("unitTestsWorkshopPackage", "inst", "testdata")
if (!dir.exists(file.path(testdata_dir)))
dir.create(file.path(testdata_dir), recursive = TRUE)
target.path <- file.path(testdata_dir, "plot1.png")
unitTestsWorkshopPackage::takeScreenshot(p, target.path)
}
})
```
so to recreate reference screenshot just set `RECREATE_REF_DATA` to `TRUE` and run test.
# Practice
## Task #1
Test output of function `unitTestsWorkshopPackage::renderBoxplot`.
* Use following code as input data:
```{r,eval=FALSE}
dat <- data.frame(xval = sample(100, 1000, replace = TRUE),
group = as.factor(sample(c("a", "b", "c"), 1000, replace = TRUE)))
```
* Use `unitTestsWorkshopPackage::takeScreenshot` to prepare reference plot
* Use function `unitTestsWorkshopPackage::compareImages` to test output