Implement Httptest to cache test and vignette responses #34

annakrystalli · 2022-01-28T17:02:00Z

I’ve been going round in circles on my own but haven’t shared this so
far because of issues with the approach (i.e. the size of the cached
data, see below), but I realised that if I just share the alterations to
the code, it obviously won’t work on GitHub Actions but others could at
least pull the branch and experiment locally to give feedback.

So decided to try and write up the issues I’ve been coming up against,
some steps and proposed solutions so far and try and get some feedback
from you!

The PR relates to #26 and describes using httptest to cache tests &
vignette queries & problems so far:

Using `httptest`

Approach:

Tests

Wrap functions in with_mock_dir. Provide a unique folder name that
will be used or created under tests/testthat/

Vignettes

Use start_vignette() at the start of any vignette to either use
previously recorded responses, if they exist, or capture real responses
for future use. Use end_vignette() at the end.

Result:

http request responses for tests are cached in named folder in
tests/testthat/ whereas for vignettes in the vignettes/ folder.

Issues

1. Path to created files too long triggering an error during
checks regarding “non-portable file paths”. Need to add a request
preprocessing function as described in the
faq
vignette, e.g.
```
set_requester(function (request) {
gsub_request(request, "https\\://language.googleapis.com/v1/", "api/")
})
```
2. Size of files can be too large and add significantly to
the size of the repo. Currently the tests/testthat/ folder size is
37.8 MB whereas the vignettes folder is a whopping 1.5 GB!!. To run
tests using the framework using GitHub Actions though, all these
files would need to be commmited to git! For this reason, although
I’ve committed the code alterations I’ve made to incorporate
httptest, I am ignoring the actual cached data folders for the
time being until a solution is decided.

3. Don’t work outside the box for wfs! According to docs,
once the tests are first run, they should then run successfully the
next time they are run, even without an internet connection.
However, when run with internet, some tests appear to fail or throw
warnings (but not all).

ℹ Testing EMODnetWFS
✓ | F W S  OK | Context
✓ |         4 | client [1.7s]                                               
✓ |   4     6 | filter [0.6s] 
✓ |        10 | info [2.8s]                                                 
x | 1 3    11 | layer_attributes [4.1s] 
x | 1 11   24 | layers [9.2s] 

══ Results ════════════════════════════════════════════════════════════════════════
Duration: 18.4 s

[ FAIL 2 | WARN 18 | SKIP 0 | PASS 55 ]
Warning message:
In XML::xmlParse(xsdFile, isSchema = TRUE, xinclude = TRUE, error = function(msg,  :
  NULL value for external reference

WARNINGS:

categorical filters work

with_mock_dir("categorical_filters", {
test_that("categorical filters work", {
  wfs <- emodnet_init_wfs_client(service = "geology_seabed_substrate_maps")
  expect_equal(emodnet_get_layers(wfs = wfs,
                                  layers = "seabed_substrate_1m",
                                  cql_filter = "country='Baltic Sea'",
                                  reduce_layers = TRUE)$country %>% unique(), 'Baltic Sea')


  or_filter_sf <- emodnet_get_layers(wfs = wfs, layers = "seabed_substrate_1m",
                                     cql_filter = "country='Baltic Sea' OR country='Bulgaria'",
                                     reduce_layers = TRUE )

  expect_equal(or_filter_sf$country %>% unique(), c("Bulgaria", "Baltic Sea"))

})
  })

Warning during test run:

Warning (test-filter.R:5:5): categorical filters work
GDAL Error 1: Could not resolve host: drive.emodnet-geology.eu
Backtrace:
  1. testthat::expect_equal(...) test-filter.R:5:4
  6. EMODnetWFS::emodnet_get_layers(...)
  9. purrr::map2(...)
 10. EMODnetWFS:::.f(.x[[1L]], .y[[1L]], ...)
 11. EMODnetWFS::ews_get_layer(.x, wfs, cql_filter = .y)
 19. wfs$getFeatures(x, cql_filter = utils::URLencode(cql_filter))
 20. ft$getFeatures(...)
 22. sf:::st_read.character(destfile, quiet = TRUE)
 23. sf:::CPL_read_ogr(...)

ERROR:

layer_attributes_summarise works

with_mock_dir("layer_attributes_summarise", {
test_that("layer_attributes_summarise works", {
    wfs <- suppressMessages(emodnet_init_wfs_client(service = "human_activities"))
    maritimebnds_summ <- layer_attributes_summarise(wfs, layer = "maritimebnds")
    expect_equal("table", class(maritimebnds_summ))
    expect_equal(maritimebnds_summ[ ,"  sitename"][2:3],
                 structure(c("Class :character  ", "Mode  :character  "), .Names = c("",
                                                                                     "")))
    expect_equal(maritimebnds_summ[ ,"  shape_leng"][1],
                 structure("Min.   :   0.0000  ", .Names = ""))
    expect_equal(maritimebnds_summ[ ,"   objectid"][1],
                 structure("Min.   :   1.0  ", .Names = ""))
    expect_equal(ncol(maritimebnds_summ),
                 length(layer_attributes_get_names(wfs, layer = "maritimebnds")))
})
})

Error message during tests:

Error (test-layer_attributes.R:48:9): layer_attributes_summarise works
Error in `sf::st_drop_geometry(.)`: st_drop_geometry only works with objects of class sf
Backtrace:
 1. EMODnetWFS::layer_attributes_summarise(wfs, layer = "maritimebnds") test-layer_attributes.R:48:8
 6. sf::st_drop_geometry(.)

Solutions [WIP]

Follow instructions to implement appropriate set_requester
function (will tackle last.
One option to tackle the size of cached data is to refactor our
tests and vignettes (vignettes especially) to request smaller
layers. To that effect, I wrote a short script to query each layer
of each service (but time out and return NA after 5 secs for
efficiency) and return the size of each layer. You can find both
script & result
csv in the attic/ dir. Using that info,
we can focus on the smaller layers in our examples, tests and
vignettes.

From a similar question in the rOpenSci slack (where this super useful
resource was shared: 😉
https://blog.r-hub.io/2020/05/29/distribute-data/#data-outside-of-your-packag
), a suggestion was to store large cached data externally from the
package repo and pull it in during testing. That obviously requires
internet and effort to implement as not only do we need the data but we
also need a mechanism for periodically updating the cached data. Setting
this up, at first thought, feels like it would require duplication of
code somewhere else, introducing the need for keeping two copies of it
synced, so would need some thought as how to do this successfully.

Still trouble shooting this and whether httptest is the right
way to go. It feels like most of the warnings / errors generated
with no internet are the result of the interaction of the sf
package with the requests once they are received. Indeed most of the
tests throw warnings, not actual errors so they actually pass.

When there IS and internet connection (which there will always be on
GitHub Actions), they actually all pass and quicker than on master so
I’m assuming the cached data is being used with some differences in time
where the cached response needs to be processed to sf during test
runtime.

Testing on http-tests with cache & internet (2 fewer tests):

ℹ Testing EMODnetWFS
✓ | F W S  OK | Context
✓ |         4 | client [1.6s]                                     
✓ |         6 | filter [1.2s]                                     
✓ |        10 | info [2.9s]                                       
✓ |        16 | layer_attributes [3.8s]                           
✓ |        25 | layers [10.3s]                                    

══ Results ═══════════════════════════════════════════════════════
Duration: 19.7 s

Testing on master without cache (test fail because of chance in crs
definition on server corrected in this PR:

ℹ Testing EMODnetWFS
✓ | F W S  OK | Context
✓ |         4 | client [2.0s]                                     
✓ |         6 | filter [2.9s]                                     
✓ |        13 | info [6.3s]                                       
✓ |        16 | layer_attributes [2.1s]                           
x | 1      24 | layers [13.8s]   

Duration: 27.1 s

So not sure how much of a problem this is. It’s just a bit
unsatisfactory as technically these should now work without internet.

Overall just wanted to update you and make the issues a bit more visible
and see if anyone had any feedback or suggestions.

codecov-commenter · 2022-01-28T18:16:28Z

Codecov Report

Merging #34 (61b6bdf) into master (b4f2f9b) will increase coverage by 1.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
+ Coverage   86.09%   87.16%   +1.06%     
==========================================
  Files           4        4              
  Lines         187      187              
==========================================
+ Hits          161      163       +2     
+ Misses         26       24       -2

Impacted Files	Coverage Δ
R/info.R	`95.00% <0.00%> (+5.00%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4f2f9b...61b6bdf. Read the comment docs.

maelle · 2022-03-01T13:55:09Z

Alternatives https://blog.r-hub.io/2020/06/03/vignettes/#how-to-include-a-compute-intensive--authentication-dependent-vignette

Pre-compute the vignette cf https://ropensci.org/technotes/2019/12/08/precompute-vignettes/
Make the vignettes articles instead (but remember the rOpenSci guidance recommends to have at least one vignette, maybe this is an interesting edge case actually!)

annakrystalli · 2022-03-14T12:38:35Z

@maelle , I imagine I can just close this PR now right?

maelle · 2022-03-14T12:41:40Z

Yes!

annakrystalli added 3 commits January 28, 2022 15:31

Use httptest to cache test & vignette reponses

a3aa68f

add query workflow & csv of layer sizes

9f243dc

Correct crs check test. epsg spec changed on server.

ebf1df8

annakrystalli added enhancement New feature or request services CI labels Jan 28, 2022

annakrystalli requested a review from salvafern January 28, 2022 17:02

update versions of setup-r & setup-pandoc actions

61b6bdf

annakrystalli mentioned this pull request Feb 4, 2022

Handle server errors #33

Merged

annakrystalli added 2 commits February 4, 2022 15:59

update test matrix configuration to more current setup. relates to #35

637edc3

install dependencies from source when binaries unavailable

65ee4ab

annakrystalli marked this pull request as ready for review February 4, 2022 16:25

annakrystalli added 3 commits February 4, 2022 16:29

delete extra line.

32f3c06

update gitignore

202ad35

Merge branch 'master' into http-tests

4477c98

maelle mentioned this pull request Mar 2, 2022

use httptest in tests #47

Closed

maelle changed the base branch from master to simpler March 2, 2022 13:38

Base automatically changed from simpler to master March 14, 2022 12:37

maelle closed this Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Httptest to cache test and vignette responses #34

Implement Httptest to cache test and vignette responses #34

annakrystalli commented Jan 28, 2022

codecov-commenter commented Jan 28, 2022

maelle commented Mar 1, 2022

annakrystalli commented Mar 14, 2022

maelle commented Mar 14, 2022

Implement Httptest to cache test and vignette responses #34

Implement Httptest to cache test and vignette responses #34

Conversation

annakrystalli commented Jan 28, 2022

Using httptest

Approach:

Tests

Vignettes

Result:

Issues

Solutions [WIP]

codecov-commenter commented Jan 28, 2022

Codecov Report

maelle commented Mar 1, 2022

annakrystalli commented Mar 14, 2022

maelle commented Mar 14, 2022

Using `httptest`