Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Httptest to cache test and vignette responses #34

Closed
wants to merge 9 commits into from

Conversation

annakrystalli
Copy link
Collaborator

I’ve been going round in circles on my own but haven’t shared this so
far because of issues with the approach (i.e. the size of the cached
data, see below), but I realised that if I just share the alterations to
the code, it obviously won’t work on GitHub Actions but others could at
least pull the branch and experiment locally to give feedback.

So decided to try and write up the issues I’ve been coming up against,
some steps and proposed solutions so far and try and get some feedback
from you!

The PR relates to #26 and describes using httptest to cache tests &
vignette queries & problems so far:

Using httptest

Approach:

Tests

Wrap functions in with_mock_dir. Provide a unique folder name that
will be used or created under tests/testthat/

Vignettes

Use start_vignette() at the start of any vignette to either use
previously recorded responses, if they exist, or capture real responses
for future use. Use end_vignette() at the end.

Result:

http request responses for tests are cached in named folder in
tests/testthat/ whereas for vignettes in the vignettes/ folder.

Issues

  • 1. Path to created files too long triggering an error during
    checks regarding “non-portable file paths”. Need to add a request
    preprocessing function as described in the
    faq
    vignette, e.g.

    set_requester(function (request) {
    gsub_request(request, "https\\://language.googleapis.com/v1/", "api/")
    })
  • 2. Size of files can be too large and add significantly to
    the size of the repo. Currently the tests/testthat/ folder size is
    37.8 MB whereas the vignettes folder is a whopping 1.5 GB!!. To run
    tests using the framework using GitHub Actions though, all these
    files would need to be commmited to git! For this reason, although
    I’ve committed the code alterations I’ve made to incorporate
    httptest, I am ignoring the actual cached data folders for the
    time being until a solution is decided.

  • 3. Don’t work outside the box for wfs! According to docs,
    once the tests are first run, they should then run successfully the
    next time they are run, even without an internet connection.
    However, when run with internet, some tests appear to fail or throw
    warnings (but not all).

    Testing EMODnetWFS| F W S  OK | Context|         4 | client [1.7s]                                               
    ✓ |   4     6 | filter [0.6s] 
    ✓ |        10 | info [2.8s]                                                 
    x | 1 3    11 | layer_attributes [4.1s] 
    x | 1 11   24 | layers [9.2s] 
    
    ══ Results ════════════════════════════════════════════════════════════════════════
    Duration: 18.4 s
    
    [ FAIL 2 | WARN 18 | SKIP 0 | PASS 55 ]
    Warning message:
    In XML::xmlParse(xsdFile, isSchema = TRUE, xinclude = TRUE, error = function(msg,  :
      NULL value for external reference
    • WARNINGS:
      • categorical filters work

        with_mock_dir("categorical_filters", {
        test_that("categorical filters work", {
          wfs <- emodnet_init_wfs_client(service = "geology_seabed_substrate_maps")
          expect_equal(emodnet_get_layers(wfs = wfs,
                                          layers = "seabed_substrate_1m",
                                          cql_filter = "country='Baltic Sea'",
                                          reduce_layers = TRUE)$country %>% unique(), 'Baltic Sea')
        
        
          or_filter_sf <- emodnet_get_layers(wfs = wfs, layers = "seabed_substrate_1m",
                                             cql_filter = "country='Baltic Sea' OR country='Bulgaria'",
                                             reduce_layers = TRUE )
        
          expect_equal(or_filter_sf$country %>% unique(), c("Bulgaria", "Baltic Sea"))
        
        })
          })

        Warning during test run:

        Warning (test-filter.R:5:5): categorical filters work
        GDAL Error 1: Could not resolve host: drive.emodnet-geology.eu
        Backtrace:
          1. testthat::expect_equal(...) test-filter.R:5:4
          6. EMODnetWFS::emodnet_get_layers(...)
          9. purrr::map2(...)
         10. EMODnetWFS:::.f(.x[[1L]], .y[[1L]], ...)
         11. EMODnetWFS::ews_get_layer(.x, wfs, cql_filter = .y)
         19. wfs$getFeatures(x, cql_filter = utils::URLencode(cql_filter))
         20. ft$getFeatures(...)
         22. sf:::st_read.character(destfile, quiet = TRUE)
         23. sf:::CPL_read_ogr(...)
        
    • ERROR:
      • layer_attributes_summarise works

        with_mock_dir("layer_attributes_summarise", {
        test_that("layer_attributes_summarise works", {
            wfs <- suppressMessages(emodnet_init_wfs_client(service = "human_activities"))
            maritimebnds_summ <- layer_attributes_summarise(wfs, layer = "maritimebnds")
            expect_equal("table", class(maritimebnds_summ))
            expect_equal(maritimebnds_summ[ ,"  sitename"][2:3],
                         structure(c("Class :character  ", "Mode  :character  "), .Names = c("",
                                                                                             "")))
            expect_equal(maritimebnds_summ[ ,"  shape_leng"][1],
                         structure("Min.   :   0.0000  ", .Names = ""))
            expect_equal(maritimebnds_summ[ ,"   objectid"][1],
                         structure("Min.   :   1.0  ", .Names = ""))
            expect_equal(ncol(maritimebnds_summ),
                         length(layer_attributes_get_names(wfs, layer = "maritimebnds")))
        })
        })
        • Error message during tests:
        Error (test-layer_attributes.R:48:9): layer_attributes_summarise works
        Error in `sf::st_drop_geometry(.)`: st_drop_geometry only works with objects of class sf
        Backtrace:
         1. EMODnetWFS::layer_attributes_summarise(wfs, layer = "maritimebnds") test-layer_attributes.R:48:8
         6. sf::st_drop_geometry(.)
        

Solutions [WIP]

  1. Follow instructions to implement appropriate set_requester
    function (will tackle last.

  2. One option to tackle the size of cached data is to refactor our
    tests and vignettes (vignettes especially) to request smaller
    layers. To that effect, I wrote a short script to query each layer
    of each service (but time out and return NA after 5 secs for
    efficiency) and return the size of each layer. You can find both
    script & result
    csv
    in the attic/ dir. Using that info,
    we can focus on the smaller layers in our examples, tests and
    vignettes.

From a similar question in the rOpenSci slack (where this super useful
resource was shared: 😉
https://blog.r-hub.io/2020/05/29/distribute-data/#data-outside-of-your-packag
), a suggestion was to store large cached data externally from the
package repo and pull it in during testing. That obviously requires
internet and effort to implement as not only do we need the data but we
also need a mechanism for periodically updating the cached data. Setting
this up, at first thought, feels like it would require duplication of
code somewhere else, introducing the need for keeping two copies of it
synced, so would need some thought as how to do this successfully.

  1. Still trouble shooting this and whether httptest is the right
    way to go. It feels like most of the warnings / errors generated
    with no internet are the result of the interaction of the sf
    package with the requests once they are received. Indeed most of the
    tests throw warnings, not actual errors so they actually pass.

When there IS and internet connection (which there will always be on
GitHub Actions), they actually all pass and quicker than on master so
I’m assuming the cached data is being used with some differences in time
where the cached response needs to be processed to sf during test
runtime.

Testing on http-tests with cache & internet (2 fewer tests):

ℹ Testing EMODnetWFS
✓ | F W S  OK | Context
✓ |         4 | client [1.6s]                                     
✓ |         6 | filter [1.2s]                                     
✓ |        10 | info [2.9s]                                       
✓ |        16 | layer_attributes [3.8s]                           
✓ |        25 | layers [10.3s]                                    

══ Results ═══════════════════════════════════════════════════════
Duration: 19.7 s

Testing on master without cache (test fail because of chance in crs
definition on server corrected in this PR:

ℹ Testing EMODnetWFS
✓ | F W S  OK | Context
✓ |         4 | client [2.0s]                                     
✓ |         6 | filter [2.9s]                                     
✓ |        13 | info [6.3s]                                       
✓ |        16 | layer_attributes [2.1s]                           
x | 1      24 | layers [13.8s]   

Duration: 27.1 s

So not sure how much of a problem this is. It’s just a bit
unsatisfactory as technically these should now work without internet.


Overall just wanted to update you and make the issues a bit more visible
and see if anyone had any feedback or suggestions.

@codecov-commenter
Copy link

Codecov Report

Merging #34 (61b6bdf) into master (b4f2f9b) will increase coverage by 1.06%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
+ Coverage   86.09%   87.16%   +1.06%     
==========================================
  Files           4        4              
  Lines         187      187              
==========================================
+ Hits          161      163       +2     
+ Misses         26       24       -2     
Impacted Files Coverage Δ
R/info.R 95.00% <0.00%> (+5.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4f2f9b...61b6bdf. Read the comment docs.

@annakrystalli annakrystalli mentioned this pull request Feb 4, 2022
@annakrystalli annakrystalli marked this pull request as ready for review February 4, 2022 16:25
@maelle
Copy link
Collaborator

maelle commented Mar 1, 2022

Alternatives https://blog.r-hub.io/2020/06/03/vignettes/#how-to-include-a-compute-intensive--authentication-dependent-vignette

@maelle maelle mentioned this pull request Mar 2, 2022
@maelle maelle changed the base branch from master to simpler March 2, 2022 13:38
Base automatically changed from simpler to master March 14, 2022 12:37
@annakrystalli
Copy link
Collaborator Author

@maelle , I imagine I can just close this PR now right?

@maelle
Copy link
Collaborator

maelle commented Mar 14, 2022

Yes!

@maelle maelle closed this Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI enhancement New feature or request services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants