Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getIndex() should obey age= even if index is cached #584

Closed
j-harbin opened this issue Mar 1, 2023 · 10 comments
Closed

getIndex() should obey age= even if index is cached #584

j-harbin opened this issue Mar 1, 2023 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@j-harbin
Copy link
Collaborator

j-harbin commented Mar 1, 2023

This is problematic because if I user does:

library(oce)
ai <- getIndex("core")
ai <- getIndex("core", age=0)

It will not force a redownload because the second getIndex() looks to see if index is cached.

@j-harbin j-harbin added the bug Something isn't working label Mar 1, 2023
@dankelley
Copy link
Collaborator

@j-harbin please see the commit notes for e234f90 for a suggestion of how we should coordinate our edits for the same file. I'll let you do your things first and then please email me when it's my turn. We can also do a Z tomorrow if that would help.

@dankelley dankelley changed the title age argument in getIndex() checks if the index is cached getIndex() should obey age= even if index is cached Mar 2, 2023
@dankelley
Copy link
Collaborator

A key line is

if (argoFloatsIsCached(filenameOrig, debug=debug-1)) {

which is a bit tricky as things stand now. We want to check the age of the cache, and compare that with age. Therefore, the cache ought to save the time of caching. I'll check into that.

@dankelley
Copy link
Collaborator

I'm taking notes here, so there is no action for co-authors and nothing pushed to GH until I say so...

I have the following working

library(argoFloats)
i <- getIndex("core")
> argoFloatsGetFromCache("coreTime")
[1] "2023-03-02 08:08:56.710 AST"
> argoFloatsGetFromCache("core")
argoFloats object of type "index" with 2772915 items

so we I can make it obey the age argument even for cached things.

@dankelley
Copy link
Collaborator

I added

#' @family functions relating to cached values

to the cached-related functions, so the docs will hint users as to related functions. That can help when looking through documentation.

@dankelley
Copy link
Collaborator

I've added new functions

argoFloatsClearCache()
argoFloatsListCache()
argoFloatsWhenCached()

which will help with debugging.

@dankelley
Copy link
Collaborator

dankelley commented Mar 2, 2023

I think the new R method of handling text encoding may be slowing down the interpretation of profile types (deep etc) because they are using grep() calls. In any case, I want to be able to see what actions are taking a long time, and so I am adding lots of timing indications to the debugging info. Output chunks will look like

listening for dogs barking ...
   heard 5 dogs
... took 0.1 seconds

@dankelley
Copy link
Collaborator

Although I think it's a net gain to be displaying elapsed times in the debugging output, the numbers can be strange (e.g. see below). I am finding that if I use Sys.time() before a fast operation and then after, the results show it took negative time. Perhaps the R interpreter is doing some fancy tricks by isolating code blocks that do not depend on each other and doing them out of order. (Other compilers do that. They do a lot of whacky things, like doing both the "true" and "false" parts of an "if" statement at the same time as the test is done. The idea is that the machine can do 3 things at once without problem, so it's faster to do this and discard results you don't need.)

> i<-getIndex("core",debug=3)
getIndex(filename="core", server=c("ifremer-https,usgodae)", destdir="~/data/argo") { ...
  Converted filename='core' to filename='ar_index_global_prof.txt.gz'.
  Set destfileRda="~/data/argo/ar_index_global_prof.rda".
  getIndex() is about to check the cache
  argoFloatsIsCached(name="core") {
    returning FALSE 
  } argoFloatsIsCached()
  This destfileRda already exists, and its age is 0.784 days.
  Using existing destfileRda, since its age is under 1 days.
  About to load '~/data/argo/ar_index_global_prof.rda'...
  ... took 3.82633 seconds
  Storing this index in memory for this R session.
  argoFloatsStoreInCache(name="core")
  } argoFloatsStoreInCache()
} # getIndex() took 3.723904 seconds

@dankelley
Copy link
Collaborator

Done in commit 1616646 of "develop_kelley" branch. That branch (as it is now, or with slight changes) will likely be merged into "develop" by some time next week.

@dankelley
Copy link
Collaborator

I merged "develop_kelley" into "develop" about an hour ago (for another issue). Therefore, I am asking @j-harbin to take a look, and to either close this issue or to add more comments so I know what to look at next.

PS. I'm sorry that this and a lot of other issues from March have been ignored for so long. I guess other things came up, and I don't often look at the issues page for this repo (unlike for 'oce', which I check several times per week).

@dankelley
Copy link
Collaborator

I just noticed this months-old issue. I ran a test, as below, and this convinces me that the code does what it is supposed to do, and so I'm closing the issue. (@j-harbin is now working on other projects, so I will give myself permission to close issues with approval.)

Note that it took 5 minutes to download the index file. That's a new record, I think. Sheesh. Maybe if Argo charged 10 cents per download they could pay for better servers.


R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(argoFloats)
> ai <- getIndex("core", debug=3)
getIndex(filename="core", server=c("ifremer-https", "usgodae")", destdir="~/data/argo") { ...
  Set useLocalFile=FALSE
  Converted filename="core" to "ar_index_global_prof.txt.gz".
  Set destfileRda="~/data/argo/ar_index_global_prof.rda"
  argoFloatsIsCached(name="core") {
    returning FALSE 
  } argoFloatsIsCached()
  This destfileRda already exists, and its age is 0.087 days.
  Using existing destfileRda, since its age is under 1 days.
  About to load '~/data/argo/ar_index_global_prof.rda'...
  ... took 7.363779 seconds
  Storing this index in a memory cache, for this R session.
  argoFloatsStoreInCache(name="core")
  } argoFloatsStoreInCache()
} # getIndex() took 7.372535 seconds (returning from location B)
> ai <- getIndex("core", age=0, debug=3)
getIndex(filename="core", server=c("ifremer-https", "usgodae")", destdir="~/data/argo") { ...
  Set useLocalFile=FALSE
  Converted filename="core" to "ar_index_global_prof.txt.gz".
  Set destfileRda="~/data/argo/ar_index_global_prof.rda"
  argoFloatsIsCached(name="core") {
    returning TRUE 
  } argoFloatsIsCached()
  getIndex() is about to check the cache
  argoFloatsWhenCached(name="core") {
    returning 2023-08-07 10:38:23.148 
  } argoFloatsWhenCached()
  cacheAge=1.705079e-08 days, age=0 days
  cacheAge >= age, so the cached index is not being used
  This destfileRda already exists, and its age is 0.087 days.
  Must update destfileRda, since its age equals or exceeds 0.00 days
  downloading a remote file ...
  Allocated temporary file
    '/var/folders/8b/l4h64m1j22v5pb7vj049ff140000gn/T//RtmpGeKA1b/argo1221560eeb49a.gz'.
  About to try downloading an index file ...
     trying 'https://data-argo.ifremer.fr/ar_index_global_prof.txt.gz'.
  ... took 284.5999 seconds
  About to read the header at the start of the index file ...
    ftpRoot=c("ftp://ftp.ifremer.fr/ifremer/argo/dac", "ftp://usgodae.org/pub/outgoing/argo/dac")
    names=c("file", "date", "latitude", "longitude", "ocean", "profiler_type", "institution", "date_update")
  ... took 0.001012087 seconds
  Reading index file contents ...
  ... took 16.93504 seconds
  Setting out-of-range longitude and latitude to NA ...
  ... took 1.850429 seconds
  Decoding times ...
  ... took 1.542963 seconds
  Saving the RDA file as " ~/data/argo/ar_index_global_prof.rda " ...
  ... took 13.83181 seconds
  Cleaning up ...
  removing "/var/folders/8b/l4h64m1j22v5pb7vj049ff140000gn/T//RtmpGeKA1b/argo1221560eeb49a.gz"
  ... took 0.004605055 seconds
  Storing newly-read index in memory for this R session ...
  argoFloatsStoreInCache(name="core")
  } argoFloatsStoreInCache()
} getIndex() took  318.768 seconds (returning from location D)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants