Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache parsed neuron data #20

Open
jefferis opened this issue Apr 27, 2015 · 4 comments
Open

cache parsed neuron data #20

jefferis opened this issue Apr 27, 2015 · 4 comments

Comments

@jefferis
Copy link
Collaborator

not sure of a good strategy yet for this? One simple thing would be to hash the returned json and at least save ourselves the trouble of re-parsing.

as a little test, something like this:

read.neurons.catmaid(<42pnids>)

breaks down to about

  • 20% GET
  • 20% parse json to R list of lists
  • 20% list2df (i.e. parse json list structures to data.frames)
  • 40% parse data.frame to neuron

So it looks like this strategy could give a 3-5x speedup, which sounds interesting. But then the question is where would do this. If we insert something in catmaid_fetch we could make something very general and save the json parsing. But if we worked with read.neuron.catmaid, we should be able to save everything.

Another strategy would be to cache the request itself – this could involve catmaid_fetch again and a hash of the url/post data along with some kind of timestamp checking.

@jefferis
Copy link
Collaborator Author

some more thoughts on above. It seems to me that both types of caching

  • request caching
  • parsed result caching

would be interesting.

For the request caching, I would think one should basically create a directory hierarchy from a root directory specified by an option

options(catmaid.cache.root=TRUE)

TRUE => something like rappdirs::user_cache_dir("rpkg-catmaid")

that matches the request url e.g.

"rpkg-catmaid/<rooturl>/1/10418394/0/0/compact-skeleton"

underneath that there should be an rds object named by the md5 hash of the content (or perhaps the etag).

One could then imagine having a second option

options(catmaid.cache.expiry=3600)

which sets the cache expiry time in seconds.

@jefferis
Copy link
Collaborator Author

For the parsed result caching something like md5 of raw contents as dir and then function name (match.call()[[1]]) as file name would be an option.

@jefferis jefferis reopened this Aug 9, 2016
@jefferis
Copy link
Collaborator Author

jefferis commented Oct 20, 2017

I have noticed a couple of options for this, but nothing looks perfect so far.

  1. https://github.com/nealrichardson/httpcache provides replacement GET/POST commands to replace httr equivalents. However since we need to be able to change cacheing behaviour at runtime, we need some more control. Furthermore we need to use the cachedPOST function because the regular POST function is assumed to change state.
  2. https://rud.is/b/2017/08/22/caching-httr-requests-this-means-warc/

Option 1 has the big advantage of being on CRAN. Either option may need

  • some special logic in catmaid_fetch
  • an argument about whether cacheing should be allowed (set by the callee – not always possible to tell the logic of POST requests that are extensively used by CATMAID API).
  • option to control location of cache/invalidation etc.

@jefferis
Copy link
Collaborator Author

See #119 for a cache / mock testing approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant