cache parsed neuron data #20

jefferis · 2015-04-27T14:00:45Z

not sure of a good strategy yet for this? One simple thing would be to hash the returned json and at least save ourselves the trouble of re-parsing.

as a little test, something like this:

read.neurons.catmaid(<42pnids>)

breaks down to about

20% GET
20% parse json to R list of lists
20% list2df (i.e. parse json list structures to data.frames)
40% parse data.frame to neuron

So it looks like this strategy could give a 3-5x speedup, which sounds interesting. But then the question is where would do this. If we insert something in catmaid_fetch we could make something very general and save the json parsing. But if we worked with read.neuron.catmaid, we should be able to save everything.

Another strategy would be to cache the request itself – this could involve catmaid_fetch again and a hash of the url/post data along with some kind of timestamp checking.

The text was updated successfully, but these errors were encountered:

jefferis · 2015-08-31T09:02:46Z

some more thoughts on above. It seems to me that both types of caching

request caching
parsed result caching

would be interesting.

For the request caching, I would think one should basically create a directory hierarchy from a root directory specified by an option

options(catmaid.cache.root=TRUE)

TRUE => something like rappdirs::user_cache_dir("rpkg-catmaid")

that matches the request url e.g.

"rpkg-catmaid/<rooturl>/1/10418394/0/0/compact-skeleton"

underneath that there should be an rds object named by the md5 hash of the content (or perhaps the etag).

One could then imagine having a second option

options(catmaid.cache.expiry=3600)

which sets the cache expiry time in seconds.

jefferis · 2015-08-31T09:14:48Z

For the parsed result caching something like md5 of raw contents as dir and then function name (match.call()[[1]]) as file name would be an option.

jefferis · 2017-10-20T18:04:44Z

I have noticed a couple of options for this, but nothing looks perfect so far.

https://github.com/nealrichardson/httpcache provides replacement GET/POST commands to replace httr equivalents. However since we need to be able to change cacheing behaviour at runtime, we need some more control. Furthermore we need to use the cachedPOST function because the regular POST function is assumed to change state.
https://rud.is/b/2017/08/22/caching-httr-requests-this-means-warc/

Option 1 has the big advantage of being on CRAN. Either option may need

some special logic in catmaid_fetch
an argument about whether cacheing should be allowed (set by the callee – not always possible to tell the logic of POST requests that are extensively used by CATMAID API).
option to control location of cache/invalidation etc.

jefferis · 2019-08-10T16:13:22Z

See #119 for a cache / mock testing approach

jefferis added enhancement question labels Apr 27, 2015

jefferis closed this as completed Aug 31, 2015

jefferis reopened this Aug 9, 2016

jefferis mentioned this issue Aug 12, 2019

Teach catmaid_fetch to cache some return values #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache parsed neuron data #20

cache parsed neuron data #20

jefferis commented Apr 27, 2015

jefferis commented Aug 31, 2015

jefferis commented Aug 31, 2015

jefferis commented Oct 20, 2017 •

edited

jefferis commented Aug 10, 2019

cache parsed neuron data #20

cache parsed neuron data #20

Comments

jefferis commented Apr 27, 2015

jefferis commented Aug 31, 2015

jefferis commented Aug 31, 2015

jefferis commented Oct 20, 2017 • edited

jefferis commented Aug 10, 2019

jefferis commented Oct 20, 2017 •

edited