Skip to content

Add time based update for caching in fs.wrap.WrapCachedDir#140

Closed
media-proxy wants to merge 11 commits intoPyFilesystem:masterfrom
media-proxy:master
Closed

Add time based update for caching in fs.wrap.WrapCachedDir#140
media-proxy wants to merge 11 commits intoPyFilesystem:masterfrom
media-proxy:master

Conversation

@media-proxy
Copy link

A new feature for WrapCachedDir:
livetime: after this time a cached entry is too old and will be recached
speedup: the cache time of uptodate entries will be renewed, so cached entries with many accesses will be delivered from cache more often.

Todo:
cleanup old entries from time to time

Question:
The scandir implementation does not use the page parameter for cache_key, so I think there will bethe same result if you call scandir with page=1 first and page=None secondly.
An Idea is to generate cache_key like:
cache_key = (_path, frozenset(namespaces or ()), path)
Am I right?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 99.027% when pulling 0364b59 on media-proxy:master into 3a691e0 on PyFilesystem:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 99.027% when pulling 0364b59 on media-proxy:master into 3a691e0 on PyFilesystem:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.9%) to 99.027% when pulling 0364b59 on media-proxy:master into 3a691e0 on PyFilesystem:master.

@coveralls
Copy link

coveralls commented Feb 11, 2018

Coverage Status

Coverage increased (+0.0001%) to 99.951% when pulling c1157ff on media-proxy:master into 3a691e0 on PyFilesystem:master.

@media-proxy
Copy link
Author

A second question here, is it necessary to use a threading.Lock() for the caching dict?

@merlink01
Copy link

Third question, why not using Cached listdir here ?

@merlink01
Copy link

OK, here all Questions in a row:

  1. scandir seems like buggy, if page is not saved in cache key:
    cache_key = (_path, frozenset(namespaces or ()), page)
    I think this could fix this.

  2. Do we need a Lock for the cache object?

  3. Why not taking listdir from Cache?

  4. fs.time caused some problems for me, wouldn't it be better to rename this to not having the name of the internal time library?

@willmcgugan
Copy link
Member

This looks interesting. I'm afraid I'm going away on holiday tomorrow for 3 weeks. Will take a look when I get back...

@willmcgugan
Copy link
Member

Sorry for the delay, haven't been back long.

  1. Good point. That needs fixed.

  2. I would say so. I'd suspect that two threads working with the cache might get in an inconsistent states.

  3. No sure why I didn't implement a listdir. Looks like it should be there!

  4. I guess you fixed it with an absolute import? It's not part of any public interface, so I wouldn't object to renaming it.

To be honest, I'm not convinced the 'live time' functionality is useful enough to be supported feature in the cache dir. The only time you would need to refresh the cache, is if some other process is writing to the filesystem. Then the live time becomes somewhat arbitrary. Even 10 seconds is enough for the filesystem to get out of sync with the cache.

So I wouldn't want to overload the CacheDir with this functionality. If its to go in the main lib I would favour it being a different class, perhaps extending CacheDir.

Thanks for the input, but I don't think I'm going to merge this one. You have highlighted that CacheDir needs some work. I'll put that on the to do list...

with self.assertRaises(errors.ResourceNotFound):
fs.getinfo('/foofoo')

def test_cachedir_time(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you familiar with https://github.com/spulec/freezegun ? You can get rid of those sleep in tests.

@media-proxy
Copy link
Author

Hi Will,

thank you for your reply.

I was also on holiday, so sorry for the late reply.

Tere are 2 Ideas behind Livetime:

  1. If you use a storage, like an FTP Server it could happen that another program writes into the folder. Without any sort of refresh you will never get an update on this.

  2. Ram Usage, if the cache only grows, you can't use this in long running processes with many files.
    With livetime we can cleanup the dict from time to time.

Another Idea could be to implement a refresh function inside the filesystem. So the external app coud decide when the update should happen.

It would be ok for me to create an extended cache Object based on this.

Thank you for the tip with freezegun, I didn't hear about this until now.
I'll have a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants