Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New class zict.Cache #65

Merged
merged 5 commits into from
Mar 15, 2022
Merged

New class zict.Cache #65

merged 5 commits into from
Mar 15, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Mar 11, 2022

@jakirkham
Copy link
Member

Have we looked at cachetools?

@crusaderky
Copy link
Collaborator Author

crusaderky commented Mar 14, 2022

cachetools.Cache does a lot more than this new class - because it's not composable - and also a lot less, for the exact same reason.

Technically speaking,
cachetools.Cache(maxsize, weightfunc)
is functionally identical to
zict.Cache({}, zict.LRU(maxsize, {}, weight=weightfunc).

Everything else that zict allows - namely,

  • having a custom subclass of zict.Func as the data, or
  • having a weakref.WeakValueDict as the cache

both of which are key features here, cachetools does not do.

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should briefly talk about naming but otherwise this looks good!

cache: MutableMapping
Fast cache for reads from data. This mapping may lose keys on its own; e.g. it
could be a LRU.
update_on_set: bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example where update_on_set=False is helpful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in distributed.
It boils down to the temporal locality patterns of the library user. Sometimes writing a key/value pair does not imply likelihood that the same key will be read back sooner than the rest.

For example, somebody may write the output of a nightly backup batch job to s3fs or some other very slow filesystem; being a backup, 9 times out of 10 nobody is going to read it back. When people do want to read it back though, they are very likely to access the same files multiple times (assuming the backup is not just dumped back on top of live).

zict/cache.py Outdated

else:

class WeakRefCache(weakref.WeakValueDictionary):
Copy link
Member

@fjetter fjetter Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit confusing having the Cache and WeakRefCache classes in here although WeakRefCache is not a Cache. Maybe one of the classes should be renamed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this one should be called WeakValueMapping to stick to zict nomenclature?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

@jakirkham
Copy link
Member

Guessing the conflicts are related to PR ( #66 )?

@crusaderky
Copy link
Collaborator Author

fixed conflicts

doc/source/index.rst Outdated Show resolved Hide resolved
@crusaderky crusaderky merged commit 911779c into dask:main Mar 15, 2022
@crusaderky crusaderky deleted the cache branch March 15, 2022 09:27
crusaderky added a commit to crusaderky/zict that referenced this pull request Mar 15, 2022
@jrbourbeau jrbourbeau mentioned this pull request Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spill to disk may cause data duplication
3 participants