Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: on_full policy #17

Closed
wants to merge 9 commits into from
Closed

ENH: on_full policy #17

wants to merge 9 commits into from

Conversation

llllllllll
Copy link
Member

Allows a user to set a maximum disk usage and a policy defining what to do when the max disk is used and a new entry must be added.

The two options are currently:

  1. raise_: raise an OSError indicating you ran out of space.
  2. pop_lru: rotate the least recently used element out of the chest.

Users may pass any callable here; however, these two are defined for them.

@llllllllll
Copy link
Member Author

Hey, This feature has been working pretty well for me, Do you think we merge this wehn you get a chance to look it over?

self._dump(data, tmp)
bs = tmp.getvalue()
while self.disk_usage + len(bs) > self.available_disk:
self.on_full(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the overhead of this like when there are many keys on disk? This seems like potentially a lot of file system access.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maintain a total instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that I couldn't generalize the way to account for the filesystem overhead. I wasn't positive how big all of the files would actually be on disk so I figured it was safest to ask the os.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could maintain a total, like how we do for memory_usage and add or subtract from it as we add or remove files from disk.

Alternatively can you measure the overhead of how this works when we have one million files? I expect this to be non-negligible at that scale.

@mrocklin
Copy link
Member

I apologize for letting this linger for so long. Thanks for the ping.

@llllllllll
Copy link
Member Author

Nol worries. this isn't blocking me because I can always deploy from my branch.

@llllllllll llllllllll closed this Sep 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants