Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up temporary space more proactively #35

Closed
jacobsa opened this issue Apr 2, 2015 · 5 comments
Closed

Clean up temporary space more proactively #35

jacobsa opened this issue Apr 2, 2015 · 5 comments
Assignees
Milestone

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Apr 2, 2015

Currently we destroy an inode's temporary files only when the kernel tells us to forget it. I believe the kernel does this only when it is running out of space in its inode cache, at least if the file hasn't been unlinked. In contrast, one temporary file per inode can grow to a lot of disk space, and we may want to clean up earlier.

Consider setting a (configurable) limit on the amount of disk space devoted to temporary files. The limit may be exceeded if we have dirty inodes that add up to more than the limit. But if we are over the limit and have any clean inodes, we will throw away their content (in least recently used order?) until we are under the limit or run out of clean inodes.

How to plumb this in? Some sort of central object used by the object proxies. They use a "grab file if still exists" method when clean, and a "register dirty file" and "unregister dirty file" methods when transitioning into and out of the dirty state. Something like that.

@jacobsa jacobsa self-assigned this Apr 2, 2015
@jacobsa jacobsa added this to the ga milestone Apr 2, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented May 5, 2015

Rambling stream of consciousness plan:

  • Add a gcsproxy.FileLeaser type:
    • LeaseForRead: You give it an os.File, and it gives you back an io.ReaderAt. The contract is that at any moment it is allowed to revoke the lease for the (temporary, read-only file), close the file, and cause the io.ReaderAt to return a sentinel error indicating this.
    • LeaseForReadWrite: Similarly, you give it an os.File and it returns a file proxy object. It cannot be revoked. The proxy object internally informs the leaser of changes to the file size. It has a Downgrade method that turns it into a regular lease, for use when it is no longer dirty. This must be called eventually. Probably call the proxy object WriteLease.
    • Actually, instead of io.ReaderAt, return something that is io.ReaderAt plus contains a Discard method, for throwing away the lease explicitly. Probably call this ReadLease. This may be slightly better than just using LRU eviction when the kernel tells us to forget an inode.
    • Also allow upgrading a read lease to a write lease if the file is still available.
  • Add a gcsproxy.ReadProxy type:
    • Like ObjectProxy, but only for reading. Only one source generation.
    • Internally, has a read lease for the entire object. Later it can maintain a set of leased files for chunks within the object (for Support small random reads within large files #38).
    • Has a destroy method that drops all of the leases.
    • Can be created with an initial read lease for the entire object.
    • Exposes a method for upgrading, if we still have a valid read lease for the entire object.
  • Update gcsproxy.ObjectProxy:
    • Uses a gcsproxy.ReadProxy when clean.
    • When dirtied, takes out a write lease.
    • Upgrades a read lease from the read proxy if possible.
    • Serves from the write lease while dirty.
  • Unit tests for all of these things.

This will work about how we want for reading, I think. Transitioning from dirty to clean, this will also work fine. From clean to dirty it might not be so great, especially if the work in #38 causes the file to be busted up. Oh well, this may not be a frequent case, and anyway this is a start.

@jacobsa jacobsa modified the milestones: beta, ga May 5, 2015
jacobsa added a commit that referenced this issue May 5, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented May 5, 2015

By the way, for now let's just have the write lease type stat the temporary file to get the length anew (for reporting to the leaser) on every modification. We can make this more subtle later if we really need to.

jacobsa added a commit that referenced this issue May 5, 2015
jacobsa added a commit that referenced this issue May 7, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented May 7, 2015

To do:

  • Add lease.NewAutoRefreshingReadLease:
    • Accepts f func() (io.Reader, error) for getting hold of a reader that gives appropriate contents. f must guarantee to always return the same contents (or fail).
    • Returns lease.ReadLease, but guarantees it will never be automatically revoked (only voluntarily)
    • Whenever a read comes in, upgrades an internal read lease if possible, reads from that, and then downgrades.
    • If not, gets a reader through f and a read/write lease, copies into read/write lease, serves from that, then downgrades.
    • Automatically benefits from LRU expiration if it is not frequently used, or never used again.
    • Upgrade then downgrade not expected to continue to be magic.
    • Unit test with a mock lease manager.
  • Rename gcsproxy.ObjectProxy to MutableObject.
  • Create gcsproxy.ReadProxy:
    • For a specific object generation.
    • Accepts a lease.ReadLease for reading, expects it to not automatically be revoked (apparently).
    • Uses that.
    • Has a destroy method that manually revokes, for use by the object proxy.
    • Has a ReleaseForWriting method that returns a read/write lease for the full object contents. Read proxy can no longer be used.
    • Unit tests with mocks.
  • Later, for Support small random reads within large files #38, we can make the constructor for ReadProxy decide to keep an array of wrapped ReadOnlyFiles for pieces of the object instead.
  • Update gcsproxy.MutableObject:
    • Uses a gcsproxy.ReadProxy when clean.
    • When dirtied, uses ReleaseForWriting to upgrade.
    • When synced, creates a new auto-refreshing read lease (maybe through an injected factory function) and reverts to a read proxy.
    • Unit tests with mocks.
  • Add an integration test for gcsproxy.MutableObject:
    • Fake GCS bucket.
    • Small object reads, writes then reads
    • Large object reads, writes then reads

jacobsa added a commit that referenced this issue May 8, 2015
jacobsa added a commit that referenced this issue May 11, 2015
It is a simple wrapper around lease.ReadProxy.

For #35.
jacobsa added a commit that referenced this issue May 11, 2015
For now, we don't do anything useful regarding file leasers.

For #35.
@jacobsa
Copy link
Contributor Author

jacobsa commented May 11, 2015

Still to do:

  • Update documentation reflecting rename of --gcsproxy.temp_dir to --fs.temp_dir.
  • Add an integration test for gcsproxy.MutableObject:
    • Fake GCS bucket.
    • Small object reads, writes then reads
    • Large object reads, writes then reads
  • Expose the file leaser limit with a flag.
  • End to end testing: hack up the binary to not unlink temporary files on creation, then look at actual temp dir usage with various settings of the limit flag.
  • Fix the race TODO added during the work above.

jacobsa added a commit that referenced this issue May 11, 2015
jacobsa added a commit that referenced this issue May 11, 2015
jacobsa added a commit that referenced this issue May 11, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented May 11, 2015

Confirmed everything works with hacked up binary at 2e3c25e.

jacobsa added a commit that referenced this issue May 11, 2015
@jacobsa jacobsa closed this as completed May 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant