Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for caching in trade-off for reduced guarantees #29

Closed
jacobsa opened this issue Mar 30, 2015 · 5 comments
Closed

Add support for caching in trade-off for reduced guarantees #29

jacobsa opened this issue Mar 30, 2015 · 5 comments
Assignees
Milestone

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Mar 30, 2015

Right now, gcsfuse caches nothing and allows the kernel to cache nothing. This is in order to support the consistency guarantees documented in semantics.md. But it makes things slow, particularly when the kernel is doing path resolution (which is very frequent).

There is probably room for a --go_fast flag that users can enable if they are okay with relaxed guarantees. I would start with just allowing the kernel to cache attributes and entries, and see if anything else is truly needed. If so, the following additional things may or may not be helpful (measure first to find out!):

  • A "stat cache" mapping object name to most recent gcs.Object record for it, perhaps with TTL. Probably also supports negative entries.
  • A "listing cache" that caches listings for a prefix. This is more subtle, so hopefully we don't need it.
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 28, 2015

Plan

  • Add a gcs.NewFastStatBucket function
    • Accepts wrapped bucket and TTL
    • Maintains a mapping from name to latest record known for that name.
    • Serves from this for StatObject when possible.
    • Invalidates in DeleteObject.
    • Invalidates then updates in CreateObject and UpdateObject.
    • When updating, don't clobber a newer generation.
    • Unit test with mock wrapped bucket.
    • Integration test that sanity checks each method, plus invalidation.
  • Set up gcsfuse to use this when enabled. Document this.
  • See how the ls -l case performs (ls takes a long time for medium-sized directories #39).

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 29, 2015

This is mostly implemented at e82b91f, but doesn't yet make things fast because we don't have a negative cache, necessary because for lookups we stat both the file name and the directory name.

So, still to do:

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 29, 2015

By the way, negative caching will make the second ls -l very fast, but the first will benefit only from positive caching.

The only way I see to make the first very fast is to cache a mapping from name to file/dir/both flags on the directory inode when reading it, then using that to decide what to stat. Yet another kind of caching, with an inconsistent source. Ugh.

jacobsa added a commit to jacobsa/gcloud that referenced this issue Apr 30, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 30, 2015

At 711b5f1a38dbe3ea7387e9b1accd68ee86a99277: ls -l on a 500-file bucket without --stat_caching_ttl set currently takes about 1m. With --stat_caching_ttl 10m, the first takes about 20s and subsequent ones take about 2s. (These numbers are highly variable, seemingly due to either GCS or GCE performance.) Still much slower than I would like, but I think this particular bug is closed. Further work in #39.

(All of this was from asia-east1-a to a standard bucket in Asia.)

@jacobsa jacobsa closed this as completed Apr 30, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 30, 2015

Oops, need to document the flag.

@jacobsa jacobsa reopened this Apr 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant