Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for parallelism within gcsfuse #23

Closed
jacobsa opened this issue Mar 24, 2015 · 11 comments
Closed

Allow for parallelism within gcsfuse #23

jacobsa opened this issue Mar 24, 2015 · 11 comments
Assignees
Milestone

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Mar 24, 2015

As of the changes in d054a1a, every operation happens on a single goroutine, blocking further fuse operations. This is good because it fixes the race discussed in jacobsa/fuse#3, but is certainly not optimal.

We should decide which types of operations need to be serialized (for example, WriteFile/SyncFile/FlushFile for a particular inode) and build queues for those, with a goroutine spawned when the queue goes from empty to non-empty. All other operations can simply receive their own goroutine. Optimization: things that don't block don't need a goroutine.

@jacobsa jacobsa self-assigned this Mar 24, 2015
@jacobsa jacobsa added this to the beta milestone Mar 24, 2015
jacobsa added a commit that referenced this issue Mar 24, 2015
jacobsa added a commit that referenced this issue Apr 1, 2015
Making it clearer it doesn't block. For #23.
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 1, 2015

Operations where we block to do something long-running, for which we may want to consider dropping the file system lock:

  • LookUpInode calls StatObject.
  • GetInodeAttributes calls StatObject.
  • SetInodeAttributes may fault in data from GCS, and calls StatObject.
  • MkDir creates an object.
  • CreateFile creates an object.
  • RmDir deletes an object.
  • Unlink deletes an object.
  • ReadDir calls ListObjects.
  • ReadFile may read from GCS.
  • WriteFile may read from GCS.
  • SyncFile writes to GCS.
  • FlushFile writes to GCS.

Others:

  • Init does nothing.
  • ForgetInode may delete local files, but doesn't call out to the network.
  • OpenDir only consults local indexes.
  • ReleaseDirHandle only manipulates local indexes.
  • OpenFile only consults local indexes.

jacobsa added a commit to jacobsa/fuse that referenced this issue Apr 1, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 1, 2015

Since I've thought harder about the write/flush/sync case, I will first try to do this there.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 1, 2015

Note to self: make sure to test with the Go race detector.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 1, 2015

I've become less confident that operations on a single inode can be parallelized. For example: will fuse send current read and write requests for the same region of the file, where the user expects the write to happen before the read?

Started a fuse-devel thread asking for documentation. In the meantime, probably safest to serialize all operations for particular inodes.

jacobsa added a commit that referenced this issue Apr 1, 2015
Properties of the new definition:

 *  The key of the index is an invariant property of the indexed inode.

 *  The key of the index doesn't contain a source generation, since we
    never strictly need to search by this.

 *  In order to add an entry for an object, you must hold the lock for
    any other inode for that object's name in the index, excluding
    concurrent changes to that inode's generation.

I hope that the new definition will aid in relaxing which locks we hold
during long-lived operations (issue #23). In particular, I don't think
we'll need to hold the file system lock around FlushFile or SyncFile.
jacobsa added a commit that referenced this issue Apr 1, 2015
The new spec allows fileSystem.CheckInvariants to not be racy, since it
doesn't lock inodes (and since ideally we don't want to).

For #23.
jacobsa added a commit that referenced this issue Apr 1, 2015
The new spec allows fileSystem.CheckInvariants to not be racy, since it
doesn't lock inodes (and since ideally we don't want to).

For #23.
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 1, 2015

Notes on lock ordering

Because I'm thinking about it.

Rule: Define a strict partial order < for the locks in your program. Let A and B be locks. Acquire B while holding A only if A < B.

Theorem: If a program follows the rule, it will not deadlock due to lock ordering.

Proof: Assume that a program deadlocks due to lock ordering, i.e. there are locks A and B, and thread 1 acquires A then B, and thread 2 acquires B then A. If A == B, then the rule is violated by both threads, so assume that A != B. If A is incomparable with B, then the rule is violated by both threads, so assume without loss of generality that A < B. But then by asymmetry of <, it is not the case that B < A. So thread 2 is violating the rule. QED

In our case, the partial order is boring: all inode locks come before the file system lock, and inode locks are mutually incomparable. It makes sense for the inode locks to come before the file system lock because the latter should be held only for brief non-blocking operations, while the inode locks are held while potentially talking to the network. If we reversed the order, we would block on acquiring inode locks, transitively blocking on the network, while holding the file system lock.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 1, 2015

Half-thought brain dump. Eventually we want the "look up or create" interface to work like this:

Acquire the file system lock and look up the index entry. If the record has generation less than the cached generation, unlock FS and return nil. If they're equal, we've found our inode (unlock FS and lock inode).

Otherwise, unlock FS and lock inode. Now compare again. If record is less than inode, return nil. If equal to inode, we've found our inode. If greater than inode, lock FS and update index with new inode if it hasn't changed in the meantime. If it has changed in the meantime, start over.

Some of this might be redundant. Maybe we don't need the cache? Maybe it's good enough to use the lock-free source generation "lower bound"?

jacobsa added a commit that referenced this issue Apr 2, 2015
The file system lock should now never be held around network operations.

For #23.
jacobsa added a commit that referenced this issue Apr 2, 2015
The file system lock should now never be held around network operations.

For #23.
jacobsa added a commit that referenced this issue Apr 2, 2015
In particular, wrote better documentation about what it represents and who
modifies it. Removed cachedGen, which was an unnecessary complication.

For #23.
jacobsa added a commit that referenced this issue Apr 2, 2015
This was an okay idea while it lasted, but with the subtle retry loop logic in
and around lookUpOrCreateInodeIfNotStale, the implicit directory thing is just
a distraction.

For #23.
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 2, 2015

Branch issue-23 seems to be working decently.

To do:

  • Run the integration test with a real bucket. Make sure no disasters.
  • Ditto on Linux.
  • Add some (temporary) instrumentation that panics when the FS lock is held for very long. Run integration tests again.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 27, 2015

A user reports that issue-23 at a6fd96d is working fine, but even with GOMAXPROCS=4 gcsfuse still uses at most one core when copying several files into a single bucket. Need to confirm.

So, to do:

  • Run the integration test with a real bucket. Make sure no disasters.
  • Ditto on Linux.
  • Add some (temporary) instrumentation that panics when the FS lock is held for very long. Run integration tests again.
  • Ensure that we actually do get real parallelism. (The above helps with that.)

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 27, 2015

Integration test passes on opensuse-13 at a6fd96d.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 27, 2015

Panic for >= 50 ms of lock holding implemented on branch tmp-panic-long-lock (currently at e3d947e).

jacobsa added a commit that referenced this issue Apr 27, 2015
Doing so may cause a call to the local file system to block, which we don't
want to hold the lock around.

For #23.
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 27, 2015

As far as I can tell, this is working fine. I can't reproduce even hitting 100% CPU, so I can't say whether locks are causing parallelism problems. Merging and closing as successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant