Allow for parallelism within gcsfuse #23

jacobsa · 2015-03-24T23:13:07Z

As of the changes in d054a1a, every operation happens on a single goroutine, blocking further fuse operations. This is good because it fixes the race discussed in jacobsa/fuse#3, but is certainly not optimal.

We should decide which types of operations need to be serialized (for example, WriteFile/SyncFile/FlushFile for a particular inode) and build queues for those, with a goroutine spawned when the queue goes from empty to non-empty. All other operations can simply receive their own goroutine. Optimization: things that don't block don't need a goroutine.

The text was updated successfully, but these errors were encountered:

Making it clearer it doesn't block. For #23.

jacobsa · 2015-04-01T00:18:06Z

Operations where we block to do something long-running, for which we may want to consider dropping the file system lock:

LookUpInode calls StatObject.
GetInodeAttributes calls StatObject.
SetInodeAttributes may fault in data from GCS, and calls StatObject.
MkDir creates an object.
CreateFile creates an object.
RmDir deletes an object.
Unlink deletes an object.
ReadDir calls ListObjects.
ReadFile may read from GCS.
WriteFile may read from GCS.
SyncFile writes to GCS.
FlushFile writes to GCS.

Others:

Init does nothing.
ForgetInode may delete local files, but doesn't call out to the network.
OpenDir only consults local indexes.
ReleaseDirHandle only manipulates local indexes.
OpenFile only consults local indexes.

While I'm thinking about GoogleCloudPlatform/gcsfuse#23.

jacobsa · 2015-04-01T00:35:06Z

Since I've thought harder about the write/flush/sync case, I will first try to do this there.

jacobsa · 2015-04-01T00:36:15Z

Note to self: make sure to test with the Go race detector.

jacobsa · 2015-04-01T02:12:06Z

I've become less confident that operations on a single inode can be parallelized. For example: will fuse send current read and write requests for the same region of the file, where the user expects the write to happen before the read?

Started a fuse-devel thread asking for documentation. In the meantime, probably safest to serialize all operations for particular inodes.

Properties of the new definition: * The key of the index is an invariant property of the indexed inode. * The key of the index doesn't contain a source generation, since we never strictly need to search by this. * In order to add an entry for an object, you must hold the lock for any other inode for that object's name in the index, excluding concurrent changes to that inode's generation. I hope that the new definition will aid in relaxing which locks we hold during long-lived operations (issue #23). In particular, I don't think we'll need to hold the file system lock around FlushFile or SyncFile.

The new spec allows fileSystem.CheckInvariants to not be racy, since it doesn't lock inodes (and since ideally we don't want to). For #23.

jacobsa · 2015-04-01T08:50:23Z

Notes on lock ordering

Because I'm thinking about it.

Rule: Define a strict partial order < for the locks in your program. Let A and B be locks. Acquire B while holding A only if A < B.

Theorem: If a program follows the rule, it will not deadlock due to lock ordering.

Proof: Assume that a program deadlocks due to lock ordering, i.e. there are locks A and B, and thread 1 acquires A then B, and thread 2 acquires B then A. If A == B, then the rule is violated by both threads, so assume that A != B. If A is incomparable with B, then the rule is violated by both threads, so assume without loss of generality that A < B. But then by asymmetry of <, it is not the case that B < A. So thread 2 is violating the rule. QED

In our case, the partial order is boring: all inode locks come before the file system lock, and inode locks are mutually incomparable. It makes sense for the inode locks to come before the file system lock because the latter should be held only for brief non-blocking operations, while the inode locks are held while potentially talking to the network. If we reversed the order, we would block on acquiring inode locks, transitively blocking on the network, while holding the file system lock.

jacobsa · 2015-04-01T08:55:00Z

Half-thought brain dump. Eventually we want the "look up or create" interface to work like this:

Acquire the file system lock and look up the index entry. If the record has generation less than the cached generation, unlock FS and return nil. If they're equal, we've found our inode (unlock FS and lock inode).

Otherwise, unlock FS and lock inode. Now compare again. If record is less than inode, return nil. If equal to inode, we've found our inode. If greater than inode, lock FS and update index with new inode if it hasn't changed in the meantime. If it has changed in the meantime, start over.

Some of this might be redundant. Maybe we don't need the cache? Maybe it's good enough to use the lock-free source generation "lower bound"?

For #23.

The file system lock should now never be held around network operations. For #23.

In particular, wrote better documentation about what it represents and who modifies it. Removed cachedGen, which was an unnecessary complication. For #23.

This was an okay idea while it lasted, but with the subtle retry loop logic in and around lookUpOrCreateInodeIfNotStale, the implicit directory thing is just a distraction. For #23.

jacobsa · 2015-04-02T04:56:18Z

Branch issue-23 seems to be working decently.

To do:

Run the integration test with a real bucket. Make sure no disasters.
Ditto on Linux.
Add some (temporary) instrumentation that panics when the FS lock is held for very long. Run integration tests again.

jacobsa · 2015-04-27T03:50:31Z

A user reports that issue-23 at a6fd96d is working fine, but even with GOMAXPROCS=4 gcsfuse still uses at most one core when copying several files into a single bucket. Need to confirm.

So, to do:

Run the integration test with a real bucket. Make sure no disasters.
Ditto on Linux.
Add some (temporary) instrumentation that panics when the FS lock is held for very long. Run integration tests again.
Ensure that we actually do get real parallelism. (The above helps with that.)

jacobsa · 2015-04-27T04:23:10Z

Integration test passes on opensuse-13 at a6fd96d.

jacobsa · 2015-04-27T04:23:38Z

Panic for >= 50 ms of lock holding implemented on branch tmp-panic-long-lock (currently at e3d947e).

Doing so may cause a call to the local file system to block, which we don't want to hold the lock around. For #23.

For #23.

jacobsa · 2015-04-27T05:52:56Z

As far as I can tell, this is working fine. I can't reproduce even hitting 100% CPU, so I can't say whether locks are causing parallelism problems. Merging and closing as successful.

jacobsa self-assigned this Mar 24, 2015

jacobsa added this to the beta milestone Mar 24, 2015

jacobsa added a commit that referenced this issue Mar 24, 2015

Removed a TODO covered by issue #23.

62528b0

jacobsa added a commit that referenced this issue Apr 1, 2015

Removed the context argument from lookUpOrCreateInode.

7d9dd9c

Making it clearer it doesn't block. For #23.

jacobsa added a commit to jacobsa/fuse that referenced this issue Apr 1, 2015

Added some code investigation about fuse_flush not waiting for writes.

db2100e

While I'm thinking about GoogleCloudPlatform/gcsfuse#23.

jacobsa added a commit that referenced this issue Apr 1, 2015

Relaxed the locking specification for inodes for constant-ish data.

362c8ee

The new spec allows fileSystem.CheckInvariants to not be racy, since it doesn't lock inodes (and since ideally we don't want to). For #23.

jacobsa added a commit that referenced this issue Apr 1, 2015

Relaxed the locking specification for inodes for constant-ish data.

6b3f07f

The new spec allows fileSystem.CheckInvariants to not be racy, since it doesn't lock inodes (and since ideally we don't want to). For #23.

jacobsa added a commit that referenced this issue Apr 1, 2015

Made inode indexing and locking more parallelism-friendly.

cc78c92

For #23.

jacobsa added a commit that referenced this issue Apr 2, 2015

Greatly relaxed global locking.

109af04

The file system lock should now never be held around network operations. For #23.

jacobsa added a commit that referenced this issue Apr 2, 2015

Greatly relaxed global locking.

7ffa63a

The file system lock should now never be held around network operations. For #23.

jacobsa added a commit that referenced this issue Apr 2, 2015

Clarified the thinking around the inodeIndex member.

4040d6b

In particular, wrote better documentation about what it represents and who modifies it. Removed cachedGen, which was an unnecessary complication. For #23.

jacobsa mentioned this issue Apr 27, 2015

File system-related commands hang while writing a large file #37

Closed

jacobsa added a commit that referenced this issue Apr 27, 2015

Don't hold the file system lock while destroying inodes.

f44c90a

Doing so may cause a call to the local file system to block, which we don't want to hold the lock around. For #23.

jacobsa added a commit that referenced this issue Apr 27, 2015

Don't camp on the parent inode's lock while looking up a child.

5d853af

For #23.

jacobsa closed this as completed in 39ee09c Apr 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for parallelism within gcsfuse #23

Allow for parallelism within gcsfuse #23

jacobsa commented Mar 24, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 2, 2015

jacobsa commented Apr 27, 2015

jacobsa commented Apr 27, 2015

jacobsa commented Apr 27, 2015

jacobsa commented Apr 27, 2015

Allow for parallelism within gcsfuse #23

Allow for parallelism within gcsfuse #23

Comments

jacobsa commented Mar 24, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

jacobsa commented Apr 1, 2015

Notes on lock ordering

jacobsa commented Apr 1, 2015

jacobsa commented Apr 2, 2015

jacobsa commented Apr 27, 2015

jacobsa commented Apr 27, 2015

jacobsa commented Apr 27, 2015

jacobsa commented Apr 27, 2015