Skip to content

Conversation

tillrohrmann
Copy link
Contributor

This PR is based on #3873 and #3864.

This commit introduces a BlobServer#readWriteLock in order to synchronize file creation
and deletion operations in BlobServerConnection and BlobServer. This will prevent
that multiple put, delete and get operations interfere with each other.

The get operations are synchronized using the read lock in order to guarantee some kind of
parallelism.

What this PR does not address is the handling of concurrent writes and reads to the BlobStore. This could be solved via SUCCESS files in order to indicate the completion of a file. However, the first read operation should now happen strictly after the write operation due to the locking.

if (!blobFile.exists()) {
// first we have to release the read lock in order to acquire the write lock
readLock.unlock();
writeLock.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In between upgrading from read lock to write lock, multiple threads can reach this point and as far as I see, then a file can be written more often then required. I would assume the code still produces correct result, but could do duplicate work. An obvious fix would be to re-check blobFile.exists() under the write lock, but now sure if the costs of another meta data query per write could offset occasional, but unlikely duplicate work.

Copy link
Contributor Author

@tillrohrmann tillrohrmann May 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'm also not sure which version would be more efficient. Given that the file transfer dominates the execution time here, I assume that the additional check won't hurt. Will change it accordingly

Copy link
Contributor

@StefanRRichter StefanRRichter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for my one inline comment, the changes look good.
One more thing that I noticed is that the PR also does some changes to the delete path, which are not covered by a (new) test.

@tillrohrmann
Copy link
Contributor Author

Thanks for the quick review @StefanRRichter. I will address your comment and also add a test case for the delete path.

@tillrohrmann tillrohrmann force-pushed the concurrentBlobUploads branch from d308546 to b3f427a Compare May 15, 2017 13:42
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request May 15, 2017
…reation and deletion

This commit introduces a BlobServer#readWriteLock in order to synchronize file creation
and deletion operations in BlobServerConnection and BlobServer. This will prevent
that multiple put and get operations interfere with each other and with get operations.

The get operations are synchronized using the read lock in order to guarantee some kind of
parallelism.

Add Get and Delete operation tests

This closes apache#3888.
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request May 15, 2017
…reation and deletion

This commit introduces a BlobServer#readWriteLock in order to synchronize file creation
and deletion operations in BlobServerConnection and BlobServer. This will prevent
that multiple put and get operations interfere with each other and with get operations.

The get operations are synchronized using the read lock in order to guarantee some kind of
parallelism.

Add Get and Delete operation tests

This closes apache#3888.
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request May 16, 2017
…reation and deletion

This commit introduces a BlobServer#readWriteLock in order to synchronize file creation
and deletion operations in BlobServerConnection and BlobServer. This will prevent
that multiple put and get operations interfere with each other and with get operations.

The get operations are synchronized using the read lock in order to guarantee some kind of
parallelism.

Add Get and Delete operation tests

This closes apache#3888.
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request May 16, 2017
…reation and deletion

This commit introduces a BlobServer#readWriteLock in order to synchronize file creation
and deletion operations in BlobServerConnection and BlobServer. This will prevent
that multiple put and get operations interfere with each other and with get operations.

The get operations are synchronized using the read lock in order to guarantee some kind of
parallelism.

Add Get and Delete operation tests

This closes apache#3888.
The ConjunctFuture now returns the set of future values once it is completed.

Introduce WaitingConjunctFuture; Fix thread safety issue with ResultConjunctFuture

The WaitingConjunctFuture waits for the completion of its futures. The future values
are discarded making it more efficient than the ResultConjunctFuture which returns
the futures' values. The WaitingConjunctFuture is instantiated via
FutureUtils.waitForAll(Collection<Future>).

This closes apache#3873.
…teHandleStore

In order to store completed checkpoints in an increasing order in ZooKeeper,
the paths for the completed checkpoint is no generated by
String.format("/%019d", checkpointId) instead of String.format("/%s", checkpointId).
This makes sure that the converted long will always have the same length with
leading 0s.

Fix failing ZooKeeperCompletedCheckpointStoreITCase

This closes apache#3884.
…abilityServices

The HighAvailabilityService creates a single BlobStoreService instance which is
shared by all BlobServer and BlobCache instances. The BlobStoreService's lifecycle
is exclusively managed by the HighAvailabilityServices. This means that the
BlobStore's content is only cleaned up if the HighAvailabilityService's HA data
is cleaned up. Having this single point of control, makes it easier to decide when
to discard HA data (e.g. in case of a successful job execution) and when to retain
the data (e.g. for recovery).

Close and cleanup all data of BlobStore in HighAvailabilityServices

Use HighAvailabilityServices to create BlobStore

Introduce BlobStoreService interface to hide close and closeAndCleanupAllData methods

This closes apache#3864.
…reation and deletion

This commit introduces a BlobServer#readWriteLock in order to synchronize file creation
and deletion operations in BlobServerConnection and BlobServer. This will prevent
that multiple put and get operations interfere with each other and with get operations.

The get operations are synchronized using the read lock in order to guarantee some kind of
parallelism.

Add Get and Delete operation tests

This closes apache#3888.
@tillrohrmann tillrohrmann force-pushed the concurrentBlobUploads branch from b3f427a to 19e4fac Compare May 16, 2017 22:36
asfgit pushed a commit that referenced this pull request May 17, 2017
…reation and deletion

This commit introduces a BlobServer#readWriteLock in order to synchronize file creation
and deletion operations in BlobServerConnection and BlobServer. This will prevent
that multiple put and get operations interfere with each other and with get operations.

The get operations are synchronized using the read lock in order to guarantee some kind of
parallelism.

Add Get and Delete operation tests

This closes #3888.
@asfgit asfgit closed this in 7ad489d May 17, 2017
@tillrohrmann tillrohrmann deleted the concurrentBlobUploads branch July 6, 2017 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants