New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-6008][blob] collection of BlobServer improvements #4146
Conversation
please note that there are four unrelated test failures:
|
if (key == null) { | ||
throw new IllegalArgumentException("BLOB key must not be null"); | ||
} | ||
checkArgument(key != null, "BLOB key must not be null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requireNonNull
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, checkNotNull
throws a NullPointerException
instead and I did not want to change that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it would be cleaner and less surprising for future users. However I have no strong feelings about it.
// we should make the local and remote file deletion atomic, otherwise we might risk not | ||
// removing the remote file in case of a concurrent put operation | ||
if (blobFile.exists() && !blobFile.delete()) { | ||
throw new IOException("Cannot delete BLOB file " + blobFile.getAbsolutePath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to add a test coverage for not throwing if delete fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the (included) changes in BlobServerDeleteTest
should cover this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry, I didn't find it first time I was looking for it.
@@ -227,7 +227,7 @@ static File getStorageLocation(File storageDir, JobID jobID, String key) { | |||
private static File getJobDirectory(File storageDir, JobID jobID) { | |||
final File jobDirectory = new File(storageDir, JOB_DIR_PREFIX + jobID.toString()); | |||
|
|||
if (!jobDirectory.exists() && !jobDirectory.mkdirs()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change the order of those operations? Is that fix for something? If so could you add explanation to the commit message what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new way is thread-safe - imagine a directory being concurrently created after another thread has failed the exists()
part - not sure we actually fix a bug nowadays since the use may be guarded...so this is probably more of a "just-in-case" thing or if usage patterns change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please add comment or commit message info about this.
|
||
@Override | ||
public void go() throws Exception { | ||
server.getStorageLocation(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you call this at least couple/couple of dozens/couple of hundred times? Otherwise won't this complete before next thread starts up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the job directory is only created once so it doesn't help adding more calls. What could help make this happen more often, is to add more threads to BlobServerPutTest#testServerContentAddressableGetStorageLocationConcurrent()
otherwise the failure only happens once in a while if the implementation is not thread-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe you could call in a loop multiple times gets
and deletes
(if there is such operation) interleaved? Otherwise I don't see a real value of those tests and I would prefer to drop them (so that we don't have to maintain tests that do not check for anything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, concurrency with delete
operations does not work either if not guarded - the directory may not exist anymore between jobDirectory.mkdirs()
and jobDirectory.exists()
. I was able to reproduce the error with the existing test though - if you want to try it, just change the order of these two commands back - the test will not hit every time, but some times.
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra line
also extend the delete tests and remove one code duplication
also add according unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @NicoK. Changes look good to me. I had two minor comments. After addressing them, we can merge this PR.
@@ -540,7 +526,7 @@ else if (type == NAME_ADDRESSABLE) { | |||
// we should make the local and remote file deletion atomic, otherwise we might risk not | |||
// removing the remote file in case of a concurrent put operation | |||
if (blobFile.exists() && !blobFile.delete()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the order of the operations be changed like in the other cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, this could be improved as well - the next PR in this series is removing this code though
catch (Exception e) { | ||
e.printStackTrace(); | ||
fail(e.getMessage()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this and simply let the exception be thrown? I think catching an exception, printing the stack trace and then failing with the exception message is an anti-pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely right - I was told this was an issue some versions ago with JUnit or so... was just copying the code from the other tests in this class. Let me create a cleanup-PR on top of the newest PR in this series, i.e. #4238
merging, |
This PR is a light-weight version of #3512 that only includes the improvements and can serve as a base for FLIP-19. It improves the following things around the
BlobServer
/BlobCache
:config.md
with non-deprecated ones, e.g.high-availability.cluster-id
andhigh-availability.storageDir
BlobServer
when a delete operation failsPreconditions.checkArgument