Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-6008][blob] collection of BlobServer improvements #4146

Closed
wants to merge 11 commits into from

Conversation

NicoK
Copy link
Contributor

@NicoK NicoK commented Jun 20, 2017

This PR is a light-weight version of #3512 that only includes the improvements and can serve as a base for FLIP-19. It improves the following things around the BlobServer/BlobCache:

  • replace config options in config.md with non-deprecated ones, e.g. high-availability.cluster-id and high-availability.storageDir
  • do not fail the BlobServer when a delete operation fails
  • add more unit tests
  • general code style and docs improvements, like using Preconditions.checkArgument

@NicoK
Copy link
Contributor Author

NicoK commented Jun 21, 2017

please note that there are four unrelated test failures:

  • 2x Kafka010ITCase
  • 2x YARNSessionCapacitySchedulerITCase

@NicoK NicoK changed the title [FLINK-6008] collection of BlobServer improvements [FLINK-6008][blob] collection of BlobServer improvements Jun 26, 2017
if (key == null) {
throw new IllegalArgumentException("BLOB key must not be null");
}
checkArgument(key != null, "BLOB key must not be null.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, checkNotNull throws a NullPointerException instead and I did not want to change that here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it would be cleaner and less surprising for future users. However I have no strong feelings about it.

// we should make the local and remote file deletion atomic, otherwise we might risk not
// removing the remote file in case of a concurrent put operation
if (blobFile.exists() && !blobFile.delete()) {
throw new IOException("Cannot delete BLOB file " + blobFile.getAbsolutePath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to add a test coverage for not throwing if delete fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the (included) changes in BlobServerDeleteTest should cover this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry, I didn't find it first time I was looking for it.

@@ -227,7 +227,7 @@ static File getStorageLocation(File storageDir, JobID jobID, String key) {
private static File getJobDirectory(File storageDir, JobID jobID) {
final File jobDirectory = new File(storageDir, JOB_DIR_PREFIX + jobID.toString());

if (!jobDirectory.exists() && !jobDirectory.mkdirs()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the order of those operations? Is that fix for something? If so could you add explanation to the commit message what's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new way is thread-safe - imagine a directory being concurrently created after another thread has failed the exists() part - not sure we actually fix a bug nowadays since the use may be guarded...so this is probably more of a "just-in-case" thing or if usage patterns change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please add comment or commit message info about this.


@Override
public void go() throws Exception {
server.getStorageLocation(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you call this at least couple/couple of dozens/couple of hundred times? Otherwise won't this complete before next thread starts up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the job directory is only created once so it doesn't help adding more calls. What could help make this happen more often, is to add more threads to BlobServerPutTest#testServerContentAddressableGetStorageLocationConcurrent() otherwise the failure only happens once in a while if the implementation is not thread-safe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe you could call in a loop multiple times gets and deletes (if there is such operation) interleaved? Otherwise I don't see a real value of those tests and I would prefer to drop them (so that we don't have to maintain tests that do not check for anything).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, concurrency with delete operations does not work either if not guarded - the directory may not exist anymore between jobDirectory.mkdirs() and jobDirectory.exists(). I was able to reproduce the error with the existing test though - if you want to try it, just change the order of these two commands back - the test will not hit every time, but some times.

}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra line

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @NicoK. Changes look good to me. I had two minor comments. After addressing them, we can merge this PR.

@@ -540,7 +526,7 @@ else if (type == NAME_ADDRESSABLE) {
// we should make the local and remote file deletion atomic, otherwise we might risk not
// removing the remote file in case of a concurrent put operation
if (blobFile.exists() && !blobFile.delete()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the order of the operations be changed like in the other cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, this could be improved as well - the next PR in this series is removing this code though

catch (Exception e) {
e.printStackTrace();
fail(e.getMessage());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this and simply let the exception be thrown? I think catching an exception, printing the stack trace and then failing with the exception message is an anti-pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely right - I was told this was an issue some versions ago with JUnit or so... was just copying the code from the other tests in this class. Let me create a cleanup-PR on top of the newest PR in this series, i.e. #4238

@zentol
Copy link
Contributor

zentol commented Jul 4, 2017

merging,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants