-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-978] Support bulk get file size in GcsUtil. #1359
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failures are checkstyle, FYI.
} | ||
throw new IOException("Unable to get file size", e); | ||
} | ||
List<Long> fileSize(Collection<GcsPath> paths) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readability nit: I would prefer fileSizes
. At call sites, this makes the meaning of the collection more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer sizes
; is files not implicit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to fileSizes:
I feel there could be ambiguity for "GcsUtil.sizes(paths)".
@@ -483,12 +475,39 @@ public Void call() throws IOException { | |||
Thread.currentThread().interrupt(); | |||
throw new IOException("Interrupted while executing batch GCS request", e); | |||
} catch (ExecutionException e) { | |||
if (e.getCause() instanceof FileNotFoundException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get all the exceptions from failed requests? Are they already contained in this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, e will be the first exception from all Futures.
I am going to propose a bulk get metadata API. And, with Metadata, we can wait for all requests to finish, and store the exception in Metadata if any request fails.
MockLowLevelHttpResponse notFoundResponse = new MockLowLevelHttpResponse(); | ||
notFoundResponse.setContent(""); | ||
notFoundResponse.setStatusCode(HttpStatusCodes.STATUS_CODE_NOT_FOUND); | ||
String content = "Content-Type: application/http\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems overspecified. It should not require any of the details, just the HTTP 404 error code and perhaps the nested {error:...}
bit.
+ " \"generation\": \"1470763581808000\",\n" | ||
+ " \"metageneration\": \"2\",\n" | ||
+ " \"contentType\": \"application/txt\",\n" | ||
+ " \"timeCreated\": \"2016-08-09T17:26:21.801Z\",\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very overspecified. Anything that could change between requests, such as creation time, etc, should not be included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems as though we should be able to generate a response of the type we desire instead of dumping in a response string. Is that not the case?
String content) throws Exception { | ||
MockLowLevelHttpResponse notFoundResponse = new MockLowLevelHttpResponse() | ||
.setContentType("multipart/mixed; boundary=batch_foobarbaz") | ||
.setContent(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, where you mock the low level response, I think you should either omit or fill in junk for pieces that GcsUtil
should not depend on. But you shouldn't mock the low level response unless our code relies on it. Instead, mock the storageClient
.
@@ -266,16 +266,16 @@ Integer getUploadBufferSizeBytes() { | |||
* Returns the file size from GCS or throws {@link FileNotFoundException} | |||
* if the resource does not exist. | |||
*/ | |||
public long fileSize(GcsPath path) throws IOException { | |||
return fileSize(ImmutableList.of(path)).get(0); | |||
public long fileSizes(GcsPath path) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should still be fileSize
c95aa4e
to
57b8c74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to defer improving the bulk get fileSizes together with https://issues.apache.org/jira/browse/BEAM-989
Since copy() and remove() need similar tests, I will just do them together.
I propose to merge this PR. And, I will send another one to fix all tests. Otherwise, I can append a commit maybe few days later.
} | ||
throw new IOException("Unable to get file size", e); | ||
} | ||
List<Long> fileSize(Collection<GcsPath> paths) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to fileSizes:
I feel there could be ambiguity for "GcsUtil.sizes(paths)".
@@ -483,12 +475,39 @@ public Void call() throws IOException { | |||
Thread.currentThread().interrupt(); | |||
throw new IOException("Interrupted while executing batch GCS request", e); | |||
} catch (ExecutionException e) { | |||
if (e.getCause() instanceof FileNotFoundException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, e will be the first exception from all Futures.
I am going to propose a bulk get metadata API. And, with Metadata, we can wait for all requests to finish, and store the exception in Metadata if any request fails.
Refer to this link for build results (access rights to CI server needed): |
I need mockito 2 to mock the final class BatchRequest. So, I suggest to move forward with this PR and address https://issues.apache.org/jira/browse/BEAM-989 later on. thoughts? |
Our use of bytebuddy should be entirely private. Can you please file a JIRA to shade it? Unfortunately, we have hit blockers in getting the SDK's private dependencies shaded by default, so we need to still just aggressively shade things right away when they cause any trouble. |
Filed https://issues.apache.org/jira/browse/BEAM-1019 More issues came out when updating to mockito 2. (Looks like there are few tests depends on the internal details of mock classes.) I am not going to jump into the rabbit hole right now, will address them one by one. |
Following up on my last comment: good shading by Mockito would help us, but our own shading cannot help. But we can probably upgrade bytebuddy without too much difficulty, despite backwards-incompatible changes between 1.4.x and 1.5.x |
+ " }\n" | ||
+ "}\n" | ||
+ "\n" | ||
+ "--batch_CnzAnRit7rQ_AAax2S5XfdM--\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This boundary should be in a constant, and match below.
+ "HTTP/1.1 200 OK\n" | ||
+ "Content-Length: 708\n" | ||
+ "\n" | ||
+ "{\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does there exist a Java API to create an object that will dump this JSON, so you don't have to know and maintain the exact format?
I'm OK with moving on with this since PRs to clean it up are already in place. |
Refer to this link for build results (access rights to CI server needed): |
PTAL I removed the test that uses long fragile string, but kept the shorter one. |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull request
mvn clean verify
. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.