-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-8801][yarn/s3] Fix jars downloading issues due to inconsistent timestamp in S3 Filesystem #8215
[FLINK-8801][yarn/s3] Fix jars downloading issues due to inconsistent timestamp in S3 Filesystem #8215
Conversation
…on consistent read-after-write" This reverts commit c90a757.
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
@yanyan300300 thanks for your contribution. would you mind creating an issue for this, and update the title and description of this patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it looks like this is the better approach. Aside from some minor details in the code, I would only have one concern which I'm not sure about:
- Are we allowed to change the local resource's time? What are the implications?
I guess, this is the local resource file that was uploaded to the JM and there are no further implications but I'm not 100% sure.
Yes, Flink will need to set the local resource's time explicitly. This is also the case before your change. |
@tillrohrmann @aljoscha Could you kindly review this change? Thanks! |
@tillrohrmann @aljoscha @NicoK Could I get a review for this PR? Thanks! |
Can you clarify a bit on this? Where was this done? My problem is only that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this change @yanyan300300. LGTM +1 for merging.
Concerning @NicoK comment, I think we don't change the timestamp of the local config file but we set the timestamp of the LocalResource
to make it consistent with the file which we just uploaded to the user's home directory. I think this should be fine since we did the same before c90a757. Correct me if I understood your question incorrectly.
|
||
// now create the resource instance | ||
LocalResource resource = registerLocalResource(dst, localFile.length(), dstModificationTime > 0 ? dstModificationTime | ||
: localFile.lastModified()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: actually, this line is quite long and then broken strangely - how about instead integrating that distinction in the variable itself, when it is created? Also, while we're at it, we could log that we are using the local file's timestamp instead:
final long dstModificationTime;
if (fss != null && fss.length > 0) {
dstModificationTime = fss[0].getModificationTime();
LOG.debug("Got modification time {} from remote path {}", dstModificationTime, dst);
} else {
dstModificationTime = localFile.lastModified();
LOG.debug("Failed to fetch remote modification time from {}, using local timestamp {}", dst, dstModificationTime);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. When a var can be assigned to different values, personally I prefer to use a default and then override it for other scenarios. There are two benefits: 1) reduce the required number of if/else branches 2) higher line coverage. In this relatively simple case, yes what you suggest also fits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not convinced by your arguments @yanyan300300. The ternary operator implicitly introduces if/else branches which need to be checked by test cases separately (as you would do it if you have explicit if/else branches). Moreover, the higher line coverage only results because one and the same line can result in two values.
In fact, I think these arguments are window dressing and even a bit dangerous because they only aim to improve some static test criteria which, if handled this way, give you a false impression that your program is correctly tested. I mean what does it help you if the line coverage says you've covered all lines but not the case where the ternary operator returns the third argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I should have made it clearer. For 1) I meant it requires one less level of indentation with else
. Personally I felt it is easier to read. For 2) Usually to check how many conditions are tested, branch coverage is looked at along with line coverage.
For this case, sure I can update according to @NicoK 's suggestion. Could you confirm if that is also what you are suggesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some value in the improved debug logging statements. Apart from that I don't have a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I updated as what @NicoK suggested.
Thanks @tillrohrmann, yes, I must have mixed things up there - you are right about setting the timestamp of the |
@NicoK Could you approve this PR if you don't have further concerns? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @yanyan300300
lgtm
@yanyan300300 can you squash your commits so that we'll have one commit for reverting the old implementation and one with your changes? Please also tag your commit with "[FLINK-8801]" then. I'll merge it afterwards. |
@NicoK Thanks for the approval! If I squash in my feature branch, this PR will be closed due to altered commit history. Could you do it at your end? https://help.github.com/en/articles/about-pull-request-merges |
@yanyan300300 strange - I do this quite frequently in my PRs - squashing work-in-progress git history, then force-push to my branch and the PR gets updated. But I can also do this on my end. |
merged via 770a404 |
Thanks @NicoK Will do this next time. |
What is the purpose of the change
This change reverts the original fix for FLINK-8801's due to its insufficient in some scenarios where S3AFilesystem does not implement
setTimes
methods.Instead, it uses retry to wait for the remote file to be available and overwrite the local file timestamp.
Brief change log
FileNotFoundException
Verifying this change
This change is already covered by existing tests, such as
YarnFileStageTestS3ITCase
for the upload path via S3.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation