New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-26015] Fixes object store bug #18692
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 6fc55ff (Wed Feb 09 19:40:50 UTC 2022) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
572371c
to
6d3fd5e
Compare
The license checker failed because of the |
|
||
final JobGraph jobGraph = JobGraphTestUtils.singleNoOpJobGraph(); | ||
|
||
flinkCluster.runDetached(jobGraph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of the FileSystemJobResultStore
issue covered by this PR, this test would run forever because the error appeared during the initialization. I didn't find a better way to assert the state of the MiniCluster
here. Checking the isRunning()
method wouldn't help because it's switched immediately into state running.
IMHO, we should make the FatalErrorHandler available through the MiniClusterResource
. Right now, it's instantiated internally calling a closeAsync()
on the MiniCluster
. Making this one accessible in the test would help adding an assert on that. But it felt like a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to create a ticket to cover this functionality after my current approach was reviewed to make sure that I don't miss anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zentol I would like to get your opinion on this issue. Would you agree that exposing the FatalErrorHandler
for testing purposes is a valid improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be fine I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm curious for what event you'd be waiting on. The handler doesn't give you more information than running the job, does it?
If you'd disable restarts and run the job non-detached we could simplify the test a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that MiniCluster is running in a separate thread. It fails while I'm submitting the Job which keeps the job submission to never complete. But using the non-detached job submission is a good point: I added a timeout to that to make the test fail earlier. That way, the test would run forever if there was an initialization error.
Initially, my idea was that I could wait for the FatalErrorHandler
(if it would be exposed outside of the MiniCluster) to report an error and fail in that case. Waiting for a reasonable long time with no error occurring would have been an indication that the initialization of the MiniCluster succeeded and I could have continued with the test.
But that approach has the downside that the test has to wait in cases where it actually succeeds which is bad in terms of test runtime. Making the job submission fail after a specific timeout sounds like the better approach. I adapted the test accordingly.
flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java
Outdated
Show resolved
Hide resolved
flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a licensing issue with the PR.
...ystems/flink-s3-fs-presto/src/test/java/org/apache/flink/fs/s3presto/MinioTestContainer.java
Outdated
Show resolved
Hide resolved
@@ -45,4 +45,6 @@ | |||
public static final String PULSAR = "apachepulsar/pulsar:2.8.0"; | |||
|
|||
public static final String CASSANDRA_3 = "cassandra:3.0"; | |||
|
|||
public static final String MINIO = "minio/minio:edge"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what stability does "edge" provide is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minio/minio:edge
seems to be an old image version which caused weird errors with the hadoop s3 filesystem. I updated the tag to the most recent release which resolved the issue (@rmetzger used this old image in his test as well). That explains why we observed the same error with the s3 hadoop file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the versioning again. edge
seems to be a development version:
$ docker run --entrypoint="" \
-p 9000:9000 \
-p 9001:9001 \
-v /tmp/minio/data2:/data \
-e "MINIO_ROOT_USER=minio" \
-e "MINIO_ROOT_PASSWORD=minio123" \
-it minio/minio:edge /opt/bin/minio --version
minio version DEVELOPMENT.2021-11-23T00-07-23Z
It includes a bug that causes the getFileStatus
to fail on an empty directory to fail. The observed behavior is also present in the released version RELEASE.2021-11-24T23-19-33Z
but fixed in the subsequent version RELEASE.2021-12-09T06-19-41Z
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does that mean all this was caused by minio being buggy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks like it the hadoop S3 FS issue on the Flink side was caused by Minio... I'm trying to understand a bit better still digging through the diff on the minio side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging the behavior from the Flink side brings me to the conclusion that it's caused by the response coming from a listObjects
call. It looks like Minio PR #13804 is the fix that was added in RELEASE.2021-12-09T06-19-41Z
that's relevant for us here. The PR description also points to S3AFileSystem
"utilizing the behavior that is fixed in that PR". I leave it like that without verifying it through a custom build to not put more time into that investigation.
Sadly, the JRS still doesn't work on K8s, using a minio s3 implementation:
The directory exists:
|
...k-test-utils-junit/src/main/java/org/apache/flink/core/testutils/TestContainerExtension.java
Show resolved
Hide resolved
...ystems/flink-s3-fs-presto/src/test/java/org/apache/flink/fs/s3presto/MinioTestContainer.java
Outdated
Show resolved
Hide resolved
...ystems/flink-s3-fs-presto/src/test/java/org/apache/flink/fs/s3presto/MinioTestContainer.java
Outdated
Show resolved
Hide resolved
...ystems/flink-s3-fs-presto/src/test/java/org/apache/flink/fs/s3presto/MinioTestContainer.java
Outdated
Show resolved
Hide resolved
...ms/flink-s3-fs-presto/src/test/java/org/apache/flink/fs/s3presto/MinioTestContainerTest.java
Outdated
Show resolved
Hide resolved
...ms/flink-s3-fs-presto/src/test/java/org/apache/flink/fs/s3presto/MinioTestContainerTest.java
Outdated
Show resolved
Hide resolved
@@ -155,18 +155,15 @@ public boolean hasCleanJobResultEntryInternal(JobID jobId) throws IOException { | |||
@Override | |||
public Set<JobResult> getDirtyResultsInternal() throws IOException { | |||
final Set<JobResult> dirtyResults = new HashSet<>(); | |||
final FileStatus fs = fileSystem.getFileStatus(this.basePath); | |||
if (fs.isDir()) { | |||
FileStatus[] statuses = fileSystem.listStatus(this.basePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this can fix anything, because the S3AFileSystem#listStatus uses getFileStatus internally. The presto implementation for both methods also do the same request against the server (#listPrefix).
Are we sure we're drawing the right conclusions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The presto test still fails with this change being reverted. PrestoS3FileSystem
does not support getFileStatus
on empty directories. Removing the isDir
check fixes that issue. I do another round over the presto implementation tomorrow to understand why it is. FLINK-26061 covers the difference between presto and hadoop s3 fs
updated the comment to make more sense out of it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: I looked through the code once more. The PrestoS3FileSystem
does not support getFileStatus
on empty directories. You're right when you said that the mkdir
call doesn't create anything (see PrestoS3FileSystem:520). But the getFileStatus
method tries to get the FileStatus
of the object at the given path. If that does not exist, it will look for objects having the path as a prefix (through listObject
). A FileNotFoundException
is thrown if no objects live underneath the passed path (which corresponds to an empty directory, see PrestoS3FileSystem:361).
...-runtime/src/test/java/org/apache/flink/runtime/highavailability/AbstractHaJobRunITCase.java
Outdated
Show resolved
Hide resolved
...-runtime/src/test/java/org/apache/flink/runtime/highavailability/AbstractHaJobRunITCase.java
Outdated
Show resolved
Hide resolved
...ystems/flink-s3-fs-presto/src/test/java/org/apache/flink/fs/s3presto/MinioTestContainer.java
Outdated
Show resolved
Hide resolved
#18692 (comment) would be the most pressing comment. |
ad21c08
to
e62d230
Compare
Apparently, the failure was caused by an old Docker image being used for the test which I also used in my initial testing efforts. I addressed this in this comment above. In the mean time @rmetzger ran his test with a new Minio version without problems. Additionally, he ran the test on AWS S3 without running into the issue. All tests were performed with Hadoop S3 filesystem loaded. |
Thanks @zentol for your thorough. I addressed/responded to all your comments. I squashed the commits and rebased the branch. |
@flinkbot run azure |
I created FLINK-26087 to have the issue covered that AzureCI is not properly setup to run the S3-related tests (it looks like the environment variables are missing in AzureCI). |
...-runtime/src/test/java/org/apache/flink/runtime/highavailability/AbstractHaJobRunITCase.java
Outdated
Show resolved
Hide resolved
...-runtime/src/test/java/org/apache/flink/runtime/highavailability/AbstractHaJobRunITCase.java
Outdated
Show resolved
Hide resolved
...ink-s3-fs-base/src/test/java/org/apache/flink/fs/s3/common/HAJobRunOnMinioS3StoreITCase.java
Outdated
Show resolved
Hide resolved
@zentol I addressed your comments. Shall we proceed with squashing the PR or did I miss something? |
Well we no longer need the changes in the filestore#getDirtyResultsInternal then, correct? |
...untime/src/main/java/org/apache/flink/runtime/highavailability/FileSystemJobResultStore.java
Outdated
Show resolved
Hide resolved
...untime/src/main/java/org/apache/flink/runtime/highavailability/FileSystemJobResultStore.java
Show resolved
Hide resolved
We do. The Presto S3 Filesystem requires it. That's what FLINK-26061 is about. |
getFileStatus does not work with object stores like s3 on directories. I created FLINK-26061 to cover a proper solution for this. We should align the contracts here and make it more obvious.
… object store as a backend storage
Thanks for the second review @zentol. I addressed/responded to your comments. I reorganized the commits and squashed the individual commits together after addressing changes. Additionally, the branch got rebased |
I'm wondering if that isn't yet again a minio issue. From everything I have read presto assumes that everything is an empty directory by default (hence why mkdirs doesn't do anything). So unless you check that a directory does not exist things should be fine. https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/directory_markers.html Did you step through PrestoS3FileSystem to see which call fails? How does the test (FileSystemBehaviorTestSuite.testMkdirsCreatesParentDirectories) actually fail? Is the assertion failing, or is the FS throwing a FileNotFoundException. |
After looking through the Presto code once, we can conclude that it's not a Minio issue and the change is required. See the explanation below
I guess, I see where the confusion is coming: I missed the fact that the Hence, change in question (removing the directory check) is still necessary for |
What is the purpose of the change
FileSystem
implementations based on object stores like S3 do not support directories.FileSystem.getFileStatus
fails with aFileNotFoundException
in such a case. This also happens despite the fact thatFileSystem.mkdirs
doesn't fail.Additionally, I created FLINK-26061 as a follow-up to cover the inconsistency of the
FileSystem
contract.Brief change log
FileSystemJobResultStore
since it's actually obsoleteVerifying this change
MinioTestContainerTest
was added to test rudimentary functionality of theMinioTestContainer
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation