-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-13633][coordination] Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of high-availability storage #9598
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 140ccb0 (Wed Oct 16 08:18:32 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @wangyang0918! I left several comments.
Also I think it is worth to add some javadoc in ZooKeeperUtils
to describe the layout HAStorage should be. It would help contributors include ourselves to understand the design here later. You can see also ZooKeeperHaServices
.
flink-tests/pom.xml
Outdated
@@ -189,6 +189,22 @@ under the License. | |||
<scope>test</scope> | |||
</dependency> | |||
|
|||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice you mentioned
Dependencies (does it add or upgrade a dependency): (yes / no)
in pull request description but it actually does upgrade a dependency. Could you please explain why this upgrade needed and correspondingly update the description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tisonkun
Do we need to add the doc to describe the layout in util classes? I think they stay in ZooKeeperHaServices is more reasonable.
For the hdfs related dependencies, we have to add to use MiniDFSCluster in the tests. And the scope is only for test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the hdfs related dependencies, we have to add to use MiniDFSCluster in the tests. And the scope is only for test.
ok I see
I think they stay in ZooKeeperHaServices is more reasonable.
make sense to me
String rootPath = configuration.getValue(HighAvailabilityOptions.HA_STORAGE_PATH); | ||
|
||
if (rootPath == null || StringUtils.isBlank(rootPath)) { | ||
throw new IllegalConfigurationException("Missing high-availability storage path for metadata." + | ||
" Specify via configuration key '" + HighAvailabilityOptions.HA_STORAGE_PATH + "'."); | ||
} else { | ||
return new FileSystemStateStorageHelper<T>(rootPath, prefix); | ||
final String clusterId = configuration.getValue(HighAvailabilityOptions.HA_CLUSTER_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if HighAvailabilityOptions.HA_CLUSTER_ID
configured wrongly? Shall we perform a runtime checker to guard clusterId
is valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean configured wrongly? In standalone mode, it is configured by user. In Yarn/mesos mode, it will be set automatically. Also it have default value and will not be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes your right. It's my mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is done partially because Till check the clusterId is not empty or whitespace-only. I remember my concern here previous that if the path concatenate is not a valid path it might fail later.
Any updates? Also ping @azagrebin could you take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this improvement @wangyang0918. The change itself looks good to me.
The thing I would like to improve is the test case. I think we don't need to spin up an HDFS testing cluster just to test a path. Instead I would simply test what getClusterHighAvailabilityStoragePath
returns. Of course this does not give you the same test coverage but I think it would be good enough. I'll push an commit which simplifies the test a bit.
22408a3
to
c13c748
Compare
…eStoragePath Let BlobUtils and ZooKeeperUtils call HighAvailabilityServiceUtils.getClusterHighAvailableStoragePath to obtain cluster wide high available storage path. This closes apache#9598.
I've pushed an update which simplifies the test and makes sure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your update @tillrohrmann! The de-duplicate commits look good to me.
Checkstyle complains with unused import. Please fix it.
@@ -23,9 +23,9 @@ | |||
import org.apache.flink.configuration.Configuration; | |||
import org.apache.flink.configuration.ConfigurationUtils; | |||
import org.apache.flink.configuration.HighAvailabilityOptions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import that checkstyle complains with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Will fix it.
…point to cluster-id subdirectory of high-availability storage
…eStoragePath Let BlobUtils and ZooKeeperUtils call HighAvailabilityServiceUtils.getClusterHighAvailableStoragePath to obtain cluster wide high available storage path. This closes apache#9598.
c44c9e7
to
140ccb0
Compare
I've updated the PR to remove the unused import. |
Travis passed. I will merge this PR now. Thanks a lot for the initial authoring @wangyang0918 and the review @tisonkun. |
…eStoragePath Let BlobUtils and ZooKeeperUtils call HighAvailabilityServiceUtils.getClusterHighAvailableStoragePath to obtain cluster wide high available storage path. This closes apache#9598.
What is the purpose of the change
This pull request moves submittedJobGraph and completedCheckpoint to cluster-id subdirectory of high-availability storage. If the flink cluster terminates exceptionally, some external tools could be used to clean up these residual files.
Brief change log
Use cluster-id sub directory instead of ha storage in
ZookeeperUtils#createCompletedCheckpoints()
andZookeeperUtils#createJobGraphs()
.Verifying this change
This change added tests and can be verified as follows:
ZooKeeperHaStorageITCase
to check the high availability storage directory structure.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation