-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FLINK-4253 - Rename "recovery.mode" config key to "high-availability" (Ram) #2342
Conversation
@@ -502,7 +503,8 @@ object TestingUtils { | |||
configuration: Configuration) | |||
: ActorGateway = { | |||
|
|||
configuration.setString(ConfigConstants.HIGH_AVAILABILITY, ConfigConstants.DEFAULT_HIGH_AVAILABILTY) | |||
configuration.setString(ConfigConstants.HIGH_AVAILABILITY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After wrapping here
[INFO] force-shading ...................................... SUCCESS [ 11.954 s] [INFO] flink .............................................. SUCCESS [01:01 min] [INFO] flink-annotations .................................. SUCCESS [ 7.724 s] [INFO] flink-shaded-hadoop ................................ SUCCESS [ 0.364 s] [INFO] flink-shaded-hadoop2 ............................... SUCCESS [ 15.982 s] [INFO] flink-shaded-include-yarn-tests .................... SUCCESS [ 16.367 s] [INFO] flink-shaded-curator ............................... SUCCESS [ 0.278 s] [INFO] flink-shaded-curator-recipes ....................... SUCCESS [ 1.953 s] [INFO] flink-shaded-curator-test .......................... SUCCESS [ 0.738 s] [INFO] flink-metrics ...................................... SUCCESS [ 0.326 s] [INFO] flink-metrics-core ................................. SUCCESS [ 8.961 s] [INFO] flink-test-utils-parent ............................ SUCCESS [ 0.302 s] [INFO] flink-test-utils-junit ............................. SUCCESS [ 7.145 s] [INFO] flink-core ......................................... SUCCESS [01:27 min] [INFO] flink-java ......................................... SUCCESS [01:03 min] [INFO] flink-runtime ...................................... SUCCESS [03:31 min] [INFO] flink-optimizer .................................... SUCCESS [ 43.211 s] [INFO] flink-clients ...................................... SUCCESS [ 33.062 s] [INFO] flink-streaming-java ............................... SUCCESS [01:11 min] [INFO] flink-test-utils ................................... SUCCESS [ 29.337 s] [INFO] flink-scala ........................................ SUCCESS [01:48 min] [INFO] flink-runtime-web .................................. SUCCESS [ 28.635 s] [INFO] flink-examples ..................................... SUCCESS [ 3.177 s] [INFO] flink-examples-batch ............................... SUCCESS [01:09 min] [INFO] flink-contrib ...................................... SUCCESS [ 0.847 s] [INFO] flink-statebackend-rocksdb ......................... SUCCESS [ 11.992 s] [INFO] flink-tests ........................................ SUCCESS [03:27 min] [INFO] flink-streaming-scala .............................. SUCCESS [01:21 min] [INFO] flink-streaming-connectors ......................... SUCCESS [ 0.532 s] [INFO] flink-connector-flume .............................. SUCCESS [ 8.794 s] [INFO] flink-libraries .................................... SUCCESS [ 0.285 s] [INFO] flink-table ........................................ SUCCESS [02:22 min] [INFO] flink-metrics-jmx .................................. SUCCESS [ 6.208 s] [INFO] flink-connector-kafka-base ......................... SUCCESS [ 14.832 s] [INFO] flink-connector-kafka-0.8 .......................... SUCCESS [ 10.985 s] [INFO] flink-connector-kafka-0.9 .......................... SUCCESS [ 7.464 s] [INFO] flink-connector-elasticsearch ...................... SUCCESS [ 11.186 s] [INFO] flink-connector-elasticsearch2 ..................... SUCCESS [ 11.327 s] [INFO] flink-connector-rabbitmq ........................... SUCCESS [ 5.484 s] [INFO] flink-connector-twitter ............................ SUCCESS [ 6.641 s] [INFO] flink-connector-nifi ............................... SUCCESS [ 6.188 s] [INFO] flink-connector-cassandra .......................... SUCCESS [ 11.955 s] [INFO] flink-connector-redis .............................. SUCCESS [ 7.246 s] [INFO] flink-connector-filesystem ......................... SUCCESS [ 8.218 s] [INFO] flink-batch-connectors ............................. SUCCESS [ 0.283 s] [INFO] flink-avro ......................................... SUCCESS [ 9.166 s] [INFO] flink-jdbc ......................................... SUCCESS [ 7.117 s] [INFO] flink-hadoop-compatibility ......................... SUCCESS [ 9.769 s] [INFO] flink-hbase ........................................ SUCCESS [ 8.976 s] [INFO] flink-hcatalog ..................................... SUCCESS [ 15.990 s] [INFO] flink-examples-streaming ........................... SUCCESS [ 35.892 s] [INFO] flink-gelly ........................................ SUCCESS [ 21.147 s] [INFO] flink-gelly-scala .................................. SUCCESS [ 35.202 s] [INFO] flink-gelly-examples ............................... SUCCESS [ 31.837 s] [INFO] flink-python ....................................... SUCCESS [ 8.998 s] [INFO] flink-ml ........................................... SUCCESS [02:34 min] [INFO] flink-cep .......................................... SUCCESS [ 21.447 s] [INFO] flink-cep-scala .................................... SUCCESS [ 46.998 s] [INFO] flink-scala-shell .................................. SUCCESS [ 50.330 s] [INFO] flink-quickstart ................................... SUCCESS [ 4.723 s] [INFO] flink-quickstart-java .............................. SUCCESS [ 2.686 s] [INFO] flink-quickstart-scala ............................. SUCCESS [ 0.957 s] [INFO] flink-storm ........................................ SUCCESS [ 24.464 s] [INFO] flink-storm-examples ............................... SUCCESS [03:51 min] [INFO] flink-streaming-contrib ............................ SUCCESS [ 18.222 s] [INFO] flink-tweet-inputformat ............................ SUCCESS [ 9.515 s] [INFO] flink-operator-stats ............................... SUCCESS [ 8.018 s] [INFO] flink-connector-wikiedits .......................... SUCCESS [ 6.316 s] [INFO] flink-yarn ......................................... SUCCESS [ 26.015 s] [INFO] flink-dist ......................................... SUCCESS [01:04 min] [INFO] flink-metrics-dropwizard ........................... SUCCESS [ 8.947 s] [INFO] flink-metrics-ganglia .............................. SUCCESS [ 5.176 s] [INFO] flink-metrics-graphite ............................. SUCCESS [ 4.891 s] [INFO] flink-metrics-statsd ............................... SUCCESS [ 7.830 s] [INFO] flink-fs-tests ..................................... SUCCESS [ 3.448 s] [INFO] flink-java8 ........................................ SUCCESS [ 14.214 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 39:09 min
Looks like only this build failed |
Looks good in general. I think this could use a test case that ensures that the configuration parsing accepts for now both the old and the new config key. Can you add such a test? |
I think we need to change more than just that single variable. The other variables should match, e.g. |
Sure I can. The existing test cases helped to ensure that if there are no old configs we are able to fetch from the new config. I can add a test case to ensure both the configs are accepted.
I see. I had this doubt but since the PR was talking about this specific param I thought that would be enough. +1 to change all relevant ones. I can update it in the next PR. Thanks all for the comments. |
Just a query, |
Regarding the priority: Yes, I think so. |
Updated PR request. Handles all other configs and renames them from 'recovery.* to 'high-availability.*'. |
github build shows two failure
Both seems to be unrelated.
Not sure of the problem. |
There are some test instabilities that we are trying to address. Seems unrelated to your changes at the first glance. We'll look at the tests again anyways before merging, |
Ran those two tests individually they seem to pass. But the Rocks_DB version failed due to some initialization error. Thank you @StephanEwen for your comments. |
@uce |
@@ -285,7 +285,7 @@ of the JobManager, because the same ActorSystem is used. Its not possible to use | |||
|
|||
## High Availability Mode | |||
|
|||
- `recovery.mode`: (Default 'standalone') Defines the recovery mode used for the cluster execution. Currently, Flink supports the 'standalone' mode where only a single JobManager runs and no JobManager state is checkpointed. The high availability mode 'zookeeper' supports the execution of multiple JobManagers and JobManager state checkpointing. Among the group of JobManagers, ZooKeeper elects one of them as the leader which is responsible for the cluster execution. In case of a JobManager failure, a standby JobManager will be elected as the new leader and is given the last checkpointed JobManager state. In order to use the 'zookeeper' mode, it is mandatory to also define the `recovery.zookeeper.quorum` configuration value. | |||
- `high-availability`: (Default 'none') Defines the recovery mode used for the cluster execution. Currently, Flink supports the 'none' mode where only a single JobManager runs and no JobManager state is checkpointed. The high availability mode 'zookeeper' supports the execution of multiple JobManagers and JobManager state checkpointing. Among the group of JobManagers, ZooKeeper elects one of them as the leader which is responsible for the cluster execution. In case of a JobManager failure, a standby JobManager will be elected as the new leader and is given the last checkpointed JobManager state. In order to use the 'zookeeper' mode, it is mandatory to also define the `recovery.zookeeper.quorum` configuration value. Previously this config was named 'recovery.mode' and the default config was 'standalone'. | |||
|
|||
- `recovery.zookeeper.quorum`: Defines the ZooKeeper quorum URL which is used to connet to the ZooKeeper cluster when the 'zookeeper' recovery mode is selected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs not updated (same for jobmanager_high_availability.md)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes. Will change them too.
@uce
to
|
@uce |
@uce |
@uce |
@uce |
I will check this later today or on Monday. You will have to wait a little, sorry. |
@uce |
I had to address some things when reviewing this: https://github.com/uce/flink/tree/ram In general the changes were good, but overlooked many occurrences that are not possible to catch with automatic refactoring tools. I went through those manually. I will merge this after my local Travis build passes. |
@uce |
Builds are passing here: https://travis-ci.org/uce/flink/builds/154236174 I'm going to merge this later today. Thanks for starting this renaming. |
Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.
mvn clean verify
has been executed successfully locally or a Travis build has passedI ran
mvn clean verify
- all the tests in flink-runtime passed and this test failedFailed tests: BlobServerDeleteTest.testDeleteAll:157 DELETE operation failed: Server side error: Unable to delete directory C:\Users\rsvasude\AppData\Local\Temp\blobStore-18502f30-ee19-4c4c-9cb7-7b51c9bdeffb\job_801e21ed42b26de3c813cfe4917d029d. LeaderChangeStateCleanupTest.testReelectionOfSameJobManager:244 TaskManager should not be able to register at JobManager.
I think it is an environment issue. Ran other tests changed as part of this PR and they all seems to pass.
Handled backward compatability if the config file has the older config
recover.mode
. The same has been handled in theconfig.sh
script also.Suggestions/feedback welcome.