FLINK-4253 - Rename "recovery.mode" config key to "high-availability" (Ram) #2342

ramkrish86 · 2016-08-09T09:21:36Z

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

General
- The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
- The pull request addresses only one issue
- Each commit in the PR has a meaningful commit message (including the JIRA id)
Documentation
- Documentation has been added for new functionality
- Old documentation affected by the pull request has been updated
- JavaDoc for public methods has been added
Tests & Build
- Functionality added by the pull request is covered by tests
- mvn clean verify has been executed successfully locally or a Travis build has passed

I ran mvn clean verify - all the tests in flink-runtime passed and this test failed
Failed tests: BlobServerDeleteTest.testDeleteAll:157 DELETE operation failed: Server side error: Unable to delete directory C:\Users\rsvasude\AppData\Local\Temp\blobStore-18502f30-ee19-4c4c-9cb7-7b51c9bdeffb\job_801e21ed42b26de3c813cfe4917d029d. LeaderChangeStateCleanupTest.testReelectionOfSameJobManager:244 TaskManager should not be able to register at JobManager.
I think it is an environment issue. Ran other tests changed as part of this PR and they all seems to pass.
Handled backward compatability if the config file has the older config recover.mode. The same has been handled in the config.sh script also.

Suggestions/feedback welcome.

ramkrish86 · 2016-08-09T17:31:30Z

flink-runtime/src/test/scala/org/apache/flink/runtime/testingUtils/TestingUtils.scala

@@ -502,7 +503,8 @@ object TestingUtils {
      configuration: Configuration)
  : ActorGateway = {

-    configuration.setString(ConfigConstants.HIGH_AVAILABILITY, ConfigConstants.DEFAULT_HIGH_AVAILABILTY)
+    configuration.setString(ConfigConstants.HIGH_AVAILABILITY,


After wrapping here
[INFO] force-shading ...................................... SUCCESS [ 11.954 s] [INFO] flink .............................................. SUCCESS [01:01 min] [INFO] flink-annotations .................................. SUCCESS [ 7.724 s] [INFO] flink-shaded-hadoop ................................ SUCCESS [ 0.364 s] [INFO] flink-shaded-hadoop2 ............................... SUCCESS [ 15.982 s] [INFO] flink-shaded-include-yarn-tests .................... SUCCESS [ 16.367 s] [INFO] flink-shaded-curator ............................... SUCCESS [ 0.278 s] [INFO] flink-shaded-curator-recipes ....................... SUCCESS [ 1.953 s] [INFO] flink-shaded-curator-test .......................... SUCCESS [ 0.738 s] [INFO] flink-metrics ...................................... SUCCESS [ 0.326 s] [INFO] flink-metrics-core ................................. SUCCESS [ 8.961 s] [INFO] flink-test-utils-parent ............................ SUCCESS [ 0.302 s] [INFO] flink-test-utils-junit ............................. SUCCESS [ 7.145 s] [INFO] flink-core ......................................... SUCCESS [01:27 min] [INFO] flink-java ......................................... SUCCESS [01:03 min] [INFO] flink-runtime ...................................... SUCCESS [03:31 min] [INFO] flink-optimizer .................................... SUCCESS [ 43.211 s] [INFO] flink-clients ...................................... SUCCESS [ 33.062 s] [INFO] flink-streaming-java ............................... SUCCESS [01:11 min] [INFO] flink-test-utils ................................... SUCCESS [ 29.337 s] [INFO] flink-scala ........................................ SUCCESS [01:48 min] [INFO] flink-runtime-web .................................. SUCCESS [ 28.635 s] [INFO] flink-examples ..................................... SUCCESS [ 3.177 s] [INFO] flink-examples-batch ............................... SUCCESS [01:09 min] [INFO] flink-contrib ...................................... SUCCESS [ 0.847 s] [INFO] flink-statebackend-rocksdb ......................... SUCCESS [ 11.992 s] [INFO] flink-tests ........................................ SUCCESS [03:27 min] [INFO] flink-streaming-scala .............................. SUCCESS [01:21 min] [INFO] flink-streaming-connectors ......................... SUCCESS [ 0.532 s] [INFO] flink-connector-flume .............................. SUCCESS [ 8.794 s] [INFO] flink-libraries .................................... SUCCESS [ 0.285 s] [INFO] flink-table ........................................ SUCCESS [02:22 min] [INFO] flink-metrics-jmx .................................. SUCCESS [ 6.208 s] [INFO] flink-connector-kafka-base ......................... SUCCESS [ 14.832 s] [INFO] flink-connector-kafka-0.8 .......................... SUCCESS [ 10.985 s] [INFO] flink-connector-kafka-0.9 .......................... SUCCESS [ 7.464 s] [INFO] flink-connector-elasticsearch ...................... SUCCESS [ 11.186 s] [INFO] flink-connector-elasticsearch2 ..................... SUCCESS [ 11.327 s] [INFO] flink-connector-rabbitmq ........................... SUCCESS [ 5.484 s] [INFO] flink-connector-twitter ............................ SUCCESS [ 6.641 s] [INFO] flink-connector-nifi ............................... SUCCESS [ 6.188 s] [INFO] flink-connector-cassandra .......................... SUCCESS [ 11.955 s] [INFO] flink-connector-redis .............................. SUCCESS [ 7.246 s] [INFO] flink-connector-filesystem ......................... SUCCESS [ 8.218 s] [INFO] flink-batch-connectors ............................. SUCCESS [ 0.283 s] [INFO] flink-avro ......................................... SUCCESS [ 9.166 s] [INFO] flink-jdbc ......................................... SUCCESS [ 7.117 s] [INFO] flink-hadoop-compatibility ......................... SUCCESS [ 9.769 s] [INFO] flink-hbase ........................................ SUCCESS [ 8.976 s] [INFO] flink-hcatalog ..................................... SUCCESS [ 15.990 s] [INFO] flink-examples-streaming ........................... SUCCESS [ 35.892 s] [INFO] flink-gelly ........................................ SUCCESS [ 21.147 s] [INFO] flink-gelly-scala .................................. SUCCESS [ 35.202 s] [INFO] flink-gelly-examples ............................... SUCCESS [ 31.837 s] [INFO] flink-python ....................................... SUCCESS [ 8.998 s] [INFO] flink-ml ........................................... SUCCESS [02:34 min] [INFO] flink-cep .......................................... SUCCESS [ 21.447 s] [INFO] flink-cep-scala .................................... SUCCESS [ 46.998 s] [INFO] flink-scala-shell .................................. SUCCESS [ 50.330 s] [INFO] flink-quickstart ................................... SUCCESS [ 4.723 s] [INFO] flink-quickstart-java .............................. SUCCESS [ 2.686 s] [INFO] flink-quickstart-scala ............................. SUCCESS [ 0.957 s] [INFO] flink-storm ........................................ SUCCESS [ 24.464 s] [INFO] flink-storm-examples ............................... SUCCESS [03:51 min] [INFO] flink-streaming-contrib ............................ SUCCESS [ 18.222 s] [INFO] flink-tweet-inputformat ............................ SUCCESS [ 9.515 s] [INFO] flink-operator-stats ............................... SUCCESS [ 8.018 s] [INFO] flink-connector-wikiedits .......................... SUCCESS [ 6.316 s] [INFO] flink-yarn ......................................... SUCCESS [ 26.015 s] [INFO] flink-dist ......................................... SUCCESS [01:04 min] [INFO] flink-metrics-dropwizard ........................... SUCCESS [ 8.947 s] [INFO] flink-metrics-ganglia .............................. SUCCESS [ 5.176 s] [INFO] flink-metrics-graphite ............................. SUCCESS [ 4.891 s] [INFO] flink-metrics-statsd ............................... SUCCESS [ 7.830 s] [INFO] flink-fs-tests ..................................... SUCCESS [ 3.448 s] [INFO] flink-java8 ........................................ SUCCESS [ 14.214 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 39:09 min

ramkrish86 · 2016-08-10T04:12:22Z

Looks like only this build failed
https://travis-ci.org/apache/flink/jobs/150992610 and that too due to a cassandra-connector test case. Should be unrelated to this PR.

StephanEwen · 2016-08-11T12:22:35Z

Looks good in general. I think this could use a test case that ensures that the configuration parsing accepts for now both the old and the new config key.

Can you add such a test?

uce · 2016-08-11T13:28:47Z

I think we need to change more than just that single variable. The other variables should match, e.g.
recovery.jobmanager.port => high-availability.jobmanager.port and also recovery.zookeeper.* => recovery.zookeeper.* etc.

ramkrish86 · 2016-08-12T04:25:35Z

Can you add such a test?

Sure I can. The existing test cases helped to ensure that if there are no old configs we are able to fetch from the new config. I can add a test case to ensure both the configs are accepted.

The other variables should match, e.g.
recovery.jobmanager.port => high-availability.jobmanager.port and also recovery.zookeeper.* => recovery.zookeeper.* etc.

I see. I had this doubt but since the PR was talking about this specific param I thought that would be enough. +1 to change all relevant ones. I can update it in the next PR.

Thanks all for the comments.

ramkrish86 · 2016-08-12T04:29:45Z

Just a query,
If there are both old and new configs available - we should give priority to the new one right? Even if the value for the old and new configs are different?

uce · 2016-08-12T08:35:15Z

Regarding the priority: Yes, I think so.

ramkrish86 · 2016-08-12T12:26:29Z

Updated PR request. Handles all other configs and renames them from 'recovery.* to 'high-availability.*'.
Also added test case and suitably modified code to ensure that if there are both old and new configs only the new value is taken. If both are not set then the default is taken. Feedback and suggestions are welcome.

ramkrish86 · 2016-08-13T10:07:48Z

github build shows two failure

org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase.org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
org.apache.flink.test.checkpointing.StreamCheckpointingITCase.org.apache.flink.test.checkpointing.StreamCheckpointingITCase

Both seems to be unrelated.
Two travis build hung. Rest of them passed.

https://travis-ci.org/apache/flink/jobs/151785550
https://travis-ci.org/apache/flink/jobs/151785556

Not sure of the problem.

StephanEwen · 2016-08-13T14:46:07Z

There are some test instabilities that we are trying to address. Seems unrelated to your changes at the first glance. We'll look at the tests again anyways before merging,

ramkrish86 · 2016-08-13T18:35:01Z

Ran those two tests individually they seem to pass. But the Rocks_DB version failed due to some initialization error. Thank you @StephanEwen for your comments.

ramkrish86 · 2016-08-16T08:22:47Z

@uce
Updated with changes so that all the configs are renamed from 'recovery' to 'high-availability'.

uce · 2016-08-16T11:34:52Z

docs/setup/config.md

@@ -285,7 +285,7 @@ of the JobManager, because the same ActorSystem is used. Its not possible to use

 ## High Availability Mode

- `recovery.mode`: (Default 'standalone') Defines the recovery mode used for the cluster execution. Currently, Flink supports the 'standalone' mode where only a single JobManager runs and no JobManager state is checkpointed. The high availability mode 'zookeeper' supports the execution of multiple JobManagers and JobManager state checkpointing. Among the group of JobManagers, ZooKeeper elects one of them as the leader which is responsible for the cluster execution. In case of a JobManager failure, a standby JobManager will be elected as the new leader and is given the last checkpointed JobManager state. In order to use the 'zookeeper' mode, it is mandatory to also define the `recovery.zookeeper.quorum` configuration value.
+- `high-availability`: (Default 'none') Defines the recovery mode used for the cluster execution. Currently, Flink supports the 'none' mode where only a single JobManager runs and no JobManager state is checkpointed. The high availability mode 'zookeeper' supports the execution of multiple JobManagers and JobManager state checkpointing. Among the group of JobManagers, ZooKeeper elects one of them as the leader which is responsible for the cluster execution. In case of a JobManager failure, a standby JobManager will be elected as the new leader and is given the last checkpointed JobManager state. In order to use the 'zookeeper' mode, it is mandatory to also define the `recovery.zookeeper.quorum` configuration value.  Previously this config was named 'recovery.mode' and the default config was 'standalone'.

 - `recovery.zookeeper.quorum`: Defines the ZooKeeper quorum URL which is used to connet to the ZooKeeper cluster when the 'zookeeper' recovery mode is selected


Docs not updated (same for jobmanager_high_availability.md)

Oh yes. Will change them too.

ramkrish86 · 2016-08-17T07:07:45Z

@uce
Updated the PR with suitable doc updates and also renamed the enum

RecoveryMode

to

HighAvailabilityMode

ramkrish86 · 2016-08-18T04:03:40Z

@uce
The test case failures seems to be unrelated. Want to have a look at the updated PR?

(Ram)

ramkrish86 · 2016-08-18T04:53:09Z

@uce
I saw that there were conflicts after yesterday's updates to the master. Hence I have rebased my PR and force pushed the changes. Thanks.

ramkrish86 · 2016-08-18T16:11:38Z

@uce
Got a green build. Any feedback/reviews here. I know you guys are busy, just a gentle reminder.

ramkrish86 · 2016-08-19T11:57:27Z

@uce
Got some time to check this? A gentle reminder !!!

uce · 2016-08-19T12:33:45Z

I will check this later today or on Monday. You will have to wait a little, sorry.

ramkrish86 · 2016-08-19T17:03:25Z

@uce
Thanks. No problem. I can wait.

uce · 2016-08-22T15:53:04Z

I had to address some things when reviewing this: https://github.com/uce/flink/tree/ram

In general the changes were good, but overlooked many occurrences that are not possible to catch with automatic refactoring tools. I went through those manually. I will merge this after my local Travis build passes.

ramkrish86 · 2016-08-23T04:47:10Z

@uce
Thank you very much for checking it out. I can see in the branch that you have created you have updated some doc comment that I had added - from 'config' to 'key' and some missing places. Good on you. Thanks once again. Do let me know if you want me to help out in this if some more places you feel it is missing.

uce · 2016-08-23T08:50:30Z

Builds are passing here: https://travis-ci.org/uce/flink/builds/154236174

I'm going to merge this later today. Thanks for starting this renaming.

ramkrish86 reviewed Aug 9, 2016
View reviewed changes

uce reviewed Aug 16, 2016
View reviewed changes

Ramkrishna added 4 commits August 18, 2016 10:01

FLINK-4253 - Rename "recovery.mode" config key to "high-availability"

85c122a

(Ram)

Wrapping long lines reported by Travis

c633270

Fixed all configs with 'recovery' to 'high-availability'

43b7548

Updated docs and renamed enum RecoveryMode to HighAvailabilityMode

5777318

asfgit closed this in 58165d6 Aug 24, 2016

rmetzger added the component=<none> label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FLINK-4253 - Rename "recovery.mode" config key to "high-availability" (Ram) #2342

FLINK-4253 - Rename "recovery.mode" config key to "high-availability" (Ram) #2342

ramkrish86 commented Aug 9, 2016

ramkrish86 Aug 9, 2016

ramkrish86 commented Aug 10, 2016

StephanEwen commented Aug 11, 2016

uce commented Aug 11, 2016

ramkrish86 commented Aug 12, 2016

ramkrish86 commented Aug 12, 2016

uce commented Aug 12, 2016

ramkrish86 commented Aug 12, 2016

ramkrish86 commented Aug 13, 2016

StephanEwen commented Aug 13, 2016

ramkrish86 commented Aug 13, 2016

ramkrish86 commented Aug 16, 2016

uce Aug 16, 2016

ramkrish86 Aug 17, 2016

ramkrish86 commented Aug 17, 2016

ramkrish86 commented Aug 18, 2016

ramkrish86 commented Aug 18, 2016

ramkrish86 commented Aug 18, 2016

ramkrish86 commented Aug 19, 2016

uce commented Aug 19, 2016

ramkrish86 commented Aug 19, 2016

uce commented Aug 22, 2016

ramkrish86 commented Aug 23, 2016

uce commented Aug 23, 2016

FLINK-4253 - Rename "recovery.mode" config key to "high-availability" (Ram) #2342

FLINK-4253 - Rename "recovery.mode" config key to "high-availability" (Ram) #2342

Conversation

ramkrish86 commented Aug 9, 2016

ramkrish86 Aug 9, 2016

Choose a reason for hiding this comment

ramkrish86 commented Aug 10, 2016

StephanEwen commented Aug 11, 2016

uce commented Aug 11, 2016

ramkrish86 commented Aug 12, 2016

ramkrish86 commented Aug 12, 2016

uce commented Aug 12, 2016

ramkrish86 commented Aug 12, 2016

ramkrish86 commented Aug 13, 2016

StephanEwen commented Aug 13, 2016

ramkrish86 commented Aug 13, 2016

ramkrish86 commented Aug 16, 2016

uce Aug 16, 2016

Choose a reason for hiding this comment

ramkrish86 Aug 17, 2016

Choose a reason for hiding this comment

ramkrish86 commented Aug 17, 2016

ramkrish86 commented Aug 18, 2016

ramkrish86 commented Aug 18, 2016

ramkrish86 commented Aug 18, 2016

ramkrish86 commented Aug 19, 2016

uce commented Aug 19, 2016

ramkrish86 commented Aug 19, 2016

uce commented Aug 22, 2016

ramkrish86 commented Aug 23, 2016

uce commented Aug 23, 2016