Skip to content

[FLINK-8393] [flip6] Reconnect to last known JobMaster when connection is lost#5267

Closed
tillrohrmann wants to merge 1 commit intoapache:masterfrom
tillrohrmann:resumeLostJobMasterConnection
Closed

[FLINK-8393] [flip6] Reconnect to last known JobMaster when connection is lost#5267
tillrohrmann wants to merge 1 commit intoapache:masterfrom
tillrohrmann:resumeLostJobMasterConnection

Conversation

@tillrohrmann
Copy link
Contributor

What is the purpose of the change

Reconnect to the last known location of a lost JobMaster connection.

Brief change log

  • In case of a heartbeat timeout or a disconnect call, the TaskExecutor tries to reconnect to the last known JobMaster location

Verifying this change

  • Added RegisteredRpcConnection#testReconnect

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@tillrohrmann tillrohrmann force-pushed the resumeLostJobMasterConnection branch 2 times, most recently from c50636f to 9b20ba6 Compare January 9, 2018 22:35
…n is lost

In case of a heartbeat timeout or a disconnect call, the TaskExecutor tries to
reconnect to the last known JobMaster location.

This closes apache#5267.
@tillrohrmann tillrohrmann force-pushed the resumeLostJobMasterConnection branch from 9b20ba6 to 3c4d845 Compare January 10, 2018 12:52
@tillrohrmann tillrohrmann deleted the resumeLostJobMasterConnection branch January 10, 2018 16:28
asfgit pushed a commit that referenced this pull request Jan 10, 2018
…n is lost

In case of a heartbeat timeout or a disconnect call, the TaskExecutor tries to
reconnect to the last known JobMaster location.

This closes #5267.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments