Skip to content

Conversation

@rice668
Copy link

@rice668 rice668 commented Feb 9, 2018

What is the purpose of the change

When timeout comes, retry JobManager/ResourceManager connection in case of timeout

Brief change log

When timeout, invoke requestHeartbeat in HeartbeatMonitor thread. Not directly invoke notifyHeartbeatTimeout and close the connection.

Verifying this change

This change is already covered by existing tests, but did minor changes. in the TaskExecutorTest.java, change testHeartbeatTimeoutWithResourceManager behavior to while timeout, does not invoke disconnectTaskManager.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): ( no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): ( no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): ( don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no )

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not documented)

@rice668
Copy link
Author

rice668 commented Mar 6, 2018

@tillrohrmann Could you please take a look on this when available ? Thanks

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @zhangminglei. I think this is not how we can solve the problem. Instead I would suggest to simply create a new ResourceManagerConnection in ResourceManagerHeartbeatListener#notifyHeartbeatTimeout in the JobMaster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants