Skip to content

Conversation

@tillrohrmann
Copy link
Contributor

What is the purpose of the change

The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.
@tillrohrmann tillrohrmann force-pushed the hardenJobManagerFailsITCase branch from f14f100 to a25a6dd Compare October 25, 2017 17:03
@StephanEwen
Copy link
Contributor

Change looks good.
Is 50 ms also akka's default value?
Out of curiosity, what triggered the need to introduce this option.

@tillrohrmann
Copy link
Contributor Author

tillrohrmann commented Oct 26, 2017

Akka's default value is actually 5 seconds, which I think is a bit too high.

I actually tried to backtrack an instability in the JobManagerFailsITCase and noticed that this test took roughly 16 s to execute (the ITCase contains only 2 tests where we restart the JM). Part of the reason was that Akka gated the JobManager ActorSystem for 5 seconds after we let the JM fail.

The actual solution to speed up this test was then to don't reuse the same port for the new JobManager system, but I couldn't think of a good reason to keep the 5 seconds default. Moreover, some other tests which also run into the case of gated connections could also benefit from that. I think lowering the gated interval should allow us to reestablish a lost connection faster.

However, I wasn't able to reproduce the test instability I've seen on Travis wrt JobManagerFailsITCase.

@StephanEwen
Copy link
Contributor

Sounds fair, +1

tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Nov 3, 2017
The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.

This closes apache#4903.
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Nov 3, 2017
The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.

This closes apache#4903.
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Nov 6, 2017
The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.

This closes apache#4903.
tillrohrmann added a commit to tillrohrmann/flink that referenced this pull request Nov 7, 2017
The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.

This closes apache#4903.
@asfgit asfgit closed this in 3e36fd6 Nov 7, 2017
@tillrohrmann tillrohrmann deleted the hardenJobManagerFailsITCase branch November 7, 2017 14:38
GJL pushed a commit to GJL/flink that referenced this pull request Nov 8, 2017
The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.

This closes apache#4903.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants