Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4052] using non-local unreachable ip:port to fix unstable Test testReturnLocalHostAddressUsingHeuristics #6853

Closed
wants to merge 1 commit into from

Conversation

leanken-zz
Copy link
Contributor

What is the purpose of the change

See. FLINK-4052 and also FLINK-3687.

Fix testReturnLocalHostAddressUsingHeuristics unstable failure.

Diagnose

I was able to 100% reproduce the failure in local environment as below.

java.lang.AssertionError: 
Expected :linxuewei.local/30.5.17.21
Actual   :/127.0.0.1
 <Click to see difference>

  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:118)
  at org.junit.Assert.assertEquals(Assert.java:144)
  at org.apache.flink.runtime.net.ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics(ConnectionUtilsTest.java:64)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

In general, in my local env, there are two ip the test case will try to use to connect to the target unreachable endpoint. 30.5.17.21 and 127.0.0.1.

Reproduce Step:

  • when case try with 30.5.17.21, do nothing.
  • when case try with 127.0.0.1, create a local process that listened on target unreachable port, in this case, the unreachable endpoint is now reachable now.
  • before the case return 127.0.0.1, there are still some code will trying to connect to the endpoint with 30.5.17.21 again which is return by InetAddress.getLocalHost()
# See. ConnectionUtils.java line: 276

case SLOW_CONNECT:
  LOG.debug("Trying to connect to {} from local address {} with timeout {}",
      targetAddress, interfaceAddress, strategy.getTimeout());

  if (tryToConnect(interfaceAddress, targetAddress, strategy.getTimeout(), logging)) {
    return tryLocalHostBeforeReturning(interfaceAddress, targetAddress, logging);
  }
  break;
  • before the tryLocalHostBeforeReturning called, kill the local process that listened on the target unreachable port. In this case, the test case will return 127.0.0.1 as assert compare result. Hans. the test case will failed.

Proposal

In genernal, in the build environment, local port used corrupt could happened time to time, So I suggest we change the local unreachable endpoint to a outside unreachable endpoint, for instance 8.8.8.8:65535 in my pull request. I think this way can prevent the test failure happen again.

Brief change log

  • using non-local unreachable ip:port to fix unstable Test testReturnLocalHostAddressUsingHeuristics

Verifying this change

Test testReturnLocalHostAddressUsingHeuristics fixed.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not documented)

… testReturnLocalHostAddressUsingHeuristics

Change-Id: If6dfbbde2190dead5e918dbdaf15b551975acd17
@leanken-zz
Copy link
Contributor Author

leanken-zz commented Oct 16, 2018

cc @tillrohrmann
Saw a couple of issue relate to testReturnLocalHostAddressUsingHeuristics unstable failure. Here is my diagnose procedure and proposal to fix this unstable case.

@StefanRRichter StefanRRichter self-assigned this Oct 16, 2018
Copy link
Contributor

@StefanRRichter StefanRRichter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leanken thanks for your contribution, I think your explanation makes sense and the solution looks good to me 👍 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants