Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-8073][kafka-tests] Disable timeout in tests #5718

Closed
wants to merge 1 commit into from

Conversation

pnowojski
Copy link
Contributor

To get stacktraces in case of deadlock do not timeout tests programatically.

This is a trivial change in tests.

To get stacktraces in case of deadlock do not timeout tests programatically.
@zentol
Copy link
Contributor

zentol commented Mar 19, 2018

This change appears to be focus mostly around Travis which i dislike a bit.
I'd rather add a custom Timeout that prints the thread-dump so we still get nice behavior when running things locally/in the IDE.

	@Rule
	public final Timeout timeout = new MyTimeOut(10, TimeUnit.MILLISECONDS);

	public static class MyTimeOut extends Timeout {

		public MyTimeOut(long timeout, TimeUnit timeUnit) {
			super(timeout, timeUnit);
		}

		@Override
		public Statement apply(Statement base, Description description) {
			Statement fail = super.apply(base, description);
			return new Statement() {
				@Override
				public void evaluate() throws Throwable {
					System.out.println(crunchifyGenerateThreadDump());
					fail.evaluate();
				}
			};
		}
	}

	public static String crunchifyGenerateThreadDump() {
		final StringBuilder dump = new StringBuilder();
		final ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean();
		final ThreadInfo[] threadInfos = threadMXBean.getThreadInfo(threadMXBean.getAllThreadIds(), 100);
		for (ThreadInfo threadInfo : threadInfos) {
			dump.append('"');
			dump.append(threadInfo.getThreadName());
			dump.append("\" ");
			final Thread.State state = threadInfo.getThreadState();
			dump.append("\n   java.lang.Thread.State: ");
			dump.append(state);
			final StackTraceElement[] stackTraceElements = threadInfo.getStackTrace();
			for (final StackTraceElement stackTraceElement : stackTraceElements) {
				dump.append("\n        at ");
				dump.append(stackTraceElement);
			}
			dump.append("\n\n");
		}
		return dump.toString();
	}

@pnowojski
Copy link
Contributor Author

I think that timeouts in IDE are only making things worse:

  1. you can not debug, since longer debugging session will fail
  2. in case of deadlocks, locally you can always manually dump/view stack traces but you also can do full thread/memory dump and/or attach a debugger.

On travis you can not debug it in any way, thus automated actions make sense (and they preserve some travis resources for other builds).

Furthermore, such rule would have to be implemented in all of our tests, not only here. Also IMO if you want to replace/remove travis_mvn_watchdog with such rule, this is out of scope of this ticket and it should be discussed separately (I would be against it), so that it doesn't hinder this ticket.

When I originally created those tests (with @Test(timeout = xxxx)) I did it by copying without thinking from a test, that was printing something to logs once every checkpoint, thus it was interrupting travis watchdog and one could argue that it required this special case timeout. This commit makes it more consistent with rest of the tests.

@tillrohrmann
Copy link
Contributor

I agree with @pnowojski. The watchdog and Travis itself should time out the test for us if things take too long.

@zentol
Copy link
Contributor

zentol commented Mar 19, 2018

I don't want to make a fuss over a single test, so for the purposes of this test it's fine to just remote the timeout.

In general however i think it would be pretty neat if we would add such a Timeout to the TestLogger class. I see the problem of debugging tests that have timeouts, but removing timeouts altogether is the wrong conclusion imo. Instead, we could implement the Timeout such that it doesn't fail the test if a certain profile/property is set.

With this we keep the benefits of test timeouts (CI service independence, less special behavior locally vs CI, fixed upper bound for test times which is particularly useful for new tests, "guarantee" that the test terminates) while still allowing debugging. In fact we may end up improving the debugging situation by consolidating how timeouts are implemented instead of each test rolling their own solution that you can't disable.

The travis watchdog cannot be removed as it covers the entire maven process that from time to time locks up outside of tests. That said, the fact that we have to rely on the travis watchdog to ensure that tests terminate is a bad sign. Not to mention that it already forced us to introduce workarounds to make tests "compatible", like the kafka tests and others that print stuff for the sole purposes of not triggering it.

@pnowojski
Copy link
Contributor Author

In that case can we merge it as it is and if you want @zentol, we can start a discussion about future direction of this?

@StephanEwen
Copy link
Contributor

Having a custom timeout like @zentol suggests could be helpful in the long run.

For now, though, I personally agree that the timeouts are counter productive (I actually never understood what they are supposed to solve). I prefer a deadlocking test to stay around, so I can jstack to the process, attach a debugger, pull a heap dump, whatever. The only environment where we cannot do that is the CI server, which has a timeout and thread dump already...

@zentol
Copy link
Contributor

zentol commented Mar 21, 2018

merging.

zentol pushed a commit to zentol/flink that referenced this pull request Mar 21, 2018
To get stacktraces in case of deadlock do not timeout tests programatically.

This closes apache#5718.
zentol pushed a commit to zentol/flink that referenced this pull request Mar 21, 2018
To get stacktraces in case of deadlock do not timeout tests programatically.

This closes apache#5718.
@pnowojski
Copy link
Contributor Author

Thanks!

zentol pushed a commit to zentol/flink that referenced this pull request Mar 21, 2018
To get stacktraces in case of deadlock do not timeout tests programatically.

This closes apache#5718.
@asfgit asfgit closed this in e273d5f Mar 21, 2018
sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018
To get stacktraces in case of deadlock do not timeout tests programatically.

This closes apache#5718.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants