Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix timeout in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup #6207

Merged
merged 4 commits into from
Aug 27, 2018

Conversation

jihoonson
Copy link
Contributor

Let's see what is the error message.


Assert.assertNull(serviceEmitter.getStackTrace());
Assert.assertNull(serviceEmitter.getExceptionMessage());
Assert.assertNull(serviceEmitter.getStackTrace(), serviceEmitter.getStackTrace());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a fix or just temp debugging arrangement to see what is in the stacktrace so that later a fix can be done ?

test times out because serviceEmitter.getStackTrace() is non-null , so this assertion is gonna fail in those cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't fix the root cause, but fixes the infinite loop and print the stack trace so that we can fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title misled me :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, do you want this PR to be merged or you're trying builds on this PR to see things whenever this test fails ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the title looks somewhat confusing :(. Do you have a better idea?

I thought it would be great if the CI for this PR shows something interesting, but unfortunately it succeeded. I can restart it until I find something, or check other PRs if they fail with some stack traces. I don't have a strong opinion here, but I would say this PR gives us a small benefit by making CI jobs fail faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, "Add debug information to expose transient failure cause in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup", if this PR is merged without actually fixing.

I'm approving it so feel free to merge whenever you want after trying out the builds here. I triggered one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. It catches the exception. I'll check it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception was

  KafkaSupervisorTest.testCheckpointForInactiveTaskGroup:2118 java.lang.AssertionError: 
  Unexpected method call TaskRunner.getRunningTasks():
	at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:44)
	at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:94)
	at com.sun.proxy.$Proxy43.getRunningTasks(Unknown Source)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor$1.getTaskLocation(KafkaSupervisor.java:296)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkpointTaskGroup(KafkaSupervisor.java:1503)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkTaskDuration(KafkaSupervisor.java:1433)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:879)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:595)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:369)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

So, I added expect(taskRunner.getRunningTasks()).andReturn(workItems).anyTimes();. Hopefully this fixes this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leventov thanks. I've also noticed that before. The problem is that the supervisor periodically executes runNotice and this can potentially call every method of the supervisor which in turn requires to mock everything using EasyMock.

I think it's better to refactor the whole KafkaSupervisorTest to be based on a sort of more controllable mockup environment by creating taskRunner, taskMaster, taskStorage, and taskClient for test purpose instead of making them using EasyMock.

Raised #6296.

@fjy fjy merged commit 64d33ee into apache:master Aug 27, 2018
@dclim dclim added this to the 0.13.0 milestone Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants