Skip to content

Fix flaky BatchIndex IT failures.#13855

Merged
paul-rogers merged 1 commit intoapache:masterfrom
tejaswini-imply:fix-flaky_BatchIndex_IT
Feb 28, 2023
Merged

Fix flaky BatchIndex IT failures.#13855
paul-rogers merged 1 commit intoapache:masterfrom
tejaswini-imply:fix-flaky_BatchIndex_IT

Conversation

@tejaswini-imply
Copy link
Member

Fixed the Lock for interval was revoked for kill tasks bug in BatchIndex ITs

2023-02-16T06:10:14,026 INFO [main] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Attempting to lock file[/tmp/persistent/task/api-issued_kill_wikipedia_parallel_index_test_jbhgipfi_2013-08-31T00:00:00.000Z_2013-09-02T00:00:00.000Z_2023-02-16T06:10:07.749Z/lock].
2023-02-16T06:10:14,029 INFO [main] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Acquired lock file[/tmp/persistent/task/api-issued_kill_wikipedia_parallel_index_test_jbhgipfi_2013-08-31T00:00:00.000Z_2013-09-02T00:00:00.000Z_2023-02-16T06:10:07.749Z/lock] in 2ms.
2023-02-16T06:10:14,131 ERROR [main] org.apache.druid.cli.CliPeon - Error when starting up.  Failing.
java.lang.reflect.InvocationTargetException: null
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_342]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_342]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
	at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) ~[druid-core-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) ~[druid-core-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:152) ~[druid-core-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:136) ~[druid-services-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:94) ~[druid-services-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	at org.apache.druid.cli.CliPeon.run(CliPeon.java:310) ~[druid-services-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	at org.apache.druid.cli.Main.main(Main.java:112) ~[druid-services-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
Caused by: org.apache.druid.java.util.common.ISE: Failed to run task[api-issued_kill_wikipedia_parallel_index_test_jbhgipfi_2013-08-31T00:00:00.000Z_2013-09-02T00:00:00.000Z_2023-02-16T06:10:07.749Z] isReady
	at org.apache.druid.indexing.worker.executor.ExecutorLifecycle.start(ExecutorLifecycle.java:173) ~[druid-indexing-service-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	... 11 more
Caused by: org.apache.druid.java.util.common.ISE: Lock for interval [2013-08-31T00:00:00.000Z/2013-09-02T00:00:00.000Z] was revoked.
	at org.apache.druid.indexing.common.task.AbstractFixedIntervalTask.isReady(AbstractFixedIntervalTask.java:92) ~[druid-indexing-service-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	at org.apache.druid.indexing.worker.executor.ExecutorLifecycle.start(ExecutorLifecycle.java:168) ~[druid-indexing-service-2023.02.0-iap-SNAPSHOT.jar:2023.02.0-iap-SNAPSHOT]
	... 11 more

ITBestEffortRollupParallelIndexTest.testIndexData:142->AbstractIndexerTest.lambda$unloader$0:71->AbstractIndexerTest.unloadAndKillData:85 » ISE Error while making request to indexer [404 Not Found No task reports were found for this task. The task may not exist, or it may not have completed yet.]

Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! LGTM.

@paul-rogers
Copy link
Contributor

An IT failed waiting for one of the services to become ready. This failure seems independent of the fix in this PR. Going to try to rerun to see if we get better results. That is, there is something flaky, but not the flakiness that this PR fixes.

@paul-rogers paul-rogers merged commit e2461c2 into apache:master Feb 28, 2023
@clintropolis clintropolis added this to the 26.0 milestone Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants