Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24713 HS2 never shut down after reconnecting to Zookeeper #1932

Closed
wants to merge 3 commits into from

Conversation

EugeneChung
Copy link

@EugeneChung EugeneChung commented Feb 1, 2021

What changes were proposed in this pull request?

Register ZKDeRegisterWatcher again for NodeDataChanged event after ZK connection reestablishment

Why are the changes needed?

It's a bug. After ZK connection is reestablished, NodeDataChanged event for the server node is occurred. Then it is never notified again. HS2 eventually never knows deregistering from Zookeeper.

Does this PR introduce any user-facing change?

No

How was this patch tested?

On my company's environment. I described reproduction process of the problem on the ticket.

@EugeneChung
Copy link
Author

EugeneChung commented Feb 2, 2021

All of the failed 26 tests are under org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.
And all errors are related with test tables. I think they are related to metastore.
Does the metastore of test configuration use zookeeper discovery?

java.lang.RuntimeException: Failed to clean up txn tables
	at org.apache.hadoop.hive.metastore.utils.TestTxnDbUtil.cleanDb(TestTxnDbUtil.java:261)
	at org.apache.hadoop.hive.ql.TestTxnCommands2.tearDown(TestTxnCommands2.java:197)
	at sun.reflect.GeneratedMethodAccessor188.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
	at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
	at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Iterator.forEachRemaining(Iterator.java:116)
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
	at org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
	at org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:248)
	at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$5(DefaultLauncher.java:211)
	at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:226)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:199)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:132)
	at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154)
	at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:123)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138)
	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451)
org.opentest4j.MultipleFailuresError: 
Multiple Failures (2 failures)
	java.lang.RuntimeException: drop table if exists acidTbl failed: (responseCode = 10280, errorMessage = FAILED: LockException [Error 10280]: Error communicating with the metastore, hiveErrorCode = 10280, SQLState = 42000, exception = Error communicating with the metastore)
	java.lang.RuntimeException: drop table if exists acidTbl failed: (responseCode = 10280, errorMessage = FAILED: LockException [Error 10280]: Error communicating with the metastore, hiveErrorCode = 10280, SQLState = 42000, exception = Error communicating with the metastore)
	at org.junit.vintage.engine.execution.TestRun.getStoredResultOrSuccessful(TestRun.java:196)
	at org.junit.vintage.engine.execution.RunListenerAdapter.fireExecutionFinished(RunListenerAdapter.java:226)
	at org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:192)
	at org.junit.vintage.engine.execution.RunListenerAdapter.testFinished(RunListenerAdapter.java:79)
	at org.junit.runner.notification.SynchronizedRunListener.testFinished(SynchronizedRunListener.java:87)
	at org.junit.runner.notification.RunNotifier$9.notifyListener(RunNotifier.java:225)
	at org.junit.runner.notification.RunNotifier$SafeNotifier.run(RunNotifier.java:72)
	at org.junit.runner.notification.RunNotifier.fireTestFinished(RunNotifier.java:222)
	at org.junit.internal.runners.model.EachTestNotifier.fireTestFinished(EachTestNotifier.java:38)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:372)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
	at org.junit.vintage.engine.execution.RunnerExecutor.execute(RunnerExecutor.java:43)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Iterator.forEachRemaining(Iterator.java:116)
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
	at org.junit.vintage.engine.VintageTestEngine.executeAllChildren(VintageTestEngine.java:82)
	at org.junit.vintage.engine.VintageTestEngine.execute(VintageTestEngine.java:73)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:248)
	at org.junit.platform.launcher.core.DefaultLauncher.lambda$execute$5(DefaultLauncher.java:211)
	at org.junit.platform.launcher.core.DefaultLauncher.withInterceptedStreams(DefaultLauncher.java:226)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:199)
	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:132)
	at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invokeAllTests(JUnitPlatformProvider.java:154)
	at org.apache.maven.surefire.junitplatform.JUnitPlatformProvider.invoke(JUnitPlatformProvider.java:123)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138)
	at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451)

@EugeneChung
Copy link
Author

EugeneChung commented Feb 2, 2021

image

Passed on my PC with the same commit.

@EugeneChung
Copy link
Author

I just changed the comment to trigger the test. It looks there are some unstable points in the test process itself.

@EugeneChung
Copy link
Author

As indicated by Apache curator, https://github.com/apache/curator/blob/2b68d13f6d1e57e019ee19f96f047361c8b88516/curator-recipes/src/main/java/org/apache/curator/framework/recipes/nodes/PersistentNode.java#L81-L81 ,
DeregisterWatcher must be registered again for any NodeDataChanged events.

@github-actions
Copy link

github-actions bot commented Jul 3, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Jul 3, 2021
Eugene Chung added 3 commits July 8, 2021 10:13
@EugeneChung EugeneChung changed the title HIVE-24713 HS2 never knows deregistering from Zookeeper in the particular case HIVE-24713 HS2 never shut down after reconnecting to Zookeeper Jul 8, 2021
@github-actions github-actions bot closed this Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants