Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3365. Ensure OzoneConfiguration is initialized in OzoneClientFac… #798

Merged
merged 1 commit into from
Apr 9, 2020

Conversation

xiaoyuyao
Copy link
Contributor

@xiaoyuyao xiaoyuyao commented Apr 9, 2020

…tory#getOzoneClient.

What's changed?

Change to use OzoneConfiguration when checking the OM HA related configurations as some of these APIs may get invoked with Hadoop Configuration.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3365

How was this patch tested?

Test with the Spark wordcount job, before the patch, RM failed due to om configuration is not available from Hadoop Configuration.

Before
20/04/09 00:40:02 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1586392793624_0001 to YARN : Failed to renew token: Kind: OzoneToken, Service: 172.27.129.0:9862,172.27.11.66:9862,172.27.20.1:9862, Ident: (OzoneToken owner=hrt_qa@ROOT.HWX.SITE, renewer=yarn, realUser=, issueDate=1586392799156, maxDate=1586997599156, sequenceNumber=65, masterKeyId=1, strToSign=null, signature=null, awsAccessKeyId=null, omServiceId=ozone1)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:322)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:185)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185)
at org.apache.spark.SparkContext.(SparkContext.scala:505)
at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:834)

After
...
20/04/09 04:09:25 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 16) in 94 ms on quasar-kekkuz-8.quasar-kekkuz.root.hwx.site (executor 1) (2/4)
20/04/09 04:09:25 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 6.0 (TID 18) in 64 ms on quasar-kekkuz-6.quasar-kekkuz.root.hwx.site (executor 2) (3/4)
20/04/09 04:09:25 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 6.0 (TID 19) in 66 ms on quasar-kekkuz-8.quasar-kekkuz.root.hwx.site (executor 1) (4/4)
20/04/09 04:09:25 INFO cluster.YarnScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
20/04/09 04:09:25 INFO scheduler.DAGScheduler: ResultStage 6 (collect at /tmp/spark_script.py:40) finished in 0.172 s
20/04/09 04:09:25 INFO scheduler.DAGScheduler: Job 2 finished: collect at /tmp/spark_script.py:40, took 0.781710 s
**** ,357
**** felis,197
**** dictum,142
**** semper,160
**** justo,217
**** purus,215
**** ante,247
...

@xiaoyuyao
Copy link
Contributor Author

Looks like the it-client test timeout even though all the test passed. Some of the tests that are most time consuming are highlighted below. Have we considered disabling before fixing them? cc: @elek

[INFO] Running org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
Time elapsed: 416.222 s 
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
Time elapsed: 164.728 s 
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestContainerStateMachine
Time elapsed: 121.868 s

[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.052 s - in org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestReadRetries
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 60.805 s - in org.apache.hadoop.ozone.client.rpc.TestReadRetries
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.035 s - in org.apache.hadoop.ozone.client.rpc.TestDeleteWithSlowFollower
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestBlockOutputStream
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 42.556 s - in org.apache.hadoop.ozone.client.rpc.TestBlockOutputStream
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestCommitWatcher
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 109.317 s - in org.apache.hadoop.ozone.client.rpc.TestCommitWatcher
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.059 s - in org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.031 s - in org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
[WARNING] Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 254.3 s - in org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
[WARNING] Tests run: 8, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 70.51 s - in org.apache.hadoop.ozone.client.rpc.TestCloseContainerHandlingByClient
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientForAclAuditLog
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.094 s - in org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientForAclAuditLog
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClient
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestBCSID
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.188 s - in org.apache.hadoop.ozone.client.rpc.TestBCSID
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 164.728 s - in org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestContainerStateMachine
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 121.868 s - in org.apache.hadoop.ozone.client.rpc.TestContainerStateMachine
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.055 s - in org.apache.hadoop.ozone.client.rpc.TestOzoneAtRestEncryption
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
[WARNING] Tests run: 71, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 58.274 s - in org.apache.hadoop.ozone.client.rpc.TestSecureOzoneRpcClient
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestHybridPipelineOnDatanode
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.353 s - in org.apache.hadoop.ozone.client.rpc.TestHybridPipelineOnDatanode
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 81.1 s - in org.apache.hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 416.222 s - in org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestContainerReplicationEndToEnd
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.061 s - in org.apache.hadoop.ozone.client.rpc.TestContainerReplicationEndToEnd
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead
[WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.045 s - in org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead
[INFO] Running org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 41.679 s - in org.apache.hadoop.ozone.client.rpc.Test2WayCommitInRatis
[INFO] Running org.apache.hadoop.ozone.client.rpc.TestKeyInputStream
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 40.687 s - in org.apache.hadoop.ozone.client.rpc.TestKeyInputStream
[INFO]
[INFO] Results:
[INFO]
[WARNING] Tests run: 120, Failures: 0, Errors: 0, Skipped: 13
[INFO]
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Skipping Apache Hadoop Ozone Mini Ozone Chaos Tests
[INFO] This project has been banned from the build due to previous failures.
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Hadoop Ozone Integration Tests 0.6.0-SNAPSHOT:
[INFO]
[INFO] Apache Hadoop Ozone Integration Tests .............. FAILURE [41:14 min]
[INFO] Apache Hadoop Ozone Mini Ozone Chaos Tests ......... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 41:15 min
[INFO] Finished at: 2020-04-09T06:48:22Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-ozone-integration-test: There was a timeout or other error in the fork -> [Help 1]

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM.

@bharatviswa504 bharatviswa504 merged commit e71c383 into apache:master Apr 9, 2020
@bharatviswa504
Copy link
Contributor

Thank You @xiaoyuyao for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants