Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3013. Fix TestBlockOutputStreamWithFailures.java. #592

Merged
merged 5 commits into from Mar 3, 2020

Conversation

bshashikant
Copy link
Contributor

What changes were proposed in this pull request?

Added configs for rpc and watch request timeout for ratis client and server. Modified the unit tests
to remove conditions for asserting on write chunk and putblock metric count in tests after inducing failure as given the nature of retry and async write along wit pipeline destruction, these counters cannot be deterministic after failure.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3013

How was this patch tested?

Ran the unit test.

@adoroszlai
Copy link
Contributor

@bshashikant there were various failures in 10 runs (at almost 10 minutes per run, this one cannot be run 20 times): https://github.com/adoroszlai/hadoop-ozone/runs/465142838

[ERROR]   TestBlockOutputStreamWithFailures.test2DatanodesFailure:548 expected:<14> but was:<18>
[ERROR]   TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure:390 expected:<800> but was:<450>
[ERROR]   TestBlockOutputStreamWithFailures.testWatchForCommitWithCloseContainerException:273 expected:<94> but was:<92>

@bshashikant
Copy link
Contributor Author

@bshashikant there were various failures in 10 runs (at almost 10 minutes per run, this one cannot be run 20 times): https://github.com/adoroszlai/hadoop-ozone/runs/465142838

[ERROR]   TestBlockOutputStreamWithFailures.test2DatanodesFailure:548 expected:<14> but was:<18>
[ERROR]   TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure:390 expected:<800> but was:<450>
[ERROR]   TestBlockOutputStreamWithFailures.testWatchForCommitWithCloseContainerException:273 expected:<94> but was:<92>

Thanks @adoroszlai . The fix was not complete as metric count validation checks post inducing failure were not removed from few tests. Latest patch addresses these.

@adoroszlai
Copy link
Contributor

Thanks @bshashikant for updating the patch. There are still some failures:

[ERROR] testWatchForCommitDatanodeFailure(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 52.234 s  <<< FAILURE!
java.lang.AssertionError: expected:<800> but was:<450>
...
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:385)
[ERROR] testWatchForCommitWithCloseContainerException(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 56.644 s  <<< FAILURE!
java.lang.AssertionError: expected:<80> but was:<84>
...
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitWithCloseContainerException(TestBlockOutputStreamWithFailures.java:175)
[ERROR] testFailureWithPrimeSizedData(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 58.116 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
...
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testFailureWithPrimeSizedData(TestBlockOutputStreamWithFailures.java:635)

https://github.com/adoroszlai/hadoop-ozone/runs/468047506

@bshashikant
Copy link
Contributor Author

Thanks @bshashikant for updating the patch. There are still some failures:

[ERROR] testWatchForCommitDatanodeFailure(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 52.234 s  <<< FAILURE!
java.lang.AssertionError: expected:<800> but was:<450>
...
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:385)
[ERROR] testWatchForCommitWithCloseContainerException(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 56.644 s  <<< FAILURE!
java.lang.AssertionError: expected:<80> but was:<84>
...
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitWithCloseContainerException(TestBlockOutputStreamWithFailures.java:175)
[ERROR] testFailureWithPrimeSizedData(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 58.116 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
...
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testFailureWithPrimeSizedData(TestBlockOutputStreamWithFailures.java:635)

https://github.com/adoroszlai/hadoop-ozone/runs/468047506

Addressed failures.

@adoroszlai
Copy link
Contributor

Thanks @bshashikant. Yet another one:

[ERROR] test2DatanodesFailure(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 64.298 s  <<< FAILURE!
java.lang.AssertionError: expected:<4> but was:<8>
...
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.test2DatanodesFailure(TestBlockOutputStreamWithFailures.java:406)

https://github.com/adoroszlai/hadoop-ozone/runs/472201293

@adoroszlai
Copy link
Contributor

@bshashikant run with the latest update had 1 error:

[ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 644.647 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
[ERROR] testFailureWithPrimeSizedData(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 55.401 s  <<< ERROR!
java.lang.NullPointerException
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testFailureWithPrimeSizedData(TestBlockOutputStreamWithFailures.java:469)

https://github.com/adoroszlai/hadoop-ozone/runs/473056970

@bshashikant
Copy link
Contributor Author

@bshashikant run with the latest update had 1 error:

[ERROR] Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 644.647 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures
[ERROR] testFailureWithPrimeSizedData(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)  Time elapsed: 55.401 s  <<< ERROR!
java.lang.NullPointerException
  at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testFailureWithPrimeSizedData(TestBlockOutputStreamWithFailures.java:469)

https://github.com/adoroszlai/hadoop-ozone/runs/473056970

@adoroszlai , can you please share log and output file for the failure?

@adoroszlai
Copy link
Contributor

can you please share log and output file for the failure?

Sure, it's available in the integration artifact from the run page:

https://github.com/adoroszlai/hadoop-ozone/suites/486485662/artifacts/2311772

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bshashikant for the update, 10/10 tests passed with 78d0e6e.

@adoroszlai
Copy link
Contributor

TestRandomKeyGenerator (freon) timeout is unrelated, reported in HDDS-3086.

@adoroszlai adoroszlai merged commit e6f1428 into apache:master Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants