Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-3179 Pipeline placement based on Topology does not have fallback #678

Merged
merged 3 commits into from
Mar 27, 2020

Conversation

timmylicheng
Copy link
Contributor

What changes were proposed in this pull request?

When rack awareness and topology is enabled, pipeline placement can fail when there is only one node on the rack.
Should add fall back logic to search for nodes from other racks.
(Please fill in changes proposed in this fix)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3179

(Please create an issue in ASF JIRA before opening a pull request,
and you need to set the title of the pull request which starts with
the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)

Please replace this section with the link to the Apache JIRA)

How was this patch tested?

UT
(Please explain how this patch was tested. Ex: unit tests, manual tests)
(If this patch involves UI changes, please attach a screen-shot; otherwise, remove this)

@timmylicheng timmylicheng force-pushed the HDDS-3179 branch 2 times, most recently from 9342cb5 to b007fae Compare March 16, 2020 07:29
@timmylicheng timmylicheng force-pushed the HDDS-3179 branch 3 times, most recently from 6696d42 to c4576bc Compare March 18, 2020 03:21
@timmylicheng timmylicheng force-pushed the HDDS-3179 branch 7 times, most recently from 4e9ce5b to b7e6aff Compare March 26, 2020 11:24
@sodonnel
Copy link
Contributor

Acceptance is failing with:

 ==============================================================================
Execute PI calculation                                                | FAIL |
1 != 0
------------------------------------------------------------------------------
Execute WordCount                                                     | FAIL |
1 != 0
------------------------------------------------------------------------------
ozonesecure-mr-mapreduce :: Execute MR jobs                           | FAIL |

This is known issue and will be fixed soon - HDDS-3284.

Integration tests are flaky. it-Freon:

[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 395.168 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
[ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator)  Time elapsed: 326.297 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)

One of these issues covers it: https://issues.apache.org/jira/browse/HDDS-3266 and it-freon: https://issues.apache.org/jira/browse/HDDS-3257

it-client I am not sure

@sodonnel
Copy link
Contributor

Thanks for the updates here. I think the code looks much cleaner now with the debug statements and refactored block in getResultSet().

There are just a couple of minor changes needed to finish this one off.

@timmylicheng
Copy link
Contributor Author

Thanks for the updates here. I think the code looks much cleaner now with the debug statements and refactored block in getResultSet().

There are just a couple of minor changes needed to finish this one off.

Thanks for the detailed review. It really helps. @sodonnel

@sodonnel
Copy link
Contributor

Thanks for quickly addressing the final issues.

I am +1 on this now. I will commit it later pending the CI checks looking good.

@sodonnel
Copy link
Contributor

All tests are green except this one, which I have seen fail in several other PRs, so it is flaky:

 [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.235 s - in org.apache.hadoop.ozone.freon.TestDataValidateWithUnsafeByteOperations
[INFO] Running org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 338.009 s <<< FAILURE! - in org.apache.hadoop.ozone.freon.TestRandomKeyGenerator
[ERROR] bigFileThan2GB(org.apache.hadoop.ozone.freon.TestRandomKeyGenerator)  Time elapsed: 276.915 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)

@sodonnel sodonnel merged commit 7d132ce into apache:master Mar 27, 2020
@timmylicheng timmylicheng deleted the HDDS-3179 branch March 27, 2020 07:42
@elek
Copy link
Member

elek commented Mar 27, 2020

If you see a flaky test, please disable it (without and issue) + create a new open issue and repeat the build. There is some risk that one flakiness hides a other. Merging patches without green build makes harder to debug flaky tests.

isahkemat pushed a commit to isahkemat/hadoop-ozone that referenced this pull request Mar 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants