Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-2995. Add integration test for Recon's Passive SCM state. #594

Closed
wants to merge 16 commits into from

Conversation

avijayanhwx
Copy link
Contributor

What changes were proposed in this pull request?

  • Verify Recon gets pipeline, node and container report from Datanode.
  • Verify SCM metadata state == Recon metadata state (Create pipeline , Close pipeline, create container)
  • Fix issue in DN StateContext.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-2995

How was this patch tested?

Manually tested.

@avijayanhwx
Copy link
Contributor Author

The IT failures are related. Working on fixing them.

Copy link
Contributor

@swagle swagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@avijayanhwx
Copy link
Contributor Author

I am not able to reproduce this issue locally. Trying to root cause this based of PR runs.

@avijayanhwx
Copy link
Contributor Author

/retest

@vivekratnavel
Copy link
Contributor

/retest

@avijayanhwx
Copy link
Contributor Author

The test failures seem to be unrelated. I will try one more time.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @avijayanhwx for working on this. I ran the new test 20 times, it had 2 problems one time each:

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 112.145 s <<< FAILURE! - in org.apache.hadoop.ozone.recon.TestReconAsPassiveScm
testDatanodeRegistrationAndReports(org.apache.hadoop.ozone.recon.TestReconAsPassiveScm)  Time elapsed: 25.52 s  <<< ERROR!
picocli.CommandLine$ExecutionException: Error while calling command (org.apache.hadoop.ozone.recon.ReconServer@1c68f6b8): java.lang.NullPointerException
        at picocli.CommandLine.execute(CommandLine.java:1180)
        ...
        at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
        at org.apache.hadoop.ozone.MiniOzoneClusterImpl$Builder.build(MiniOzoneClusterImpl.java:536)
        at org.apache.hadoop.ozone.recon.TestReconAsPassiveScm.init(TestReconAsPassiveScm.java:72)
...
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.ozone.recon.ReconServer.start(ReconServer.java:118)
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 126.987 s <<< FAILURE! - in org.apache.hadoop.ozone.recon.TestReconAsPassiveScm
testDatanodeRegistrationAndReports(org.apache.hadoop.ozone.recon.TestReconAsPassiveScm)  Time elapsed: 40.297 s  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<0>
        ...
        at org.apache.hadoop.ozone.recon.TestReconAsPassiveScm.testDatanodeRegistrationAndReports(TestReconAsPassiveScm.java:117)

https://github.com/adoroszlai/hadoop-ozone/runs/474216109
Logs: https://github.com/adoroszlai/hadoop-ozone/suites/487647417/artifacts/2328018

Comment on lines +287 to +288
protected Optional<Integer> reconHttpPort = Optional.empty();
protected Optional<Integer> reconDatanodePort = Optional.empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please use OptionalInt.

Comment on lines +300 to +301
reconServer = new ReconServer();
reconServer.execute(new String[]{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can call startRecon() instead to avoid duplicate code?

LOG.error("Exception while getting new container info from SCM",
ioEx);
return;
synchronized (getContainerManager()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it sync on containerManager variable?

@avijayanhwx
Copy link
Contributor Author

avijayanhwx commented Feb 28, 2020

Thanks for the review @adoroszlai & @swagle. Closing this PR since it is taking longer than expected to fix the flakiness of Recon's integration tests. This patch contains other important refactoring that need not wait for the integration tests to be added. I will be creating separate JIRAs for that.

@avijayanhwx avijayanhwx deleted the HDDS-2995-master branch March 16, 2020 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants