Skip to content

HDDS-5606. Intermittent failure in TestBlockDeletion#testContainerStatisticsAfterDelete#2528

Merged
adoroszlai merged 2 commits intoapache:masterfrom
ChenSammi:HDDS-5606
Aug 18, 2021
Merged

HDDS-5606. Intermittent failure in TestBlockDeletion#testContainerStatisticsAfterDelete#2528
adoroszlai merged 2 commits intoapache:masterfrom
ChenSammi:HDDS-5606

Conversation

@ChenSammi
Copy link
Contributor

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for working on this.

  • Can you please explain the problem and the solution?
  • Does this also address HDDS-5605?
  • Do we have any data on repeated run success rate?

@ChenSammi
Copy link
Contributor Author

ChenSammi commented Aug 18, 2021

Thanks @ChenSammi for working on this.

* Can you please explain the problem and the solution?

The problem is a container state change from DELETEING to DELETED check failure. The change will happen without questoin. The question is the timing. So the fix is the addition of a retry of the state check action.

* Does this also address [HDDS-5605](https://issues.apache.org/jira/browse/HDDS-5605)?

This doesn't address HDDS-5605. HDDS-5605 is still under investigation.

* Do we have any data on repeated run success rate?

I have run the CPI three times, two times succeed, the third fails with other UT failure, such as TestRootedOzoneFileSystemWithFSO. It seems there are several tests which fails randomly, and need to be fixed too.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for the fix. It seems to work fine, passed 40x:

https://github.com/adoroszlai/hadoop-ozone/runs/3354755683

@adoroszlai
Copy link
Contributor

This only changes TestBlockDeletion, which is passing:

[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 200.333 s - in org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion

and the only failure is in TestOzoneManagerBootstrap, which is being fixed in #2550.

Therefore I'm merging this without retriggering CI.

@adoroszlai adoroszlai merged commit c654ed8 into apache:master Aug 18, 2021
@ChenSammi
Copy link
Contributor Author

Thanks @adoroszlai for the code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants