Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10644. Intermittent failure in testBalancer.robot #6481

Merged
merged 1 commit into from
Apr 5, 2024

Conversation

afilpp
Copy link
Contributor

@afilpp afilpp commented Apr 4, 2024

What changes were proposed in this pull request?

HDDS-10644. Intermittent failure in testBalancer.robot

The problem was that there was a delay between the close container event sent to event queue and container close event being processed.
To improve test stability, we need to ignore the exception due to a duplicate container close request so that it doesn't cause the acceptance test to fail.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10644

How was this patch tested?

The test passed successfully more than 10 times in a row

Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix. LGTM.

Left some questions and comments, but there is no need to update the patch if not needed.

Comment on lines -106 to 108
${output} = Execute ozone admin container list --state OPEN
${output} = Execute ozone admin container list --state OPEN
Should Be Empty ${output}
Copy link
Contributor

@ivandika3 ivandika3 Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ContainerBalancerSelectionCriteria#shouldBeExcluded, the container chosen needs to be CLOSED (ContainerBalancerCriteria#isContainerClosed). The "All container is closed" check whether there is no OPEN containers. Therefore, I think there is a chance where the container is still CLOSING when the container balancer start, and the container will be excluded during the container balancer iteration.

However, I think the chance is very small, since the time pass between the close container command and the start of the container balancer should be large enough for the container to be closed. So it should be fine as it is now.

Execute ozone admin container close "${container}"
EXIT FOR LOOP IF "${container}" == "${EMPTY}"
${message} = Execute And Ignore Error ozone admin container close "${container}"
Run Keyword If '${message}' != '${EMPTY}' Should Contain ${message} is in closing state
${output} = Execute ozone admin container info "${container}"
Should contain ${output} CLOS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason of CLOS is to include both CLOSING and CLOSED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for this test we don't need to wait for the containers to be completely closed (this may take too long). Therefore we are happy with both “closed” and “closing” statuses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation.

Copy link
Contributor

@myskov myskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afilpp for the patch, LGTM

@myskov myskov merged commit 6b92a37 into apache:master Apr 5, 2024
24 checks passed
@myskov
Copy link
Contributor

myskov commented Apr 5, 2024

@ivandika3 thank you for reviewing the patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants