Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-2281. ContainerStateMachine#handleWriteChunk should ignore close container exception #54

Merged
merged 2 commits into from Oct 20, 2019

Conversation

bshashikant
Copy link
Contributor

@bshashikant bshashikant commented Oct 18, 2019

As write chunk happens in parallel over datanode, it might be possible that writeChunk happening as part of writeStateMachineData may fail with CloseContainerException. This leads to a log append failure in Ratis and as a result of which pipeline close action gets triggered on datanode resulting in frequent destruction of pipelines in the system.

Currently, ContainerStateMachine#applyTrannsaction ignores close container exception.Similarly,ContainerStateMachine#handleWriteChunk call also should ignore close container exception.

The patch was tested by adding a unit test where after allocating a container and doing writes over it with multiple threads in parallel with one thread closing the container randomly and verifying that because of close container , the stateMachine is not marked unhealthy and new snapshots can still be taken and pipeline functions does not halt.

@bshashikant bshashikant changed the title Hdds 2281. ContainerStateMachine#handleWriteChunk should ignore close container exception HDDS-2281. ContainerStateMachine#handleWriteChunk should ignore close container exception Oct 18, 2019
@mukul1987 mukul1987 merged commit bfaa640 into apache:master Oct 20, 2019
@anuengineer
Copy link
Contributor

@bshashikant can we please fill up the JIRA template ? that helps people who read this JIRA and understand what it is about. I was reading the JIRA description and the patch and not able to make a head or tail about it.

@mukul1987 when you commit or review can you please comment about this ?

metrics.incNumWriteDataFails();
// write chunks go in parallel. It's possible that one write chunk
// see the stateMachine is marked unhealthy by other parallel thread.
stateMachineHealthy.set(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So question; if a thread has marked the container as unhealthy why should a write be successful at all ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this patch is merged, but I have no way of understanding what this means -- so I appreciate some comments or feedback that explains what happens here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So question; if a thread has marked the container as unhealthy why should a write be successful at all ?

If a container is marked unhealthy, write will marked fail and log Append wlll fail in ratis. This is juts incrementing fail count metrics here and marking the stateMachine for the pipeline unhealthy so that now new ratis snapshots can be taken .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this patch is merged, but I have no way of understanding what this means -- so I appreciate some comments or feedback that explains what happens here?

Updated the description to add clarity to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How an unhealthy pipeline will be recovered? I got a lot of exception because the pipline is marked as unhealthy and remain in the unhealthy state...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants