HDDS-5249. Race Condition between Full and Incremental Container Reports#2268
HDDS-5249. Race Condition between Full and Incremental Container Reports#2268sodonnel merged 1 commit intoapache:masterfrom
Conversation
|
cc @GlenGeng |
|
+1. Thanks for the fix of this subtle issue. BTW, about the issue 2,
Right now the order of FCR and ICR is not guaranteed, we may add some monotonically increasing |
|
@GlenGeng Thanks for the review. I will go ahead and commit this one, and raise a new Jira for issue 2. I think you are correct, in that we may need a monotonically increasing sequence number in the datanode, plus maybe some locking in the DN to guarantee the contents of the FCR. The hard part, is figuring out what to do with that information on SCM. I wonder if we need to attach the "report sequence number" to every replica in SCM. Then we can say:
However this approach adds a new field to every replica which would have a memory overhead, so we need to try to think of a better way. |
|
HDDS-5267 raised to track / fix the second issue mentioned here. |
…ing-upgrade-master-merge * upstream/master: (76 commits) HDDS-5280. Make XceiverClientManager creation when necessary in ContainerOperationClient (apache#2289) HDDS-5272. Make ozonefs.robot execution repeatable (apache#2280) HDDS-5123. Use the pre-created apache/ozone-testkrb5 image during secure acceptance tests (apache#2165) HDDS-4993. Add guardrail for reserved buffer size when DN reads a chunk (apache#2058) HDDS-4936. Change ozone groupId from org.apache.hadoop to org.apache.ozone (apache#2018) HDDS-4043. allow deletion from Trash directory without -skipTrash option (apache#2110) HDDS-4927. Determine over and under utilized datanodes in Container Balancer. (apache#2230) HDDS-5273. Handle unsecure cluster convert to secure cluster for SCM. (apache#2281) HDDS-5158. Add documentation for SCM HA Security. (apache#2205) HDDS-5275. Datanode Report Publisher publishes one extra report after DN shutdown (apache#2283) HDDS-5241. SCM UI should have leader/follower and Primordial SCM information (apache#2260) HDDS-5219. Limit number of bad volumes by dfs.datanode.failed.volumes.tolerated. (apache#2243) HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. (apache#2271) HDDS-5191. Increase default pvc storage size (apache#2219) HDDS-5073. Use ReplicationConfig on client side (apache#2136) HDDS-5250. Build integration tests with Maven cache (apache#2269) HDDS-5236. Require block token for more operations (apache#2254) HDDS-5266 Misspelt words in S3MultipartUploadCommitPartRequest.java line 202 (apache#2279) HDDS-5249. Race Condition between Full and Incremental Container Reports (apache#2268) HDDS-5142. Make generic streaming client/service for container re-replication, data read, scm/om snapshot download (apache#2256) ... Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java hadoop-ozone/dist/src/main/compose/testlib.sh hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto hadoop-ozone/ozone-manager/pom.xml hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java hadoop-ozone/s3gateway/pom.xml
…rts (apache#2268) (cherry picked from commit ab8f07d) Change-Id: Ie9e4e45abe7a872782ec0966c84ed4dfa0793962
What changes were proposed in this pull request?
During testing we came across an issue with ICR and FCR handing.
The following log shows the issue:
In the above log, SCM is processing both an ICR and FCR for the same Datanode at the same time. The FCR does not container container #1001.
The FCR starts first, and it takes a snapshot of the containers on the node via NodeManager.
Then it starts processing the containers one by one.
The ICR then starts, and it added #1001 to the ContainerManager and to the NodeManager.
When the FCR completes, it replaces the list of containers in NodeManager with those in the FCR.
At this point, container #1001 is in the ContainerManager, but it is not listed against the node in NodeManager.
This would get fixed by the next FCR, but then the node goes dead. The dead node handler runs and uses the list of containers in NodeManager to remove all containers for the node. As #1001 is not listed, it is not removed by the DeadNodeManager. This means the container will never been seen as under replicated, as 3 copies will exist forever in the ContainerManager.
This issue is quite tricky to fully fix. There are two issues:
Parallel processing of ICR and FCR can lead to data inconsistency between the ComtainerManager and NodeManager. This is what caused the bug above.
A FCR wiping out a reference to a container recently sent in an ICR, but which is not included in the FCR.
The second issue is less serious, as the next FCR will fix the problem, as the FCRs are produced approximately every 60 seconds by default.
We can fix problem 1 quite easily by synchronising on the datanode when processing FCRs and ICRs, that will ensure the data inconsistency will not happen.
This PR is for issue 1, and we should probably create a followup issue for 2.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5249
How was this patch tested?
Added a new test to reproduce the race condition and verified it passes after the code change.