Skip to content

HDDS-8417. Cap on queue of commands at DN#4618

Merged
adoroszlai merged 8 commits intoapache:masterfrom
sumitagrawl:HDDS-8417
Apr 27, 2023
Merged

HDDS-8417. Cap on queue of commands at DN#4618
adoroszlai merged 8 commits intoapache:masterfrom
sumitagrawl:HDDS-8417

Conversation

@sumitagrawl
Copy link
Contributor

What changes were proposed in this pull request?

  • Block Delete Command queue cap to 5
  • Other queue is capped to 5000

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8417

How was this patch tested?

Impact is tested using existing UT and integration test

@adoroszlai adoroszlai requested a review from sodonnel April 26, 2023 05:57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description isn't really accurate - its really max commands of each command type that can be queued on a datanode, right? As the commands go from the command queue into sub queues in the various handlers in many cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the replication supervisor, I think the limit check should be in supervisor.addTask(). Then we don't need to add this check in two places (Reconstruction and replication handlers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel Updated...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this check, when we have limit checks in other places too. Eg we could have 5000 replications queued on its queue and then another 5000 here, so its a little confusing.

The command queue implementation is single threaded, and processes each command in turn. These are the list of handlers it currently has:

        .addHandler(new CloseContainerCommandHandler())  // Blocks
        .addHandler(new DeleteBlocksCommandHandler(getContainer(),  // async
            conf, dnConf.getBlockDeleteThreads(),
            dnConf.getBlockDeleteQueueLimit()))
        .addHandler(new ReplicateContainerCommandHandler(conf, supervisor, // async
            pullReplicatorWithMetrics, pushReplicatorWithMetrics))
        .addHandler(reconstructECContainersCommandHandler)  // async
        .addHandler(new DeleteContainerCommandHandler(         // async
            dnConf.getContainerDeleteThreads(), clock))
        .addHandler(new ClosePipelineCommandHandler())  // blocks
        .addHandler(new CreatePipelineCommandHandler(conf))  // blocks
        .addHandler(new SetNodeOperationalStateCommandHandler(conf)) // blocks
        .addHandler(new FinalizeNewLayoutVersionCommandHandler()) // blocks
        .addHandler(new RefreshVolumeUsageCommandHandler())  // blocks

So quite a few of the commands block processing other commands on the queue while they run. Most of these are probably very fast. The RefreshVolumeUsageCommandHandler worries me though. I am not sure what it does or how often it get called, but if it calls du it could be slow. Perhaps it should be async. Could closed and create pipeline potentially be slow too if there are some issues with Ratis election or the pipeline when closing the pipeline?

When you saw problems with the DNs having a lot of queued commands, where were the commands queued? On the replication supervisor / delete block queues, or was the command queue also backed up with a lot of messages, indicating its thread was not able to process the messages onto the sub-queues fast enough, or some blocking command was taking a long time to run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sodonnel
In earlier version, delete block was executed synchronously causing many commands accumulated. But this is resolved in later version.
The probability of command queue getting accumulated is rare, as commands takings lot of time is handover to sub-queue, and few commands only executed synchronously.
This is just a upper cap for all command queue to avoid any such issue if happens due to some bug or abnormal scenario, as good practice.

@sodonnel
Copy link
Contributor

Its worth noting here that there have been changes in the Replication Manager which means it will not keep adding and adding to the replication queue on the datanode, so problems where 1000's of replication commands end up on the datanode should not happen any more. Also, the datanode now drops the commands if they are too old for both replication and delete container commands.

I believe you saw some scenarios where the DN queue had a lot of message resulting in a crash. Do you have a breakdown of what commands were queued and where? Eg replication commands on the replication queue, what about the commandQueue where commands get placed first - was it also suffering from many messages stuck on it?

I don't see any new tests in this PR - if we are going to have logic to enforce limits we probably need a couple of simple tests to check they are working ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class is not enabled as it is not prefixed with Test...:

  • Helper to check scheduling efficiency.
  • This unit test is not enabled (doesn't start with Test) but can be used
  • to validate changes manually.

Need to move it into TestReplicationSupervisor

Copy link
Contributor

@sodonnel sodonnel Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to avoid sleeps in tests. It leads to slow tests and potentially flaky tests. Instead of a sleep please use some sort of barrier to block execution - eg a countdown latch. This test takes just over 4 seconds to run.

In TestReplicationSupervisor there is a BlockingTask class defined that makes use of latches to block execution for a period of time, so you could probably make use of that.

  // schedule a few tasks up to the limit
  // try to schedule one more and the queue size should not increase
 // unblock the execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should use getTotalInFlightReplications() instead of getQueueSize() as it will include running tasks as well as those queued.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest rewording this log and including the task class in it, eg:

LOG.warn(Ignored a {} command for container {} in the Replication Supervisor as the queue has reached the max size of {}.,
task.class, task.getContainerId(), maxQueueSize());

@adoroszlai
Copy link
Contributor

@sumitagrawl please fix checkstyle

hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestDeleteContainerCommandHandler.java
 28: Unused import - org.mockito.Mock.
 37: Unused import - org.mockito.ArgumentMatchers.eq.
 39: Unused import - org.mockito.Mockito.doNothing.

@sumitagrawl
Copy link
Contributor Author

@sumitagrawl please fix checkstyle

hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestDeleteContainerCommandHandler.java
 28: Unused import - org.mockito.Mock.
 37: Unused import - org.mockito.ArgumentMatchers.eq.
 39: Unused import - org.mockito.Mockito.doNothing.

Fixed...

@sumitagrawl sumitagrawl requested a review from sodonnel April 27, 2023 10:29
@adoroszlai
Copy link
Contributor

@sumitagrawl

new one:

hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestDeleteContainerCommandHandler.java
 38: Unused import - org.mockito.Mockito.atMost.

Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for implementing the changes.

@adoroszlai adoroszlai merged commit c616e8d into apache:master Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments