HDDS-10874. Freon tool DNRPCLoadGenerator should not cache client objects. #6705

jojochuang · 2024-05-20T16:32:34Z

What changes were proposed in this pull request?

HDDS-10874. Freon tool DNRPCLoadGenerator should not cache client objects.

The original XceiverClientSpi API always caches client objects to a specific pipeline. But in this freon tool we wanted to have multiple clients to parallelize throughput.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10874

How was this patch tested?

Ran on a real cluster. Other than that it leverages the existing tests.

After this change:

ozone freon dne --clients=32 --container-id=1 -t 32 -n 1000000 --ratis --read-only
rpc-payload
mean rate = 38794.07 calls/second

ozone freon dne --clients=1 --container-id=1 -t 32 -n 1000000 --ratis --read-only
count = 1000000
mean rate = 9987.42 calls/second

But number of clients has no impact for Ratis in write mode and GRPC.

…ects. Change-Id: Icc1adf11db4d07e970414e7d1cafcd22ad4ba40d

Change-Id: Ia22492a6633c65e00ef5f23d84e38db1a364c61f

kerneltime · 2024-05-20T18:11:48Z

@jojochuang why not make this the default behavior for reads on client side? Maybe we can change the logic on the client side to have a pool of clients per container, I guess the benefit for the client cache is to avoid the setup cost when there is sequential access to the same container but when there is parallel access to the same container we see a performance hit. Having a pool of clients per container and no upper bound on the number of clients across containers might be right way forward. We might still need a TTL or LRU for the clients.

adoroszlai · 2024-05-21T16:04:55Z

The original XceiverClientSpi API always caches client objects to a specific pipeline. But in this freon tool we wanted to have multiple clients to parallelize throughput.

Distinct clients can be acquired simply by providing pipeline with unique ID.

smengcl · 2024-05-21T23:39:37Z

@jojochuang why not make this the default behavior for reads on client side?

@kerneltime I want to take a different angle. -- I wonder where the overhead is coming from in the first place.

In the case of Ratis, XceiverClientRatis itself does not appear to have much synchronization overhead. So I did a little bit of digging. And I think the limiting factor can be this config in each RaftClient:

raft.client.async.outstanding-requests.max

by default each RaftClient allows 100 outstanding requests. As a result only 100 parallel RaftClient#send can be in-flight at one point:

https://github.com/apache/ratis/blob/release-3.0.1/ratis-client/src/main/java/org/apache/ratis/client/impl/OrderedAsync.java#L160-L183

We should try setting raft.client.async.outstanding-requests.max to 100000 and see what happens. cc @jojochuang @szetszwo

smengcl · 2024-05-21T23:40:53Z

And btw for grpc we also have a similar semaphore config in XceiverClientGrpc that limits parallelization for each XceiverClient instance:

ozone/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java

Lines 126 to 127 in 9b248a0

    
           this.semaphore = 
        
               new Semaphore(HddsClientUtils.getMaxOutstandingRequests(config));

and it is set to a default of 32:

ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/conf/RatisClientConfig.java

Lines 51 to 54 in 6839345

    
               description = 
        
               "Controls the maximum number of outstanding async requests that can" 
        
                   + " be handled by the Standalone as well as Ratis client.") 
        
           private int maxOutstandingRequests = 32;

jojochuang · 2024-05-22T18:43:34Z

Worth point out that the Echo request is implemented as a blocking call and the number of threads limits number of concurrent requests. And that is consistent across all ContainerProtocolCalls APIs.

For GRPC client, It made no difference increasing hdds.ratis.raft.client.async.outstanding-requests.max. They were all blocked waiting for response from DN.

Hadn't compared RPC Client. But given that 32 thread implies 32 inflight requests, I don't expect tuning that raft.client.async.outstanding-requests.max would make a difference.

adoroszlai · 2024-06-01T10:50:08Z

hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientFactory.java

+  XceiverClientSpi acquireClientUncached(Pipeline pipeline, boolean topologyAware)
+      throws Exception;


Instead of adding a new method to the XceiverClientFactory interface, how about creating a new, non-caching implementation as parent class of XceiverClientManager?

see ad95eb8

jojochuang added 2 commits May 17, 2024 11:18

HDDS-10874. Freon tool DNRPCLoadGenerator should not cache client obj…

5e7413b

…ects. Change-Id: Icc1adf11db4d07e970414e7d1cafcd22ad4ba40d

Fix checkstyle.

c81ff3e

Change-Id: Ia22492a6633c65e00ef5f23d84e38db1a364c61f

adoroszlai reviewed Jun 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-10874. Freon tool DNRPCLoadGenerator should not cache client objects. #6705

HDDS-10874. Freon tool DNRPCLoadGenerator should not cache client objects. #6705

jojochuang commented May 20, 2024

kerneltime commented May 20, 2024

adoroszlai commented May 21, 2024

smengcl commented May 21, 2024

smengcl commented May 21, 2024

jojochuang commented May 22, 2024 •

edited

adoroszlai Jun 1, 2024

adoroszlai Jun 1, 2024

		XceiverClientSpi acquireClientUncached(Pipeline pipeline, boolean topologyAware)
		throws Exception;

HDDS-10874. Freon tool DNRPCLoadGenerator should not cache client objects. #6705

Are you sure you want to change the base?

HDDS-10874. Freon tool DNRPCLoadGenerator should not cache client objects. #6705

Conversation

jojochuang commented May 20, 2024

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

kerneltime commented May 20, 2024

adoroszlai commented May 21, 2024

smengcl commented May 21, 2024

smengcl commented May 21, 2024

jojochuang commented May 22, 2024 • edited

adoroszlai Jun 1, 2024

Choose a reason for hiding this comment

adoroszlai Jun 1, 2024

Choose a reason for hiding this comment

jojochuang commented May 22, 2024 •

edited