Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10972. Reduce the default watch timeout configuration in DatanodeRatisServerConfig #6772

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented Jun 4, 2024

What changes were proposed in this pull request?

Reduce the default duration of Ratis server watch timeout hdds.ratis.raft.server.watch.timeout to reduce the amount of time client needs to wait during two-way commit (i.e. one DN is not reachable).

The watch timeout configuration affects the upper bound of how long the watch request will wait during a block write, let's set it to 10s as a good starting point.

See the related tickets for context.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10972

How was this patch tested?

Existing tests (thanks for @smengcl for validating the CI earlier). Clean CI: https://github.com/ivandika3/ozone/actions/runs/9365515025

Tested with ozone-ha docker-compose environment with one DN down (using "docker container pause")

Without watch timeout configuration change -> client needs to wait for 3 minutes (the default hdds.ratis.raft.server.watch.timeout configuration)

time ozone sh key put /vol1/bucket1/key1 ~/LICENSE.txt --type RATIS -r THREE
real 3m3.037s
user 0m9.957s
sys 0m1.106s 

hdds.ratis.raft.server.watch.timeout=30s -> client needs to wait for around 30s

time ozone sh key put /vol1/bucket1/key1 ~/LICENSE.txt --type RATIS -r THREE
real 0m33.897s
user 0m9.990s
sys 0m1.346s 

hdds.ratis.raft.server.watch.timeout=10s -> client needs to wait for around 10s

time ozone sh key put /vol1/bucket1/key1 ~/LICENSE.txt --type RATIS -r THREE
real	0m13.319s
user	0m9.966s
sys	0m1.102s

@ivandika3 ivandika3 marked this pull request as ready for review June 4, 2024 12:50
@ChenSammi
Copy link
Contributor

@ivandika3 , thanks for proposing this change. I also feel 180s is a little bit too long. It's good to change it to a smaller default value. But this 10s seems too aggressive. 10s is too easy to timeout in a write heavy environment, with all three healthy DN, which makes most of write requests watch will fall back to MAJORITY_COMMITTED watch.

)
private long watchTimeOut = Duration.ofSeconds(180).toMillis();
private long watchTimeOut = Duration.ofSeconds(10).toMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it looks like this getWatchTimeOut() is not called anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Yes, getWatchTimeOut is not called anywhere, which was confusing me at first.

However, the configuration hdds.ratis.raft.server.watch.timeout will be processed in RatisHelper#createRaftServerProperties, which will trim the hdds.ratis. prefix and set it to the Ratis properties. Checked previously that changing the configuration actually updated the Ratis server watch timeout configuration.

This means that some calls to DatanoteRatisServerConfig in XceiverServerRatis might not be necessary (e.g. , DatanoteRatisServerConfig#getClientPoolSize). However, we need to double check before removing these calls since some of the DatanodeRatisServerConfig property keys do not correspond to any Raft server property keys and need to be set explciitly (e.g. DatanodeRatisServerConfig#getStreamRequestThreads).

We can remove the getWatchTimeOut if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the explanation. It's OK to keep the getWatchTimeOut.

@ivandika3
Copy link
Contributor Author

ivandika3 commented Jun 5, 2024

@ChenSammi Thanks for the review.

But this 10s seems too aggressive. 10s is too easy to timeout in a write heavy environment, with all three healthy DN, which makes most of write requests watch will fall back to MAJORITY_COMMITTED watch

I see, we can increase it to higher value (e.g. 30s). Currently I don't think our cluster has encountered a workload situation where a normal block write can last more than 10s. Could you help to suggest a reasonable value based on the write-heavy workload? Thank you.

@ChenSammi ChenSammi requested a review from szetszwo June 5, 2024 06:05
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivandika3 , thanks a lot for working on this!

+1 the change looks good.

@ChenSammi
Copy link
Contributor

@ChenSammi Thanks for the review.

But this 10s seems too aggressive. 10s is too easy to timeout in a write heavy environment, with all three healthy DN, which makes most of write requests watch will fall back to MAJORITY_COMMITTED watch

I see, we can increase it to higher value (e.g. 30s). Currently I don't think our cluster has encountered a situation where a single block write can last more than 10s. Could you help to suggest a reasonable value based on the write-heavy workload? Thank you.

Here are some data I got when I did some stress writing test times ago,

image

Based on these data, I would suggest use 30s too.

@ivandika3
Copy link
Contributor Author

@ChenSammi Thank you for the providing the data. Updated to 30s. PTAL.

Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivandika3 .

@ivandika3 ivandika3 merged commit d11752f into apache:master Jun 6, 2024
39 checks passed
@ivandika3
Copy link
Contributor Author

Thank you for the reviews @szetszwo @ChenSammi , merged.

@ivandika3 ivandika3 self-assigned this Jun 6, 2024
@ChenSammi
Copy link
Contributor

@ivandika3, after the patch is merged, you can resolve the JIRA https://issues.apache.org/jira/browse/HDDS-10972 and add "1.5.0" as the fixed version.

@ivandika3
Copy link
Contributor Author

Thanks for the reminder @ChenSammi . Updated the ticket.

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants