DAOS-18607 object: misc fixes to improve server side congestion#18032
DAOS-18607 object: misc fixes to improve server side congestion#18032
Conversation
|
Errors are Unable to load ticket data |
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18032/1/testReport/ |
|
test_dfuse_daos_build_wb failed for DAOS-18813, not related with the patch. |
| * Direct use current ULT instead of dss_chore to send DTX RPC to avoid being | ||
| * blocked by some slow dss_chore users. DAOS-18607. | ||
| */ | ||
| if (0 && dss_has_enough_helper()) { |
There was a problem hiding this comment.
Not sure if possible we might expose a new helper like dss_core_is_busy() and do something like rather than totally disable dss_core.
if (dss_has_enough_helper() && !dss_core_is_busy())
There was a problem hiding this comment.
Not sure if possible we might expose a new helper like dss_core_is_busy() and do something like rather than totally disable dss_core.
if (dss_has_enough_helper() && !dss_core_is_busy())
In theory, that is possible. But the difficulty is how to define and measure the busy status.
d5d0ba5 to
3e0806f
Compare
Include the following improvements: 1. Increase RPC retry latency to reduce server load that is caused by resent IO requests. 2. Use current ULT to send DTX RPC instead of via dss_chore to avoid being blocked by some slow dss_chore uers. Signed-off-by: Fan Yong <fan.yong@hpe.com>
3e0806f to
0337f5e
Compare
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18032/6/testReport/ |
test_dfuse_daos_build_wb still failed for DAOS-18813, not related with the patch. |
Include the following improvements:
Increase RPC retry latency to reduce server load that is caused by resent IO requests.
Use current ULT to send DTX RPC instead of via helper::dss_chore to avoid being blocked
when helper xstream is too busy.
Steps for the author:
After all prior steps are complete: