Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Handle oncommits and wait for future work items from mClock queue #47216

Merged
merged 1 commit into from Aug 6, 2022

Conversation

sseshasa
Copy link
Contributor

@sseshasa sseshasa commented Jul 21, 2022

When a worker thread with the smallest thread index waits for future work
items from the mClock queue, oncommit callbacks are called. But after the
callback, the thread has to continue waiting instead of returning back to
the ShardedThreadPool::shardedthreadpool_worker() loop. Returning results
in the threads with the smallest index across all shards to busy loop
causing very high CPU utilization.

The fix involves reacquiring the shard_lock and waiting on sdata_cond
until notified or until time period lapses. After this, the smallest
thread index repopulates the oncommit queue from the context_queue
if there were any additions.

Fixes: https://tracker.ceph.com/issues/56530
Signed-off-by: Sridhar Seshasayee sseshasa@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

When a worker thread with the smallest thread index waits for future work
items from the mClock queue, oncommit callbacks are called. But after the
callback, the thread has to continue waiting instead of returning back to
the ShardedThreadPool::shardedthreadpool_worker() loop. Returning results
in the threads with the smallest index across all shards to busy loop
causing very high CPU utilization.

The fix involves reacquiring the shard_lock and waiting on sdata_cond
until notified or until time period lapses. After this, the smallest
thread index repopulates the oncommit queue from the context_queue
if there were any additions.

Fixes: https://tracker.ceph.com/issues/56530
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
@sseshasa sseshasa requested a review from a team as a code owner July 21, 2022 18:54
@github-actions github-actions bot added the core label Jul 21, 2022
@neha-ojha
Copy link
Member

@sseshasa can you please attach a link to your before/after CPU utilization comparison plots to this PR?

@sseshasa
Copy link
Contributor Author

Test CPU Utilization During Client Ops + Recoveries
The charts below illustrate the before and after effect of the fix. In both the charts, the test has both client ops and recoveries/backfills going on. A few seconds after recoveries and client ops are started, client ops are stopped to check if recoveries alone consume a lot of CPU.

Test using HDDs - Before Fix
The following chart shows the CPU usage during client ops and recoveries/backfills without the fix. It shows very high CPU usage both during and after client ops are stopped. One osd continues to consume 100% of the CPU when recovery rate is low towards the tail end.

mClock_OSD_CPU_Utlization_Before_Fix_HDD

Test using HDDs - After Fix
The initial usage when client ops + recoveries are going on are low in the range 5% - 11%. After the client ops are stopped but with recoveries still going on, there is a steep drop in the CPU utilization across all the OSDs and remains that way. For one of the osds towards the tail end of recovery, the CPU utilization hovers at around 1%.
mClock_OSD_CPU_Utilization_After_Fix_HDD

Test using SSDs - Before Fix
The following chart shows CPU usage before the fix with the test using SSDs. It illustrates the same point made above with HDDs with CPU usage being very high with only recoveries going on after client ops are stopped midway.

mClock_OSD_CPU_Utlization_Before_Fix_SSD

Test using SSDs - After Fix
The initial high usage is with client ops + recoveries as expected. After the client ops are stopped and with recoveries going on, the CPU usage drops down to 45% and below through the recovery phase.

mClock_OSD_CPU_Utilization_After_Fix_SSD

@athanatos
Copy link
Contributor

This looks right, though ShardedOpWQ::_process has become rather tough to follow.

@sseshasa
Copy link
Contributor Author

sseshasa commented Aug 2, 2022

jenkins test windows

@sseshasa
Copy link
Contributor Author

sseshasa commented Aug 3, 2022

@tchaikov tchaikov merged commit d80e7dd into ceph:main Aug 6, 2022
14 of 16 checks passed
@sseshasa sseshasa deleted the wip-fix-high-cpu-util-in-process branch August 18, 2022 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants