Skip to content

Commit

Permalink
repair/repair.cc: do_repair_ranges(): prevent stalls when skipping ra…
Browse files Browse the repository at this point in the history
…nges

We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: scylladb#14330
  • Loading branch information
denesb committed Oct 30, 2023
1 parent 460bc7d commit 6a7dff0
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions repair/repair.cc
Original file line number Diff line number Diff line change
Expand Up @@ -974,6 +974,13 @@ future<> repair::shard_repair_task_impl::do_repair_ranges() {
rlogger.info("repair[{}]: Started to repair {} out of {} tables in keyspace={}, table={}, table_id={}, repair_reason={}",
global_repair_id.uuid(), idx + 1, table_ids.size(), _status.keyspace, table_info.name, table_info.id, _reason);
co_await coroutine::parallel_for_each(ranges, [this, table_info] (auto&& range) -> future<> {
// It is possible that most of the ranges are skipped. In this case
// this lambda will just log a message and exit. With a lot of
// ranges, this can result in stalls, as there are no opportunities
// to yield when ranges are skipped. The yield below is meant to
// ensure prevent this.
co_await coroutine::maybe_yield();

// Get the system range parallelism
auto permit = co_await seastar::get_units(rs.get_repair_module().range_parallelism_semaphore(), 1);
// Get the range parallelism specified by user
Expand Down

0 comments on commit 6a7dff0

Please sign in to comment.