Fine-tune unified scheduler loops by select_biased #1437

ryoqun · 2024-05-20T14:11:33Z

Problem

unified scheduler is pretending to use the biased selection for its optimum performance.

Summary of Changes

Now that select_biased! is available (#1434 ); let's use that.

perf numbers

about 12%-13% increase can be attained in an ideal condition with a caveat:

before

ledger processed in 1 second, 370 ms
ledger processed in 1 second, 323 ms
ledger processed in 1 second, 399 ms

after

ledger processed in 1 second, 200 ms
ledger processed in 1 second, 202 ms
ledger processed in 1 second, 243 ms

Here's the caveat: unlike previous optimization prs (#627, #1037, #1192 (comment), #1197 (comment) and #1250 (comment) <= this is not merged yet), note that the perf gain is small in the present day of the real word, to be honest. I was actually forced to compromise here. ;) The above benchmark is conducted with the synthesized settings of both busy looping and nop handler like this:

diff --git a/unified-scheduler-pool/src/lib.rs b/unified-scheduler-pool/src/lib.rs
index 348b6bb038..b7cfdb0be1 100644
--- a/unified-scheduler-pool/src/lib.rs
+++ b/unified-scheduler-pool/src/lib.rs
@@ -236,23 +236,6 @@ impl TaskHandler for DefaultTaskHandler {
         index: usize,
         handler_context: &HandlerContext,
     ) {
-        // scheduler must properly prevent conflicting tx executions. thus, task handler isn't
-        // responsible for locking.
-        let batch = bank.prepare_unlocked_batch_from_single_tx(transaction);
-        let batch_with_indexes = TransactionBatchWithIndexes {
-            batch,
-            transaction_indexes: vec![index],
-        };
-
-        *result = execute_batch(
-            &batch_with_indexes,
-            bank,
-            handler_context.transaction_status_sender.as_ref(),
-            handler_context.replay_vote_sender.as_ref(),
-            timings,
-            handler_context.log_messages_bytes_limit,
-            &handler_context.prioritization_fee_cache,
-        );
     }
 }
 
@@ -802,6 +785,9 @@ impl<S: SpawnableScheduler<TH>, TH: TaskHandler> ThreadManager<S, TH> {
                                 state_machine.deschedule_task(&executed_task.task);
                                 Self::accumulate_result_with_timings(&mut result_with_timings, executed_task);
                             },
+                            default => {
+                                continue;
+                            },
                         };
 
                         is_finished = session_ending && state_machine.has_no_active_task();
@@ -843,6 +829,9 @@ impl<S: SpawnableScheduler<TH>, TH: TaskHandler> ThreadManager<S, TH> {
                             continue;
                         }
                     },
+                    default => {
+                        continue;
+                    },
                 };
                 let mut task = ExecutedTask::new_boxed(task);
                 Self::execute_task_with_handler(

So, this explains the exceeding time reduction compared to previous results.

However, I believe the above result will still stands by itself and this pr can be justified. That's because:

the result aligns with theoretical understanding.
the actual locking pattern (note that scheduling (locking & dep-graph-ing) is now dominant processing) is realistic because the same usual dataset is used like previous benchmarks.
the code change is quite small (diff is +3 lines -3 lines sans outdated comment removal). (no additional complexity and risk).
the underlying reason for the need of synthesized benchmark to show the improvement is our current mainnet-beta blocks are tooooo sparse and execute_batch is toooo slow. both will be addressed in the future.

codecov-commenter · 2024-05-20T16:57:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.7%. Comparing base (94af1aa) to head (b353804).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1437   +/-   ##
=======================================
  Coverage    82.7%    82.7%           
=======================================
  Files         872      872           
  Lines      370361   370354    -7     
=======================================
+ Hits       306478   306538   +60     
+ Misses      63883    63816   -67

apfitzge

lgtm - think adding biased selection makes sense for prioritization of unblocking completed tasks.

Fine-tune unified scheduler loops by select_biased

b353804

ryoqun marked this pull request as ready for review May 21, 2024 07:07

ryoqun requested a review from apfitzge May 21, 2024 07:07

ryoqun changed the title ~~[wip] Fine-tune unified scheduler loops by select_biased~~ Fine-tune unified scheduler loops by select_biased May 21, 2024

apfitzge approved these changes May 22, 2024

View reviewed changes

ryoqun merged commit e227d25 into anza-xyz:master May 22, 2024
40 checks passed

ryoqun mentioned this pull request May 22, 2024

Add select_biased! macro crossbeam-rs/crossbeam#1040

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune unified scheduler loops by select_biased #1437

Fine-tune unified scheduler loops by select_biased #1437

ryoqun commented May 20, 2024 •

edited

Loading

codecov-commenter commented May 20, 2024

apfitzge left a comment

Fine-tune unified scheduler loops by select_biased #1437

Fine-tune unified scheduler loops by select_biased #1437

Conversation

ryoqun commented May 20, 2024 • edited Loading

Problem

Summary of Changes

perf numbers

before

after

codecov-commenter commented May 20, 2024

Codecov Report

apfitzge left a comment

Choose a reason for hiding this comment

ryoqun commented May 20, 2024 •

edited

Loading