-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce ProgramCache write lock contention #1037
Reduce ProgramCache write lock contention #1037
Conversation
svm/src/transaction_processor.rs
Outdated
@@ -292,6 +292,7 @@ impl<FG: ForkGraph> TransactionBatchProcessor<FG> { | |||
|
|||
execution_time.stop(); | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
firstly, let's see which test should fail. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe, turned out there's no test to test this code specifically...
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1037 +/- ##
=======================================
Coverage 82.0% 82.0%
=======================================
Files 860 860
Lines 232898 232911 +13
=======================================
+ Hits 191071 191104 +33
+ Misses 41827 41807 -20 |
86c433e
to
1073cf7
Compare
1073cf7
to
cc12279
Compare
@@ -4162,7 +4162,7 @@ impl Bank { | |||
programs_modified_by_tx, | |||
} = execution_result | |||
{ | |||
if details.status.is_ok() { | |||
if details.status.is_ok() && !programs_modified_by_tx.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope this one is fairly uncontroversial. haha
svm/src/transaction_processor.rs
Outdated
// ProgramCache entries. Note that this flag is deliberately defined, so that there's still | ||
// at least one other batch, which will evict the program cache, even after the occurrences | ||
// of cooperative loading. | ||
if programs_loaded_for_tx_batch.borrow().loaded_missing { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess this one isn't so straightforward. better ideas are very welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global cache can also grow via cache.merge(programs_modified_by_tx);
above, not just by loading missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this? d50c11c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better, but the number of insertions and evictions can still be unbalanced because it is only a boolean.
Also, maybe we should move eviction to the place where we merge in new deployments? That way they could share a write lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better, but the number of insertions and evictions can still be unbalanced because it is only a boolean.
I intentionally chosen boolean, thinking the number of insertions and evictions doesn't need to be balanced. That's because evict_using_2s_random_selection()
continues to evict entries until they're under 90% of MAX_LOADED_ENTRY_COUNT(=256)
just with a single invocation. So, we just need to ensure these are called with sufficient frequency/timings to avoid cache bomb dos attack.
Also, maybe we should move eviction to the place where we merge in new deployments? That way they could share a write lock.
This is possible and that looks appealing. however it isn't trivial. Firstly, load_and_execute_sanitized_transactions
can be entered via 3 code path: replaying, banking, rpc tx simulation. I guess that's the reason this eviction is placed here to begin with as a the most shared code path for all of transaction executions?
The place where we merge in new deployments is the commit_transactions()
, which isn't touched by the rpc tx simulation for obvious reason. So, moving this eviction there would expose unbounded program cache entry grow dos (theoretically; assumes no new blocks for extended duration). Also, replaying and banking take the commit code-path under slightly different semantics. so, needs a bit of care to move this eviction nevertheless, even if we ignore the rpc concern...
all that said, I think the current code change should be good enough and safe enough?
runtime/src/bank.rs
Outdated
@@ -4162,7 +4162,7 @@ impl Bank { | |||
programs_modified_by_tx, | |||
} = execution_result | |||
{ | |||
if details.status.is_ok() { | |||
if details.status.is_ok() && !programs_modified_by_tx.is_empty() { | |||
let mut cache = self.transaction_processor.program_cache.write().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i noticed that this write-lock is per-tx write-lock, if the batch contains 2 or more transactions, while writing this: #1037 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. How about:
if execution_results.iter().any(|execution_result| matches!(execution_result, TransactionExecutionResult::Executed { details, programs_modified_by_tx } if details.status.is_ok() && !programs_modified_by_tx.is_empty())) {
let mut cache = self.transaction_processor.program_cache.write().unwrap();
for execution_result in &execution_results {
if let TransactionExecutionResult::Executed { programs_modified_by_tx, .. } = execution_result {
cache.merge(programs_modified_by_tx);
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, that incurs 2 pass looping for the worst case (totaling, O(2*N)
). also rather heavy code duplication.
Considering !programs_modified_by_tx.is_empty()
should be rare (unless malice), I think a quick and dirty memoization like this will be enough (this worst case's overall cost is O(Cm*N), where Cm << 2, Cm == memoization overhead
cce3075
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM + some nits about the comments
Changes seem fine to me, but I'm less familiar with program-cache stuff. In terms of lock contention, I just ran banking-bench as a sanity check - no programs should be compiled but if locks are grabbed less often might expect small boost in throughput. Results seem to show around the same if not slightly better:
|
this result isn't surprising considering:
So, banking stage is much like the blockstore processor in this regard... |
Problem
ProgramCache
's current locking behavior is needlessly hurting the unified scheduler's performance.Unified scheduler is a new transaction scheduler for the block verification (i.e. the replaying stage). It doesn't employ batching at all as a design choice. Technically, the unified scheduler still is using
TransactionBatch
es but only with 1 transaction for each. That means it bares all the extra overheads, which has been amortized by batching.Sidenote: I believe these overheads are all solvable (but not SOON except this one...). Also note that it's already 1.8x faster than the to-be-replaced
blockstore_processor
after this pr.Among all those overhead sources, one of the most visible one is
ProgramCache
's write-lock contentions. Currently,ProgramCache
is write-locking 3 times unconditionally per 1-tx batch (for loading byreplenish_program_cache()
, for evictions byload_and_execute_sanitized_transactions()
, for updated program commiting bycommit_transactions()
). so, it acutely hampers the unified scheduler concurrency.Summary of Changes
Reduce
write-lock
ofProgramCache
to bare-minimal. Now the usual case is 1 time for loading byreplenish_program_cache()
, while the worst case is still remaining at 3 times.roughly ~5% consistent improvement. also note that
blockstore-processor
isn't affected while both are taking the same code-path.perf improvements
before
after
after-the-pr's speedup-factor is now 1.8x:
(for the record) the merged commit
before(3f7b352):
after(90bea33):
context: extracted from #593
cc: @apfitzge