Skip to content

Commit

Permalink
Do not double-check transaction effects when catching up to checkpoint (
Browse files Browse the repository at this point in the history
#4178)

When syncing to checkpoint we download effects and insert them in DB. If this fails we abort checkpoint sync process.

As such catching up to check point does not need to double-check that just inserted effects exist in DB, so this check can be skipped in this particular place.

Other places where new checkpoint is set still perform the check.

This will fix frequent "Checkpoint blocked by pending certificates" that we can see in devnet validator logs. In fact, currently any attempt to catch up to checkpoint by validator will fail first with this error and only will succeed on second attempt, this PR fixed this unfortunate behaviour.

This will also unblock full node being able to catch up to checkpoint during startup (in the future PR).

This PR is somewhat related to the #4168 - we remove redundant check when catching up to checkpoint. We still need another PR for that issue to check effects instead of batch service inclusion for other use cases.
  • Loading branch information
andll committed Aug 20, 2022
1 parent 597f899 commit 68755c7
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 24 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -618,11 +618,10 @@ where
return Err(SuiError::CheckpointingError { error });
}

checkpoint_db.lock().process_new_checkpoint_certificate(
checkpoint_db.lock().process_synced_checkpoint_certificate(
&past,
&contents,
&net.committee,
active_authority.state.database.clone(),
metrics,
)?;
}
Expand Down
36 changes: 14 additions & 22 deletions crates/sui-core/src/checkpoints/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ impl CheckpointStore {
let checkpoint = AuthenticatedCheckpoint::Signed(
SignedCheckpointSummary::new_from_summary(summary, self.name, &*self.secret),
);
self.handle_internal_set_checkpoint(&checkpoint, &ordered_contents, effects_store)
self.handle_internal_set_checkpoint(&checkpoint, &ordered_contents)
}

/// Call this function internally to update the latest checkpoint.
Expand All @@ -407,7 +407,6 @@ impl CheckpointStore {
&mut self,
checkpoint: &AuthenticatedCheckpoint,
contents: &CheckpointContents,
effects_store: impl CausalOrder + PendCertificateForExecution,
) -> SuiResult {
let summary = checkpoint.summary();
let checkpoint_sequence_number = *summary.sequence_number();
Expand All @@ -419,7 +418,6 @@ impl CheckpointStore {
.is_none());
debug_assert!(self.next_checkpoint() == checkpoint_sequence_number);

self.check_checkpoint_transactions(contents.transactions.iter(), &effects_store)?;
debug!(
"Number of transactions in checkpoint {:?}: {:?}",
checkpoint_sequence_number,
Expand Down Expand Up @@ -775,6 +773,19 @@ impl CheckpointStore {
committee: &Committee,
effects_store: impl CausalOrder + PendCertificateForExecution,
metrics: &CheckpointMetrics,
) -> SuiResult {
self.check_checkpoint_transactions(contents.transactions.iter(), &effects_store)?;
self.process_synced_checkpoint_certificate(checkpoint, contents, committee, metrics)
}

/// Unlike process_new_checkpoint_certificate this does not verify that transactions are executed
/// Checkpoint sync process executes it because it verifies transactions when downloading checkpoint
pub fn process_synced_checkpoint_certificate(
&mut self,
checkpoint: &CertifiedCheckpointSummary,
contents: &CheckpointContents,
committee: &Committee,
metrics: &CheckpointMetrics,
) -> SuiResult {
let seq = checkpoint.summary.sequence_number();
debug_assert!(self.tables.checkpoints.get(seq)?.is_none());
Expand All @@ -784,7 +795,6 @@ impl CheckpointStore {
self.handle_internal_set_checkpoint(
&AuthenticatedCheckpoint::Certified(checkpoint.clone()),
contents,
effects_store,
)?;
metrics.checkpoint_sequence_number.set(*seq as i64);
self.clear_proposal(*seq + 1)?;
Expand Down Expand Up @@ -1015,24 +1025,6 @@ impl CheckpointStore {
&mut self, // We take by &mut to prevent concurrent access.
transactions: &[(TxSequenceNumber, ExecutionDigests)],
) -> Result<(), SuiError> {
let in_checkpoint = self
.tables
.transactions_to_checkpoint
.multi_get(transactions.iter().map(|(_, tx)| tx))?;
let already_in_checkpoint_tx: Vec<_> = transactions
.iter()
.zip(&in_checkpoint)
.filter_map(|((_seq, tx), in_chk)| in_chk.map(|_| tx))
.collect();
if !already_in_checkpoint_tx.is_empty() {
// This should never happen, but if it happens, we should not let it block checkpoint
// progress either. Log the error so that we can keep track.
error!(
"Transactions are processed and updated from batch more than once: {:?}",
already_in_checkpoint_tx
);
}

let batch = self.tables.extra_transactions.batch();
let batch = batch.insert_batch(
&self.tables.extra_transactions,
Expand Down

0 comments on commit 68755c7

Please sign in to comment.