Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add consensus time metrics on SUI side #4584

Merged
merged 1 commit into from
Sep 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions crates/sui-core/src/authority.rs
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ pub mod authority_store_tables;

mod authority_store;
use crate::epoch::epoch_store::EpochStore;
use crate::metrics::TaskUtilizationExt;
pub use authority_store::{
AuthorityStore, GatewayStore, ResolverWrapper, SuiDataStore, UpdateType,
};
Expand Down Expand Up @@ -141,6 +142,8 @@ pub struct AuthorityMetrics {
handle_node_sync_certificate_latency: Histogram,

total_consensus_txns: IntCounter,
handle_consensus_duration_mcs: IntCounter,
verify_narwhal_transaction_duration_mcs: IntCounter,

pub follower_items_streamed: IntCounter,
pub follower_items_loaded: IntCounter,
Expand Down Expand Up @@ -280,6 +283,18 @@ impl AuthorityMetrics {
registry,
)
.unwrap(),
handle_consensus_duration_mcs: register_int_counter_with_registry!(
"handle_consensus_duration_mcs",
"Total duration of handle_consensus_transaction",
registry,
)
.unwrap(),
verify_narwhal_transaction_duration_mcs: register_int_counter_with_registry!(
"verify_narwhal_transaction_duration_mcs",
"Total duration of verify_narwhal_transaction",
registry,
)
.unwrap(),
follower_items_streamed: register_int_counter_with_registry!(
"follower_items_streamed",
"Number of transactions/signed batches streamed to followers",
Expand Down Expand Up @@ -1876,6 +1891,10 @@ impl AuthorityState {
}

fn verify_narwhal_transaction(&self, certificate: &CertifiedTransaction) -> SuiResult {
let _timer = self
.metrics
.verify_narwhal_transaction_duration_mcs
.utilization_timer();
// Ensure the input is a shared object certificate. Remember that Byzantine authorities
// may input anything into consensus.
fp_ensure!(
Expand Down Expand Up @@ -1905,6 +1924,10 @@ impl ExecutionState for AuthorityState {
transaction: Self::Transaction,
) -> Result<Self::Outcome, Self::Error> {
self.metrics.total_consensus_txns.inc();
let _timer = self
.metrics
.handle_consensus_duration_mcs
.utilization_timer();
let tracking_id = transaction.get_tracking_id();
match transaction.kind {
ConsensusTransactionKind::UserTransaction(certificate) => {
Expand Down
14 changes: 12 additions & 2 deletions crates/sui-core/src/consensus_adapter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ use narwhal_executor::SubscriberResult;
use narwhal_types::TransactionProto;
use narwhal_types::TransactionsClient;
use parking_lot::Mutex;
use prometheus::register_int_counter_with_registry;
use prometheus::register_int_gauge_with_registry;
use prometheus::IntCounter;
use prometheus::IntGauge;
use prometheus::Registry;
use prometheus::{register_histogram_with_registry, register_int_counter_with_registry, Histogram};
use std::collections::VecDeque;
use std::sync::atomic::AtomicU64;
use std::sync::atomic::Ordering;
Expand Down Expand Up @@ -71,8 +71,9 @@ pub struct ConsensusAdapterMetrics {
pub sequencing_certificate_success: IntCounter,
pub sequencing_certificate_timeouts: IntCounter,
pub sequencing_certificate_control_delay: IntGauge,
pub sequencing_certificate_latency: Histogram,

// Certificate sequencing metrics
// Fragment sequencing metrics
pub sequencing_fragment_attempt: IntCounter,
pub sequencing_fragment_success: IntCounter,
pub sequencing_fragment_timeouts: IntCounter,
Expand Down Expand Up @@ -113,6 +114,12 @@ impl ConsensusAdapterMetrics {
registry,
)
.unwrap(),
sequencing_certificate_latency: register_histogram_with_registry!(
"sequencing_certificate_latency",
"The latency for certificate sequencing.",
registry,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should define buckets here until #4601 is merged (I will try to read today), the default values won't tell us much useful insight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Let's see if we can merge better histograms today, if not i'll make PR with better buckets [this one got merged via automerge]

)
.unwrap(),
sequencing_fragment_attempt: register_int_counter_with_registry!(
"sequencing_fragment_attempt",
"Counts the number of sequenced fragments submitted.",
Expand Down Expand Up @@ -351,6 +358,9 @@ impl ConsensusAdapter {
metrics
.sequencing_certificate_control_delay
.set(new_delay as i64);
metrics
.sequencing_certificate_latency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't past just the amount of time to send a transaction to consensus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(replied to Lu below)

.observe(past_ms as f64);
Comment on lines +361 to +363
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed that metric after I added mine. There is slight difference between them(one is pure consensus vs another adds some crypto checks), so I think we can keep both.

});

self.delay_ms.store(new_delay, Ordering::Relaxed);
Expand Down
30 changes: 29 additions & 1 deletion crates/sui-core/src/metrics.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (c) 2022, Mysten Labs, Inc.
// SPDX-License-Identifier: Apache-2.0

use prometheus::Histogram;
use prometheus::{Histogram, IntCounter};
use tokio::time::Instant;

pub fn start_timer(metrics: Histogram) -> impl Drop {
Expand All @@ -10,3 +10,31 @@ pub fn start_timer(metrics: Histogram) -> impl Drop {
metrics.observe(start_ts.elapsed().as_secs_f64());
})
}

pub struct TaskUtilizationGuard<'a> {
metric: &'a IntCounter,
start: Instant,
}

pub trait TaskUtilizationExt {
/// Measures amount of time spent until guard is dropped and increments the counter by duration in mcs
/// Primary usage for this counter is to measure 'utilization' of the single task
/// E.g. having rate(metric) / 1_000_000 can tell what portion of time this task is busy
/// For the tasks that are run in single thread this indicates how close is this task to a complete saturation
fn utilization_timer(&self) -> TaskUtilizationGuard;
}

impl TaskUtilizationExt for IntCounter {
fn utilization_timer(&self) -> TaskUtilizationGuard {
TaskUtilizationGuard {
start: Instant::now(),
metric: self,
}
}
}

impl<'a> Drop for TaskUtilizationGuard<'a> {
fn drop(&mut self) {
self.metric.inc_by(self.start.elapsed().as_micros() as u64);
}
}