Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quic batched send metrics are unreliable on error #283

Open
t-nelson opened this issue Mar 17, 2024 · 0 comments
Open

quic batched send metrics are unreliable on error #283

t-nelson opened this issue Mar 17, 2024 · 0 comments

Comments

@t-nelson
Copy link

Problem

rpc sts only increments its send failure once on error, regardless of batch size in the case of batched sends

let result = if wire_transactions.len() == 1 {
Self::send_transaction(tpu_address, wire_transactions[0], connection_cache)
} else {
Self::send_transactions_with_metrics(tpu_address, wire_transactions, connection_cache)
};
if let Err(err) = result {
warn!(
"Failed to send transaction transaction to {}: {:?}",
tpu_address, err
);
stats.send_failure_count.fetch_add(1, Ordering::Relaxed);
}

it can't really do any better because the quic tpu client batched send implementation bails on the first error, discarding statuses of any prior and pending sends
for f in futures {
f.await
.into_iter()
.try_for_each(|res| res)
.map_err(Into::<ClientErrorKind>::into)?;
}
Ok(())

Proposed Solution

investigate whether waiting to collect all results of a batched send has significant performance penalty.
if so, split rpc sts metrics to report failed single and batched sends separately.
if not, rework the quic client's batched send such that in can return the error count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant