Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txHandler: add more metric #4786

Merged
merged 14 commits into from
Nov 17, 2022
Merged

Conversation

algorandskiy
Copy link
Contributor

@algorandskiy algorandskiy commented Nov 14, 2022

Summary

Add more metrics to tx handler to better understand txn traffic:

  • Number of duplicate or error transaction messages after txhandler backlog
  • Number of transaction messages with invalid txgroup fee
  • Number of transaction messages not well formed
  • Number of transaction messages with bad formed signature
  • Number of transaction messages with bad formed msig
  • Number of transaction messages with invalid logic sig
  • Number of transaction messages with signature verification failed
  • Number of transaction messages with some validation error
  • Number of transaction messages remembered in TX handler
  • Number of transaction groups remembered via txsync
  • Number of duplicate or error transaction groups received via txsync

Test Plan

Added unit tests

data/txHandler.go Outdated Show resolved Hide resolved
data/txHandler.go Show resolved Hide resolved
if request.tags[i] == protocol.ProposalPayloadTag {
networkPrioPPNonCompressedSize.AddUint64(uint64(len(d)), nil)
}
messageTags[request.tags[i]] = struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it, what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for correcting the PP metric. without it it counts all prio messages, not only PP

Copy link
Contributor Author

@algorandskiy algorandskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cce what other metrics we want to collect?

data/transactions/verify/txn.go Outdated Show resolved Hide resolved
data/transactions/verify/txn.go Outdated Show resolved Hide resolved
data/txHandler.go Outdated Show resolved Hide resolved
data/txHandler.go Show resolved Hide resolved
Copy link
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes seem not related to adding the metric.

@@ -146,6 +154,7 @@ func (handler *TxHandler) backlogWorker() {
return
}
if handler.checkAlreadyCommitted(wi) {
transactionMessagesDupPreBacklog.Inc(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note: a duplicate may be detected here because the node got the transactions through the pool sync.
Although the txSync transactions also go through the txHandler, they don't get check for duplication through this code path.
I am not saying this is wrong, I just wanted to put out some more information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was just looking at txSyncer.go and noticed it doesn't have any counters, and neither does the txHandler.processDecoded() call it makes when it adds txns from txSyncer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, we want to count duplicates from tx handler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to add some counters for how often txSyncer is actually putting txns in the pool (or even counting their failures too), which seems to be happening more often than we think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added AlreadyCommitted and Remembered counters

network/wsNetwork.go Outdated Show resolved Hide resolved
algonautshant
algonautshant previously approved these changes Nov 14, 2022
Copy link
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Some comments maybe for other considerations.

@codecov
Copy link

codecov bot commented Nov 14, 2022

Codecov Report

Merging #4786 (d998917) into master (526cb89) will decrease coverage by 0.00%.
The diff coverage is 66.32%.

@@            Coverage Diff             @@
##           master    #4786      +/-   ##
==========================================
- Coverage   54.67%   54.67%   -0.01%     
==========================================
  Files         416      416              
  Lines       53596    53662      +66     
==========================================
+ Hits        29306    29340      +34     
- Misses      21860    21895      +35     
+ Partials     2430     2427       -3     
Impacted Files Coverage Δ
data/transactions/verify/txn.go 76.03% <48.14%> (-0.16%) ⬇️
data/txHandler.go 33.33% <53.65%> (+7.40%) ⬆️
network/wsNetwork.go 66.99% <100.00%> (+0.20%) ⬆️
network/wsPeer.go 67.06% <100.00%> (-1.39%) ⬇️
catchup/service.go 68.14% <0.00%> (-1.24%) ⬇️
ledger/testing/randomAccounts.go 56.00% <0.00%> (-1.24%) ⬇️
ledger/tracker.go 74.04% <0.00%> (-0.86%) ⬇️
ledger/acctupdates.go 69.51% <0.00%> (-0.49%) ⬇️
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

algonautshant
algonautshant previously approved these changes Nov 15, 2022
Copy link
Contributor

@brianolson brianolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one tweak? Should we have a counter for 'other reason'? otherwise good.
I wish we didn't make new definitions of constants like algod_transaction_messages_already_committed in util/metrics instead of in the place they're used, but I recognize it matches the established pattern for that source.

data/txHandler.go Show resolved Hide resolved
iansuvak
iansuvak previously approved these changes Nov 16, 2022
Copy link
Contributor

@iansuvak iansuvak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@algorandskiy algorandskiy changed the title txhandler: add more metric txHandler: add more metric Nov 16, 2022
@algorandskiy
Copy link
Contributor Author

@brianolson @algonautshant renamed algod_transaction_messages_already_committed to algod_transaction_messages_err_or_committed since I got two complains on this name.

algonautshant
algonautshant previously approved these changes Nov 16, 2022
if prepErr != nil {
// re-wrap the error, take underlying one and copy the reason code
err = &ErrTxGroupError{
err: fmt.Errorf("transaction %+v invalid : %w", stxn, prepErr.err),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think txnBatchPrep is already returning ErrTxGroupError, maybe you would just want to keep the error it created (not make a new one), which preserves .Reason, and set .err on it to have this wrapped fmt?

@@ -185,6 +246,8 @@ func (handler *TxHandler) postprocessCheckedTxn(wi *txBacklogMsg) {
return
Copy link
Contributor

@cce cce Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to break out any of the errors where Remember() fails? Remember failing is transactionMessagesHandled - transactionMessagesRemember so we already know the total

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up: will do in separate PR

Copy link
Contributor

@algonautshant algonautshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. However, many of the added code is not covered by tests.

Comment on lines +1467 to +1468
if prio {
if request.tags[i] == protocol.ProposalPayloadTag {
Copy link
Contributor

@algonautshant algonautshant Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to make two levels of if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we want containsPrioPPTag

@@ -185,6 +246,8 @@ func (handler *TxHandler) postprocessCheckedTxn(wi *txBacklogMsg) {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up: will do in separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants