Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress p2p logs #1875

Merged
merged 9 commits into from Aug 1, 2019

Conversation

@abitmore
Copy link
Member

commented Jul 27, 2019

PR for #978.

This PR reduces logging level of some common exceptions, greatly reduce size of p2p log files.

Current implementation doesn't address the txid mismatch issue mentioned in #978 (comment).

@abitmore abitmore added this to the 3.3.0 - Feature Release milestone Jul 27, 2019

@abitmore abitmore added this to In development in Feature Release (3.3.0) via automation Jul 27, 2019

Reduce log level of exceptions in order_cancel_op
Some bots place limit orders then try to cancel some of them before a
new block arrives. However, the order's id (in pending state) may be
different in individual nodes, thus caused the exceptions.

A potential solution to the order_id inconsistent issue:
#556

@abitmore abitmore force-pushed the pr-978-p2p-logging branch from 2fc099d to eea889d Jul 27, 2019

@abitmore

This comment has been minimized.

Copy link
Member Author

commented Jul 28, 2019

The next large message in p2p log is

2019-07-28T21:00:00 p2p:message read_loop on_closing_connectio ] Peer xx.xx.xx.xx:xxxxx is disconnecting us because of an error: You offered us a block that we reject
as invalid, exception: { ... "Rejecting block with timestamp in t
he future" ...

which followed with a JSON dump of a whole block due to the capture and re-throw of blk_msg in application_impl::handle_block():

} FC_CAPTURE_AND_RETHROW( (blk_msg)(sync_mode) ) return false; }

The reason why this message appears in a normal node's log file, is when a normal node has a peer whose system clock is behind, the "stale" node may repeatedly reject / disconnect to the normal node and re-connect, the exception message with the whole block will be sent to the normal node when disconnecting. One stale node can generate 200+ such messages in normal nodes' log files every hour, which is 2-3 MB with recent blocks (depends on size of blocks). The more stale nodes connected, the more the messages, thus the larger the log files.

On the other hand, the message could be helpful for nodes whose system clock is ahead.

I'm not sure what's the best way to deal with this. My gut says we'd better send shorter message when disconnecting due to this reason, E.G. only includes the block header but not the whole block.

Thoughts?

@pmconrad

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2019

Agree to send only message header and block header in this case. Returning the full block is waste of bandwidth + disk space.

@jmjatlanta

This comment has been minimized.

Copy link
Contributor

commented Jul 29, 2019

The code looks good. Any suggestions on how to effectively and easily test your changes?

Side note: Do we need an appender that could help us with testing (i.e. appends to a vector)?

@abitmore

This comment has been minimized.

Copy link
Member Author

commented Jul 29, 2019

@jmjatlanta one of my mainnet public API nodes is running the patched code.

Reduce log level of exceptions in order_create_op
Some misconfigured bots were trying to place limit orders in
unauthorized markets, or place orders when don't have sufficient
balance (this could also be caused by inconsistent pending state).
case 3050105: // limit_order_create_receiving_asset_unauthorized
case 3050106: // limit_order_create_insufficient_balance
case 3050201: // limit_order_cancel_nonexist_order
case 3050202: // limit_order_cancel_owner_mismatch

This comment has been minimized.

Copy link
@abitmore

abitmore Jul 29, 2019

Author Member

I feel this code (a hard-coded list of magic numbers) is a bit ugly. Any idea to improve?

This comment has been minimized.

Copy link
@pmconrad

pmconrad Jul 30, 2019

Contributor

Does using e. g. limit_order_cancel_nonexist_order::code_enum::code_value work?

This comment has been minimized.

Copy link
@abitmore

abitmore Jul 30, 2019

Author Member

Thanks, it works.

But I still think hard-coding a list here is not very good, since we may need to add more items later. It would be better if we can make it somewhat configurable, e.g. can define the log level when throwing the exception, or use a config file, etc. Perhaps not worth the efforts though.

@abitmore

This comment has been minimized.

Copy link
Member Author

commented Jul 30, 2019

New log messages about "timestamp-in-the-future" blocks look like this:

2019-07-30T20:44:39 p2p:message read_loop on_closing_connectio ] Peer 127.0.0.1:39889 is disconnecting us because of an error: You offered us a block that we reject as invalid, exception: {"code":90007,"name":"block_timestamp_in_future_exception","message":"block timestamp in the future","stack":[{"context":{"level":"warn","file":"node.cpp","line":2589,"method":"send_sync_block_to_node_delegate","hostname":"","thread_name":"p2p","timestamp":"2019-07-30T20:44:39"},"format":"","data":{"block_header":{"previous":"00000014aebc77a146c7e989ea4cc3f4f6fe3b58","timestamp":"2019-07-30T20:44:45","witness":"1.6.3","transaction_merkle_root":"0000000000000000000000000000000000000000","extensions":[]},"block_num":21,"block_id":"00000015a564b87c8fffa7c442d8dee508e1ddc1"}}]} node.cpp:2432

Average size of such messages reduced from around 12KB (data of 2019-07-29) to 800 bytes.

@pmconrad
Copy link
Contributor

left a comment

Looks good.

@abitmore abitmore merged commit d62131e into develop Aug 1, 2019

3 checks passed

ci/dockercloud Your tests passed in Docker Cloud
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

Feature Release (3.3.0) automation moved this from In development to Done Aug 1, 2019

@abitmore abitmore deleted the pr-978-p2p-logging branch Aug 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
3 participants
You can’t perform that action at this time.