Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per-Peer Message Capture #19509

Merged
merged 6 commits into from Feb 2, 2021
Merged

Conversation

troygiorshev
Copy link
Contributor

@troygiorshev troygiorshev commented Jul 14, 2020

This PR introduces per-peer message capture into Bitcoin Core. 馃摀

Purpose

The purpose and scope of this feature is intentionally limited. It answers a question anyone new to Bitcoin's P2P protocol has had: "Can I see what messages my node is sending and receiving?".

Functionality

When a new debug-only command line argument capturemessages is set, any message that the node receives or sends is captured. The capture occurs in the MessageHandler thread. When receiving a message, it is captured as soon as the MessageHandler thread takes the message off of the vProcessMsg queue. When sending, the message is captured just before the message is pushed onto the vSendMsg queue.

The message capture is as minimal as possible to reduce the performance impact on the node. Messages are captured to a new message_capture folder in the datadir. Each node has their own subfolder named with their IP address and port. Inside, received and sent messages are captured into two binary files, msgs_recv.dat and msgs_sent.dat, like so:

message_capture/203.0.113.7:56072/msgs_recv.dat
message_capture/203.0.113.7:56072/msgs_sent.dat

Because the messages are raw binary dumps, included in this PR is a Python parsing tool to convert the binary files into human-readable JSON. This script has been placed on its own and out of the way in the new contrib/message-capture folder. Its usage is simple and easily discovered by the autogenerated -h option.

Future Maintenance

I sympathize greatly with anyone who says "the best code is no code".

The future maintenance of this feature will be minimal. The logic to deserialize the payload of the p2p messages exists in our testing framework. As long as our testing framework works, so will this tool.

Additionally, I hope that the simplicity of this tool will mean that it gets used frequently, so that problems will be discovered and solved when they are small.

FAQ

"Why not just use Wireshark"

Yes, Wireshark has the ability to filter and decode Bitcoin messages. However, the purpose of the message capture added in this PR is to assist with debugging, primarily for new developers looking to improve their knowledge of the Bitcoin Protocol. This drives the design in a different direction than Wireshark, in two different ways. First, this tool must be convenient and simple to use. Using an external tool, like Wireshark, requires setup and interpretation of the results. To a new user who doesn't necessarily know what to expect, this is unnecessary difficulty. This tool, on the other hand, "just works". Turn on the command line flag, run your node, run the script, read the JSON. Second, because this tool is being used for debugging, we want it to be as close to the true behavior of the node as possible. A lot can happen in the SocketHandler thread that would be missed by Wireshark.

Additionally, if we are to use Wireshark, we are at the mercy of whoever it maintaining the protocol in Wireshark, both as to it being accurate and recent. As can be seen by the many previous attempts to include Bitcoin in Wireshark (google "bitcoin dissector") this is easier said than done.

Lastly, I truly believe that this tool will be used significantly more by being included in the codebase. It's just that much more discoverable.

@troygiorshev troygiorshev changed the title Message dump Per-Peer Message Logging Jul 14, 2020
@troygiorshev troygiorshev marked this pull request as draft July 14, 2020 02:14
src/net_processing.cpp Outdated Show resolved Hide resolved
@naumenkogs
Copy link
Member

Concept ACK the high-level idea. I think this feature may be quite useful while debugging, and it requires very few lines of code.

I have two questions though:

  1. How to better integrate it with the existing codebase? Should the format be similar to what we have now:
    received: addr (30003 bytes) peer=0? Or maybe it doesn't matter.
  2. Is there any threat of this feature being used for DoS? Perhaps on a system where file create is expensive, an attacker could disconnect/reconnect (from a new net address?) to make us create a new file? Maybe the file should be created after 1 minute of connection lifetime? (Although the messages from the very beginning should not be lost).

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 14, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@jnewbery jnewbery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. I'm looking forward to playing around with this.

My first thought is that this would be much more useful if the user was able to specify an individual peer to log messages for, either through a command line argument -logmessages=<ip_addr> or through an RPC to enable and disable logging. Perhaps that can be done in a follow-up PR?

src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net_processing.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@practicalswift practicalswift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK: Very nice feature! Looking forward to using it. Thanks for adding it.

Feature request: Would it be possible to add a flag to the Python script that writes the recorded messages to one file each in the binary format the ProcessMessage(鈥) fuzzers expect (src/test/fuzz/process_message.cpp and src/test/fuzz/process_messages.cpp)?

Ideally with a file naming convention matching that of what libFuzzer and many other fuzzers generate: the file name being the SHA-1 of the file content.

That way this feature could be used to record live network messages that could be used directly as fuzzing inputs. A great way to generate a fuzzing seed corpus. That would be super neat! :)

src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.h Outdated Show resolved Hide resolved
src/net_processing.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
@troygiorshev
Copy link
Contributor Author

git range-diff master cf5966a 9898fa1

Made suggested changes and fixed the Travis problem.

Copy link
Member

@MarcoFalke MarcoFalke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks surprisingly simple, so Concept ACK

contrib/peer-logging/peer-logging-parser.py Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
@troygiorshev
Copy link
Contributor Author

@naumenkogs thanks for the concept ack!

How to better integrate it with the existing codebase? Should the format be similar to what we have now:
received: addr (30003 bytes) peer=0? Or maybe it doesn't matter.

IMO that format is good for a debug log, where things need to be compact on a single line. For this I much prefer JSON.

Is there any threat of this feature being used for DoS? ...

At the moment, with this locked behind a debug flag, I'm not worried about DoS vectors. For both debugging and learning someone should be at the keyboard and the node shouldn't be working with anything real. Additionally, I imagine that if writing a very small amount to disk in this way (every time a message is sent or received) is a DoS vector then there are many more serious ones available.

That said, this is worth consideration, especially if we think this will one day be used for wider purposes.

@troygiorshev troygiorshev force-pushed the message-dump branch 2 times, most recently from 45db287 to cbb154e Compare July 22, 2020 19:02
@troygiorshev
Copy link
Contributor Author

git range-diff master 9898fa1 cbb154e

  • Trivial Rebase
  • Added a commit at the beginning cleaning up PushMessage's and ProcessMessages's if statements
  • LogMessage now uses Span
  • command -> msg_type
  • python import is now relative to file
  • other small comments and fixes

src/net.cpp Show resolved Hide resolved
src/net_processing.cpp Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
@troygiorshev
Copy link
Contributor Author

git range-diff master cbb154e 863c6b5

  • Added Test
  • Fixed Travis warning
  • Made Time Mockable

@troygiorshev troygiorshev marked this pull request as ready for review July 23, 2020 20:13
@theStack
Copy link
Contributor

theStack commented Jul 26, 2020

Huge Concept ACK! Very nice idea with simple implementation, I'm also pretty sure this will be very helpful for debugging purposes. Will definitely code-review that within the next days.

Just one detail I find quite odd: why create two files instead of just one and not rather include the direction as part of the serialization for each message? Is it only about saving disk space (1 byte per message) or is there another reason I'm not aware of? I've never seen a logging/sniffing tool that divides up into multiple files before and tend to think it would only complicate re-assembling the whole send/receive order later.

@troygiorshev
Copy link
Contributor Author

troygiorshev commented Jul 27, 2020

@theStack Thanks for the concept ACK!

why create two files instead of just one

Good question; I considered a few ways of doing this. Just to be clear, I create two files per peer. Partially, yes, this saves 1 byte per message. I could have also just logged everything into one file, which meant I would have had to save the peer address alongside each message as well. Ultimately, I chose the current way with the intention of simplifying the usage of the parser. As it stands, the parser is a simple "just give me what you want me to parse" sort of program. This allows the user to take advantage of globbing when selecting which files to parse (e.g. **/*.dat or **/msgs_recv.dat). Additionally the parser, being written in python, is actually noticeably slow. Forcing it to process all of the messages and then filter would be unnecessarily inconvenient.

@theStack
Copy link
Contributor

why create two files instead of just one

Good question; I considered a few ways of doing this. Just to be clear, I create two files per peer. Partially, yes, this saves 1 byte per message. I could have also just logged everything into one file, which meant I would have had to save the peer address alongside each message as well. Ultimately, I chose the current way with the intention of simplifying the usage of the parser. As it stands, the parser is a simple "just give me what you want me to parse" sort of program. This allows the user to take advantage of globbing when selecting which files to parse (e.g. **/*.dat or **/msgs_recv.dat). Additionally the parser, being written in python, is actually noticeably slow. Forcing it to process all of the messages and then filter would be unnecessarily inconvenient.

Generally I'd say it makes sense to divide up into files/folders by criteria that I would likely use as frequent search key for accessing the logging data. Would I often want to access logs for a certain peer? Absolutely. Would I want to see only incoming or outcoming messages for a certain peer? Very unlikely, as the resulting data is probably not very valueable without the whole request/response stream in both directions. On the other hand, your arguments for simplification, efficiency (if only looking for incoming or outcoming messages is really such a frequent operation) and flexibility are also very strong, hence this solution is also okay with me.

I reviewed the code in more detail and noticed that there may be a serialization endianness problem, see the comments soon following below.

src/net.cpp Outdated Show resolved Hide resolved
src/net.cpp Outdated Show resolved Hide resolved
@jnewbery
Copy link
Contributor

Especially if we'd go in a direction of separating separate sessions to the same ip/port, I think there is an elegance to having exactly 1 file = 1 session.

I agree that this seems very nice. I don't have any objection to changing the output to be this way.

What about a directory structure of <CLIENT_NAME>_<CLIENT_VERSION>/<ip>_<port>/<timestamp>_<peerid>.dat?

I'm less convinced of this. For one, those details can easily be fished out of the received version message in the file. I also like the directory structure of one directory per peer in .bitcoin/message_capture (and one file per session in each of those directories). But again, if other people prefer that I won't strongly oppose it.

@sipa
Copy link
Member

sipa commented Jan 22, 2021

I'm less convinced of this.

Yes, just a suggestion to hear what people think.

For one, those details can easily be fished out of the received version message in the file.

Note that I'm talking about the version of the dumping node, not of the peer we're connecting to (that may also be interesting of course, but is a bit harder as it means you only know the filename after receiving VERSION).

I also like the directory structure of one directory per peer in .bitcoin/message_capture

If the intent is really "one directory per peer", the port perhaps shouldn't be included (or at least not for incoming connections), as the outgoing TCP port is newly chosen for every connection.

@troygiorshev
Copy link
Contributor Author

why not capture messages after CNode::ReceiveMsgBytes?

This has been brought up a couple times now, and it's a great question, so it's worth a full response (that won't be buried in resolved conversations).

My priority in this PR is that the capture of messages always occurs in order from an application layer perspective. Because of this, I have to capture messages in the MessageHandler thread.

If I were to instead capture incoming messages in the SocketHandler thread, say, right after ReceiveMsgBytes, the following could possibly occur:

  • receive and capture message X
  • receive and capture message Y
  • send and capture message A

whereas on the processing side (since X and Y are in in the vRecvMsg queue before being processed), the node actually did the following:

  • process message X
  • respond with message A
  • process message Y

So the order would be broken. Remember, this PR isn't just a fun replacement for wireshark. The point is that someone improving/debugging/learning bitcoin's p2p protocol can see what's actually going on from the node's perspective, without having to worry about how the timing of the queues is complicating things.

@sipa
Copy link
Member

sipa commented Jan 22, 2021

@troygiorshev That's a great point. I was thinking that the difference between net-level and net_processing-level ordering would only be different interleaving across peers, but you're right to point out it even obscures the processing within one peer. I agree it's valuable to dump in processing order.

@troygiorshev
Copy link
Contributor Author

is really nice to be able to have the messages in correct order without having to correlate timestamps from two different files to produce an "ordered" file covering the entire flow.

This is already done for you by the parser, just give it both files and the messages will be interleaved automatically in the output file. If I'm understanding you correctly, the following should do what you're looking for:

./message-capture-parser.py <folder>/*.dat -o out.json

That said...

I think there is an elegance to having exactly 1 file = 1 session.

I agree completely, I'll figure this out!

This commit adds the CaptureMessage function.  This will later be called
when any message is sent or received.  The capture directory is fixed,
in a new folder "message_capture" in the datadir.  Peers will then have
their own subfolders, named with their IP address and port, replacing
colons with underscores to keep compatibility with Windows.  Inside,
received and sent messages will be captured into two binary files,
msgs_recv.dat and msgs_sent.dat.

e.g.
message_capture/203.0.113.7_56072/msgs_recv.dat
message_capture/203.0.113.7_56072/msgs_sent.dat

The format has been designed as to result in a minimal performance
impact.  A parsing script is added in a later commit.
These calls are toggled by a debug-only "capturemessages" flag.  Default
disabled.
This commit adds contrib/message-capture/message-capture-parser.py, a python
script to be used alongside -capturemessages to parse the captured
messages.

It is complete with arguments and will parse any file given, sorting the
messages in the files when creating the output.  If an output file is
specified with -o or --output, it will dump the messages in json format
to that file, otherwise it will print to stdout.

The small change to the unused msg_generic is to bring it in line with
the other message classes, purely to avoid a bug in the future.
Add a functional test for CaptureMessage.  This connects and then
disconnects a peer so that the handshake can be used to check if capture
is being done correctly.

Included in a docstring in the test is the following:

From the data file we'll only check the structure.

We won't care about things like:
- Deserializing the payload of the message
    - This is managed by the deserialize methods in
      test_framework.messages
- The order of the messages
    - There's no reason why we can't, say, change the order of the
      messages in the handshake
- Message Type
    - We can add new message types

We're ignoring these because they're simply too brittle to test here.
This commit adds brief documentation for this feature.  Included in the
justification is the purpose of this feature as well as usage and
functionality tips.
@troygiorshev
Copy link
Contributor Author

git range-diff master a17b250 bff7c66

  • Message capture now more gracefully handles unrecognized and corrupted messages
  • Corrected typo in docs
  • Added comment to clarify how timestamps are being used here
  • A couple small improvements to the test

Thanks so much for the review everyone!

Up soon: 1 file = 1 session

@MarcoFalke
Copy link
Member

re-ACK bff7c66 only some minor changes: 馃憵

  • doc fixups
  • append messages in the python parser even when the msg type is unknown or something can not be decoded
Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

re-ACK bff7c66e67aa2f18ef70139338643656a54444fe only some minor changes: 馃憵

* doc fixups
* append messages in the python parser even when the msg type is unknown or something can not be decoded
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUhNwgwAsIB9TNhIdWgBLKYoNkJpnwXmAPbL/OAR3t+csB3wjowlJQMz9UijiMPN
H/EzFMGiWMQoDgfxBMewFNXtBetGBzdQHHO9kGk7S0uulqj4S6Tg1WAWrfSOPhbI
cSDuE8XSsPXq1qEpVS5NiVjsB5IJiVyER7LqSEnMp/iemdpgCIWfvxiJzaJNXbkj
ijCLQ3+2D7TX/e+N8iyHyIBnC/NbTGp8fzim7AoRf6O42NFWy7CVmM2aSu5qpW5B
7nxTuWff66xBAnGwCEOLgfbm3G96ZgBqXGrBXCfL5GNIVTqNTDblX08cfoyMLqdv
InGH+Qdz1ObG7tqY3TRq88U8Jgl+qm1N4Yi63lg2LISAL7r/qwAeQcyk/JiEsN8g
8cCVCLkTd3BPKsHb4O+ItsfLCg4xkqSPogLML+FjNj1ppDKxo/rPPrdV7Ss04lbo
5f4UbCSI9WMLMS+v2ZYSASW/4CXVXKF/bgAjzxv3VYK90+ITKnIoQcBjOZv5myGc
Zkcqkx7K
=FpGK
-----END PGP SIGNATURE-----

Timestamp of file with hash 1aec4b471c3da64520ce75cb6750abae42d383f6093d1a0c1987e5b405b08997 -

@michaelfolkson
Copy link
Contributor

Still need to do a lot of playing around with this. Interested if you have given any thought @troygiorshev on what questions this tool can easily answer (without additional digging) and what is possible for future improvements/future PRs. Don't want to derail review of this PR though so set up a StackExchange question.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK bff7c66
Verified that the error handling code that was introduced since my last ACK works, by corrupting a message-type in the middle of a random capture file (with a hex editor) and diffing the json outputs of the original and corrupted captures:

s/sendcmpct/sendcmpcx/

53,57c53,55
<     "msgtype": "sendcmpct",
<     "body": {
<       "announce": false,
<       "version": 2
<     }
---
>     "msgtype": "sendcmpcx",
>     "body": "000200000000000000",
>     "error": "Unrecognized message type."

s/sendcmpct/<random_unprintable_gibberish>

<     "msgtype": "sendcmpct",
<     "body": {
<       "announce": false,
<       "version": 2
<     }
---
>     "msgtype": "UNREADABLE",
>     "body": "000200000000000000",
>     "error": "Unrecognized message type."

@MarcoFalke
Copy link
Member

The nit here can be addressed, once you have to rebase: #19509 (comment)

@jnewbery
Copy link
Contributor

jnewbery commented Feb 2, 2021

Up soon: 1 file = 1 session

@troygiorshev Are you still planning to do this?

@jnewbery
Copy link
Contributor

jnewbery commented Feb 2, 2021

utACK bff7c66

This isn't part of the public interface, so the file layout can be changed in a follow-up.

@MarcoFalke MarcoFalke merged commit 384e090 into bitcoin:master Feb 2, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 3, 2021
if not tmp_header_raw:
break
tmp_header = BytesIO(tmp_header_raw)
int.from_bytes(tmp_header.read(TIME_SIZE), "little") # type: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the right place to ask, but when I'm running vulture against this file, it complains:

bitcoin/test/functional/p2p_message_capture.py:45: invalid syntax at "int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int"

Is that a bug in vulture or a problem in the code? If I assign the return value of from_bytes to a variable, it passes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a problem in the code. The return value of from_bytes is not being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the fix is:

-            int.from_bytes(tmp_header.read(TIME_SIZE), "little")      # type: int
+            tmp_header.read(TIME_SIZE)  # skip over the timestamp field

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leonardojobim
Copy link

leonardojobim commented Feb 10, 2021

I ran the message-capture-parser.py and received the following error:

Error:
Traceback (most recent call last):
  File "./message-capture-parser.py", line 214, in <module>
    main()
  File "./message-capture-parser.py", line 199, in main
    process_file(str(capture), messages, "recv" in capture.stem, progress_bar)
  File "./message-capture-parser.py", line 159, in process_file
    msg_dict["body"] = to_jsonable(msg)
  File "./message-capture-parser.py", line 82, in to_jsonable
    elif slot in HASH_INT_VECTORS and isinstance(val[0], int):
IndexError: list index out of range

So I changed the line 82
elif slot in HASH_INT_VECTORS and isinstance(val[0], int):
to
elif isinstance(val, list) and len(val) > 0 and slot in HASH_INT_VECTORS and isinstance(val[0], int):

The script worked after this change.

The reason of the error is the val variable had the value [] and when accessing val[0] the script crashed.
The error happened with following values for the msgtype and for the type of the obj variable (in to_jsonable() function):

msgtype: 'tx'       - obj type: <class 'test_framework.messages.CTxWitness'> 
msgtype: 'headers'  - obj type: <class 'test_framework.messages.msg_headers'>,
msgtype: 'blocktxn' - obj type: <class 'test_framework.messages.CTxWitness'> 
msgtype: 'block'    - obj type: <class 'test_framework.messages.CTxWitness'>

I tested it on Ubuntu 20.04 on VMWare.
The command used was ./message-capture-parser.py -o out2.json ~/.bitcoin/message_capture/**/*.dat

MarcoFalke pushed a commit that referenced this pull request Feb 17, 2021
鈥sion

de85af5 test: store subversion (user agent) as string in msg_version (Sebastian Falbesoner)

Pull request description:

  It seems more natural to treat the "subversion" field (=user agent string, see [BIP 14](https://github.com/bitcoin/bips/blob/master/bip-0014.mediawiki#Proposal)) of a node as pure string rather than a bytestring within the test framework. This is also suggested with the naming prefix in `msg_version.strSubVer`: one probably wouldn't expect a field starting with "str" to be a bytestring that needs further decoding to be useful. This PR moves the encoding/decoding parts to the serialization/deserialization routines so that the user doesn't have to bother with that anymore.

  Note that currently, in the master branch the `msg_version.strSubVer` is never read (only in `msg_version.__repr__`); However, one issue that is solved by this PR came up while testing #19509 (not merged yet): A decoding script for binary message capture files takes use of the functional test framework convert it into JSON format. Bytestrings will be convered to hexstrings, while pure strings will (surprise surprise) end up without modification in the file.

  So without this patch, we get:

  ```
  $ jq . out.json | grep -m5 strSubVer
        "strSubVer": "2f5361746f7368693a32312e39392e302f"
        "strSubVer": "2f5361746f7368693a302e32302e312f"
        "strSubVer": "2f5361746f7368693a32312e39392e302f"
        "strSubVer": "2f5361746f7368693a302e32302e312f"
        "strSubVer": "2f5361746f7368693a32312e39392e302f"
  ```

  After this patch:

  ```
  $ jq . out2.json | grep -m5 strSubVer
        "strSubVer": "/Satoshi:21.99.0/"
        "strSubVer": "/Satoshi:0.20.1/"
        "strSubVer": "/Satoshi:21.99.0/"
        "strSubVer": "/Satoshi:0.20.1/"
        "strSubVer": "/Satoshi:21.99.0/"
  ```

ACKs for top commit:
  jnewbery:
    utACK de85af5

Tree-SHA512: ff23642705c858e8387a625537dfec82e6b8a15da6d99b8d12152560e52d243ba17431b602b26f60996d897e00e3f37dcf8dc8a303ffb1d544df29a5937080f9
@jnewbery
Copy link
Contributor

Nice find @leonardojobim! You could also test for the list's non-emptiness by checking its booleanness, which is maybe more pythonic:

elif isinstance(val, list) and val and slot in HASH_INT_VECTORS and isinstance(val[0], int):

Do you mind opening a PR for this?

Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 18, 2022
Summary:
The purpose and scope of this feature is intentionally limited. It answers a question anyone new to Bitcoin's P2P protocol has had: "Can I see what messages my node is sending and receiving?".

When a new debug-only command line argument `capturemessages` is set, any message that the node receives or sends is captured. The capture occurs in the `MessageHandler` thread. When receiving a message, it is captured as soon as the `MessageHandler` thread takes the message off of the `vProcessMsg` queue. When sending, the message is captured just before the message is pushed onto the `vSendMsg` queue.

The message capture is as minimal as possible to reduce the performance impact on the node. Messages are captured to a new message_capture folder in the datadir. Each node has their own subfolder named with their IP address and port. Inside, received and sent messages are captured into two binary files, msgs_recv.dat and msgs_sent.dat, like so:
```
message_capture/203.0.113.7:56072/msgs_recv.dat
message_capture/203.0.113.7:56072/msgs_sent.dat
```
Because the messages are raw binary dumps, included in this PR is a Python parsing tool to convert the binary files into human-readable JSON. This script has been placed on its own and out of the way in the new contrib/message-capture folder. Its usage is simple and easily discovered by the autogenerated -h option.

This is a backport of [[bitcoin/bitcoin#19509 | core#19509]]

Test Plan:
`ninja all check-all`
`src/bitcoind -capturemessages`
`./contrib/message-capture/message-capture-parser.py -o out.json /bitcoinddata/message_capture/**/*.dat`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D12531
@bitcoin bitcoin locked and limited conversation to collaborators Jul 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet