New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eBPF Linux tracepoints #19866
eBPF Linux tracepoints #19866
Conversation
Concept ACK, thanks for working on this, I think this is a great way to have more visibility into internal metrics (for diagnosing issues, and statistics), without intrusive changes, or having to hand-roll all kind of custom tooling. |
Concept ACK: more tooling for allowing visibility into internals for power users is good. I don't think this Linux only improvement removes the need for #19509 at all: I would use both in different contexts :) |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
Concept ACK, will review/test. |
@jonatack thanks! you need |
Thanks for the heads-up! which probably saved me some time even though you mentioned it in the PR description. Done: |
Could list the probe points in gdb:
Is it really necessary to run this as root? Edit: yes it is necessary to run |
I'm trying the first example, but it complains about a missing function
|
"Wladimir J. van der Laan" <notifications@github.com> writes:
I'm trying the first example, but it complains about a missing function `buf`. Any idea? Maybe too old bptrace version?
yeah buf is a new thing from bpftrace 0.11 and bcc 0.16.0, you can just
remove that if you're testing.
bpftrace is a pretty high level thing, you can write lower level bpf
programs with bcc, I haven't tried that yet but it looks like its more
powerful. bpftrace is great for simple stuff though.
|
right, you can run bitcoin without root and then attach to it with bpftrace afterwards. |
I wasn't sure how we should name the probes, right now I just have the mapped to class::method but perhaps they could be more abstract like |
Good to know! I've built bpftrace from git and it works now (which is version
I think I prefer that. It has the probe point name describe more accurately the action taken and data that is traced.
I think these would be good to have too:
BTW: is it possible to do binary logging with |
Rebased version here: https://github.com/laanwj/bitcoin/tree/usdt-probes |
src/net.cpp
Outdated
auto sanitizedType = SanitizeString(msg.m_type); | ||
|
||
LogPrint(BCLog::NET, "sending %s (%d bytes) peer=%d\n", sanitizedType, nMessageSize, pnode->GetId()); | ||
TRACE4(net, push_message, sanitizedType.c_str(), msg.data, nMessageSize, pnode->GetId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to pass the sanitized packet type to the probe here or simply the raw data.
Edit: Nah, I guess anyone that wants the raw data already has a pointer to the packet, which is passed too. It's extra information.
I've also made a start with documentation |
@laanwj writes:
I'm not sure we want to pass the sanitized packet type to the probe here or simply the raw data.
I'm not super familiar with the network packet format, is this easy to
pull from the raw data? If so we probably don't need it. I wasn't really
happy with the current binary capabilities of bpftrace but I imagine
this will get better over time once the tooling matures.
Rebased version here: https://github.com/laanwj/bitcoin/tree/usdt-probes
Thanks for moving this along. I reviewed the range diff and looks ok.
I've also made a start with documentation doc/tracing.md: https://github.com/laanwj/bitcoin/blob/35cad1adbf2bdf98d437e499ea10904d967b0083/doc/tracing.md
Amazing, thank you!
|
Yea the header is very simple, given that eBPF is used for complex packet fltering and pre-processing setups I'm a bit surprised the binary handling capabilities are still a weak point but it should be able to handle it:
|
doc/tracing.md
Outdated
|
||
### net | ||
|
||
- `net:process_message(char* addr_str, int node_id, char* msgtype_str, uint8_t* rx_data, size_t rx_size)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to rearrange the arguments to push_message
to be in the same order as process_message (eg move node_id up front before msgtype_str
):
net:push_message(int node_id, char* msgtype_str, uint8_t* tx_data, size_t tx_size)
I think this is ready to go out of draft and get review for the build system changes. It adds the basic infrastructure for tracing. Further tracepoints and examples can be added in later PRs. |
I think this is ready to go out of draft and get review for the build
system changes. It adds the basic infrastructure for tracing. Further
tracepoints and examples can be added in later PRs.
ok I'll push just those for now
|
Given that eBPF is used for complex packet fltering and pre-processing
setups I'm a bit surprised the binary handling capabilities are still
a weak point
This was really a comment on the bpftrace tool, not the capabilities of
ebpf itself. You can always write a C/BCC file and load the low level
probes that way, but I'm lazy and haven't tried that yet.
|
I remember when you posted about this on Twitter. |
Yes, that would be interesting. But yes, we can experiment with that later, Tested ACK 22eb793 |
Tested ACK 22eb793 Played around with the IBD example and added a I did try to extend the IBD example to log the hash of the connected block, but haven't had any luck getting this to work with bpftrace yet. Might try with |
0xB10C <notifications@github.com> writes:
I did try to extend the IBD example to log the hash of the connected
block, but haven't had any luck getting this to work with bpftrace
yet. Might try with `bcc` later.
I had the same experience. perhaps it will be easier once bpftrace has
better tooling and more control around printing arbitrary data. This
feature was only added recently:
bpftrace/bpftrace#1107
|
Example bcc python script for block-connected printing the block height and hash via USDT probes: click me for code.py and output.txtfrom __future__ import print_function
from bcc import BPF, USDT
import sys
from datetime import datetime
if len(sys.argv) < 2:
print("USAGE:", sys.argv[0], "path/to/bitcoind")
exit()
path = sys.argv[1]
debug = False
bpf_text = """
#include <uapi/linux/ptrace.h>
struct connect_block_t
{
u32 height;
u8 hash[32];
};
BPF_PERF_OUTPUT(connected_blocks);
int do_trace(struct pt_regs *ctx) {
struct connect_block_t cb = {};
bpf_usdt_readarg(1, ctx, &cb.height);
bpf_usdt_readarg_p(2, ctx, &cb.hash, sizeof(cb.hash));
connected_blocks.perf_submit(ctx, &cb, sizeof(cb));
return 0;
};
"""
u = USDT(path=str(path))
u.enable_probe(probe="connect_block", fn_name="do_trace")
if debug:
print(u.get_text())
b = BPF(text=bpf_text, usdt_contexts=[u])
def print_connected_blocks(cpu, data, size):
event = b["connected_blocks"].event(data)
height = event.height
hash_bytes = bytearray(event.hash)
hash_bytes.reverse()
hash = hash_bytes.hex()
print(datetime.now(), "height:", height, "hash:", hash)
b["connected_blocks"].open_perf_buffer(print_connected_blocks)
while 1:
try:
b.perf_buffer_poll()
except KeyboardInterrupt:
exit() Output on testnet:
or gist |
Add documentation for bitcoin#19866.
Add documentation for bitcoin#19866.
22eb793 tracing: add tracing framework (William Casarin) 933ab8a build: detect sys/sdt.h for eBPF tracing (William Casarin) Pull request description: Instead of writing ad-hoc logging everywhere (eg: bitcoin#19509), we can take advantage of linux user static defined traces, aka. USDTs ( not the stablecoin 😅 ) The linux kernel can hook into these tracepoints at runtime, but otherwise they have little to no performance impact. Traces can pass data which can be printed externally via tools such as bpftrace. For example, here's one that prints incoming and outgoing network messages: # Examples ## Network Messages ``` #!/usr/bin/env bpftrace BEGIN { printf("bitcoin net msgs\n"); @start = nsecs; } usdt:./src/bitcoind:net:push_message { $ip = str(arg0); $peer_id = (int64)arg1; $command = str(arg2); $data_len = arg3; $data = buf(arg3,arg4); $t = (nsecs - @start) / 100000; printf("%zu outbound %s %s %zu %d %r\n", $t, $command, $ip, $peer_id, $data_len, $data); @outbound[$command]++; } usdt:./src/bitcoind:net:process_message { $ip = str(arg0); $peer_id = (int64)arg1; $command = str(arg2); $data_len = arg3; $data = buf(arg3,arg4); $t = (nsecs - @start) / 100000; printf("%zu inbound %s %s %zu %d %r\n", $t, $command, $ip, $peer_id, $data_len, $data); @inbound[$ip, $command]++; } ``` $ sudo bpftrace netmsg.bt output: https://jb55.com/s/b11312484b601fb3.txt if you look at the bottom of the output you can see a histogram of all the messages grouped by message type and IP. nice! ## IBD Benchmarking ``` #!/usr/bin/env bpftrace BEGIN { printf("IBD to 500,000 bench\n"); } usdt:./src/bitcoind:CChainState:ConnectBlock { $height = (uint32)arg0; if ($height == 1) { printf("block 1 found, starting benchmark\n"); @start = nsecs; } if ($height >= 500000) { @EnD = nsecs; @duration = @EnD - @start; exit(); } } END { printf("duration %d ms\n", @duration / 1000000) } ``` This one hooks into ConnectBlock and prints the IBD time to height 500,000 starting from the first call to ConnectBlock Userspace static tracepoints give lots of flexibility without invasive logging code. It's also more flexible than ad-hoc logging code, allowing you to instrument many different aspects of the system without having to enable per-subsystem logging. Other ideas: tracepoints for lock contention, threads, what else? Let me know what ya'll think and if this is worth adding to bitcoin. ## TODO - [ ] docs? - [x] Integrate systemtap-std-dev/libsystemtap into build (provides the <sys/sdt.h> header) - [x] ~dtrace macos support? (is this still a thing?)~ going to focus on linux for now ACKs for top commit: laanwj: Tested ACK 22eb793 0xB10C: Tested ACK 22eb793 Tree-SHA512: 69242242112b679c8a12a22b3bc50252c305894fb3055ae6e13d5f56221d858e58af1d698af55e23b69bdb7abedb5565ac6b45fa5144087b77a17acd04646a75
…tion on User-Space, Statically Defined Tracing (USDT) 8f37f5c tracing: Tracepoint for connected blocks (0xb10c) 4224dec tracing: Tracepoints for in- and outbound P2P msgs (0xb10c) 469b71a doc: document systemtap dependency (0xb10c) 84ace9a doc: Add initial USDT documentation (0xb10c) Pull request description: This PR adds documentation for User-Space, Statically Defined Tracing (USDT) as well as three tracepoints (including documentation and usage examples). ## Context The `TRACEx` macros for tracepoints and build system changes for USDT were merged in bitcoin/bitcoin#19866 earlier this year. Issue bitcoin/bitcoin#20981 contains discussion about potential tracepoints and guidelines for adding them (also documented with this PR). USDT was a topic in a [core-dev-meeting discussion](https://bitcoin.jonasschnelli.ch/ircmeetings/logs/bitcoin-core-dev/2021/bitcoin-core-dev.2021-01-21-19.00.moin.txt) on 21st Jan, 2021. - [collabora.com: An eBPF overview, part 1: Introduction](https://www.collabora.com/news-and-blog/blog/2019/04/05/an-ebpf-overview-part-1-introduction/) - [collabora.com: An eBPF overview, part 2: Machine & bytecode](https://www.collabora.com/news-and-blog/blog/2019/04/15/an-ebpf-overview-part-2-machine-and-bytecode/) - [Brendan D. Gregg's blog posts, and book on on eBPF](http://www.brendangregg.com/) - [Brendan D. Gregg: Linux bcc/BPF Node.js USDT Tracing](http://www.brendangregg.com/blog/2016-10-12/linux-bcc-nodejs-usdt.html) ## USDT? Stablecoin? User-space, Statically Defined Tracing (USDT) allows for more observability during development, debugging, code review, and production usage. The tracepoints make it possible to keep track of custom statistics and enable detailed monitoring of otherwise hidden internals and have little to no performance impact when unused. Linux kernels (4.x or newer) can hook into the tracepoints and execute [eBPF] programs in a kernel VM once the tracepoint is called. This PR includes, for example, tracepoints for in- and outbound P2P messages. ``` USDT and eBPF Overview ====================== ┌──────────────────┐ ┌──────────────┐ │ tracing script │ │ bitcoind │ │==================│ 2. │==============│ │ eBPF │ tracing │ hooks │ │ │ code │ logic │ into┌─┤►tracepoint 1─┼───┐ 3. └────┬───┴──▲──────┘ ├─┤►tracepoint 2 │ │ pass args 1. │ │ 4. │ │ ... │ │ to eBPF User compiles │ │ pass data to │ └──────────────┘ │ program Space & loads │ │ tracing script │ │ ─────────────────┼──────┼─────────────────┼────────────────────┼─── Kernel │ │ │ │ Space ┌──┬─▼──────┴─────────────────┴────────────┐ │ │ │ eBPF program │◄──────┘ │ └───────────────────────────────────────┤ │ eBPF kernel Virtual Machine (sandboxed) │ └──────────────────────────────────────────┘ 1. The tracing script compiles the eBPF code and loads the eBFP program into a kernel VM 2. The eBPF program hooks into one or more tracepoints 3. When the tracepoint is called, the arguments are passed to the eBPF program 4. The eBPF program processes the arguments and returns data to the tracing script ``` The two main [eBPF] front-ends with support for USDT are [bpftrace] an [BPF Compiler Collection (BCC)]. BCC is used for complex tools and daemons and `bpftrace` is preferred for one-liners and shorter scripts. Example tracing scripts for both are provided with this PR. [eBPF]: https://ebpf.io/ [bpftrace]: https://github.com/iovisor/bpftrace [BPF Compiler Collection (BCC)]: https://github.com/iovisor/bcc This PR adds three tracepoints: - `net:inbound_message` - `net:outbound_message` - `valildation:block_connected` See `doc/tracing.md` and `contrib/tracing/` for documentation and example tracing scripts. ## Open Questions (Not in scope for this PR) - How to use these tracepoints under macOS? - Release builds with USDT support? - Should and can the tracepoints be automatically tested? ## Todo (before undraft) - [x] bcc example showing how to read raw P2P messages up to 32kb - [x] document that you need `sys/sdt.h` from `systemtap` for USDT support in Bitcoin Core (`apt install systemtap-sdt-dev` on debian-like). See bitcoin/bitcoin@933ab8a - [ ] release notes? ACKs for top commit: laanwj: re-ACK 8f37f5c jb55: ACK 8f37f5c Tree-SHA512: a92a8a2dfcd28465f58a6e5f50d39486817ef5f51214ec40bdb02a6843b9c08ea154fadb31558825ff3a4687477b90f2a5da5d6451989eef978e128a264c289d
…User-Space, Statically Defined Tracing (USDT) 8f37f5c tracing: Tracepoint for connected blocks (0xb10c) 4224dec tracing: Tracepoints for in- and outbound P2P msgs (0xb10c) 469b71a doc: document systemtap dependency (0xb10c) 84ace9a doc: Add initial USDT documentation (0xb10c) Pull request description: This PR adds documentation for User-Space, Statically Defined Tracing (USDT) as well as three tracepoints (including documentation and usage examples). ## Context The `TRACEx` macros for tracepoints and build system changes for USDT were merged in bitcoin#19866 earlier this year. Issue bitcoin#20981 contains discussion about potential tracepoints and guidelines for adding them (also documented with this PR). USDT was a topic in a [core-dev-meeting discussion](https://bitcoin.jonasschnelli.ch/ircmeetings/logs/bitcoin-core-dev/2021/bitcoin-core-dev.2021-01-21-19.00.moin.txt) on 21st Jan, 2021. - [collabora.com: An eBPF overview, part 1: Introduction](https://www.collabora.com/news-and-blog/blog/2019/04/05/an-ebpf-overview-part-1-introduction/) - [collabora.com: An eBPF overview, part 2: Machine & bytecode](https://www.collabora.com/news-and-blog/blog/2019/04/15/an-ebpf-overview-part-2-machine-and-bytecode/) - [Brendan D. Gregg's blog posts, and book on on eBPF](http://www.brendangregg.com/) - [Brendan D. Gregg: Linux bcc/BPF Node.js USDT Tracing](http://www.brendangregg.com/blog/2016-10-12/linux-bcc-nodejs-usdt.html) ## USDT? Stablecoin? User-space, Statically Defined Tracing (USDT) allows for more observability during development, debugging, code review, and production usage. The tracepoints make it possible to keep track of custom statistics and enable detailed monitoring of otherwise hidden internals and have little to no performance impact when unused. Linux kernels (4.x or newer) can hook into the tracepoints and execute [eBPF] programs in a kernel VM once the tracepoint is called. This PR includes, for example, tracepoints for in- and outbound P2P messages. ``` USDT and eBPF Overview ====================== ┌──────────────────┐ ┌──────────────┐ │ tracing script │ │ bitcoind │ │==================│ 2. │==============│ │ eBPF │ tracing │ hooks │ │ │ code │ logic │ into┌─┤►tracepoint 1─┼───┐ 3. └────┬───┴──▲──────┘ ├─┤►tracepoint 2 │ │ pass args 1. │ │ 4. │ │ ... │ │ to eBPF User compiles │ │ pass data to │ └──────────────┘ │ program Space & loads │ │ tracing script │ │ ─────────────────┼──────┼─────────────────┼────────────────────┼─── Kernel │ │ │ │ Space ┌──┬─▼──────┴─────────────────┴────────────┐ │ │ │ eBPF program │◄──────┘ │ └───────────────────────────────────────┤ │ eBPF kernel Virtual Machine (sandboxed) │ └──────────────────────────────────────────┘ 1. The tracing script compiles the eBPF code and loads the eBFP program into a kernel VM 2. The eBPF program hooks into one or more tracepoints 3. When the tracepoint is called, the arguments are passed to the eBPF program 4. The eBPF program processes the arguments and returns data to the tracing script ``` The two main [eBPF] front-ends with support for USDT are [bpftrace] an [BPF Compiler Collection (BCC)]. BCC is used for complex tools and daemons and `bpftrace` is preferred for one-liners and shorter scripts. Example tracing scripts for both are provided with this PR. [eBPF]: https://ebpf.io/ [bpftrace]: https://github.com/iovisor/bpftrace [BPF Compiler Collection (BCC)]: https://github.com/iovisor/bcc This PR adds three tracepoints: - `net:inbound_message` - `net:outbound_message` - `valildation:block_connected` See `doc/tracing.md` and `contrib/tracing/` for documentation and example tracing scripts. ## Open Questions (Not in scope for this PR) - How to use these tracepoints under macOS? - Release builds with USDT support? - Should and can the tracepoints be automatically tested? ## Todo (before undraft) - [x] bcc example showing how to read raw P2P messages up to 32kb - [x] document that you need `sys/sdt.h` from `systemtap` for USDT support in Bitcoin Core (`apt install systemtap-sdt-dev` on debian-like). See bitcoin@933ab8a - [ ] release notes? ACKs for top commit: laanwj: re-ACK 8f37f5c jb55: ACK 8f37f5c Tree-SHA512: a92a8a2dfcd28465f58a6e5f50d39486817ef5f51214ec40bdb02a6843b9c08ea154fadb31558825ff3a4687477b90f2a5da5d6451989eef978e128a264c289d
Instead of writing ad-hoc logging everywhere (eg: #19509), we can take advantage of linux user static defined traces, aka. USDTs ( not the stablecoin 😅 )
The linux kernel can hook into these tracepoints at runtime, but otherwise they have little to no performance impact. Traces can pass data which can be printed externally via tools such as bpftrace. For example, here's one that prints incoming and outgoing network messages:
Examples
Network Messages
output: https://jb55.com/s/b11312484b601fb3.txt
if you look at the bottom of the output you can see a histogram of all the messages grouped by message type and IP. nice!
IBD Benchmarking
This one hooks into ConnectBlock and prints the IBD time to height 500,000 starting from the first call to ConnectBlock
Userspace static tracepoints give lots of flexibility without invasive logging code. It's also more flexible than ad-hoc logging code, allowing you to instrument many different aspects of the system without having to enable per-subsystem logging.
Other ideas: tracepoints for lock contention, threads, what else?
Let me know what ya'll think and if this is worth adding to bitcoin.
TODO
dtrace macos support? (is this still a thing?)going to focus on linux for now