Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btf: CO-RE relocation is very slow on bpf-next #1081

Closed
lmb opened this issue Jun 28, 2023 · 10 comments
Closed

btf: CO-RE relocation is very slow on bpf-next #1081

lmb opened this issue Jun 28, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@lmb
Copy link
Collaborator

lmb commented Jun 28, 2023

Using CO-RE to relocate against some common types like struct net, struct sk_buff and so on takes a looong time.

This was originally reported against pwru: cilium/pwru#189

The library spends a lot of time copying types to remove qualifiers:

(pprof) top -cum
Showing nodes accounting for 2.86s, 1.68% of 169.87s total
Dropped 361 nodes (cum <= 0.85s)
Showing top 10 nodes out of 104
      flat  flat%   sum%        cum   cum%
         0     0%     0%     98.79s 58.16%  main.main
         0     0%     0%     98.79s 58.16%  runtime.main
     2.86s  1.68%  1.68%     88.28s 51.97%  github.com/cilium/ebpf/btf.copier.copy
         0     0%  1.68%     88.24s 51.95%  github.com/cilium/ebpf.(*CollectionSpec).LoadAndAssign
         0     0%  1.68%     88.24s 51.95%  github.com/cilium/ebpf.(*CollectionSpec).LoadAndAssign.func1
         0     0%  1.68%     88.24s 51.95%  github.com/cilium/ebpf.(*collectionLoader).loadProgram
         0     0%  1.68%     88.24s 51.95%  github.com/cilium/ebpf.assignValues
         0     0%  1.68%     88.23s 51.94%  github.com/cilium/ebpf.newProgramWithOptions
         0     0%  1.68%     88.20s 51.92%  github.com/cilium/ebpf.applyRelocations
         0     0%  1.68%     88.20s 51.92%  github.com/cilium/ebpf/btf.CORERelocate

The problem is exacerbated by the fact that some common types seem to be present a bunch of times in the kernel BTF:

$ bpftool btf dump file btf/testdata/vmlinux-bpf-next-6.4.0-rc3-g60548b825b08 | grep -F "STRUCT 'net'"
[1735] STRUCT 'net' size=3904 vlen=46
[15852] STRUCT 'net' size=3904 vlen=46
[31853] STRUCT 'net' size=3904 vlen=46
[35931] STRUCT 'net' size=3904 vlen=46
[38825] STRUCT 'net' size=3904 vlen=46
[41011] STRUCT 'net' size=3904 vlen=46
[42603] STRUCT 'net' size=3904 vlen=46
[44088] STRUCT 'net' size=3904 vlen=46
[46099] STRUCT 'net' size=3904 vlen=46
[47639] STRUCT 'net' size=3904 vlen=46
[48988] STRUCT 'net' size=3904 vlen=46
[50893] STRUCT 'net' size=3904 vlen=46
[73283] STRUCT 'net' size=3904 vlen=46
[74222] STRUCT 'net' size=3904 vlen=46
[81145] STRUCT 'net' size=3904 vlen=46
[82995] STRUCT 'net' size=3904 vlen=46
[84469] STRUCT 'net' size=3904 vlen=46
[86587] STRUCT 'net' size=3904 vlen=46
[88388] STRUCT 'net' size=3904 vlen=46
[90207] STRUCT 'net' size=3904 vlen=46
[92254] STRUCT 'net' size=3904 vlen=46
[94647] STRUCT 'net' size=3904 vlen=46
[95809] STRUCT 'net' size=3904 vlen=46
[98809] STRUCT 'net' size=3904 vlen=46
[100269] STRUCT 'net' size=3904 vlen=46
[102827] STRUCT 'net' size=3904 vlen=46
[105943] STRUCT 'net' size=3904 vlen=46
[116377] STRUCT 'net' size=3904 vlen=46
[118156] STRUCT 'net' size=3904 vlen=46
[142116] STRUCT 'net' size=3904 vlen=46
[143698] STRUCT 'net' size=3904 vlen=46
[145247] STRUCT 'net' size=3904 vlen=46
[146855] STRUCT 'net' size=3904 vlen=46
[155939] STRUCT 'net' size=3904 vlen=46
[157794] STRUCT 'net' size=3904 vlen=46
[159996] STRUCT 'net' size=3904 vlen=46
[161879] STRUCT 'net' size=3904 vlen=46
[164535] STRUCT 'net' size=3904 vlen=46
[167831] STRUCT 'net' size=3904 vlen=46
[170472] STRUCT 'net' size=3904 vlen=46
[172942] STRUCT 'net' size=3904 vlen=46
[174460] STRUCT 'net' size=3904 vlen=46
[176415] STRUCT 'net' size=3904 vlen=46
[179167] STRUCT 'net' size=3904 vlen=46
[182520] STRUCT 'net' size=3904 vlen=46
[184211] STRUCT 'net' size=3904 vlen=46
[186833] STRUCT 'net' size=3904 vlen=46
[189317] STRUCT 'net' size=3904 vlen=46
[190878] STRUCT 'net' size=3904 vlen=46
[196388] STRUCT 'net' size=3904 vlen=46
[202142] STRUCT 'net' size=3904 vlen=46
[204810] STRUCT 'net' size=3904 vlen=46
[206288] STRUCT 'net' size=3904 vlen=46
[213502] STRUCT 'net' size=3904 vlen=46
[216873] STRUCT 'net' size=3904 vlen=46
[223909] STRUCT 'net' size=3904 vlen=46
[225185] STRUCT 'net' size=3904 vlen=46
[227361] STRUCT 'net' size=3904 vlen=46
[229073] STRUCT 'net' size=3904 vlen=46
[236957] STRUCT 'net' size=3904 vlen=46
[254213] STRUCT 'net' size=3904 vlen=46
[255898] STRUCT 'net' size=3904 vlen=46
[258453] STRUCT 'net' size=3904 vlen=46
[260138] STRUCT 'net' size=3904 vlen=46
[262430] STRUCT 'net' size=3904 vlen=46
[264319] STRUCT 'net' size=3904 vlen=46
[265305] STRUCT 'net' size=3904 vlen=46
[275750] STRUCT 'net' size=3904 vlen=46
[277522] STRUCT 'net' size=3904 vlen=46
[278776] STRUCT 'net' size=3904 vlen=46
[288110] STRUCT 'net' size=3904 vlen=46
[294419] STRUCT 'net' size=3904 vlen=46
[297195] STRUCT 'net' size=3904 vlen=46
[299021] STRUCT 'net' size=3904 vlen=46
[305120] STRUCT 'net' size=3904 vlen=46
[306607] STRUCT 'net' size=3904 vlen=46
[309527] STRUCT 'net' size=3904 vlen=46
[312213] STRUCT 'net' size=3904 vlen=46
[315430] STRUCT 'net' size=3904 vlen=46
[316833] STRUCT 'net' size=3904 vlen=46
[319361] STRUCT 'net' size=3904 vlen=46
[321659] STRUCT 'net' size=3904 vlen=46
[324716] STRUCT 'net' size=3904 vlen=46
[326194] STRUCT 'net' size=3904 vlen=46
[331294] STRUCT 'net' size=3904 vlen=46

Current CO-RE code will make a copy of every one of these types.

@lmb lmb added the bug Something isn't working label Jun 28, 2023
@brycekahle
Copy link
Contributor

@lmb I think this is essentially the issue I was seeing awhile ago.

@lmb
Copy link
Collaborator Author

lmb commented Jun 28, 2023

Some digging into why there are these duplicate types:

The types really are different as well:

$ diff -u 1735.txt 15852.txt
--- 1735.txt	2023-06-28 17:35:22.371062151 +0100
+++ 15852.txt	2023-06-28 17:35:36.980258636 +0100
@@ -27,21 +27,21 @@
 	'dev_index_head' type_id=1265 bits_offset=2432
 	'netdev_chain' type_id=519 bits_offset=2496
 	'hash_mix' type_id=22 bits_offset=2560
-	'loopback_dev' type_id=2037 bits_offset=2624
+	'loopback_dev' type_id=15870 bits_offset=2624
 	'rules_ops' type_id=47 bits_offset=2688
 	'core' type_id=1938 bits_offset=2816
 	'mib' type_id=1970 bits_offset=3072
 	'packet' type_id=1983 bits_offset=4032
 	'unx' type_id=1982 bits_offset=4352
 	'nexthop' type_id=2069 bits_offset=4608
-	'ipv4' type_id=2005 bits_offset=5632
-	'ipv6' type_id=2060 bits_offset=11264
+	'ipv4' type_id=15864 bits_offset=5632
+	'ipv6' type_id=15884 bits_offset=11264
 	'sctp' type_id=2070 bits_offset=17408
 	'nf' type_id=2075 bits_offset=19776
 	'ct' type_id=2095 bits_offset=21632
 	'gen' type_id=2157 bits_offset=23168
-	'bpf' type_id=2107 bits_offset=23232
-	'xfrm' type_id=2102 bits_offset=24064
+	'bpf' type_id=15927 bits_offset=23232
+	'xfrm' type_id=15926 bits_offset=24064
 	'net_cookie' type_id=24 bits_offset=30720
 	'ipvs' type_id=2158 bits_offset=30784
 	'diag_nlsk' type_id=777 bits_offset=30848
[2107] STRUCT 'netns_bpf' size=64 vlen=3
	'run_array' type_id=2108 bits_offset=0
...

[2108] ARRAY '(anon)' type_id=1755 index_type_id=13 nr_elems=2
[1755] PTR '(anon)' type_id=10598
[10598] STRUCT 'bpf_prog_array' size=16 vlen=2
	...
	'items' type_id=10763 bits_offset=128
[10763] ARRAY '(anon)' type_id=10762 index_type_id=13 nr_elems=0
[10762] STRUCT 'bpf_prog_array_item' size=24 vlen=2
	'prog' type_id=10666 bits_offset=0
	...
[10666] PTR '(anon)' type_id=10595
[10595] STRUCT 'bpf_prog' size=72 vlen=26
...
	'aux' type_id=10660 bits_offset=448
...
[10660] PTR '(anon)' type_id=10661
[10661] STRUCT 'bpf_prog_aux' size=1056 vlen=63
--- vs. ---

[15927] STRUCT 'netns_bpf' size=64 vlen=3
	'run_array' type_id=15928 bits_offset=0
...

[15928] ARRAY '(anon)' type_id=15929 index_type_id=13 nr_elems=2
[15929] PTR '(anon)' type_id=15930
[15930] STRUCT 'bpf_prog_array' size=16 vlen=2
	...
	'items' type_id=16910 bits_offset=128
[16910] ARRAY '(anon)' type_id=16909 index_type_id=13 nr_elems=0
[16909] STRUCT 'bpf_prog_array_item' size=24 vlen=2
	'prog' type_id=2110 bits_offset=0
	...
[2110] PTR '(anon)' type_id=15931
[15931] STRUCT 'bpf_prog' size=72 vlen=26
...
	'aux' type_id=16809 bits_offset=448
...
[16809] PTR '(anon)' type_id=16810
[16810] STRUCT 'bpf_prog_aux' size=1056 vlen=63

We end up at bpf_prog_aux:

$ diff -u 10661.txt 16810.txt 
--- 10661.txt	2023-06-28 17:52:28.696038452 +0100
+++ 16810.txt	2023-06-28 17:53:18.730720758 +0100
@@ -16,8 +16,8 @@
 	'attach_btf' type_id=10700 bits_offset=512
 	'ctx_arg_info' type_id=10737 bits_offset=576
 	'dst_mutex' type_id=153 bits_offset=640
-	'dst_prog' type_id=10666 bits_offset=896
-	'dst_trampoline' type_id=10738 bits_offset=960
+	'dst_prog' type_id=2110 bits_offset=896
+	'dst_trampoline' type_id=16872 bits_offset=960
 	'saved_dst_prog_type' type_id=10553 bits_offset=1024
 	'saved_dst_attach_type' type_id=10554 bits_offset=1056
 	'verifier_zext' type_id=38 bits_offset=1088
@@ -30,26 +30,26 @@
 	'xdp_has_frags' type_id=38 bits_offset=1144
 	'attach_func_proto' type_id=10657 bits_offset=1152
 	'attach_func_name' type_id=2 bits_offset=1216
-	'func' type_id=10739 bits_offset=1280
+	'func' type_id=16873 bits_offset=1280
 	'jit_data' type_id=57 bits_offset=1344
-	'poke_tab' type_id=10740 bits_offset=1408
+	'poke_tab' type_id=16874 bits_offset=1408
 	'kfunc_tab' type_id=10741 bits_offset=1472
 	'kfunc_btf_tab' type_id=10742 bits_offset=1536
 	'size_poke_tab' type_id=22 bits_offset=1600
 	'ksym' type_id=10721 bits_offset=1664
-	'ops' type_id=10743 bits_offset=6464
-	'used_maps' type_id=10744 bits_offset=6528
+	'ops' type_id=16875 bits_offset=6464
+	'used_maps' type_id=16876 bits_offset=6528
 	'used_maps_mutex' type_id=153 bits_offset=6592
-	'used_btfs' type_id=10745 bits_offset=6848
-	'prog' type_id=10666 bits_offset=6912
+	'used_btfs' type_id=16877 bits_offset=6848
+	'prog' type_id=2110 bits_offset=6912
 	'user' type_id=1084 bits_offset=6976
 	'load_time' type_id=24 bits_offset=7040
 	'verified_insns' type_id=22 bits_offset=7104
 	'cgroup_atype' type_id=13 bits_offset=7136
-	'cgroup_storage' type_id=10746 bits_offset=7168
+	'cgroup_storage' type_id=16878 bits_offset=7168
 	'name' type_id=125 bits_offset=7296
 	'security' type_id=57 bits_offset=7424
-	'offload' type_id=10747 bits_offset=7488
+	'offload' type_id=16879 bits_offset=7488
 	'btf' type_id=10700 bits_offset=7552
 	'func_info' type_id=10748 bits_offset=7616
 	'func_info_aux' type_id=10749 bits_offset=7680
@@ -58,7 +58,7 @@
 	'func_info_cnt' type_id=22 bits_offset=7872
 	'nr_linfo' type_id=22 bits_offset=7904
 	'linfo_idx' type_id=22 bits_offset=7936
-	'mod' type_id=8638 bits_offset=8000
+	'mod' type_id=16041 bits_offset=8000
 	'num_exentries' type_id=22 bits_offset=8064
 	'extable' type_id=894 bits_offset=8128
 	'(anon)' type_id=10736 bits_offset=8192

All of these fields either point at bpf_map or bpf_prog.

@lmb
Copy link
Collaborator Author

lmb commented Jun 28, 2023

The only difference in bpf_prog seems to be bpf_prog_aux.

$ diff -u 38764.txt 33237.txt 
--- 38764.txt	2023-06-28 18:06:54.699936390 +0100
+++ 33237.txt	2023-06-28 18:06:29.086584142 +0100
@@ -22,6 +22,6 @@
 	'stats' type_id=10757 bits_offset=256
 	'active' type_id=259 bits_offset=320
 	'bpf_func' type_id=10608 bits_offset=384
-	'aux' type_id=38941 bits_offset=448
+	'aux' type_id=33356 bits_offset=448
 	'orig_prog' type_id=10758 bits_offset=512
 	'(anon)' type_id=10755 bits_offset=576

@borkmann
Copy link
Member

And this is out of the same vmlinux.h?

[2107] STRUCT 'netns_bpf' size=64 vlen=3
	'run_array' type_id=2108 bits_offset=0
...
--- vs. ---
[15927] STRUCT 'netns_bpf' size=64 vlen=3
	'run_array' type_id=15928 bits_offset=0
...

Could you follow the ids of run_array further to see where some of the properties differ? Was this with latest pahole?

@lmb
Copy link
Collaborator Author

lmb commented Jun 29, 2023

I'll do that. Not sure which pahole, the vmlinux is pulled from an lvh image, see cilium/pwru#189 (comment) I think this uses whatever Ubuntu lunar has packaged:

[lunar (23.04)](https://packages.ubuntu.com/lunar/pahole) (utils): set of advanced DWARF utilities [universe]
1.24-4ubuntu1: amd64 arm64 armhf ppc64el riscv64 s390x

Here is the BTF blob: vmlinux-bpf-next-6.4.0-rc3-g60548b825b08.gz

@lmb
Copy link
Collaborator Author

lmb commented Jun 29, 2023

Could you follow the ids of run_array further to see where some of the properties differ?

We end up at bpf_prog_aux again:

2107<>15927 STRUCT 'netns_bpf' size=64 vlen=3
a: 	'run_array' type_id=2108 bits_offset=0
b: 	'run_array' type_id=15928 bits_offset=0
2108<>15928 ARRAY '(anon)' type_id=1755 index_type_id=13 nr_elems=2
a: ARRAY '(anon)' type_id=1755 index_type_id=13 nr_elems=2
b: ARRAY '(anon)' type_id=15929 index_type_id=13 nr_elems=2
1755<>15929 PTR '(anon)' type_id=10598
a: PTR '(anon)' type_id=10598
b: PTR '(anon)' type_id=15930
10598<>15930 STRUCT 'bpf_prog_array' size=16 vlen=2
a: 	'items' type_id=10763 bits_offset=128
b: 	'items' type_id=16910 bits_offset=128
10763<>16910 ARRAY '(anon)' type_id=10762 index_type_id=13 nr_elems=0
a: ARRAY '(anon)' type_id=10762 index_type_id=13 nr_elems=0
b: ARRAY '(anon)' type_id=16909 index_type_id=13 nr_elems=0
10762<>16909 STRUCT 'bpf_prog_array_item' size=24 vlen=2
a: 	'prog' type_id=10666 bits_offset=0
b: 	'prog' type_id=2110 bits_offset=0
10666<>2110 PTR '(anon)' type_id=10595
a: PTR '(anon)' type_id=10595
b: PTR '(anon)' type_id=15931
10595<>15931 STRUCT 'bpf_prog' size=72 vlen=26
a: 	'aux' type_id=10660 bits_offset=448
b: 	'aux' type_id=16809 bits_offset=448
10660<>16809 PTR '(anon)' type_id=10661
a: PTR '(anon)' type_id=10661
b: PTR '(anon)' type_id=16810
10661<>16810 STRUCT 'bpf_prog_aux' size=1056 vlen=63
a: 	'dst_prog' type_id=10666 bits_offset=896
b: 	'dst_prog' type_id=2110 bits_offset=896
already diffed 10666<>2110

@brb
Copy link
Member

brb commented Jun 29, 2023

A loop? 🤯

@lmb
Copy link
Collaborator Author

lmb commented Jun 29, 2023

A loop?

No, the script I had bailed out after the first diff, so it gave misleading results. I rewrote the thing in Go, now it gives me a real diff:

Struct:"mm_struct": child 0: Struct: child 1: Pointer: child 0: FuncProto: child 0: Int:"long unsigned int"[unsigned size=64] != Void

This says that mm_struct.get_unmapped_area differs in that it sometimes returns void, other times unsigned long. Even wonkier, there are multiple identical get_unmapped_area as well:

Number of struct mm_struct: 205
First five
FuncProto[args=5 return=Int:"long unsigned int"]
  "": Pointer[target=Struct:"file"]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
FuncProto[args=5 return=Void]
  "": Pointer[target=Struct:"file"]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
FuncProto[args=5 return=Void]
  "": Pointer[target=Struct:"file"]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
FuncProto[args=5 return=Void]
  "": Pointer[target=Struct:"file"]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
FuncProto[args=5 return=Void]
  "": Pointer[target=Struct:"file"]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]
  "": Int:"long unsigned int"[unsigned size=64]

@lmb
Copy link
Collaborator Author

lmb commented Jun 29, 2023

The problem is in ubuntu pahole: https://bugs.launchpad.net/ubuntu/+source/dwarves/+bug/2025370

It doesn't just create duplicate BTF, it also creates really large types. For example, in ubuntu BTF there are 150983 types reachable from struct sk_buff. This is the reason copying takes up so much time. Upstream pahole only has ~8k.

@lmb
Copy link
Collaborator Author

lmb commented Jul 4, 2023

This should now be much better on master even when running against buggy BTF.

@lmb lmb closed this as completed Jul 4, 2023
brb added a commit to cilium/pwru that referenced this issue Jul 6, 2023
Most notably to fix [1] which caused pwru's slow loading on Ubuntu.

[1]: cilium/ebpf#1081

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit to cilium/pwru that referenced this issue Jul 6, 2023
Most notably to fix [1] which caused pwru's slow loading on Ubuntu.

[1]: cilium/ebpf#1081

Signed-off-by: Martynas Pumputis <m@lambda.lt>
brb added a commit to cilium/pwru that referenced this issue Jul 6, 2023
Most notably to fix [1] which caused pwru's slow loading on Ubuntu.

[1]: cilium/ebpf#1081

Signed-off-by: Martynas Pumputis <m@lambda.lt>
tpapagian added a commit to cilium/tetragon that referenced this issue Jul 11, 2023
It doesn't seem to be part of any renovate PR.

This also includes a fix for newer kernels: cilium/ebpf#1081

Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
tpapagian added a commit to cilium/tetragon that referenced this issue Jul 11, 2023
It doesn't seem to be part of any renovate PR.

This also includes a fix for newer kernels: cilium/ebpf#1081

Signed-off-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants