-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elf_reader,btf: support multiple programs per ELF section #508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think splitting out / delaying the linking this way makes a lot of sense.
One aspect @lmb pointed out is that this makes looking at the final instructions actually loaded into the kernel hard. We already have this problem a bit with the -1
offsets that get resolved in NewProgram()
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, that took a while 😅 I know it's often difficult when doing big refactoring, however having several smaller commits would have helped me a lot here. Especially the BTF stuff can be split I think, some of the diffs are basically only reviewable by ignoring the old code.
I really like that BTF is now organised by program / function and not by section anymore. That was always a crutch. Having the various Infos exported could also help us to get rid of btf.Program
which would be nice.
Here are things that we need to discuss more:
- FuncInfos, LineInfos are basically 1:1 mappings of the BTF wire format. I think we shouldn't do that: it boxes us into what is convenient for the wire format, but it makes working with the type more cumbersome. Point in case is
FuncInfo.Name()
. We should export a type that has a good API and is useful in the future. The thought experiment I like to do is: what would aFuncInfo
look like if we added BTF write support? - Linking: I'm uneasy about allowing functions to call each other without any restrictions. Seems like it's currently possible to create a cyclic graph of function calls, is that true? Do we really want to support that?
- I also wonder whether we can only link the programs which we will end up including in
CollectionSpec
. - I think we shouldn't export functions from
.text
in CollectionSpec for now (can't exactly figure out whether that is the case right now).
a19acbb
to
a1c6631
Compare
34311ef
to
d27aede
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't respond to #508 (comment) so I'll copy the relevant parts here:
Things get tricky when a function D is added that calls both A and B:
D -|-> A --> B --> C
|-> B --> CB and C's instructions were already pulled in indirectly by pulling in A, and subsequently pulling in B will cause the same insns to be pulled in again.
Now if C would call back into A, this gets even more complicated, since all of A, B and C now need to include each other's instructions. In order to pre-calculate this, we'd need to execute a full dependency resolution pass for each node in the graph, starting with the deepest node(s) first. But if there are cycles, how do we determine the deepest node?
Maybe we are talking past each other, but the current linker handles this correctly I think. I've modified a testcase to be sure:
diff --git a/linker_test.go b/linker_test.go
index c531d08..4618ac8 100644
--- a/linker_test.go
+++ b/linker_test.go
@@ -13,9 +13,11 @@ func TestLink(t *testing.T) {
Instructions: asm.Instructions{
// Make sure the call doesn't happen at instruction 0
// to exercise the relative offset calculation.
- asm.Mov.Reg(asm.R0, asm.R1),
- asm.Call.Label("my_func"),
- asm.Return(),
+ asm.Mov.Reg(asm.R0, asm.R1).Sym("D"),
+ asm.Call.Label("A"),
+ asm.JNE.Imm(asm.R0, 0, "exit"),
+ asm.Call.Label("B"),
+ asm.Return().Sym("exit"),
},
License: "MIT",
}
@@ -23,13 +25,19 @@ func TestLink(t *testing.T) {
libs := []*ProgramSpec{
{
Instructions: asm.Instructions{
- asm.LoadImm(asm.R0, 1337, asm.DWord).Sym("my_other_func"),
+ asm.Call.Label("B").Sym("A"),
asm.Return(),
},
},
{
Instructions: asm.Instructions{
- asm.Call.Label("my_other_func").Sym("my_func"),
+ asm.Call.Label("C").Sym("B"),
+ asm.Return(),
+ },
+ },
+ {
+ Instructions: asm.Instructions{
+ asm.LoadImm(asm.R0, 1337, asm.DWord).Sym("C"),
asm.Return(),
},
},
The output for this is:
D:
0: MovReg dst: r0 src: r1
1: Call -1 <A>
2: JNEImm dst: r0 off: -1 imm: 0 <exit>
3: Call -1 <B>
exit:
4: Exit
A:
5: Call -1 <B>
6: Exit
B:
7: Call -1 <C>
8: Exit
C:
9: LdImmDW dst: r0 imm: 1337
11: Exit
The trick is to keep the following state:
- whether a function has been
linked
- the linked insns, currently stored in
prog.Instructions
- a stack of insns which are yet to be linked, currently
pending
Provided I've understood you correctly, I disagree with this argument you made:
For the same reason Instructions no longer contains all flattened dependencies: because it causes duplicates. 'Uniqueness' depends on the starting point of the dependency resolution process.
For the call graph you showed the current code doesn't produce duplicates. What am I missing?
839b10c
to
89ee7ea
Compare
…method Ignore LBB symbols since they are internal to clang, there are often many, and they serve no purpose for loading eBPF. Signed-off-by: Timo Beckers <timo@isovalent.com>
This patch adds support for emitting multiple eBPF functions/programs to the same ELF section. Significant changes were necessary, both to the BPF linker and the BTF ext_info parser, as previously most of the code (rightfully) made the assumption that one ELF section equaled one BPF function. In order to remove this limitation, all 'offset' logic that used to track positions within an ELF section had to be modified to track positions within a function body instead. Secondly, in order to reassemble instruction streams and BTF func_infos, line_infos and also CO-RE relocations back into a flat format the kernel expects, the linking logic had to be broken up into smaller pieces. This allows the instruction linker and the BTF extinfo linkers to operate on the same source of truth, so they can generate their outputs using the same program layout. Additionally, by computing references (which requires scanning all progs' instruction streams for references to other programs) only once during ELF loading, any marshaling logic can simply request the flattened program layout, saving cycles. To simplify the reference discovery process, the internal distinction between 'libs' and 'progs' has been eliminated. This allows any function to be called from anywhere, regardless of section name. For backwards compatibility reasons, programs of type UnspecifiedProgram are not yet emitted to the CollectionSpec. How we treat functions from .text and other unknown sections is still to be debated. From a high level, the linker is now split into multiple stages: - Finding references - a program's instruction stream is checked for references to any other program in the ELF. If a direct dependency is found (a jump to another function), a pointer to that function is stored in the program's reference list. - Flattening - when the program is about to be marshaled for hand-off to the kernel, a unique, flat list of dependent programs is generated by stepping through each program's reference list in a depth-first manner. This list must be used by both the instruction and BTF extinfo linkers, so they generate their outputs using the same layout. - Marshaling - the flat list of programs is simply iterated over and its instructions and BTF extinfo's are marshaled to their respective output buffers. Signed-off-by: Timo Beckers <timo@isovalent.com>
Adds the Instructions.Name() method for conveniently getting the name of the function insns belongs to. It returns the Symbol field of the first instruction in insns, if any. Eliminated (*elfCode).loadInstructions() in favour of iterating the ELF section sequentially, performing symbol lookups as we go. Function decoding no longer assumes an Exit instruction denotes the end of a function, which was obviously wrong. Signed-off-by: Timo Beckers <timo@isovalent.com>
Added the jumpTarget helper for calculating absolute offsets of jump targets within ELF program sections. Signed-off-by: Timo Beckers <timo@isovalent.com>
This patch adds support for emitting multiple eBPF functions/programs
to the same ELF section.
Significant changes were necessary, both to the BPF linker and the BTF
ext_info parser, as previously most of the code (rightfully) made the
assumption that one ELF section equaled one BPF function. In order to
remove this limitation, all 'offset' logic that used to track positions
within an ELF section had to be modified to track positions within a
function body instead.
Secondly, in order to reassemble instruction streams and BTF func_infos,
line_infos and also CO-RE relocations back into a flat format the kernel
expects, the linking logic had to be broken up into smaller pieces.
This allows the instruction linker and the BTF extinfo linkers to
operate on the same source of truth, so they can generate their outputs
using the same program layout. Additionally, by computing neighbours
(which requires scanning all progs' instruction streams for references
to other programs) only once during ELF loading, any marshaling logic
can simply request the flattened program layout, saving cycles.
To simplify the neighbour discovery process, the internal distinction
between 'libs' and 'progs' has been eliminated. This allows any function
to be called from anywhere, regardless of section name. For backwards
compatibility reasons, programs of type UnspecifiedProgram are not yet
emitted to the CollectionSpec. How we treat functions from .text and
other unknown sections is still to be debated.
From a high level, the linker is now split into multiple stages:
references to any other program in the ELF. If a direct dependency is
found (a jump to another function), a pointer to that function is
stored in the program's neighbour table.
the kernel, a unique, flat list of dependent programs is generated by
stepping through each program's neighbour table in a depth-first
manner. This list must be used by both the instruction and BTF extinfo
linkers, so they generate their outputs using the same layout.
instructions and BTF extinfo's are marshaled to their respective
output buffers.
Note: includes and depends on #503
Possible follow-ups to be discussed:
CollectionSpec.Programs
, including the section they were declared in. (need forcilium/cilium
).text
; hide it from the caller completely or expose inCollectionSpec.Functions
or something (cc @arthurfabre)