Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btf: libbpf doesn't update instructions when deduplicating types #739

Open
lmb opened this issue Jul 19, 2022 · 1 comment
Open

btf: libbpf doesn't update instructions when deduplicating types #739

lmb opened this issue Jul 19, 2022 · 1 comment

Comments

@lmb
Copy link
Collaborator

lmb commented Jul 19, 2022

--- FAIL: TestLibBPFCompat (0.01s)
    --- FAIL: TestLibBPFCompat/netif_receive_skb.linked3.o (0.01s)
        elf_reader_test.go:665: Error during loading: program trace_netif_receive_skb: apply CO-RE relocations: apply fixup target_type_id=67->16253: invalid immediate 73, expected 67 (fixup: target_type_id=67->16253)

target_type_id=67->16253 means we expect to find type id 67 in Instruction.Constant. Instead we find invalid immediate 73.

We can dump the BTF to figure out what types this refers to:

$ bpftool btf dump file netif_receive_skb.linked3.o format raw | grep -F '[67]'
[67] STRUCT 'bpf_insn' size=8 vlen=5
$ bpftool btf dump file netif_receive_skb.linked3.o format raw | grep -F '[73]'
[73] STRUCT 'btf_enum' size=8 vlen=2

We think the instruction should reference a struct bpf_insn but the compiler encoded struct btf_enum. After adding some advanced printf debugging:

$ ./run-tests.sh 5.17 -run TestLibBPFCompat/netif_receive_skb.linked3.o . | grep -A1 -B1 302
offset 18 target_type_id 19 (from wire)
offset 302 target_type_id 67 (from wire)
offset 359 target_type_id 2 (from wire)
--
offset 18 target_type_id of Struct:"sk_buff" (0) local type id 19 (from ext info)
offset 302 target_type_id of Struct:"bpf_insn" (0) local type id 67 (from ext info)
offset 359 target_type_id of Int:"int" (0) local type id 2 (from ext info)
--
offset 18 LdImmDW dst: r2 imm: 19 target_type_id of Struct:"sk_buff" (0) (into metadata)
offset 302 LdImmDW dst: r3 imm: 73 target_type_id of Struct:"bpf_insn" (0) (into metadata)
offset 359 LdImmDW dst: r3 imm: 2 target_type_id of Int:"int" (0) (into metadata)
--
offset 18 LdImmDW dst: r2 imm: 19 target_type_id of Struct:"sk_buff" (0) (from metadata)
offset 302 LdImmDW dst: r3 imm: 73 target_type_id of Struct:"bpf_insn" (0) (from metadata)
offset 359 LdImmDW dst: r3 imm: 2 target_type_id of Int:"int" (0) (from metadata)
--
offset 4258 LdImmDW dst: r3 imm: 83 target_type_id of Struct:"__sk_buff" (0) (from metadata)
offset 302 LdImmDW dst: r3 imm: 73 target_type_id of Struct:"bpf_insn" (0) (apply)
--- FAIL: TestLibBPFCompat (0.02s)

From the last line we know that applying the fixup at offset 302 fails. The BTF relocation (from wire) encodes target type 67 for that offset, which bpftool agrees is struct bpf_insn. As soon as we write it into metadata we can see that the instruction at offset 302 disagrees, the immediate is 73. We can verify that the 73 comes from the ELF, not from a bug in the library:

$ llvm-objdump-12 --section=tp_btf/netif_receive_skb -D netif_receive_skb.linked3.o  | grep ' 302:'     302:	18 03 00 00 49 00 00 00 00 00 00 00 00 00 00 00	r3 = 73 ll

At this point it looks like the encoded instructions and the encoded CO-RE relocations disagree. Maybe this is related to which tool we're using to do the linking? An interesting experiment would be if @ti-mo rebuilt the 5.15 selftests on his machine, and then run the unit test against that. If that changes the outcome its a tooling issue, if not it might be a bug in 5.17.

Another thing to try is running the same example via libbpf and looking at the output it generates for this particular test case.

Diff for my quick hack:

diff --git a/internal/btf/ext_info.go b/internal/btf/ext_info.go
index 2c0e1af..e1c9996 100644
--- a/internal/btf/ext_info.go
+++ b/internal/btf/ext_info.go
@@ -124,6 +124,7 @@ func (ei *ExtInfos) Assign(insns asm.Instructions, section string) {
 		}
 
 		if len(reloInfos) > 0 && reloInfos[0].offset == iter.Offset {
+			fmt.Println("offset", iter.Offset, iter.Ins, reloInfos[0].relo, "(into metadata)")
 			iter.Ins.Metadata.Set(coreRelocationMeta{}, reloInfos[0].relo)
 			reloInfos = reloInfos[1:]
 		}
@@ -610,6 +611,10 @@ type CORERelocation struct {
 	kind     coreKind
 }
 
+func (cr *CORERelocation) String() string {
+	return fmt.Sprintf("%s of %s (%s)", cr.kind, cr.typ, cr.accessor)
+}
+
 func CORERelocationMetadata(ins *asm.Instruction) *CORERelocation {
 	relo, _ := ins.Metadata.Get(coreRelocationMeta{}).(*CORERelocation)
 	return relo
@@ -653,6 +658,7 @@ func newRelocationInfos(brs []bpfCORERelo, ts types, strings *stringTable) ([]co
 		if err != nil {
 			return nil, fmt.Errorf("offset %d: %w", br.InsnOff, err)
 		}
+		fmt.Println("offset", relo.offset, relo.relo, "local type id", br.TypeID, "(from ext info)")
 		rs = append(rs, *relo)
 	}
 	sort.Slice(rs, func(i, j int) bool {
@@ -713,7 +719,7 @@ func parseCOREReloRecords(r io.Reader, bo binary.ByteOrder, recordSize uint32, r
 		// ELF tracks offset in bytes, the kernel expects raw BPF instructions.
 		// Convert as early as possible.
 		relo.InsnOff /= asm.InstructionSize
-
+		fmt.Println("offset", relo.InsnOff, relo.Kind, relo.TypeID, "(from wire)")
 		out = append(out, relo)
 	}
 
diff --git a/linker.go b/linker.go
index 60cb7a6..723077c 100644
--- a/linker.go
+++ b/linker.go
@@ -117,11 +117,14 @@ func findReferences(progs map[string]*ProgramSpec) error {
 func applyRelocations(insns asm.Instructions, local, target *btf.Spec) error {
 	var relos []*btf.CORERelocation
 	var reloInsns []*asm.Instruction
+	var reloOffsets []asm.RawInstructionOffset
 	iter := insns.Iterate()
 	for iter.Next() {
 		if relo := btf.CORERelocationMetadata(iter.Ins); relo != nil {
+			fmt.Println("offset", iter.Offset, iter.Ins, relo, "(from metadata)")
 			relos = append(relos, relo)
 			reloInsns = append(reloInsns, iter.Ins)
+			reloOffsets = append(reloOffsets, iter.Offset)
 		}
 	}
 
@@ -144,6 +147,7 @@ func applyRelocations(insns asm.Instructions, local, target *btf.Spec) error {
 
 	for i, fixup := range fixups {
 		if err := fixup.Apply(reloInsns[i]); err != nil {
+			fmt.Println("offset", reloOffsets[i], reloInsns[i], relos[i], "(apply)")
 			return fmt.Errorf("apply fixup %s: %w", &fixup, err)
 		}
 	}

Originally posted by @lmb in #668 (comment)

lmb added a commit to ti-mo/ebpf that referenced this issue Jul 19, 2022
Trying to load the ELF gives the following error:

    elf_reader_test.go:665: Error during loading: program trace_netif_receive_skb:
    apply CO-RE relocations: apply fixup target_type_id=67->16253:
    invalid immediate 73, expected 67 (fixup: target_type_id=67->16253)

After some cursory digging this doesn't seem to be a bug in the library
but maybe one in either pahole or clang.

See cilium#739
lmb added a commit to ti-mo/ebpf that referenced this issue Jul 20, 2022
Trying to load the ELF gives the following error:

    elf_reader_test.go:665: Error during loading: program trace_netif_receive_skb:
    apply CO-RE relocations: apply fixup target_type_id=67->16253:
    invalid immediate 73, expected 67 (fixup: target_type_id=67->16253)

After some cursory digging this doesn't seem to be a bug in the library
but maybe one in either pahole or clang.

See cilium#739
ti-mo pushed a commit that referenced this issue Jul 20, 2022
Trying to load the ELF gives the following error:

    elf_reader_test.go:665: Error during loading: program trace_netif_receive_skb:
    apply CO-RE relocations: apply fixup target_type_id=67->16253:
    invalid immediate 73, expected 67 (fixup: target_type_id=67->16253)

After some cursory digging this doesn't seem to be a bug in the library
but maybe one in either pahole or clang.

See #739
@lmb
Copy link
Collaborator Author

lmb commented Nov 1, 2023

I think this is the same issue as https://lore.kernel.org/bpf/CAN+4W8i=7Wv2VwvWZGhX_mc8E7EST10X_Z5XGBmq=WckusG_fw@mail.gmail.com/

clang generates BTF with duplicates in it, libbpf deduplicates the BTF but doesn't update the instruction immediate.

@lmb lmb changed the title btf: pahole or clang maybe generates invalid CO-RE relocations btf: libbpf doesn't update instructions when deduplicating types Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant