Use instruction metadata to simplify linking and CO-RE relocation #606

lmb · 2022-03-28T05:30:42Z

btf: move CORERelocate to core.go

btf: add constructors for various *Info

Pull constructors for FuncInfo, LineInfo and CORERelocation out of
splitExtInfos. This makes the function smaller and makes removing it
later easier.

Instead of taking a *Spec in the constructors, take a []Type and
stringTable directly. This makes splitting ExtInfos out of *Spec
easier later on.

btf: track bpfCORERelo.InsnOff in instructions instead of bytes

This makes bpfCORERelo consistent with bpfFuncInfo and bpfLineInfo. It seems like
there is a latent bug here as well, since we used to subtract bpfFuncInfo.InsnOff
from bpfCORERelo.InsnOff, but for some reason the tests pass.

btf: store ExtInfos in instruction metadata

Use per-instruction metadata to store FuncInfo, LineInfo and CO-RE relocations.
As a result, simply concatenating two Instruction slices preserves the ext_info
without extra code to keep track of offsets. The simplest way to achieve is to
assign ext_infos per section, before we split into individual functions. This
also avoids having to split ext_infos in the first place.

Storing ext_infos in metadata in turn allows / requires removing code that relies
on stable offsets, the most notable being applying COREFixups. Instead of tracking
which offset a fixup should be applied to we change coreRelocate to instead return
results in the same order as CORERelocations are passed. In the caller we remember
which instruction originated the relocation and apply the fixup directly instead of
iterating all instructions once more.

Instead of storing ExtInfos in Spec we split it off and keep a reference to the
necessary types and string table around. This makes more sense since every BTF ELF
has a Spec, but ExtInfos are optional. It turns out that most code except the ELF
loader doesn't care about ExtInfos in the first place, so we might be able to
add an API that doesn't parse ExtInfos in the future.

Finally we can remove btf.Program since we don't need an intermediate type to hold
metadata anymore.

btf: remove Map

btf.Program is gone, so let's get rid of Map as well.

btf-metadata.md

lmb · 2022-03-28T19:41:14Z

Split off the WithSource + sparse metadata changes into #609

lmb · 2022-04-01T21:41:46Z

I'll move program: remove references from ProgramSpec to a follow up PR once #609 is merged.

lmb · 2022-04-20T17:13:25Z

This is now in a state where it can be reviewed! I've updated the PR description with the latest commit messages, please take a look.

I'm a little bit unhappy with ExtInfos.Assign, it seems like it belongs into the ELF loader. I'd prefer to convert bpfFuncInfo to FuncInfo, etc. at BTF load time and then expose something like the following:

type ExtInfos struct{
    []FuncInfo
    []LineInfo
    []CORERelocation
}

// Per-section ExtInfos
func LoadSpecFromReader(...) (*Spec ,map[string]ExtInfos, error)

The snag is that FuncInfo, etc. don't contain the InsnOff, since having an offset doesn't make sense when they are attached to an instruction. I can see two ways to fix this:

type ExtInfo struct {
    map[uint64]FuncInfo
    ...
}

By abusing the map we can avoid introducing a new type, but none of our code really wants to use a map. An ordered slice is much better. For that we'd need:

type FuncInfoWithOffset struct {
    FuncInfo
    Offset uint32
}

type ExtInfo struct {
    []FuncInfoWithOffset
    ...
}

This means a lot of additional types, which I don't particularly like. Maybe you have another idea?

ti-mo

Looks great, have a few first observations that might or might not apply, let me know what you think.

internal/btf/ext_info.go

ti-mo · 2022-04-27T09:52:22Z

internal/btf/core.go

 		if relo.kind == reloTypeIDLocal {
 			// Filtering out reloTypeIDLocal here makes our lives a lot easier
 			// down the line, since it doesn't have a target at all.
 			if len(relo.accessor) > 1 || relo.accessor[0] != 0 {
 				return nil, fmt.Errorf("%s: unexpected accessor %v", relo.kind, relo.accessor)
 			}

-			result[uint64(relo.insnOff)] = COREFixup{
+			result[i] = COREFixup{


I wonder whether we can't use Type instead of TypeID here. What is the remaining significance of a type ID? If we know the IDs of the local and target types, and we have both specs available within this function, what's stopping us from putting inflated types here?

This is libbpf behaviour, since relo.kind == reloTypeIDLocal we need an ID.

This also creates problems for #643 of course.

Yes, this is probably something that needs solving in order to get rid of IDs. (as a follow-up, to implement #643)

internal/btf/core.go

internal/btf/ext_info.go

internal/btf/core.go

ti-mo · 2022-04-27T15:43:59Z

internal/btf/btf.go

-func LoadSpecFromReader(rd io.ReaderAt) (*Spec, error) {
+// Returns ErrNotFound if reading from an ELF which contains no BTF. ExtInfos
+// may be nil.
+func LoadSpecFromReader(rd io.ReaderAt) (*Spec, *ExtInfos, error) {


I've seen you refer to eliminating *ExtInfos here in the future in your commit message, but can we get away with doing that sooner rather than later? Could we put *ExtInfos (back?) into the btf.Spec instead of returning it as a separate value here? IMO LoadSpec* should return a single value.

I'd like to remove ExtInfos completely, but #606 (comment) is my best idea how to do it and I don't like it very much.

Embedding ExtInfos in Spec has it the wrong way round: when reading BTF we always have a Spec, but we don't always have ExtInfos. So modeling the relationship as ExtInfo{Spec} is more natural. For example, naked vmlinux BTF has no ext infos. Check the call sites of loadRawSpec, almost always we pass nil, nil and aren't interested in ExtInfos in the first place. With ExtInfos split out / removed a good amount of code can be refactored to be less deeply nested.

I was considering adding a separate constructor LoadExtInfos(io.Reader) and making LoadSpec skip ExtInfos. However I can't come up with a way to achieve that without (a) adding *elf.File to the btf API surface like LoadExtInfos(*elf.File) (and all the SafeELFFile baggage) or (b) parsing the ELF one more time. (We already parse the ELF header twice in the ELF loader due to this.)

Well, check this out: 640c4b0 Kinda terrible, but we could use it to have LoadExtInfosFromReader(io.ReaderAt) and LoadSpecFromReader(io.ReaderAt).

Regarding the following:

I'd like to remove ExtInfos completely, but #606 (comment) is my best idea how to do it and I don't like it very much.

FuncInfo is wire format, the metadata would be a btf.Func. Func currently still holds a TypeID, so we can easily generate a FuncInfo based on the ID and the instruction's offset when marshaling.

Later on, when BTF marshaling lands, we can remove TypeID from btf.Func and generate BTF blobs that only contain the program's func and lineinfos.

Regarding ExtInfos: could we somehow break out that review and discussion into a separate PR? Taking *elf.File is not the way to go for the reasons you mentioned, but for me primarily due to #550.

If we're reworking the btf/extinfo reader API, we should definitely take the opportunity to address this current limitation and come to a solution that works for module BTF as well. Actually parsing (and 'linking') module BTF might be a bigger chunk of work, but at least the API should allow for it to be added later if needed.

Regarding ExtInfos: could we somehow break out that review and discussion into a separate PR?

I was trying to find a way to not return ExtInfos from LoadSpec...() since you seemed opposed. If you can live with that solution I'm happy to shelve that whole discussion for now.

lmb · 2022-04-27T18:03:46Z

As a follow up to #606 (comment), what if ExtInfos becomes this:

type ExtInfos struct{
    Relocations []*CORERelocations
    RelocationOffsets []uint32
    ...
}

func Foo(...) (map[string]*ExtInfos, error)

lmb · 2022-04-29T09:58:02Z

Please take a look at da1770b:

I've gotten rid of ApplyRelocations and ExtInfos.Assign and MarshalExtInfos and moved them into linker.go instead. This is much closer to what we had before. This is nice because instruction metadata is now private to the ebpf package.
FuncInfo, LineInfo, CORERelocationInfo are now used to expose "metadata + offset" via ExtInfos. The real metadata is called Func (as you suggested), Line and CORERelocation.
I've split the constuctor for Spec in two, with one returning ExtInfos along with the Spec. The nice side effect is that we don't decode useless ExtInfo when loading BTF via handleCache.

ti-mo

Exciting! Left a few more comments. Should the latest commit be squashed?

internal/btf/btf.go

ti-mo · 2022-05-04T09:17:49Z

internal/btf/core.go

 		if relo.kind == reloTypeIDLocal {
 			// Filtering out reloTypeIDLocal here makes our lives a lot easier
 			// down the line, since it doesn't have a target at all.
 			if len(relo.accessor) > 1 || relo.accessor[0] != 0 {
 				return nil, fmt.Errorf("%s: unexpected accessor %v", relo.kind, relo.accessor)
 			}

-			result[uint64(relo.insnOff)] = COREFixup{
+			result[i] = COREFixup{


Yes, this is probably something that needs solving in order to get rid of IDs. (as a follow-up, to implement #643)

ti-mo · 2022-05-04T09:47:07Z

linker.go

-		return nil, nil
+// marshalExtInfos encodes function and line info embedded in insns into kernel
+// wire format.
+func marshalExtInfos(insns asm.Instructions) (funcInfos, lineInfos []byte, _ error) {


Could this live in package btf, or is there an issue with a dependency on asm?

My hope was to get rid of FuncInfo and LineInfo, since all the information they carry can now be represented by asm.Instructions. e.g. *Func and the instruction's position in the slice as you've shown below.

Ultimately, we should end up with only Line and Func, and their respective wire formats, without the Func/LineInfo intermediates.

Pull constructors for FuncInfo, LineInfo and CORERelocation out of splitExtInfos. This makes the function smaller and makes removing it later easier. Instead of taking a *Spec in the constructors, take a []Type and stringTable directly. This makes splitting ExtInfos out of *Spec easier later on.

This makes bpfCORERelo consistent with bpfFuncInfo and bpfLineInfo. It seems like there is a latent bug here as well, since we used to subtract bpfFuncInfo.InsnOff from bpfCORERelo.InsnOff, but for some reason the tests pass.

Use per-instruction metadata to store func info, line info and CO-RE relocations. As a result, simply concatenating two Instruction slices preserves the ext_info without extra code to keep track of offsets. The simplest way to achieve is to assign ext_infos per section, before we split into individual functions. This also avoids having to split ext_infos in the first place. Storing ext_infos in metadata in turn allows / requires removing code that relies on stable offsets, the most notable being applying COREFixups. Instead of tracking which offset a fixup should be applied to we change coreRelocate to instead return results in the same order as CORERelocations are passed. In the caller we remember which instruction originated the relocation and apply the fixup directly instead of iterating all instructions once more. Instead of storing ExtInfos in Spec we split it off into a separate exported type. This makes more sense since every BTF ELF has a Spec, but ExtInfos are optional. It turns out that only the ELF loader doesn't care about ExtInfos in the first place, so we allow skipping ExtInfos altogether by introducing LoadSpecAndExtInfosFromReader. Finally we can remove btf.Program since we don't need an intermediate type to hold metadata anymore. Fixes cilium#522

btf.Program is gone, so let's get rid of Map as well.

lmb mentioned this pull request Mar 28, 2022

asm: add Instruction Source metadata #594

Closed

lmb requested a review from ti-mo March 28, 2022 05:31

lmb force-pushed the more-metadata branch from 6977500 to 661a625 Compare March 28, 2022 05:55

ti-mo reviewed Mar 28, 2022

View reviewed changes

btf-metadata.md Outdated Show resolved Hide resolved

lmb force-pushed the more-metadata branch from 661a625 to a05fcff Compare March 28, 2022 18:46

lmb marked this pull request as draft March 28, 2022 19:40

lmb force-pushed the more-metadata branch from a05fcff to f4d4912 Compare April 1, 2022 21:39

lmb force-pushed the more-metadata branch from f4d4912 to 7a9a200 Compare April 4, 2022 18:25

lmb force-pushed the more-metadata branch 2 times, most recently from 149ab47 to ef79de7 Compare April 20, 2022 16:54

lmb marked this pull request as ready for review April 20, 2022 16:56

lmb requested a review from ti-mo April 20, 2022 16:56

lmb force-pushed the more-metadata branch from ef79de7 to fc6c476 Compare April 20, 2022 17:15

lmb mentioned this pull request Apr 25, 2022

btf: support marshaling Type #641

Closed

lmb added this to the Export `btf` milestone Apr 26, 2022

ti-mo reviewed Apr 27, 2022

View reviewed changes

lmb force-pushed the more-metadata branch from fc6c476 to 64dbe60 Compare April 29, 2022 09:52

lmb force-pushed the more-metadata branch from 64dbe60 to da1770b Compare April 29, 2022 10:00

ti-mo approved these changes May 4, 2022

View reviewed changes

lmb added 3 commits May 4, 2022 10:33

btf: move CORERelocate to core.go

bf04b9d

lmb force-pushed the more-metadata branch 2 times, most recently from 04f12e3 to 4d44715 Compare May 4, 2022 11:15

btf: remove Map

a455344

btf.Program is gone, so let's get rid of Map as well.

lmb force-pushed the more-metadata branch from 4d44715 to a455344 Compare May 4, 2022 11:23

lmb merged commit 9dce119 into cilium:master May 4, 2022

lmb deleted the more-metadata branch May 4, 2022 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use instruction metadata to simplify linking and CO-RE relocation #606

Use instruction metadata to simplify linking and CO-RE relocation #606

lmb commented Mar 28, 2022 •

edited

lmb commented Mar 28, 2022

lmb commented Apr 1, 2022

lmb commented Apr 20, 2022

ti-mo left a comment

ti-mo Apr 27, 2022

lmb Apr 27, 2022

lmb Apr 27, 2022

ti-mo May 4, 2022 •

edited

ti-mo Apr 27, 2022

lmb Apr 27, 2022

lmb Apr 27, 2022

ti-mo Apr 28, 2022

ti-mo Apr 28, 2022

lmb Apr 29, 2022

lmb commented Apr 27, 2022

lmb commented Apr 29, 2022 •

edited

ti-mo left a comment

ti-mo May 4, 2022 •

edited

ti-mo May 4, 2022

Use instruction metadata to simplify linking and CO-RE relocation #606

Use instruction metadata to simplify linking and CO-RE relocation #606

Conversation

lmb commented Mar 28, 2022 • edited

lmb commented Mar 28, 2022

lmb commented Apr 1, 2022

lmb commented Apr 20, 2022

ti-mo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ti-mo May 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmb commented Apr 27, 2022

lmb commented Apr 29, 2022 • edited

ti-mo left a comment

Choose a reason for hiding this comment

ti-mo May 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmb commented Mar 28, 2022 •

edited

ti-mo May 4, 2022 •

edited

lmb commented Apr 29, 2022 •

edited

ti-mo May 4, 2022 •

edited